Troubleshooting and Debugging “stuff” on linux.

So, today, I decided to write a doc on how to troubleshoot
and debug on a linux system. What I’m going for is a
cross between my old System Tuning
page and Max os x Debugging Magic.

Aka, what tools, tips, tactics, process, utils, approaches, etc people
can use to debug “stuff”. Where “stuff” is basically defined as “software
that is not doing what it should be doing”.

It’s theoretically focused on Red Hat Linux, but so far, it’s
mostly generic to any linux system,

So, I ask you, what are your debugging/troubleshooting secrets?


This is a guide to basic, and not so basic troubleshooting and
debugging on Red Hat linux systems. Goals include description
and usage of common tools, how to find information, etc. 
Basically, info that may be helpful to someone diagnosing 
a problem. Emphasis will be on software issues, but
might include hardware as well.



Enviroment
   - allowing cores
   - thread stuff (LD_ASSUME_KERNEL, etc)
   - glibc malloc stuff
     - all the glibc env variable stuff


Tools
   - strace
	- simple useage
	- filtering output
	- examples
	- use as profiling
	- see what files are open
	- network connections

   - ltrace
	- simple useage
	- filtering output

   - gdb 
	- simple useage
	- debugging crashes
	- reporting crashes
	- getting stack traces
	- *-debuginfo
	- no optimization

   - python debugging
   - perl debugging
   - sh debugging

   - bugbuddy etc

   - top
   - ps 
   - free
   - systat/sar
   - tcpdump/ethereal
   - netstat 
	- what process is doing what and
	  to whom over the network
	- number of sockets open
	- socket status
   - lsof/fuser
   - nm/ld/ other linker stuff
   - file
   - netcat
	- to see network stuff
   - md5sum
	- verifying files
	- verifying iso's
   - diff
	- compare versions for diffs
   - find
	- things changed recently
	- executables
	- owned by foo:bar
   - ls/stat
        - finding [sym|hard] links
        - out of space
   - df
	- out of space
   - watch
        - used to see if process output changes
        - free, df, etc



Logs
   - messages, dmesg, lastlog, etc
   - log filtering tools?
	
Using RPM to help troubleshoot
   - package verify
   - missing deps

Types Of Problems
   - missing stuff
     - files
     - libs
     - perms
     - deps

  - networking
    - firewall checks
    - ifconfig
    - tcpdump/ethereal
    - netcat
    - netstat

  - programs crashing
   - strace
   - ltrace
   - gdb
   - debuginfo
   - core files

  - configs screwed up
   - finding config files with rpm
   - gconf
   - .rpmnew/etc
   - diff
   - stat (when did it change)
     - atime, etc
   - apps not reloading config
     - nohup
     - daemon already running
     - process not restarting 
       - bashrc for example

  - kernel issues
   - single user
   - init=/bin/bash
   - bootloader configs
   - log levels
 
  - stuff not writing to disk
    - df (space?)
    - stat/ls (perms?)
    - mount (mount ro?)

  - files doing weird stuff
    - view hidden symbols
    - ending new line
    - vi -b :setlist to see hidden stuff
    - od to look for hidden stuff

  - env stuff
    - things work as user/not root, vice versa
    - printenv
    - what basic env stuff means
    - su/sudo issues
    - env -i to launch with clean env
    - su -s, etc 
    - sudo -l

  - shell scripting
   - echo
   - sh -x 
   - aliases
   - trap 
   - bash debugger

  - name resolution
   - useage of dig
   - /etc/hosts
   - nscd
   - /etc/nsswitch.conf
   - splat names/host typos

  - auth info
   - getent
   - ypwhich/match/cat
   

 
App specific
   - apache
     - scorecard stuff
     - module debugging
     - log files
     - init file "configtest"
     - -X debug mode

  - php
	- 

  - gtk apps
    - event debuging stuff?

  - X apps
    - nosync stuff
    - X log

  - ssh
    - debug flags
    - sshd -d -d 
   
  - samba?

  - pam/auth/nss
    - logging options?
    - getent

  - sendmail