Home‎ > ‎CIS 191‎ > ‎

System Monitoring and Logging

Commands

  • uptime
  • top 
  • powertop 
  • iftop 
  • vmstat
  • netstat
  • lsof

Configuration

  • /var/log

Introduction

Administrators must be able to determine what the system is doing at all times. Problems with your Linux host are invisible until you reveal them with the right commands. This lecture focuses on the tools used to discover where your host's resources are being spent. 

Memory Resources

The fundamental unit of memory to an operating system is called a page. When a process allocates memory it allocates it in pages. Most processors use a 4kB page size though newer processor architectures may support simultaneous use of different page sizes. To see what the page size is on  your system run the command: 

    $ getconf PAGE_SIZE

Each process in Linux "sees" a completely empty memory space (or nothing but empty pages). As a process allocates new memory pages Linux finds space for those pages in main memory. The diagram below shows how two processes have their pages sprinkled throughout main memory.  You can examine how much memory a process uses by looking into the /proc filesystem. For example, to see how much memory the process with PID 1212 is using, run the command:

$ sudo cat /proc/1212/status | grep Vm
VmPeak:    7212 kB
VmSize:  7212 kB
VmLck:       0 kB
VmPin:       0 kB
VmHWM:     360 kB
VmRSS:     360 kB
VmData:     184 kB
VmStk:     136 kB
VmExe:      44 kB
VmLib:    1908 kB
VmPTE:      36 kB
VmSwap:       0 kB
 
There's a lot more information in that file, but grepping for "Vm" limits the output only to virual memory usage. The output shows that the current memory utilization (VmSize) is 7212 kB (or 1803 pages). We can see what the total memory usage for the system is using by looking in the /proc/vmstat pseudo file:  

$ cat /proc/meminfo
MemTotal:        8060452 kB
MemFree:          324508 kB
Buffers:          493664 kB
Cached:          2559792 kB
SwapCached:            8 kB
Active:          4356928 kB
Inactive:        2238916 kB
Active(anon):    3229324 kB
Inactive(anon):  1110968 kB
Active(file):    1127604 kB
Inactive(file):  1127948 kB
Unevictable:        1588 kB
Mlocked:            1588 kB
SwapTotal:       5521404 kB
SwapFree:        5520828 kB
...[snip]...

The output shows that my system is using 7735944 kB (8060452 KB total − 324508 KB free) memory. 

Swap

The picture on the right shows what happens when a processes on the operating system request more memory than is physically present in the system. The additional pages required can be served by Linux but the overflow goes into swap. Swap is space on the disk used to back memory. Having swap means that Linux can gracefully handle situations where there is a short-term oversubscription to memory. You can see how much swap is used by looking in /proc/meminfo 

$ cat /proc/meminfo  | grep Swap 
SwapCached:            8 kB
SwapTotal:       5521404 kB
SwapFree:        5520828 kB

If your machine has never needed more memory than it has the swap file will be empty. Otherwise it will contains some pages. Even if you have free memory available you may still see some pages in the swap file because Linux will not swap them in until they're needed  by the program that owns them. Depending on the program that might be a long while. 

CPU Resources 
The following commands are good commands to show you what your CPU is doing: 
  1. uptime
  2. top
  3. powertop 
  4. vmstat
Here's an example of running the uptime command:

$ uptime
 12:29:31 up 6 days, 19:10,  3 users,  load average: 0.07, 0.13, 0.09

The output of the command tells you when the last reboot was. This is important to know because it tells you if there's been a crash or a power failure. The load average is also a key number to understanding how busy your system is. The three numbers are the average number of runnable processes in the last
  1. 1 minute
  2. 5 minute 
  3. 15 minutes
When the number exceed the number of processors on the machine it indicates that the machine is being asked to work beyond it's peak capacity and that the scheduler is rationing time. In some cases that may be normal, but usually it means that something has gone out of control. 

The top command shows a continuously updated view of running processes: 

$ top 
top - 12:35:14 up 6 days, 19:15,  3 users,  load average: 0.19, 0.19, 0.13
Tasks: 270 total,   1 running, 269 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us,  0.1 sy,  0.0 ni, 99.0 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8060480 total,  7504036 used,   556444 free,   863164 buffers
KiB Swap:  5521404 total,        0 used,  5521404 free.  3783600 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
 2338 root      20   0  503784  79588  66292 S  12.7  1.0  21:01.97 Xorg        
20945 mimatera  20   0   24944   1568   1076 R   6.4  0.0   0:00.01 top         
    1 root      20   0   34028   3432   1500 S   0.0  0.0   0:01.42 init        
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.04 kthreadd    
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.10 ksoftirqd/0 
    4 root      20   0       0      0      0 S   0.0  0.0   0:33.69 kworker/0:0 
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+ 
    7 root      20   0       0      0      0 S   0.0  0.0   0:39.11 rcu_sched   
    8 root      20   0       0      0      0 S   0.0  0.0   0:10.65 rcuos/0     
    9 root      20   0       0      0      0 S   0.0  0.0   0:38.06 rcuos/1     
   10 root      20   0       0      0      0 S   0.0  0.0   0:09.26 rcuos/2     
   11 root      20   0       0      0      0 S   0.0  0.0   0:07.35 rcuos/3     
   12 root      20   0       0      0      0 S   0.0  0.0   0:03.32 rcuos/4     
   13 root      20   0       0      0      0 S   0.0  0.0   0:03.67 rcuos/5     
   14 root      20   0       0      0      0 S   0.0  0.0   0:03.46 rcuos/6     
   15 root      20   0       0      0      0 S   0.0  0.0   0:04.25 rcuos/7     
   16 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh      
   17 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/0     
   18 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/1     
   19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/2     

The top command is a good "first look" at what the system is doing. You can sort the output by different columns. The vmstat command is similar to top in that it shows you an updating view of the system's performance. With no arguments it shows you the averages since boot:

$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 543328 863172 3797364    0    0     1     2    2    1  1  0 99  0  0

With the -n argument it continuously updates the output:

$ vmstat -n 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 543212 863176 3794008    0    0     1     2    2    1  1  0 99  0  0
 0  0      0 543204 863176 3793984    0    0     0     0  359  916  0  0 99  0  0
 0  0      0 543088 863176 3793980    0    0     0    24  340  818  0  0 99  0  0
 0  0      0 543088 863176 3794004    0    0     0    60  327  569  0  1 99  0  0
 0  0      0 542220 863176 3794004    0    0     0     0  257  483  0  0 100  0  0
 0  0      0 542220 863176 3793992    0    0     0     0  225  401  0  0 100  0  0
 0  0      0 543220 863176 3794000    0    0     0     0  216  444  0  0 100  0  0
 0  0      0 543220 863176 3793988    0    0     0     0  197  223  0  0 100  0  0

Powertop is available on platforms that support power monitoring, like laptops and mobile devices. Most desktops don't support power monitoring but powertop still shows you what processes and devices are using power, it just doesn't tell you how much in terms of watts. Powertop is text-menu driven. 

Files and Network Sockets

The most important tool for monitoring files on your host is lsof (list open files). lsof lists all the open file handles for every process. On a busy machine its output can be thousands of lines so if you're looking for something specific you can pipe the output through grep: 

$ sudo lsof | grep Desktop 
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
bash      20704              mimatera  cwd       DIR               0,36      4096    4849689 /home/mimatera/Desktop
sudo      21389                  root  cwd       DIR               0,36      4096    4849689 /home/mimatera/Desktop
grep      21390              mimatera  cwd       DIR               0,36      4096    4849689 /home/mimatera/Desktop
lsof      21394                  root  cwd       DIR               0,36      4096    4849689 /home/mimatera/Desktop
lsof      21395                  root  cwd       DIR               0,36      4096    4849689 /home/mimatera/Desktop

You can find open files by user ID and other selectors. See the manual page for the full list. The netstat command shows you the open network sockets on a host. The output of lsof also shows you sockets but doesn't give you specifics; netstat does. The following command shows you all open TCP connections:  

$ netstat -tn 
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 172.30.5.41:2049        192.168.0.19:860        ESTABLISHED
tcp        0      0 172.30.5.41:2049        192.168.0.12:939        ESTABLISHED
tcp       38      0 172.30.5.41:46232       54.192.145.245:443      CLOSE_WAIT 
tcp        0      0 172.30.5.41:2049        192.168.0.14:640        ESTABLISHED
tcp        1      0 172.30.5.41:37751       54.174.174.143:443      CLOSE_WAIT 
tcp        0      0 172.30.5.41:42068       108.160.169.178:443     ESTABLISHED
tcp        0      0 172.30.5.41:2049        192.168.0.11:986        ESTABLISHED
tcp6       0      0 2607:f380:80f:f42:39939 2607:f8b0:4005:803::443 ESTABLISHED
tcp6       0      0 2607:f380:80f:f42:33591 2607:f8b0:4005:803::443 ESTABLISHED
tcp6       0      0 2607:f380:80f:f42:39941 2607:f8b0:4005:803::443 ESTABLISHED
tcp6       0      0 2607:f380:80f:f42:44852 2607:f8b0:400e:c04:5228 ESTABLISHED

The 'l' options shows you listening sockets. Those are sockets where your host is listening for inbound connections:

$ netstat -ltnp 
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:30003         0.0.0.0:*               LISTEN      -               
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -               
tcp        0      0 127.0.0.1:631           0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:51257           0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:46458           0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:17500           0.0.0.0:*               LISTEN      21298/dropbox   
tcp        0      0 0.0.0.0:57534           0.0.0.0:*               LISTEN      -               
tcp        0      0 127.0.0.1:17600         0.0.0.0:*               LISTEN      21298/dropbox   
tcp        0      0 0.0.0.0:2049            0.0.0.0:*               LISTEN      -               
tcp        0      0 127.0.0.1:17603         0.0.0.0:*               LISTEN      21298/dropbox   
tcp        0      0 0.0.0.0:33545           0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:54733           0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      -               
tcp6       0      0 :::22                   :::*                    LISTEN      -               
tcp6       0      0 ::1:631                 :::*                    LISTEN      -               
tcp6       0      0 :::35447                :::*                    LISTEN      -               
tcp6       0      0 :::36764                :::*                    LISTEN      -               
tcp6       0      0 :::34369                :::*                    LISTEN      -               
tcp6       0      0 :::2049                 :::*                    LISTEN      -               
tcp6       0      0 :::47397                :::*                    LISTEN      -               
tcp6       0      0 :::51595                :::*                    LISTEN      -               
tcp6       0      0 :::111                  :::*                    LISTEN      -               
tcp6       0      0 :::80                   :::*                    LISTEN      -               

The 'p' flag says to list the programs that are listening. You can only do that for processes that you own (unless you're root). 


Comments