Wednesday, 19 September 2012

The Linux Sysadmin's Toolkit

If you're an admin for Linux servers that are going to be doing any real kind of work, you'll need to know how to make sure they're running right. You need to understand how the CPU, memory and disk get utilised by the OS, and to do that you need to know how to use a few essential tools and how to interpret the results.

I'll try and write this so admins coming from a Windows background can understand how Linux works compared to Windows.


There are 3 things you need to be concerned about with regards to how the system is performing from a processor point of view.

1. CPU utilisation percentage
2. CPU run queue (load)
3. CPU I/O wait

In Windows, you mostly are just concerned with CPU utilisation from just a single percentage figure with the maximum being 100%. This isn't really the whole story though.

In Linux, if you have 4 cores in total, the CPU utilisation will be shown as a percentage with the maximum 400%. That may seem strange to someone used to seeing 100% as the maximum but it actually makes more sense to add up the totals of each core and show you all the cores together.

The thing to understand about this is that CPU utilisation isn't actually how busy your system is. It's a part of it, but not the whole story. It's simply a representation of how long the CPU was seen as being busy over a time period. If the system looks at a CPU core for 10ms, and that core was busy for 2ms, it will be 20% busy. It will then sample the other 3 cores, and add those to the total. If they were also all busy for 2ms out of that 10, the total CPU utilisation of the system will be 80%, with the maximum being 400%.

We have a percentage of how busy the CPU is, why isn't that the whole story?

Well, if a CPU core is used by 1 process for 2ms out of 10ms, but for those 2ms there are also 5 other processes waiting to jump on that core and do stuff, a utilisation of 20% isn't really accurate is it? Because for those 2ms, the system is actually trying to do 5 times more than it actually can.

When you understand that both the CPU utilisation _and_ the CPU load are factors to be taken in conjunction with each other, you can interpret what the tools tell you.


top - 16:49:48 up 14 days,  6:18,  5 users,  load average: 2.75, 3.64, 3.87
Tasks: 315 total,   1 running, 314 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.3%us,  1.6%sy,  0.0%ni, 86.4%id,  3.1%wa,  0.1%hi,  0.5%si,  0.0%st
Mem:  98871212k total, 81501412k used, 17369800k free,    50108k buffers
Swap:  9446212k total,    32700k used,  9413512k free,  7281528k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                           
32031 mysql    -10   0 71.4g  69g 6132 S   97 74.2  13958:45 mysqld                                                                                                                                                                                                         
28358 root      20   0 44176  15m 1280 D   63  0.0 210:01.71 mysqlbackup                                                                                                                                                                                                       
19749 root      20   0 69624  18m 3160 S    7  0.0   3:02.40 iotop                                                                                                                                                                                                             
 6183 root      RT   0  161m  37m  17m S    5  0.0   1188:46 aisexec                                                                                                                                                                                                           
 5397 root      39  19     0    0    0 S    1  0.0 241:39.12 kipmi0                                                                                                                                                                                                         
 2971 root      15  -5     0    0    0 S    0  0.0  65:52.74 kjournald                                                                                                                                                                                                         
    1 root      20   0  1064  392  324 S    0  0.0   0:16.52 init 

top is the standard age-old tool for quickly looking at what's going on. The system above has 8 cores, which are hyperthreaded, so I know that it has 16 logical processors available (generally found out from cat /proc/cpuinfo). When I look at the processes, the mysqld process is taking 97%, but that's from a maximum of _around_ 1600%.

Then, as I said above, we can also look at the system load, which is represented as load average. In the output above, I can see that the first figure of 2.75 is the average over the last 1 minute, 3.64 over the last 5 minutes and 3.87 over the last 15 minutes.

What do these figures mean?

While the system was sampling how much was running on each usable CPU core, it also looked at how many processes were waiting to run. Out of 16 queues, around 3 were waiting at any time, 1 process has taken 97% of 1600%, and another 63%. Therefore, actually, what looked like the system was fairly busy, really has a lot of room to get busier. Until we're consistently filling almost all of the queues (16 on this system), and the CPU utilisation is getting nearer a total of 1600%, we don't need to worry.

The following is a Munin graph of the same system. We can see that the Max idle is 1600, and we're nowhere near it.

And this graph shows the load average

Again, it backs up what we saw from top. We don't have to worry about the load on this system, and we know this by combining the utilisation and load average to see what's really going on.

But what about IO?

A 3rd variable comes into the mix which complicates it a little further, which is IO wait. If a process is running on a CPU core, but you have a slow IO subsystem (e.g. a slow disk, or a saturated fibre channel host bus adapter), the process can be waiting for an IO request to complete. This in turn increases the CPU utilisation and the load average.

If you're seeing high CPU usage and need to find out why, you can see if it's IO wait by using vmstat.

These figures are from a web server. You can see that the io column has no blocks in and a few blocks out now and again. The blocks out are likely to be log files being written, and as it's a web server, everything is already in memory and doesn't need to be read in. No IO issues here.

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0    100 266688 302404 5135804    0    0     0     0 17822 24008 15  1 84  0  0
15  0    100 266532 302404 5135820    0    0     0   124 16510 24104 12  1 87  0  0
 0  0    100 265504 302404 5135848    0    0     0     0 18332 24488 17  2 82  0  0
 4  0    100 264312 302404 5135852    0    0     0     0 16986 23787 14  2 84  0  0
 6  0    100 265476 302404 5135864    0    0     0   344 16711 23948 15  1 83  1  0

This one is from a database server. You can see that the blocks in and blocks out (1 block is 1KB) is a lot larger, and as I ran this as vmstat 1 it's cycling every 1 second, so it was reading 30-50MB/s and writing 10-20MB/s.

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0     33    879     57  24340    0    0 36108 15304 27169 42413 10  3 83  4  0
 2  1     33    852     57  24384    0    0 36576 15762 26833 40486  9  3 85  4  0
 2  0     33    780     57  24439    0    0 47296  9735 21587 33633  8  2 85  4  0
 0  1     33    721     57  24499    0    0 49496 19881 22993 36320  8  3 86  4  0
 4  0     33    683     57  24547    0    0 42176 13993 23573 36176  8  2 87  2  0
 5  2     33    632     57  24595    0    0 38748 10611 26785 41753 11  3 76 10  0
 4  0     33    584     57  24636    0    0 37636 12618 23149 36298 14  2 80  4  0
 6  0     33    551     57  24685    0    0 34060 13504 25268 39642 14  2 79  5  0
 3  0     33    481     57  24739    0    0 50360 10973 24150 37552 13  2 82  3  0

That's a lot of throughput. Is it affecting the CPU by waiting on IO? Well, the 'wa' column in 'cpu' are figures in a percentage of 100%, so the single digit figures compared to the 'id' (idle) column, it's not waiting on IO for very long at all. Therefore, this server is heavily utilised for IO, but it's not affecting CPU utilisation or system load due to having decent IO.

IO is a bit easier to see by using iostat, which gives you % utilised of your IO subsystem.

# iostat -x -d 1
Linux (xxxxxxx)      09/19/12        _x86_64_

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00  349.00   71.00 92320.00 21136.00   270.13     0.99    2.35   1.04  43.60
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0            145.00  1795.00  349.00   71.00 92320.00 21136.00   270.13     1.15    2.77   1.07  44.80
dm-1              0.00     0.00  493.00 1857.00 91976.00 14856.00    45.46    30.01   13.64   0.19  44.40
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   14.00     0.00  9490.00   677.86     0.20   14.57   2.00   2.80
dm-2              0.00     0.00    0.00   14.00     0.00  9490.00   677.86     0.21   15.14   2.57   3.60
dm-3              0.00     0.00    0.00   14.00     0.00  9490.00   677.86     0.21   15.14   2.57   3.60

Even easier to use is iotop, a layer on top of iostat to make it more like a top style interface.

Finally, on to memory

Memory is really misunderstood in Linux. Unused memory is inefficient. Some people see the below and panic.

# free -m
             total       used       free     shared    buffers     cached
Mem:         96553      93561       2992          0         73      21044
-/+ buffers/cache:      72443      24110
Swap:         9224         31       9192

That's 93GB used of 96GB installed RAM in the server going by the Mem: row.

Wrong. The Linux kernel grabs as much memory as it can, leaving only a small amount unused and then dishes it out to applications which request it. Anything which isn't requested by an application is then utilised for buffers and caches, including the IO buffer. Read the values in the -/+ buffers/cache line. 24GB is free, and 72GB is used by applications. That's obviously still a lot, but this is a database server, and we want to give the database engine as much memory to cache stuff as possible.

Here's another one from a slightly more modest server:

# free -m
             total       used       free     shared    buffers     cached
Mem:           463        397         65          0        134        136
-/+ buffers/cache:        126        336
Swap:          475         10        465

463MB of RAM and only 65MB free?! Nope, 336GB free as the kernel hasn't needed to dish it out and has allocated it to buffers and caches.

Sunday, 2 September 2012

How To Make A Great CAT5 Cable

If you want a really nice rack install, or you're cabling up long runs of Cat 5/6 twisted pair cables with RJ45 connectors, you really need to make your own cables. It can be a bit of a pfaff getting the RJ45 connectors on the end, but I'll show you how I've found the easiest way to do it.

You need:

* Crimping tool. Can't really do it without one.
* Side-cutters. If not included in the crimping tool.
* RJ45 connectors.
* Cat 5/6 cable.
* Recommended: Cable tester. Really helps if you're doing a lot.

1. Measure the cable to the correct length and cut.

2. Strip the sheath off using the crimp tools sizer, or about 15mm.

3. There you'll see 4 twisted pairs of sheathed cables. 1 x Orange / Orange-White striped. 1 x Green / Green-White striped, 1 x Brown / Brown-White striped, 1 x Blue / Blue-White striped.

At this point, make sure your boots are on, if you're using them.

Don't forget your booties, it's cold outside!

4. Untwist all 4 pairs about 2 or 3 times, trying to straighten each wire as you do so.

5. If you can get them into the same kind of order as on the right, it will be a bit easier. Don't worry if not.

6. By pushing the cables together, get the order of Orange, Green and Brown, with the striped cables being first.

7. Then get the Blue cables, swap the stripe / plain order back to front, then push in between the Greens.

8. Try to straighten them as much as possible. If you need to trim them with side-cutters, now's the time.

9. While holding tightly, gently push the wires into the RJ45 blank, checking they're still in the correct order before pushing all the way home.

10. Then push all the way home very tightly. Check the wires on the edges are pushed up to the end.

11. Then crimp tightly.

12. And admire your handywork, testing with a tester if you have one.

Monday, 2 April 2012

Prying Dave's Folly

Hello Dave
So the UK government wants to monitor your emails, web usage, calls and texts. Let's all panic.

But, let's not. This government is akin to a racist on Twitter, expelling rubbish directly from their deep, dark fantasies, not allowing the inhibitions to take hold and reign them in. I doubt this one passed the Cabinet Reality Assessment Procedure (CRAP) test before going public. Fabulously they've gone public very prematurely on this one, not bothering to actually consult anyone who knows what they're talking about, or listening to the wrong people, and at the same time showing their hand. While you should worry about the intentions and the implications, you needn't worry about it actually happening. Here's why.

Let's start with web site visits. This is the easy one. There are a few stages involved in getting your computer to communicate with a server somewhere else in the world which serves you web content. The first part is a DNS lookup. You type in and your computer goes to your configured DNS server (usually your ISP) to translate the DNS address to a numeric IP address for direct access. This DNS request in plain text and transmitted in the clear and can be intercepted by your ISP. So it's relatively easy for the government to pressurise your ISP into syphoning off all DNS requests into their own systems to log and analyse. They would be supplied lists of web sites that you have requested in your browser, or indeed, anything that your computer has been instructed to access, either by you or by some nefarious bit of spyware / malware you've been afflicted by. This is also something to be mindful of, not everything your computer accesses is initiated by you.

The next bit is the actual transfer of data, and where it gets interesting.

Years ago, it became apparent that we needed to secure information exchanges across the Internet. Something called Secure Socket Layer (SSL) was invented, and then more recently, a enhanced version called Transport Layer Security (TLS) superseded it. This works by encrypting the data sent between the web site and your computer. For your specific session, it can only be decrypted at your end, or the on the web server serving the content; nowhere in between. 
By now, every web site should be SSL by default. It should be that every web site you visit should make your address bar turn green. Those sites that don't do this just need a little kick up the owners' / administrators' arses. (Conveniently, a good way of doing this is to introduce something which will make users much more likely to visit if they do, such as a government snooping on you...) Then, if web browsers start attempting to connect to web sites using SSL first, and then falling back to plain text with a warning if it can't, all web sites would very quickly be SSL only. 
So that's intercepting traffic in the middle taken care of, but what about the actual web sites? Well, Big Bad Dave can't get every single web site everywhere to log traffic for him, so there's no way they can monitor what you're actually doing on a web site that is secure. They can see where you're going from DNS lookups (and then only maybe, I'll elaborate later), but not what you're doing when you're there.

Now that we've seen how SSL and TLS secures conversations between users and web sites, it makes email security a bit easier to understand. There are 2 major ways people use email; webmail and remote mail servers. Communications with webmail servers happen through your browser and are subject to the exact same encrypted SSL traffic as visits to any other web sites. Take GMail and Hotmail for example, both enforce SSL by default, so sending someone with a GMail address an email from your own GMail address means that the mail never goes outside of Google, and neither your ISP, or anyone else can see anything to do with what's in the mail, who it's for etc. They would need Google to give them info for that...

It gets slightly more complicated when mail goes outside of webmail. If you send an email from your GMail to, and the mail server for is a standard old SMTP server, then the info is likely to be sent in the clear and it can be intercepted. However, any aspiring terrorists (which these plans are made to catch, right?) will encrypt the mail before it gets sent. There is Pretty Good Privacy (PGP), and the later, better version, GNU Privacy Guard (GPG), both of which are personal level encryption standards where it's simple to encrypt a document, or mail etc. The use of these tools is widespread, and will become the default simple way of doing things in mail client programs such as Outlook and Thunderbird, with support in webmail coming soon after.

A final consideration is a Virtual Private Network (VPN). VPNs were invented to provide privacy and security between computers communicating over the Internet. They provide a layer of segregation where the data is being sent over public networks, but the data can only be read if you're part of the private network. In the late 90s and early 2000s, when the arab states started getting more western immigrants wanting the same unrestricted Internet access they'd become accustomed to, they attempted to control access at the ISP level. This caused the users to use VPNs to subvert any interception or security in place, and meant that the ISPs were unable to block specific types of traffic inside the VPNs. I personally supported someone who moved from the UK to Dubai and found that he was unable to use Skype out there as the local ISP had blocked Skype in favour of their own paid-for version. A simple VPN configuration later and he was using Skype and there was nothing the local ISP could do about it. This is also where it comes back to DNS lookups, because if they're done inside of a VPN, they're also encrypted and can't be intercepted. The Tor Project is a mass VPN which allows normal users to remain anonymous in much the same way.

So to summarise, yes, if the government want to snoop on which web sites you're visiting, then it's not difficult for them to do so, unless you use a different DNS server. It becomes almost impossible for them to see what you're actually doing on a web site, unless the site owner is willing to give them traffic logs (which is unrealistic, it places far too much overhead on the site owners). Most people use webmail now so that is already secure and works by the same rules as secure web sites.

What can you do to make sure the government can't snoop on you? Well, start by encouraging the use of web sites which are secure by default. Always type https:// at the start of an address to attempt to connect securely.
Then, use a different DNS server to your ISPs. Nice, reliable DNS servers are Google's which are and It won't stop your ISP from being able to intercept DNS lookup traffic, but they can't just hand over simple logs.
Then, if you're not using webmail, use GPG to encrypt your emails.

In the end, the logistics of the government being able to snoop and log everything everyone does are insurmountable. But they only want you to be worried that they could be looking at any time, the fact they actually can't doesn't enter into it. Even then, it's almost useless for them to do so. This is due to logistics on databases, but that's a whole other topic.

It's also important to distinguish between anyone being able to look at where you're going, and being able to look at what you're doing. The media would like you to believe it's the latter, but this is only in very few cases. So for now, don't worry about it. Just think about your privacy and look for encryption everywhere. The web is built on some pretty solid foundations and a massively right-wing, paranoid, temporary government can't change that.