How to interpret logs

In a previous blog I promised to show how to use pipes to read log files better. This can be an intimidating process to say the least, lets give an example.

[root@DNS01 logs]: ls -lh
-rw-r--r-- 1 root root    0 Mar 28 04:02 access_log
-rw-r--r-- 1 root root 320K Mar 24 22:14 access_log.1
-rw-r--r-- 1 root root 327K Mar 24 03:36 access_log.2
-rw-r--r-- 1 root root  94K Mar  6 21:48 access_log.3
-rw-r--r-- 1 root root 1.7M Feb 28 03:10 access_log.4

Notice I have almost a megabyte of access logs here. This is a sandbox that only I really play on. I’ve got WordPress, Drupal and a few other minor things installed. Nothing special virtually NO traffic. Regardless, when we do:

[root@DNS01 logs]: cat access_log* | wc -l
16176

There are over 16,000 entries! That is a lot of text to go through, and these are TINY compared to what you can see on a production server which is likely millions of lines. Without a good question we will not get a good answer, so we have to know the data we are looking for and how to narrow down your data pool to only the entries you need, or at the very least an amount you can go through manually. Some times this means that you wouldn’t even know the log file that you were looking for. At this point, you could use grep in the capacity

grep searchterm -ilr /logdir/*

and this should give you some file names where you can process data at. I don’t usually have to do this unless I am unfamiliar with the application. If you’re getting an error with a configuration this is a GREAT way to find out what configuration file contains the parameter you need to set.

Getting back to the task at hand, which is how to interpret our findings lets just say that I was looking for some PHP code that was causing trouble. Because network traffic monitoring is set up, I know that I had an outbound flood to the IP address 122.222.233.234. I can grep for the IP in question in the apache logs to find out what and where is happening. These can be set up in a few different ways, so assuming you’re on CPanel I would run

[root@DNS01 logs]: grep 122.222.233.234 -ilr /etc/httpd/domlogs/*

and see if anything came up with this IP address in the actual page title. It would likely only be entries for one or two PHP pages, and they would likely be highly suspicious. Lets say that we found a script called suspicious.php mentioned that had this IP in the get string, and it was mentioned a ton of times in one domain’s log. We could find out how many times this was mentioned by running:

[root@DNS01 logs]: grep 122.222.233.234 -ilr /etc/httpd/domlogs/domain.com | grep suspicious.php | wc -l

and this would tell us the number of times it has been ran. Depending on the attack script this can be thousands of log entries. Lets say we wanted to see if they had been doing anything else on the server, but the suspicious.php access entries were so numerous that we have thousands of lines to search through. Instead of searching through these we would just run:

[root@DNS01 logs]: grep 111.222.254.254 -ilr /etc/httpd/domlogs/* | grep -v suspicious.php

Where 111.222.254.254 is the IP that accessed suspicious.php. This would eliminate any entries that contained suspicious.php from being shown. We can do this multiple times with different terms as necessecary. By the gradual inclusion or exclusion of terms we can process the logs into usable data. It takes a bit of time but you can track compromises, spammers and other server side problems with this. You can even do it in real time. Lets say you had a 503 error. All you would do is run:

[root@DNS01 logs]: tail -f /etc/httpd/log/error_log | grep 

Where is the IP of the computer you are accessing the problem page with. This will filter out any requests by other IP addresses. It would be virtually impossible to use tail -f by its self on a server with high traffic because the entries would scroll too fast. This way it shows only the entries you give. This works great for email sending problems as well, lets say you don’t think your email is being sent out. All you have to do is run:

[root@DNS01 logs]: tail -f /var/log/exim_mainlog | grep 

Where the email is the sending or receiving address. This will show you flow of the email through the server.

How to rebuild an initrd

To start off with, what’s an initrd? This is the initial ram disk that the system sets up to boot from. It has the Linux kernel in it, a basic set of modules and a few other things. After so long in the boot process the system gets switched over to the main partitions and then boots up the rest of the way.

How do we know we have a problem? That’s fairly simple. If we have a system that gives an error pertaining to a “switchroot” and panics we can scroll up and see if the drivers aren’t there. In the most recent case I had to deal with, it was due to there not being any 3ware modules. I have seen this happen in the past with drives moved between systems as well with radically different chipsets (think AMD versus Intel chipsets.)

Switchroot errors can happen due to a ton of reasons, these include the FS not being in existance, files missing from the FS or the modules not being loaded for the hardware in question. Use your head when you get them, but if you seem to have the above problem the solution is pretty simple.

All you have to do is run the /sbin/mkinitrd command. This is as simple as doing something along the lines of:

/sbin/mkinitrd initrd-2.6.18-194.el5PAE.img -v 2.6.18-194.el5PAE

Provided there isn’t a file with that name in the FS, a few moments later you will have a fresh initrd with whatever modules are loaded on the system at the time provided the kernel you are using has them available to it. Pretty cool huh? I will get more in depth with initrd at a later date, you can even hack them to make your own mini OS with.

Pipin ain’t easy (unless you read this guide)

I have gone over more than a few Linux based commands at this point, so I want to introduce a new way of using them; Pipes. Pipes are really cool because they will let you take the command and put its output into another command. There are a nearly infinite amount of ways to use them as well. Enough with the introductions, lets get on to the commands. Lets say that I want to determine the number of connections currently open at the moment on port 80. We can do the command

netstat -anp

And it will give us a distinct number of entries. We are left looking for traffic that is on port 80, and we have a bunch of lines we don’t care to bother with quite frankly. This is where the pipe comes in.

netstat -anp | grep :80

This will show any traffic that is to port 80, or any traffic that is going to port 80 on a remote server. Lets say we have a huge amount of traffic though, and want to just get a count. If you have a high use server counting by hand is a nuisance at best and virtually impossible at worse. At this point we would just throw the results of our grep into another pipe like this:

netstat -anp | grep :80 | wc -l

And then we have a raw number of accounts used on port 80. Pretty neat. Because pipes are an infinitely versatile tool we can use them for dealing with static files or dealing with the server in real time using utilities like tail. Want to know more about tail? Check out my next blog, I’m going to show some of the tricks on server side troubleshooting.

Sort Files with Bash

I’ve always been a fan of collecting fonts, as I go across the net I find random font files and save them for later use. Seems like I can never find the fonts I want because they are scattered all over and unsorted. Got tired of having unorganized font files laying around, so I wrote this to organize them.

#!/bin/bash
#Edit path to the location of where you want your fonts organized
#create a folder called "Unsorted" and place all the files into that.
path="Fonts/"
cd "${path}Unsorted/"
for mFile in *
do
  #Check rather its a file or folder, if its a folder skip it!
  if ! [[ -f $mFile ]]; then continue; fi

  #Grab the first letter of the filename and set it to $mFirstChar/
  mFirstChar="${mFile:0:1}"

  #Convert all lowercase fist letters to upper case.
  mFL=$(tr "[:lower:]" "[:upper:]"<<<"$mFirstChar")

  #If the filename contains any chars such as "!@#$%^&*()" at the beginning
  # set the $mFl var to "MSC".
  if [[ $mFL != [[:alpha:]] ]]; then mFL="MSC"; fi

  #Make directories, prompt for overwrite and create dir if it does not exists.
  mkdir -ip "../${mFL}"

  #Move files to there new home.
  mv -v "${mFile}" "../${mFL}/${mFile}"
done

Cleaning spam out

I’ve seen my fair share of spam in my day, however in a lot of cases a server has a sysadmin that doesn’t quite know what to look for in order to track it down. In fact a lot of these machines are caught because they crash due to the load put on them. While my methods are probably not the best way of doing it, they do work and you can clean a queue out without just deleting everything and starting over. You can also use an email verifier and cleaning tool to help you manage your emails.

You can follow these simple steps if you want.

First thing’s first, on a CPanel/Exim (the most common setup you’ll see if you do CPanel hosting) we simply go into /var/spool/exim/input.  At this point you will see a bunch of directories, there should be about 62 here. The letters A through Z in both upper and lower case as well as 0 through 9. These are subcategories of mail. if we do

[root@dns01 input]: ls ./* -lh

here you will see a bunch of files that have -D and -H after them if your queue is being stuffed. If this is empty, then you may be spamming however at a lower rate. Now then, lets say that we happen to have a spamming problem, there are a few things we can do here. The first thing we need to do is isolate the source of the problem, if at all possible. After that removal of everything (directories and all) is an option especially if you’re doing low end or free shared hosting. If you have to preserve your ham though, you will have to clean. The first thing that needs to be done is to figure out what content is there. I prefer to view a few of these emails and find a topic in them. So lets say your email is about some Nigrian King who is poisoned. all we would have to do is run:

[root@dns01 input]: grep -ilr 'Nigeria' ./* > check

This will populate the file check with the names of the files containing “Nigeria” These are our probable spam messages. You may lose a few legitimate emails this way but the vast majority will be safe. If you use a phrase you are virtually assured that no legit emails will be deleted.

so we would do something like this:

[root@dns01 input]: sed -e 's/^/rm -f /' check

[root@dns01 input]: sed -e 's/\-D/\-\*/' check

[root@dns01 input]: chmod 755 check

[root@dns01 input]: ./check

This will go through and delete the emails and their associated headers. While this isn’t too hard, spammers are like mice, or roaches. Once they find a way in they are sure to be back. To this end we need to at least try and find the hole before we delete the evidence. We can actually take the messages and move them instead which would allow us to view at our convenience, or we can evaluate the headers in the input directory. There are two things I tend to look for.

  • Rogue PHP scripts
  • Open Relays

PHP scripts can be a hotbed of insecurity on many servers. Personally I’m not a fan of how a lot of people use or misuse them. This being said it’s a necessary evil because of how much function PHP adds to a site. This being said, the quickest and dirtiest thing to add is mail() headers. This can be done via Easy Apache or you can download the latest and greatest from http://choon.net/php-mail-header.php. This gives you something convenient to track with. Now then all you have to do is run

[root@dns01 input]: grep -ir php ./*

before you delete the email you will get every mention of the PHP file name in question. All you have to do is find the one(s) that have a lot of entries and they are either going to be for message boards or other lists or they are going to be spam scripts. Pretty simple.

In regards to open relays, just find a checker via your search engine of choice and go to town. Telnet can also make a fun checking tool to see if you can relay or not, but that is likewise another episode.

Some SED basics

One of my favorite tools for Sysadmin work is the Stream EDitor utility or just SED. SED is useful for many things, and is a stepping stone along the way to making variable based shell scripts as well. Don’t want to have to edit the nameservers on a million zone files? SED it. Need to do certain things to a million files at once? SED it. In conjunction with cat, find, and grep SED is devastatingly effective in finding and eliminating administrator headaches. Lets start out with something extremely basic.

sed -i ‘s/ns1.domain1.com/ns1.domain.com/’ /var/named/*.db

What does this do? It goes through and changes the instances of ns1.domain1.com to ns1.domain.com in DNS zone files. Please note I advise grepping any thing out that you are changing because if there are multiple instances of this in the file it will only do the first instance.  If you have this issue, you can always repeat the command and check again until all instances are found. the s indicates a spelling correction, the -i puts it back into the original file. If we just wanted to print to the TTY we would use the -e augment.

Well, that’s pretty cool, but what about some other situations that come up? Lets say we’re migrating a cpanel box. There are a ton of scripts out there, but we have some special needs. Say we want to run it with –skip-homedir because this is going to be a pseudo-manual migration and we’ll sync the homedir over later. All we have to do is make a copy of /etc/users and then do the following:

sed -i 's/$/\/scripts\/pkgacct /' users

sed -i 's/^/  \-\-skip\-homedir\' users

chmod +x users

./users

Yes this is a few commands, but we want to progressively look over what’s happening here. The $ means that /scripts/pkgacct is prepended to the beginning of each line. Notice that there is a space at the end of the command so that the user name doesn’t become part of the command we are trying to run and error out. Also notice the \es. These are an escape character that is used in order to allow the use of a special character such as /  – . or other characters that may otherwise be taken as part of the command. The second line is similar to the first in the fact that it will add to every line in a file as well as the use of escape characters, however ^ will add to the end of every line. The last thing we are doing is making our script executeable with the +x command (you can chmod 755 if you want and get similar results) and then runs it. If we were smart we would probably put a she-bang at the top (#!/bin/bash) so that it is run with BASH.

Our input file would look like this:

user1

user2

user3

user4

and the output would look like

/scripts/pkgacct user1 --skip-homedir

/scripts/pkgacct user2 --skip-homedir

/scripts/pkgacct user3 --skip-homedir

/scripts/pkgacct user4 --skip-homedir

There are a ton of uses for this, I love cleaning spam in Exim’s mail queue with this if you’re not allowed to BOFH the system and delete the “clean” email with the spam as well. That will be a later episode however.

Linux Daily Tip – Bulk Delete Files By Name

Ever find yourself tediously removing files by hand, across a large, complex directory structure? Luckily, there’s a simply, easy way to delete files, as long as they have a part of their name in common.

For example, to delete all PDF files in this directory, recursively, run

find . -name "*.pdf" -exec rm -f {} \;

Linux Daily Tip – Concatenate

A extremely useful tool for outputting the contents of a file is “Cat” short for Concatenate.  Cat will print the standard output onto the screen.

A useful example:

cat -n  -s /proc/cpuinfo | more

Options: -n will number the lines outputted while -s will suppress excess empty lines.

Its also useful to pipe the output into more to make it easier to read.

Want to learn more about pipe? Checkout Alex’s article here.