Openstack Kilo (OpenVSwitch) Networking in a nutshell

 

OVS… its simple really!

It’s taken me almost a week to figure out how they expect the OVS networking to work, and no one explains its simple.  So heres a 30 second explanation that will actually make sense.

You have 3 openvswitch bridges,  br-int, br-ex and br-tun.

The VM all get ports on br-int, br-ex is used for actual network traffic and br-tun is used for the tunnel interfaces between instances.

OpenVSwitch creates flow rules with virtual patch cables between br-ex and br-int to provide connectivity.

Add your physical interfaces to br-ex, create a management port with type internal so linux can add ips to it.  In the below example we use load balancing to combine 2 nics for redundancy.

 

ovs-neutron

Commands to build this configuration:

ovs-vsctl add-br br-ex
ovs-vsctl add-br br-int
ovs-vsctl add-br br-tun
ovs-vsctl add-bond br-ex bond0 em1 em2 — set port bond0 bond_mode=balance-slb
ovs-vsctl add-port br-ex mgmt tag=15 — set interface mgmt type=internal

What it should look like:

[root@s2138 ~]# ovs-vsctl show

0646ec2b-3bd3-4bdb-b805-2339a03ad286

    Bridge br-ex

        Port br-ex

            Interface br-ex

                type: internal

        Port mgmt

            tag: 15

            Interface mgmt

                type: internal

        Port “bond0”

            Interface “em1”

            Interface “em2”

    Bridge br-int

        fail_mode: secure

        Port br-int

            Interface br-int

                type: internal

    Bridge br-tun

        Port br-tun

            Interface br-tun

                type: internal

Installing OpenVSwitch 2.3.1 LTS on CentOS 6

yum install kernel-headers kernel-devel gcc make python-devel openssl-devel kernel-devel, graphviz kernel-debug-devel automake rpm-build redhat-rpm-config libtool git

cd /root/

wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.64.tar.gz

tar xvf autoconf-2.64.tar.gz

cd autoconf-2.64/

./configure

make

make install

 

cd /root/

wget http://openvswitch.org/releases/openvswitch-2.3.1.tar.gz -O /root/openvswitch-2.3.1.tar.gz

 

mkdir /root/rpmbuild/SOURCES

cp /root/openvswitch-2.3.1.tar.gz /root/rpmbuild/SOURCES/

rpmbuild -bb rhel/openvswitch.spec
rpmbuild -bb rhel/openvswitch-kmod-rhel6.spec

rpm -ivh /root/rpmbuild/RPMS/*.rpm

 

You can also use our public repo here for cloudstack.

http://mirror.beyondhosting.net/Cloudstack/

 

Recommendations I make to save critical data

First off, your data is the most valuable part of any server. There are many many hour of very hard if not impossible to replace work involved in setting up even a fairly basic web site. This doesn’t even include things like client information, orders etc. that directly cost you money if you lose them.

Not all backup methods are for everyone. The reason is that there are widely variable needs for data security as well as a wide variety of budgets. Someone with a page that is doing e-commerce transactions will likely need a lot more in regards to backups than someone with a bi-weekly blog for instance.

First off, there are two different modes of failure one will encounter as a sysadmin. The first is a “hard” failure. This includes drives or RAID arrays (yes it does happen) going bad. I love RAID, I think it’s a great measure to ensuring data protection but it’s not fool proof by any means and is no substitute for backups.

The second type of failure is the “soft” failure. With this failure mode for whatever reason data on the system is gone. This can be anything from a user deleting off their public_html directory to data corruption because the drive is heavily over run. Commonly this is someone running an FS check on a machine and having it dump a few thousand files to lost&found. I have seen my fair share of machines come up after this and run fine, and have seen plenty that didn’t too. This can also be the result of hackers etc. messing around on your system. Something I will warn of is if you use a secondary drive in the same server for backups, it can be something that is deleted by hackers as well. If you leave the drive mounted after backups are done and they do rm -rf /* it will be erased. Be sure to unmount your backup drive if you use this method. In general I do not advise relying on it for this reason, however it makes for a great way to have backups on a system without waiting for them to transfer.

The first rule I have is no matter what you should have minimum three copies of your data, at least one of which is totally off site and not within the same company as your server/colocation/shared host etc. This gives you options if something happens, and you’re not relying on one group of people to ensure your data is in tact.This can be as simple as having your system upload the files to a home or office computer via DynDNS and back mapping the port, then burning the images on to a CD weekly. On a higher level it can be storage by a company offering cloud storage such as Amazon.

How often you should back your data up and retain it is another question that is fairly common. This is largely subjective, and is a compromise between how much data you can afford to lose versus how much space you can afford. If you’re running a streaming video site, this can get quite pricey very quickly. Even to the point it may be best to try and get a low end server and put big drives in it to back up to. Afterall if you pay .50/gb and need a 1TB of backup space $500 buys a good bit of server!

What to back up is another good question. If you’re running a forum or something like that where there aren’t really all that many changes made to the underlying software, doing a single full backup and then backing the user upload directories (eg images) and the database may be enough. If the site is undergoing constant development, full backups would be a great deal more prudent.

The last thing to consider is how these backups are going to be made. I have done backups before with shell scripts, and used both Plesk’s and CPanel’s backup mechanisms. When doing a shell script for backups, you gain a ton of versatility in how and what you back up, at the price of being a lot more tedious to configure. These sort of backups are really nice if you’re wanting to make it so that your system backs up only certain things on varying interval. The panel based backups are so easy to configure, there is little to no reason you shouldn’t set them up. You just specify how often you want backups, where they will be stored and what will be backed up. The caveat I will warn about using a panel based backup system is that even with CPU level tweaks in the config files these can heavily load a system so my advice is to run them off hours.

16 x 256GB Samsung 830 SSD Raid 6 with LSI 9266-8i Controller in a Dell R720 (16 Bay)

 

As a systems administrator it seems like I’m constantly battling IO contention and latency in our san and local storage environments. So As months roll by these new SSD drives keep getting cheaper and cheaper, offering better write wear and longer life spans for high write intensive environments, so finally I’m taking the plunge to begin converting our most intensive systems over to solid state.

In the process of exploring solid state disk the samsung 256GB 830 series really stuck out of the crowd. The 830 offers fantastic read and write latency and throughput as well as being one of the only SSD series on the market where both the flash and storage controller are by the same manufacture.

The main reason for chosing the samsung is this benchmark at extreme systems.

 

 

Update: 8/24/12

We ended up going back to the dell H710P after having a few issues with the uEFI bios not playing well with the controller at post.  Not to mention LSI webbios is a horrible pile of useless shit, this is 2012 why the hell do we have this prehistoric pile of crap UI on a raid controller.  Whoever at LSI approved that to be shipped on the cards should be forced to stand in a fire.

The H710P has dells lovely customized controller bios which is keyboard driven EASY to use and FAST to configure with.   Performance of the H710P is actually a little bit better than the 9266-8i while the hardware is identical.

Another major issue with the 9266 is when you would remove a drive *failure simulation* and replace it, the controller would mark the new drive as bad vs treating it as a fresh drive to rebuild on.  Without the CLI or MegaRaid Storage Manager this is a rather annoying problem to deal with as you would need to reboot the system to fix it in WEbiboss11!!111.. POS.

The H710P obviously works with dells unified system and can be accessed a number of ways without the operating system even knowing about it.

 The configuration:

  • 16x Samsung 830 256GB MLC SSD
  • Raid 6 with read and write caching (BBU backed).  64KB Block Size
  • Dell R720 16 Bay 8i SAS6 Expanded Backplane  2 Ports 16 devices.

The Benchmarks!

Here are some prelim benchmarks of the actual performance inside a VMware machine.

LSI 9266-8i

Children see throughput for 32 initial writers  =  214905.26 ops/sec
Parent sees throughput for 32 initial writers   =  198172.68 ops/sec
Min throughput per process                      =    6392.06 ops/sec
Max throughput per process                      =    7173.76 ops/sec
Avg throughput per process                      =    6715.79 ops/sec
Min xfer                                        =  925970.00 ops

Children see throughput for 32 readers          =  734057.97 ops/sec
Parent sees throughput for 32 readers           =  734011.56 ops/sec
Min throughput per process                      =   22833.85 ops/sec
Max throughput per process                      =   23062.16 ops/sec
Avg throughput per process                      =   22939.31 ops/sec
Min xfer                                        = 1038205.00 ops

Children see throughput for 32 random readers   =   55662.96 ops/sec
Parent sees throughput for 32 random readers    =   55662.71 ops/sec
Min throughput per process                      =    1730.88 ops/sec
Max throughput per process                      =    1751.76 ops/sec
Avg throughput per process                      =    1739.47 ops/sec
Min xfer                                        = 1036073.00 ops

Children see throughput for 32 random writers   =   19827.16 ops/sec
Parent sees throughput for 32 random writers    =   19090.45 ops/sec
Min throughput per process                      =     584.53 ops/sec
Max throughput per process                      =     663.61 ops/sec
Avg throughput per process                      =     619.60 ops/sec
Min xfer                                        =  967988.00 ops

Dell H710P

Children see throughput for 32 initial writers  =  489124.60 ops/sec
Parent sees throughput for 32 initial writers   =  435746.51 ops/sec
Min throughput per process                      =   14005.25 ops/sec
Max throughput per process                      =   17028.75 ops/sec
Avg throughput per process                      =   15285.14 ops/sec
Min xfer                                        =  860278.00 ops

Children see throughput for 32 readers          =  678563.56 ops/sec
Parent sees throughput for 32 readers           =  678524.72 ops/sec
Min throughput per process                      =   21111.18 ops/sec
Max throughput per process                      =   21253.53 ops/sec
Avg throughput per process                      =   21205.11 ops/sec
Min xfer                                        = 1041599.00 ops

Children see throughput for 32 random readers   =   59482.27 ops/sec
Parent sees throughput for 32 random readers    =   59482.00 ops/sec
Min throughput per process                      =    1851.91 ops/sec
Max throughput per process                      =    1869.25 ops/sec
Avg throughput per process                      =    1858.82 ops/sec
Min xfer                                        = 1038852.00 ops

Children see throughput for 32 random writers   =   20437.99 ops/sec
Parent sees throughput for 32 random writers    =   19228.06 ops/sec
Min throughput per process                      =     610.33 ops/sec
Max throughput per process                      =     695.63 ops/sec
Avg throughput per process                      =     638.69 ops/sec
Min xfer                                        =  945641.00 ops

 

 

Update 7/20/13!

So we’ve been running this configuration in production for almost a year now without fault.   Performance remains fantastic and we’ve had 0 disk failures or faults.

We’ve began testing on the 840 PRO series of disk and so far testing has not been as favorable, having some minor issues with 512gb drives being kicked from the array or faulting for no apparent reasons.

I can confirm that the 840 pro series are NOT compatible with the 24 bay chassis, the backplane power is designed for 12v utilization and the samsung drives are 5v.  You will have random system lockups with a message about not enough system power available.  If you need to populate a 24 bay chassis we recommend looking at the intel emlc drives which utilize 12v power optimization.

The Sword of SEO part II

Well, it’s been a long time since I posted the first article on this. My time or lack thereof got the best of me. To counter this attack is actually very very easy. The first thing you do is you find out who is the referrer. This is simply done by tailing the logs. If you have a single domain, this can be fairly easy. Otherwise my preferred method involves using “watch ls -l” and seeing which log grows the fastest. This tends to be the one getting hit, or a likely suspect. I will probably write a perl script later to check this and tell me which log grows the most in say 10 seconds eventually. After this, you can use tail in the manner of:

tail -f /etc/httpd/domlogs/domain.log

When you do this, you will see what IPs are querying the page and the source they are being referred from. Look for any thing that doesn’t look like a search engine. To actually block them after they are identified what you do is you block the attack based on a referrer in the .htaccess. See the convenient rewrite code I jacked off another web site (about the same I did when I really saw the attack.)

RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} attacker\.com [NC]
RewriteRule .* – [F]

So, why does this work you may ask? In the case of the scenario I saw the person was attacking a “high value” target. This means a page that hits the database and has dynamically generated content with no caching. Server side configuration CAN make these sort of attacks a lot harder to perpetrate as well. Anything that you can do to increase the robustness of a server will help with a DoS. When you add a rule like this where it denies access to the referrer basically what happens is you pull up static content instead. Static content uses virtually no resources compared to something PHP based and backed by a databse. It’s a good idea to know about this sort of attack, as I could see it being bigger in the future. Black hat SEO is very common these days, and if you have the SEO part down the resources to do the rest of this attack are virtually nothing compared to what it does. It could also be plausible we will see this attack combined with “conventional, network level” type DoSing to increase its effectiveness.

A quickie MySQL backup script

I’ve seen my fair share of clients that need basic MySQL backups but have no control panel or don’t want to bother with Control panel based backups. This is a really simple setup that lets you do DB backups and put them in a local directory of the server. It would likely be easily modified to rsync to another server as well if you wanted to. There are a ton of options that could be added to this, your imagination (and shell scripting capacity) are the only limitations. Some suggestions I have would be

-Mail on success or failure and on old file deletion

-Connect to a remote DB

-Monitor the overall size

Well enough with the abstract, on to the shell!

#!/bin/bash
date=`date +%Y%m%d`
mysqldump –all-databases > /mysqlbackups/mysql-$date.sql
find /mysqlbackups/ -atime +30 -delete

If you notice, this takes up all of 4 lines. The first one is the she-bang, the second is establishing the date time stamp, the third dumps the databases and the last one purges any old backups. The only real variable you have to change here is the “+30” so that it is the number of days you want to retain the backups for minus one.

The sword of SEO

I was on a client server getting attacked, the DoS was heavily distributed. Since he’d mentioned something about someone linking to his web site, I was poking through the Apache logs. I noticed that one site was generating a huge amount of referrals. Investigating deeper, Ifound this on the referral site:

<iframe src=”http://www.domain.com” width=”1″ height=”1″ ></iframe>
0<br><iframe src=”http://www.domain.com” width=”1″ height=”1″ ></iframe>
1<br><iframe src=”http://www.domain.com” width=”1″ height=”1″ ></iframe>
2<br><iframe src=”http://www.domain.com” width=”1″ height=”1″ ></iframe>
3<br><iframe src=”http://www.domain.com” width=”1″ height=”1″ ></iframe>

…….

30<br><iframe src=”http://www.domain.com” width=”1″ height=”1″ ></iframe>

This is one of the slicker DoSes I’ve seen in a while. Because of the way it was set up it would be very difficult if not impossible to block on a network level and not traceable back to any particular IP on a network level (read:iptables, RTG or hardware firewall.) Within a few assumptions here this is what I believe to happen:

-Person sets up a web site with just a park page etc. on it.
-Person directs traffic to this using SEO. (back links, etc) to gain it status on search engines
-Person puts up the attack page similar to the above
-Every time a person from a search engine clicks the link, they load a few dozen copies of the page
-The iframe points to a “high value” target that generates a lot of load on the server, such as a forum or other dynamic content.

I personally saw this attack decimate a late model server with 16GB of RAM with enough IP distribution that it was not plausible to block it. It is viciously effective when planned out and done properly. It can also be done with virtually NO resources using a free shared hosting account. The person who loads it probably never realized they just made an attack on a server either. The plus side is that if you track it you can limit the damage done very easily provided you know what you are looking for. That will be my next blog.

Adding lots of IPs to a debian box

At work I had a client with a Debian system that needed a bunch of IPs added to it. Since it doesn’t really support ranges (at least that I can find) I came up with the following script.

#/bin/bash
j=42
for i in  {186..190}
do
j=$(expr $j + 1)
echo auto eth0:$j >> interfaces; echo iface eth0:$j inet static >> interfaces; echo address 192.168.41.$i >> interfaces; echo netmask 255.255.255.248 >> interfaces;
done

How it works is that j is the last IP in the ranges currently set in the interfaces file. The address is defined in the script, and the range is defined in the i= section. Just change the numbers to match what you want, put this into /etc/networking, run it and restart networking. This is only for five IPs but you could do hundreds or thousands this way if it was the desired affect. Or you can use a distro that supports ranges :>

Search Specific Files for Specific Content!

At Beyond Hosting we have a lot of customers who use CSF (Config Server & Firewall)  after about 30 installations of CSF the md5 checker can really cause problems for iowait.   So below is a script to check all the files for the setting of the md5 checker,  you can adapt this to check any file really.

for i in /vz/private/*
do grep "LF_INTEGRITY" $i/etc/csf/csf.conf
echo $i
done

Why you pay money for ECC RAM

Tonight presents a valuable lesson. I had a box running heavy MySQL duty that would crash at odd times. I could get MySQL to start, but the processes would die, it wouldn’t terminate cleanly, and even on a freshly started copy it was giving me “out of memory” errors. After fighting this for some time (say hours) and assuming that it was me the user, I checked the system in a bout of frustration.

Being a Xeon, my first look after rebooting it was in the error log of the BIOS. It had a lone ECC error in the log. Where I couldn’t even run show databases; before it will go through a check and stay up now. I bring this up as it presents two invaluable lessons:

A)It’s usually the software or the sysadmin that screws a server up. Not the hardware. That being said it is best to consider it. This is the second time I’ve seen a machine with ECC RAM screw up like this in two years and several hundred servers later. I have seen maybe 20 ECC equipped machines that actually had DIMMs that were bad. Probably half that. With that being said MySQL tends to show it first.

B)ECC RAM is worth the extra outlay in the datacenter. This could have easily not been detected for a long period of time, and cost a client and the next client that would have been put on the server.