High end consumer SSD benchmarks

Running consumer ssd in a server has been deemed hazardous and silly… but that’s only the case when your utilizing a hardware raid solution.

Provided you have UPS systems and software storage that can talk to the disk directly its perfectly safe.  We use Ceph!

These are tested with FIO on a Dell M620 with H310 JBOD mode controller.

Micron/Crucial M500

running IO "sequential read" test... 
	result is 491.86MB per second

 running IO "sequential write" test... 
	result is 421.42MB per second

 running IO "seq read/seq write" test... 
	result is 228.74MB/184.88MB per second

 running IO "random read" test... 
	result is 240.35MB per second
	equals 61530.2 IOs per second

 running IO "random write" test... 
	result is 230.34MB per second
	equals 58968.2 IOs per second

 running IO "rand read/rand write" test... 
	result is 93.90MB/94.01MB per second
	equals 24038.8/24067.5 IOs per second

Micron/Crucial M550

running IO "sequential read" test... 
	result is 523.79MB per second

 running IO "sequential write" test... 
	result is 476.59MB per second

 running IO "seq read/seq write" test... 
	result is 211.70MB/173.50MB per second

 running IO "random read" test... 
	result is 253.36MB per second
	equals 64861.0 IOs per second

 running IO "random write" test... 
	result is 233.42MB per second
	equals 59754.2 IOs per second

 running IO "rand read/rand write" test... 
	result is 102.42MB/102.28MB per second
	equals 26219.5/26184.0 IOs per second

Micron M600

running IO "sequential read" test... 
 result is 507.47MB per second

running IO "sequential write" test... 
 result is 477.18MB per second

running IO "seq read/seq write" test... 
 result is 198.38MB/166.73MB per second

running IO "random read" test... 
 result is 244.66MB per second
 equals 62633.2 IOs per second

running IO "random write" test... 
 result is 238.35MB per second
 equals 61017.5 IOs per second

running IO "rand read/rand write" test... 
 result is 103.10MB/102.95MB per second
 equals 26393.8/26354.0 IOs per second

Sandisk 960GB SSD Extreme Pro

running IO "sequential read" test... 
 result is 394.66MB per second

running IO "sequential write" test... 
 result is 451.28MB per second

running IO "seq read/seq write" test... 
 result is 181.48MB/158.89MB per second

running IO "random read" test... 
 result is 255.99MB per second
 equals 65533.5 IOs per second

running IO "random write" test... 
 result is 223.86MB per second
 equals 57309.2 IOs per second

running IO "rand read/rand write" test... 
 result is 71.47MB/71.46MB per second
 equals 18296.0/18294.2 IOs per second

Crucial MX300 1TB

running IO "sequential read" test... 
	result is 504.80MB per second

 running IO "sequential write" test... 
	result is 501.97MB per second

 running IO "seq read/seq write" test... 
	result is 239.47MB/210.71MB per second

 running IO "random read" test... 
	result is 175.78MB per second
	equals 45000.0 IOs per second

 running IO "random write" test... 
	result is 291.85MB per second
	equals 74713.5 IOs per second

 running IO "rand read/rand write" test... 
	result is 137.10MB/137.09MB per second
	equals 35096.8/35095.0 IOs per second

Ceph — Basic Management of OSD location and weight in the crushmap

It’s amazing how crappy hard disk are!   No really!   We operate a 100 disk ceph pool for our object based backups and Its almost a weekly task to replace a failing drive.   I’ve only seen one go entirely unresponsive but normally we get read error and rear failures that stop the osd service and show up in dmesg as faults.

 

To change the weight of a drive:

ceph osd crush reweight osd.90 1.82

To replace a drive:

#Remove old disk
ceph osd out osd.31
ceph osd crush rm osd.31
ceph osd rm osd.31
ceph auth del osd.31
#Provision new disk
ceph-deploy osd prepare --overwrite-conf hostname01:/dev/diskname

Move a host into a different root bucket.

ceph osd crush move hostname01 root=BUCKETNAME

Quick and Dirty Ceph Deployment

Replace the disk names and ssd device name.   This will build a ceph cluster with 2 object redundancy in about 5 minutes.

ceph-deploy purge ceph0-mon0 ceph0-mon1 ceph0-mon2 ceph0-node0 ceph0-node1
ceph-deploy purgedata ceph0-mon0 ceph0-mon1 ceph0-mon2 ceph0-node0 ceph0-node1
ceph-deploy forgetkeys


ceph-deploy new ceph0-mon0 ceph0-mon1 ceph0-mon2

echo "osd pool default size = 2" >> ~/ceph.conf
echo "public network = 10.1.8.0/22" >> ~/ceph.conf
echo "cluster network = 10.1.12.0/22" >> ~/ceph.conf
echo "osd journal size = 12000" >> ~/ceph.conf

ceph-deploy install ceph0-mon0 ceph0-mon1 ceph0-mon2 ceph0-node0 ceph0-node1
ceph-deploy mon create-initial

ceph-deploy admin ceph0-mon0 ceph0-mon1 ceph0-mon2 ceph0-node0 ceph0-node1

sudo chmod +r /etc/ceph/ceph.client.admin.keyring

ceph-deploy disk zap ceph0-node0:/dev/oczpcie_4_0_ssd
ceph-deploy disk zap ceph0-node0:/dev/sdb
ceph-deploy disk zap ceph0-node0:/dev/sdc
ceph-deploy disk zap ceph0-node0:/dev/sdd
ceph-deploy disk zap ceph0-node0:/dev/sde
ceph-deploy disk zap ceph0-node0:/dev/sdf
ceph-deploy disk zap ceph0-node0:/dev/sdg
ceph-deploy disk zap ceph0-node0:/dev/sdh
ceph-deploy disk zap ceph0-node0:/dev/sdi
ceph-deploy disk zap ceph0-node0:/dev/sdj
ceph-deploy disk zap ceph0-node0:/dev/sdk
ceph-deploy disk zap ceph0-node0:/dev/sdl
ceph-deploy disk zap ceph0-node0:/dev/sdm

ceph-deploy disk zap ceph0-node1:/dev/oczpcie_4_0_ssd
ceph-deploy disk zap ceph0-node1:/dev/sdb
ceph-deploy disk zap ceph0-node1:/dev/sdc
ceph-deploy disk zap ceph0-node1:/dev/sdd
ceph-deploy disk zap ceph0-node1:/dev/sde
ceph-deploy disk zap ceph0-node1:/dev/sdf
ceph-deploy disk zap ceph0-node1:/dev/sdg
ceph-deploy disk zap ceph0-node1:/dev/sdh
ceph-deploy disk zap ceph0-node1:/dev/sdi
ceph-deploy disk zap ceph0-node1:/dev/sdj
ceph-deploy disk zap ceph0-node1:/dev/sdk
ceph-deploy disk zap ceph0-node1:/dev/sdl
ceph-deploy disk zap ceph0-node1:/dev/sdm

ceph-deploy osd prepare ceph0-node0:/dev/sdb:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdb:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdc:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdc:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdd:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdd:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sde:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sde:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdf:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdf:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdg:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdg:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdh:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdh:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdi:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdi:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdj:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdj:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdk:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdk:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdl:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdl:/dev/oczpcie_4_0_ssd

ceph-deploy osd prepare ceph0-node0:/dev/sdm:/dev/oczpcie_4_0_ssd
ceph-deploy osd prepare ceph0-node1:/dev/sdm:/dev/oczpcie_4_0_ssd

Dell M1000e Manually Configure Set Minimal Fan Speed Control

You can manually configure the minimum fan speed of the m1000e so that the chassis maintains a lower operating temperature.

SSH The CMC with the cmc ip address and port 22.  User will be root and calvin unless changed.

Then run:

racadm config -g cfgThermal -o cfgThermalMFSPercent  75

This will set the minimum fan speed to 75%.  You can set it from 0-100%.  Obviously 0% is more like 35% but you won’t be able to tell.

You can view the requested fan speed by the servers in the chassis by running:

racadm getfanreqinfo

Example:

[Server Module Fan Request Table]

<Slot#>   <Server Name>   <Blade Type>       <Power State>  <Presence>   <Fan Request%>   

1         s2086.corp PowerEdgeM610      ON             Present      48               

2         s2087.corp PowerEdgeM610      ON             Present      48               

3         s2088.corp PowerEdgeM610      ON             Present      48               

[Switch Module Fan Request Table]

<IO>      <Name>                           <Type>             <Presence>   <Fan Request%>   

Switch-1  MXL 10/40GbE                     10 GbE KR          Present      30               

Switch-2  MXL 10/40GbE                     10 GbE KR          Present      30               

Switch-3  N/A                              None               Not Present  N/A              

Switch-4  N/A                              None               Not Present  N/A              

Switch-5  N/A                              None               Not Present  N/A              

Switch-6  N/A                              None               Not Present  N/A              

[Minimum Fan Speed %]

65

Rescan linux partition table on active disk with centos 6

If you try to rescan the partition table of an active disk it will fail and require a reboot to discover new partitions.

You can get around this by using partx which will scan for individual new partitions and inject them into the running kernel.

 

#partx -v -a /dev/sda

 

root@linux # partx -l /dev/sda
# 1:      2048-  1026047 (  1024000 sectors,    524 MB)
# 2:   1026048-1048575999 (1047549952 sectors, 536345 MB)
# 3: 1048576000-1572859889 (524283890 sectors, 268433 MB)
# 4:         0-       -1 (        0 sectors,      0 MB)
root@linux# partx -v -a /dev/sda
device /dev/sda: start 0 size 1572864000
gpt: 0 slices
dos: 4 slices
# 1:      2048-  1026047 (  1024000 sectors,    524 MB)
# 2:   1026048-1048575999 (1047549952 sectors, 536345 MB)
# 3: 1048576000-1572859889 (524283890 sectors, 268433 MB)
# 4:         0-       -1 (        0 sectors,      0 MB)
BLKPG: Device or resource busy
error adding partition 1
BLKPG: Device or resource busy
error adding partition 2
added partition 3

 

EMC XtremIO

6a00e552e53bd28833019aff2b85a5970bWell It’s been a while since I posted  but I got some new exciting stuff to talk about!

We’ve purchased our first EMC XtremIO array!   The system is 2x 20T Xbricks which gives us around 30TB raw capacity.  Compression and dedupe brings us to around 210TB usable storage, this is based on a  7:1  compress/dedupe factor which is proving to be the standard for the current software.

 

The system comes in up to an 8 brick configuration:

emc-XtremIO-f2

 

What each X-Brick looks like racked:

Each X-Brick contains:

  • Eaton high quality UPS unit.
  • 2 Storage Controllers.
  • DAE enclosure sas connected to the Storage Controllers.
  •  Once you go to a 2 or more brick system you will have 2 48G infiniban switches connecting the storage controllers together.

 

Update 12/1/14

 

So we’ve had our array in production for about 8 weeks now.  I have nothing but good things to say, performance has been absolutely incredible and stability / reliability has been everything promised.

The storage controller servers are Intel chassis with dual power supply, dual infiniban controllers and dual sas hba.  They appear to be running some sort of E5 cpu and boast 256GB of ram each.

Inside XtremIO Storage Controller

 

The system is pretty busy once cabled up, the architecture is very cluster oriented so there is a lot of redundancy in the cabling.

Xtremio Xbrick Cabling

Some pictures for the front of the array/Eaton UPS.

EMC EATON XtremIO UPS

 

20141202_180648

We are currently running around 550 production virtual machines that service 7,000 customer servers.   We are averaging 350-400MB/s read/writes at 20k io day in day out.   We’ve seen well over 20GB/s transfers and over 200k iop.

Storage vMotion and VAAI actions are extremely fast and completed almost instantly.  At the current time our data reduction/dedupe ratio is around 2.5:1 but I believe this numbers inaccurate as the total amount of data in our datastores is much larger than it shows stored.  🙂

Some UI Screenshots of our environment:

 

XtremIO UI Bandiwdth

XtremIO UI IOPS

xtremioui3

The lights on the disks in the UI blink based on activity, pretty cool eye candy.

Installing OpenVSwitch 2.3.1 LTS on CentOS 6

yum install kernel-headers kernel-devel gcc make python-devel openssl-devel kernel-devel, graphviz kernel-debug-devel automake rpm-build redhat-rpm-config libtool git

cd /root/

wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.64.tar.gz

tar xvf autoconf-2.64.tar.gz

cd autoconf-2.64/

./configure

make

make install

 

cd /root/

wget http://openvswitch.org/releases/openvswitch-2.3.1.tar.gz -O /root/openvswitch-2.3.1.tar.gz

 

mkdir /root/rpmbuild/SOURCES

cp /root/openvswitch-2.3.1.tar.gz /root/rpmbuild/SOURCES/

rpmbuild -bb rhel/openvswitch.spec
rpmbuild -bb rhel/openvswitch-kmod-rhel6.spec

rpm -ivh /root/rpmbuild/RPMS/*.rpm

 

You can also use our public repo here for cloudstack.

http://mirror.beyondhosting.net/Cloudstack/

 

Recommendations I make to save critical data

First off, your data is the most valuable part of any server. There are many many hour of very hard if not impossible to replace work involved in setting up even a fairly basic web site. This doesn’t even include things like client information, orders etc. that directly cost you money if you lose them.

Not all backup methods are for everyone. The reason is that there are widely variable needs for data security as well as a wide variety of budgets. Someone with a page that is doing e-commerce transactions will likely need a lot more in regards to backups than someone with a bi-weekly blog for instance.

First off, there are two different modes of failure one will encounter as a sysadmin. The first is a “hard” failure. This includes drives or RAID arrays (yes it does happen) going bad. I love RAID, I think it’s a great measure to ensuring data protection but it’s not fool proof by any means and is no substitute for backups.

The second type of failure is the “soft” failure. With this failure mode for whatever reason data on the system is gone. This can be anything from a user deleting off their public_html directory to data corruption because the drive is heavily over run. Commonly this is someone running an FS check on a machine and having it dump a few thousand files to lost&found. I have seen my fair share of machines come up after this and run fine, and have seen plenty that didn’t too. This can also be the result of hackers etc. messing around on your system. Something I will warn of is if you use a secondary drive in the same server for backups, it can be something that is deleted by hackers as well. If you leave the drive mounted after backups are done and they do rm -rf /* it will be erased. Be sure to unmount your backup drive if you use this method. In general I do not advise relying on it for this reason, however it makes for a great way to have backups on a system without waiting for them to transfer.

The first rule I have is no matter what you should have minimum three copies of your data, at least one of which is totally off site and not within the same company as your server/colocation/shared host etc. This gives you options if something happens, and you’re not relying on one group of people to ensure your data is in tact.This can be as simple as having your system upload the files to a home or office computer via DynDNS and back mapping the port, then burning the images on to a CD weekly. On a higher level it can be storage by a company offering cloud storage such as Amazon.

How often you should back your data up and retain it is another question that is fairly common. This is largely subjective, and is a compromise between how much data you can afford to lose versus how much space you can afford. If you’re running a streaming video site, this can get quite pricey very quickly. Even to the point it may be best to try and get a low end server and put big drives in it to back up to. Afterall if you pay .50/gb and need a 1TB of backup space $500 buys a good bit of server!

What to back up is another good question. If you’re running a forum or something like that where there aren’t really all that many changes made to the underlying software, doing a single full backup and then backing the user upload directories (eg images) and the database may be enough. If the site is undergoing constant development, full backups would be a great deal more prudent.

The last thing to consider is how these backups are going to be made. I have done backups before with shell scripts, and used both Plesk’s and CPanel’s backup mechanisms. When doing a shell script for backups, you gain a ton of versatility in how and what you back up, at the price of being a lot more tedious to configure. These sort of backups are really nice if you’re wanting to make it so that your system backs up only certain things on varying interval. The panel based backups are so easy to configure, there is little to no reason you shouldn’t set them up. You just specify how often you want backups, where they will be stored and what will be backed up. The caveat I will warn about using a panel based backup system is that even with CPU level tweaks in the config files these can heavily load a system so my advice is to run them off hours.