We recently overhauled our whole network as our 1Gbps network running on unsupported ebay gear which wasn’t cutting it anymore. I’ll go into more detail regarding upgrade later but for now I am going to focus on the access switches that we chose, The Dell / Force10 MXL (DF10MXL)! We chose the Force10 MXL because it offers both 1 and 10Gbps server-side connectivity and 10 and 40Gbps up-links, “FCOE”, and its well priced! However with the exception of a couple issues outlined they have been pretty decent switches.
Issue Number 1 – MAC addresses and memory leaks.
We started on firmware version 184.108.40.206. It did not take us long to realize our VMWare environment was a little too much for these switches. With VMs and consequently MAC addresses being moved all over our network due to VMotion we started to have random IP address reachability issues and rarely we would have switches reboot. We quickly learned that issuing the command “clear mac-address-table dynamic all” on the switches servicing the IP address in question resolved the issue and the IP address was again reachable. After a little time on Google and browsing through Force10 documentation we found the following in the release notes for firmware version 220.127.116.11 which is the latest release after 18.104.22.168.
Microcode (Resolved) (Resolved in version 22.214.171.124)
Severity: Sev 2
Synopsis: System may experience memory leak when it learns new MAC addresses continuously.
Release Notes: When MAC addresses are learned continuously, the system may fail to release allocated memory if internal software processes are busy processing newly learned MAC addresses and may experience a reboot due to memory exhaustion.
We found our issue!.. or so we thought. At the time we did not have access to firmware version 126.96.36.199 so we looked in the archive for the latest release without this issue. This lead us to 9.5(0.0P2). After a whole day of downgrading switches, 40 in total, our environment calmed down and our issues disappeared. Yey!
Issue Number 2 – Running hot.
Five weeks later we started to notice some of our switches running extremely hot. 60-100 degrees Celsius or 140-212 degrees Fahrenheit. We were seeing a lot syslog messages from these switches with reboot warnings but no actual reboots. It didn’t take long for the reboots to start. The four to five switches that were running in excess of 70 degrees Celsius started to reboot at random intervals. After beating our way around Dell support we were able to get some answers. Firmware version 9.5(0.0P2) contains a bug that does not correctly report temperature / requested fan speed to the M1000e chassis. The chassis were only running at 30% fan speed regardless of how hot the switches were getting. For a temporary solution Dell pointed us to the RACADM Command Line Reference Guide found here. Using this guide we were able to manually set the fan speed on our chassis to cool the switches. Here is a post explaining exactly how to do that. We settled on 65% fan speed. This kept the switches cool and the noise level down.
Issue Number 3 – Stack Formation.
FTOS 188.8.131.52 will not form a 4 switch stack. No documentation is available as to why. When the 4th switch joins the stack the 3rd and the 4th switch kernel panic and reboot.
So… Force10 FTOS in a Nutshell.
- 9.5(0.0P2) contains a bug that does not report temperature and/or requested fan speed correctly to the chassis and as a result it runs too hot and reboots.
- 184.108.40.206 doesn’t run hot but has mac-address mobility issues which can apparently be worked around by enabling MAC Masquerading. This is done with one simple command “mac-address-table station-move refresh-arp” I am hesitant to take this route as we could still experience the memory leak issue noted above.
- 220.127.116.11 is available and should resolve both of our issues but I am beginning to wonder what other ‘features’ we my find the latest release.
- Update on 18.104.22.168 If you have more than 3 switches in a stack the 4th switch will continuously reboot as it tries to join the stacked cluster.
More to come. For now the fans are hard set to 65% and here are some fun graphs to look at showing the temperatures before and after setting the fan speeds.
Operating Temperature Drop on Force10 MXL with 65% minimum fan speed.
Power Impact of 65% minimum fan speed.
Update – 4/1/2015
- Today is April 1st of 2015. Dell just recently release 22.214.171.124 and we tested it in a lab for a few weeks before throwing it into production. FTOS / Dell OS 126.96.36.199 appears to resolve all of our issues. The switches no longer run hot, the switches form a stack like they should, and we have not had any reboots. I’ll follow up in a few weeks to let you know if if we happen to have an issues.
Update – 6/9/2015
- 188.8.131.52 is rock solid. Use it.