Once an IT environment gets past a certain size the requirement for automation grows exponentially. When you’re working in an environment with a handful of servers and a dozen or so PCs its easy to just spend the time doing everything manually. If you’re in a situation like I am right now where a single deployment covers some 400+ physical servers then that process isn’t particularly feasible, especially if you want any level of consistency across the fleet. It should come as no surprise then that I spend the vast majority of my time automating the commissioning of IT infrastructure and since I don’t want to do something 400 times it usually sees me trying to automate things that really don’t want to be automated.
Take for instance this little fellow, a Dell 8/4 Fibre Channel Interconnect module for a M1000e chassis (sounds sexy, right?). Don’t let that Dell badge on the outside fool you, like a lot of Dell hardware it’s actually a rebranded Brocade fibre switch under the hood, albeit with a significantly paired down feature set. For the most part it’s just a dumb NPIV device that acts as a pass through for high speed connections but it does have a little bit of smarts in it, enough so it would typically come under the purview of your on site storage team. However due to its paired down nature it doesn’t work with any of Brocade’s management software (at least none that we have here) and so the storage team wasn’t particularly interested in managing it. Fair cop but there was still a few things that needed to be configured on it, so my colleague and I set about figuring out how to do that.
Usually this is when I’ll track down the CLI or automation guide for the particular product and then dig around for the commands I need in order to get it configured. Try as I might I couldn’t find anything from Brocade themselves as they usually recommend using something like DCFM for configuration. There is a SSH interface on the devices however which does have a rather comprehensive set of commands in it but there didn’t appear to be any way to get at these remotely. We could, of course, use something like TCL with EXPECT to essentially automate this process but that’s traditionally quite messy so we asked our on site resident from Brocade if there was a better solution.
There isn’t, apparently.
So off we went building up a TCL file that would do the configuration for us and initially it all worked as expected (pun completely unintentional I assure you). Our test environment worked every time once we had all the initial kinks worked out of the script and we were confident enough to start moving it up the chain. Of course this is when problems start to become apparent and during our testing we began to find some really weird behaviours coming from the switches, things that aren’t mentioned anywhere nor are obvious unless you’re doing exactly what we’re doing.
So in order to build up the original TCL script file I’d PuTTy into one of the switches and execute the command. Then once I had confirmed the changes I wanted to be done had been done I’d then put them into the script. Pretty standard stuff but after re-running the scripts I’d find they inexplicably fail at certain points, usually when attempting to reconfigure a switch that had already been deployed. Essentially I’d look for a “Access denied” message after trying the default password and then send along the correct one afterwards as that’s all that was required when using PuTTy.
However looking at the logs not only is the message it sends back different, saying “Login incorrect”, it also doesn’t just ask for the correct password it also requests the user name again. There are also significant differences in the way output is written between the two interfaces which means for things like EXPECT you have to code around them otherwise you’ll end up trying to send input at wrong times and read lines that you might not want to. It’s clear that there’s 2 interfaces to the Brocade switches and they differ enough between each other to make coding against one incompatible with the other which is just unacceptable.
Realistically what’s required is for Brocade to release some kind of configuration tool like Dell’s RACADM which provides a direct hook into these devices so they can be automated properly. I’ve found old forum posts that reference something like that for Perl but as far as I, and the Brocade people I’ve talked to, am aware there’s nothing like that available for these particular devices. It’s not like its impossible to code EXPECT up to do what we want it to but it’s ugly, unmaintainable and likely to break with firmware updates. If there is a better solution I’d love to hear it but after all the time I have invested in this I’m pretty sure there isn’t one.
Unless Brocade has something in the works, nudge nudge 😉