Since my side projects (including this blog) don’t really have any kind of revenue generation potential I tend to shy away from spending a lot on them, if I can avoid it. This blog is probably the most extravagant of the lot getting its own dedicated server which, I’ll admit, is overkill but I’d had such bad experiences which shared providers before that I’m willing to bear the cost. Cloud hosting on the other hand can get nightmarishly expensive if you don’t keep an eye on it and that was the exact reason I shied away from it for any of my side projects. That was until I got accepted into the Microsoft BizSpark program which came with a decent amount of free usage, enough for me to consider it for my next application.
The Azure benefits for BizSpark are quite decent with a smattering of all their offerings chucked in which would easily be enough to power a nascent start up’s site through the initial idea verification stage. That’s exactly what I’ve been using it for and, as longtime readers will tell you, my experiences have been fairly positive with most of the issues arising from my misappropriation of different technologies. The limits, as I found out recently, are hard and running up against them causes all sorts of undesirable behaviour, especially if you run up against your compute or storage limit. I managed to run up against the former due to a misunderstanding of how a preview technology was billed but I hadn’t hit the latter until last week.
So the BizSpark benefits are pretty generous for SQL storage, giving you access to a couple 5GB databases (or a larger number of smaller 1GB ones) gratis. That sounds like a lot, and indeed it should be sufficient for pretty much any burgeoning application, however mine is based around gathering data from another site and then performing some analytics on it so the amount of data I have is actually quite large. In the beginning this wasn’t much of a problem as I had a lot of headroom however after I made a lot of performance improvements I started gathering data at a much faster rate and the 5GB limit loomed over me. In the space of a couple weeks I managed to fill it completely and had to shut it down lest my inbox get filled with “Database has reached its quota” errors.
Looking over the database in the Azure management studio (strangely one of the few parts of the Azure that still uses Silverlight) showed that one particular table was consuming the majority of the database. Taking a quick look at the rows it was pretty obvious as to why this was the case, I had a couple columns that had lengthy URLs in them and over the 6 million or so records I had this amounted to a huge amount of space being used. No worries I thought, SQL has to have some kind of built in compression to deal with this and so off I went looking for an easy solution.
As it turns out SQL Server does and its implementation would’ve provided the benefits I was looking for without much work on my end. However Azure SQL doesn’t support it and the current solution to this is to implement row based compression inside your application. If you’re straight up dumping large XML files or giant wads of text into SQL rows then this might be of use to you however if you’re trying to compress data at a page level then you’re out of luck, unless you want to code an extravagant solution (like creating a compression dictionary table in the same database, but that’s borderline psycotic if you ask me).
The solution for me was to move said problem table into its own database and, during the migration, trim out all the fat contained within the data. There were multiple columns I never ended up using, the URL fields were all very similar and the largest column, the one most likely causing me to chew through so much space, was no longer needed now that I was able to query that data properly rather than having to work around Azure Table Storage’s limitations. Page compression would’ve been an easy quick fix but it would’ve only been a matter of time before I found myself in the same situation, struggling to find space where I could get it.
For me this experience aptly demonstrated why its good to work within strict constraints as left unchecked these issues would’ve hit me much harder later on. Sure it can feel like I’m spinning my wheels when hitting issues like this is a monthly occurrence but I’m still in the learning stage of this whole thing and lessons learned now are far better than ones I learn when I finally move this thing into production.