My post last week about the trials and tribulations of sorting ones media collection struck a chord with a lot of my friends. Like me they’d been doing this sort of thing for decades and the fact that none of us had any kind of sense to our sorting systems (apart from the common thread of “just leave it where it lies”) came at something of a surprise. I mean just taking the desk I’m sitting at right now for an example it’s clear of everything bar computer equipment and the stuff I bring in with me every day. The fact that this kind of organization doesn’t extend to our file systems means that we either simply don’t care enough or that it’s just too bothersome to get things sorted. Whilst I can’t change the former I decided I could do something about the latter.
Enter Sortilio.
So my quest last week proving fruitless I set about developing a program that could sort media based on a couple cues derived from the files themselves. Now for the most part media files have a few clues as to what they actually are. For the more organized of us the top level folder will contain the episode name but since mine was all over the place I figured it couldn’t be trusted. Instead I figured that the file name would be semi-reliable based on a cursory glance at my media folder and that most of them were single strings delimited with only a few characters. Additionally the identifier for season and episode number is usually pretty standard (S01E01, 2×01,1008, etc) so that pulling the season out of them would be relatively easy. What I was missing was something to verify that I was looking in the right place and that’s where I TheTVDB comes in.
The TV Database is like IMDB for TV shows except that it’s all community driven. Also unlike IMDB they have a really nice API that someone has wrapped up in a nice C# library that I could just import straight into my project. What I use this for is a kind of fuzzy matching filter for TV show names so that I can generate a folder with the correct name. At this point I could also probably rename the files with the right name (if I was so inclined) but for the point of making the tool simple I opted not to do this (at this point). With that under my belt I started on the really hard stuff: figuring out how to sort the damn files.
Now I could have cracked open the source of some other renaming programs to see how they did it but I figured out a half decent process after pondering the idea for a short while. It’s a multi-stage process that makes a few assumptions but seems to work well for my test data. First I take the file name and split it up based on common delimiters used in media files. Then I build up a search string using those broken up names stopping when I hit a string that matches a season/episode identifier. I then add that into a list of search terms to query for later, checking first to see if it’s already added. If it’s already in there I then add the file path into another list for that specific search term, so that I know that all files under that search term belong to the same series. Finally I create the new file location string and then present this all to the user, which ends up looking like this:
The view you see here is just a straight up data table of the list of files that Sortilio has found and identified as media (basically anything with the extension .avi or .mkv currently) and the confidence level it has in its ability to sort said media. Green means that in the search for the series name it only found one match, so it’s a pretty good assumption that it’s got it right. Yellow means that when I was doing a search for that particular title I got multiple responses back from TheTVDB so the confidence in the result is a little lower. Right now all I do is take the first response and use that for verification which has served me well with the test data, but I can easily see how that could go wrong. Red means I couldn’t find any match at all (you can see what terms I was searching for in the debug log) and everything marked like that will end up in one giant “Unsorted” folder for manual processing. Once you hit the sort button it will perform the move operations, and suffice to say, it works pretty darn well:
Of course it’s your standard hacked-together-over-the-weekend type deal with a lot of not quite necessary but really nice to have features left out. For starters there’s no way to tell it that a file belongs to a certain series (like if something is misspelled) or if it picks the wrong series to tell it to pick another. Eventually I’m planning to make it so you can click on the items and change the series, along with a nice dialog box to search for new ones should it not get it right. This means you might want to do this on a small subset of your media each time (another thing I can code in) as otherwise you might get files ending up in strange folders.
Also lacking is any kind of options page where you can specify things like other extensions, regex expressions for season/episode matching and a whole host of other preferences that are currently hard coded in. These things are nice to have but take forever to get right so they’ll eventually make their way into another revision but for now you’re stuck with the way I think things should be done. Granted I believe they’ll work for the majority of people out there, but I won’t blame you if you wait for the next release.
Finally the code will eventually be open sourced once I get it to a point where I’m not so embarrassed by it. If you really want to know what I did in the ~400 odd lines that constitute this program then shoot me an email/twitter and I’ll send the source code to you. Realistically any half decent programmer could come up with this in half the amount of time I did so I can’t imagine anyone will need it yet, unless you really need to save 3 hours 😛
So without further ado, Sortilio can be had here. Download it, unleash it on your media files and let me know how it works for you. Comments, questions, bugs and feature requests can be left here as a comment, an @ message on Twitter or you can email me on therefinedgeek@gmail.com.