I spent most of today trying to find out why my experimental results were less than 1/5th as good as they had been for the same dataset a few months ago when I wrote my paper. It’s quite disturbing to run a program for half an hour and have it come back and tell you that you have nothing worth writing your thesis about. Quite disturbing indeed. Especially when the paper didn’t even get accepted with the results I used to have, but I’ll not get into that.

I was pretty put out, but I was reassured that I had managed to obtain useful results before so it must just be a bug somewhere in my code. I figured it could take some time to find though, as my year’s work has become somewhat gnarly along the way for one reason or another. But at least it was just a matter of finding and fixing a bug.

Or was it? The thing is, my first set of results had a little bit of a cheat in them. I’m not going to try to describe the setup in any really meaningful way, because that’s basically what this thesis is about and I don’t have weeks to write this blog post. But there were some circumstances under which I would get a few correct answers "for free". These shouldn’t have been counted with the rest of the results, but they had been when I presented the results in my paper. I noticed this a few weeks ago and decided to fix it. Could it be that removing my "free" right answers had thrown off my whole set of results to the extent that I was seeing?

The obvious thing to do was to root around in the code and revert whatever I changed so that the cheat was back in place. If I went back to getting good results then I would know that the new, bad numbers were just the legitimate results where the good ones were basically fake. This would have been more than a bit bothersome. Now, I’m sitting here writing a blog post rather than, say, chucking myself under a train so you can probably surmise that this wasn’t what had gone wrong. In fact I had never even fixed the cheating! Well that was a relief anyway. At least when I did find whatever was wrong it would be something I could fix without feeding my academic integrity to the ducks.

Given the complexity and the sheer quantity of the code involved in this project, and my relative unfamiliarity with parts of it that I wrote too long ago and didn’t properly document, I expected to spend the rest of the week tracking down the problem. It took me about two hours of the usual ritual of reading code, adding debugging statements, compiling, running, panicking, reading more code, adding more debugging statements, going for a piss, and finally reading the right bit of code and adding the right debugging statement to find the bug. A one-liner fixed the major symptoms, and a little more work tomorrow will fix it entirely.

I also fixed the little cheat that had been in the system when I generated the earlier results. As I was doing this it occurred to me that it shouldn’t actually have any effect on the numbers I was reporting. A quick check bore this out. So not only had I got the results back to being as good as they had been, they were now completely legitimate.

By this time it was almost time for the bus (which I had intended not to get, since I thought I’d need to be in all evening). But I had time to run a couple of more iterations of one or two experiments. I ran through the newer dataset and discovered that I can get precision of 100%, with 97.9% recall (don’t worry about what those words mean in this context; what’s important is that you can’t do better than 100%). And that’s under what is intuitively the most difficult set of conditions, so I expect to see the same results under all other conditions on that dataset.

The last 15 minutes of my day in college were very good. How was your day?