The Legacy and Opportunity of 1.5 Million Lines of Code

This is the story of the founding of NBSoftSolutions. Queue the nostalgic music. It was the beginning of 2008 and I was 15 and obsessed with a game called Europa Universalis III. I had recently started to get more into programming and needed a project to test my new found skills. The perfect idea struck me. Whenever the player saved their game, a human readable file was created. The problem was that there were often times, millions of lines in these files and the syntax was key. Editing in Notepad took an inordinate amount of time, especially loading and saving. Programs such as Notepad++ alleviated this problem but it was terribly cumbersome to navigate to the correct location of the file. I resolved that I could program a fix.

A few weeks later, on March 10th 2008, I released the first version. It was buggy, slow, and feature poor but what initial release of a program isn’t? Despite the problems, the release did generate a lot of interest. The support of the community fueled my continuous work on the savegame editor. While this was going on, Paradox Interactive, the company that produced Europa Universalis III, gave others an opportunity to license the underlying game engine. Due to the Non-Disclosure Agreement signed by those who were accepted by Paradox, the developers were silent and so I put it in the back of my mind. Fast forward to December 2009 where I receive an email from a well-known member of the community from Portugal famous for modifying Europa Universalis III into a more realistic experience. It turns out; this community member was granted a license by Paradox and had a proposition for me. The email detailed an offer: modify my savegame editor to work with his and his team’s game.

I was 16 at the time and ecstatic. I had achieved international fame! Needless to say, I wasted no time in responding and accepting the offer. The thought of 500€ (later raised to 1000€) for a product I developed with no intention of compensation induced long term euphoria. The first order of business was downloading the code for the game that was housed in something called “Subversion”. Looking back, this was my first encountered with Revision Control. Being on barely “high speed” internet, downloading the code easily took several hours. Considering I was essentially downloading text files, I couldn’t fathom the complexity of the game. Last week, I decided to calculate how many lines of code there were:

find . -name *.cpp -o -name *.hpp -o -name *.c -o -name *.h |\
xargs cat |\
wc -l

1582158

Update 6/6/2013: Another method

find -regextype posix-egrep -regex ".*\.((h|c)(pp)?)$" -exec wc {} -l \; |\
awk '{total += $1 } END { print total }'

Update 6/14/2013: Yet another method

find . -regextype posix-egrep -regex ".*\.((h|c)(pp)?)$" -print0 |\
wc -l --files0-from=- |\
tail -n1
find . -name *.cpp -print0 \
    -o -name *.hpp -print0 \
    -o -name *.c -print0 \
    -o -name *.h -print0 |\
wc -l --files0-from=- |\
tail -n1

I benchmarked the previous methods using time and the reported times are user

  • sys.
method1: 0m0.460s
method2: 0m0.568s
method3: 0m0.424s
method4: 0m0.452s

That’s over 1.5 million lines of code! No wonder I could go to the bathroom and come back to code still being compiling. For the most part, I ignored this code and worried myself with my own project. A couple years go by. I weathered the complexity of getting an international payment. That experience taught me that the exchange rate is never as good as the listing. The exchange rate, apparently, is only for transferring millions of dollars, any less and the rate is worse. It was also the reason that I founded a company, called NBSoftSolutions, with my dad so that we could route the money through a business account instead of personal and this business has always been a source of pride for me.

Continuing on, sometime late in the project I was asked if I could contribute to the underlying engine. It was in C++ originally written in 1999. It was a behemoth. I was given a small task to fix the interface, but I flopped. I didn’t know much C++ then. I was simply overwhelmed – a programmer’s nightmare. It didn’t help that the turnaround time was enormous. A change in the code here, a compilation there, starting the game in debug mode over yonder took the better part of 20 minutes. I’m an optimist, but even I knew given the circumstances, I couldn’t produce bug free code on the first write. It was a frustrating time in development that ended only in wasted time.

It wasn’t that the code was hard to read or used foreign constructs, it was just written in a different time. I can’t imagine writing C++ back in 1999. Aspects of the language that I take for granted on a daily basis, didn’t exist or were in their infancy. For instance, I didn’t understand why so much code was dedicated to replicating the functionality of the STL. Didn’t the developer know how to use a list or vector? Only after researching the history of the STL, did I find out that Microsoft’s C++ compiler uses the Dinkumware’s implementation, which didn’t reach conformance of the ISO/ANSI C++ Standard until 1998. Now imagine you are developing a game engine. You need everything to work flawlessly and fast. Are you more likely to use a library that is largely immature and unproven, or are you more likely to write your own set of classes? Nowadays, it is foolish to ignore the power of the STL, but back then, it was understandable and commonplace.

In the meantime, the game that was being developed with the license to the engine was canceled by Paradox. There’s a bit of controversy as to the reason of the cancellation. Some say that a development time of four years is too long or that the game was going to interfere with Paradox’s upcoming game. Frankly, I don’t care for the reason, all I know that I am sitting on a goldmine of 1.5 million lines of code that I’m sure I can learn from for the rest of my life. So I’m in debt to the community member who liked my savegame editor and game me such an opportunity, and I’m in debt to Paradox who recognized the potential benefits of licensing their game engine.

Currently, I have been looking at the engine code more than ever, though it’s been six months since the cancellation. It takes me a little longer to come understand the code than I would like to, and the hand rolled version of the STL, doesn’t help facilitation. The key is to not give up. I am motivated by the thought of improving my savegame editor with understanding how the game was laid out. The engine code has turned out to be an invaluable friend. If I have a difficult time figuring out what a construct means, or the exact parsing rules of the savegame, I can dig into the code. The task is often long and arduous but the knowledge gained from the foray is worth it. I was able to take 1,500 lines of C++ and translate it down to 200 lines of simple C#, replacing complex logic I had previously written. Hardly anything is as satisfying as such a drastic refactor.

I believe Paradox is using a lot of the same code in their newer games. Those programmers who can wrap their head around 1.5 million lines of code, I tip my hat to you. And to the one programmer who single handedly wrote much of the foundation back in 1999, and whose code is still in use after 15 years. I want to be you. I envy your impact.

Comments: