Friday, April 18, 2014

Computer Science fundamentals still hold true

There are some discussions out there about what software developer should really know nowadays. Arguments are raised that most of contemporary software developer work is no longer a computer science - which meant creating new stuff out of nothing - but is "just" an engineering work - using already built and verified components and approaches, gluing and mixing it together to create stuff out of other stuff. Questions are raised whether deep understanding of algorithms and data structures or other fundamentals of computer science is crucial in putting together existing libraries and frameworks, which is what most of us basically do every day.

The need for a good software developer to hold a Computer Science degree is often questioned and there are multiple advices out there how to be successful in a field without a degree. Indeed, I've never actually implemented sorting or tree search on my own for professional needs, only as an academic exercise. But I think the knowledge I've gained during my CS studies gave me a lot of insight how things work and gives me a sort of confidence in what I do. Without the knowledge about time and memory complexities, I'd be wandering in the darkness.

Right now, working in a self-sufficient Agile team, with strong desire to avoid knowledge and responsibility silos, everyone is encouraged to take on every task needed to reach the goal. Of course there are still (and always be) people more skilled in database stuff and other more skilled in HTML, and that's fine. But with no code ownership it's also fine when more front-end inclined developers do some back-end tasks and conversely.

But unless someone is doing something purely declarative in its nature, like plain HTML or CSS, the code is still code, regardless it's low-level C or JavaScript at the client. That means understanding the mechanics of how things works and knowing the fundamentals of data structures and algorithms is still crucial for all software developers, no matter where in the stack they fit best. Of course not having a degree cannot disqualify anyone from being a good software engineer, but the theoretical gaps need to be filled properly.

Recently I've stumbled upon a simple piece of code we already had on production, working well enough so that it haven't brought any attention until the data quantity was small enough. The code goes like this:

foreach (var foo in foos)
{
    var matchingBars = bars.Where(x => x.Foo == foo);
    foreach (var bar in matchingBars)
    {
        DoSomethingWith(foo, bar);
    }
}

Simple enough, isn't it? But there is more and more data. When we reached more than 20k foos and more than 20k bars, this is what happened:

That simple piece of code hit us badly with quite an obvious O(n^2) number of comparisons. Foos and bars are both plain lists, finding matching elements requires traversing plain bars collection for each and every foo element. Each comparison is insignificantly small, but doing it 425 million times takes more than a minute!

I've changed the code to use Lookup, which is a basic hashed structure that allows quick access to the elements by the key. The code now looks like this:

var barsLookup = bars.ToLookup(x => x.Foo);
foreach (var foo in foos)
{
    foreach (var bar in barsLookup[foo])
    {
        DoSomethingWith(foo, bar);
    }
}

That simple change replaced >400 millions of comparisons with only 20k needed to build up the lookup + 20k cheap lookup reads. The result? Total execution time fall down to just 115 ms.

That's 538 times faster, just by one simple data structure change.

I've found a great algorithms and data structures complexity cheat sheet. I think one may not call himself a software developer if he doesn't understand what at least the basic stuff in those tables mean.

Thursday, April 10, 2014

Waiting screen - doing bad things right - is it even possible?

Sometimes in large web applications, there is a necessity to make a client wait, before the server is able to provide any content. There may be some heavy calculations to be performed, caches refreshed etc. In most cases it probably can - and should - be avoided using background workers not being a part of the actual web request or some kind of asynchronous AJAX calls. Those approaches give the possibility to have either completely undisturbed user experience or at least to reduce the fuss and eliminate the need to have a blocking wait.

But there's a chance the task we are to accomplish cannot be offloaded onto the background thread at the server nor performed asynchronously with AJAX during normal site operation. I had that kind of situation in a project where authorization was handled by an external, awfully enterprisey component, available through the web service which was slow as a snail and completely out of our control. It was not possible to show anything more than the waiting screen on the first log on, as everything of any value had to be authorized first.

In that hopeless situation, we've decided we need a waiting screen, so that users at least see something that lets them know the service is being prepared for them. You know, if it's slow, it's a clear indication there's a lot of value inside and it's worth paying big money for it. In case it loaded quickly, there seems to have no real value (in case my English is not so fluent to express my thoughts accurately - yes, this was meant to be sarcastic).

Anyway, I was looking for a solution for the waiting screen, that ideally holds all of the features listed here in my subjective order of importance.

  1. it needs to be available not just as a JS throbber to be put somewhere on the site - it needs to be a response for the initial request to the website;
  2. it needs to draw something on screen while waiting (obvious to say, but not so obvious to implement);
  3. it should be HTTP-compliant in terms of status codes etc. so that it doesn't confuse any browsers or web crawlers;
  4. it should not break "back" button;
  5. while waiting, it should indicate "waiting" status in a browser's status bar, to convince the user something is really going on.

Serving a simple wait screen with 302 Redirect to the target request that will do the actual work is not an option, as it will fail on requirement no. 2 - the browser will issue a redirected request without rendering our wait screen. But in order to have point 3 and 4 fulfilled, we need 302 Redirect - serving 200 OK without the actual content will harm the protocol and browsers' history badly - it's tightly coupled. So there's a big contradiction here. We can try going the unwanted HTML Redirect route or use JavaScript redirection - it satisfies point 2, but it's no better about protocol compliance nor browser history obedience.

Well, I got stuck here. I've asked for help on StackOverflow, but no interest at all. Let me know if I'm missing something.

By now, I've sacrificed protocol compliance and proper "back" button behavior in order to have anything shown on screen - points 1 and 2 are the crucial ones for the general user experience. It's still quite weak experience, but without those, there is no experience at all. I've chosen the JavaScript redirect route called on window.load jQuery event (note that document.ready event is raised when the DOM is ready, but possibly not yet rendered - a bit too early for us to be sure something is already drawn).

I can't see a way to have the waiting screen done right. Well, maybe there is no right way to do bad things?