On Code, And Other Things: 2011

Monday, 8 August 2011

The operational mentality in software development

A talk by Theo Schlossnagle spurred this line of thought. The description of the talk is "about the evolution of a career in web operations", but he talks more about the importance of thinking operationally by developers. In other words, he takes a different stance about the meaning of DevOps than what is prevalent. DevOps is usually described as increased collaboration between development and operations, with knowledge sharing between the two groups leading to better delivery.

Theo Schlossnagle is of the view that developers need to take the operational view when writing code. It's a very valid point of view and something that I suspect most of us overlook.

I'm going to expand on what he said and share my ideas on that.

Let's take the software you're writing now (assuming it's web software). Is it operable?
It probably does atleast these things

Fulfills your requirements document
Passes your unit tests and integration tests
The UI is usable and responsive

But is it operable? Once it's deployed, can it survive unprecedented load? Fringe cases? Subsystems going down? Third party services it depends on becoming unavailable?

And present a front of graceful degradation as it does all this?

Selective Failure
Most of the time, we stress systems before deployment, using load tests to simulate real world conditions. That takes care of one aspect. But most of us don't think of failures of selective systems, especially when the system is distributed and its components interact in complex ways. The latter is true of most big web applications.Handling selective subsystem failures is not purely an operations responsibility. The application has to be written keeping selective failure in mind.

In the video, Theo brings up an analogy with security. Security is not a feature, but a way of thinking.
Sanitizing user input before putting it in a database query is not a feature.
Not allowing access to your internal web services is not a feature.
These are security restrictions you automatically think of when you develop.

In the same way, operational thinking should not be a something postponed till deployment while writing code. It should be de rigueur in the design and development process.

Thursday, 21 July 2011

Some string matching

This puzzle from Facebook's engineering puzzles page is a good introduction to string matching.
A solution is possible using an algorithm called Levenshtein distance.

Let's take the example on the puzzle page -

TIHS SENTENTCNES ISS NOUT VARRRY GOUD

It's a munged up version of

THIS SENTENCE IS NOT VERY GOOD

The original problem statement involves minimizing the score (number of changes necessary) to transform each word in the munged up sentence into words that are in a predefined wordlist. It says nothing about grammatical correctness of the result.

So going by that, the score and the output sentence from my implementation are respectively, 8 and

TICS SENTENCES IDS NOT VAGARY GAUD

From the standpoint of the original problem, this solution is correct.
But not from a grammatical point of view. After some poking around, I realized it's not possible for this program to spew out the correct version of the sentence without it being aware of English grammar rules.

P.S. Without getting into grammar correction, of which I have no idea how to implement, I made a small change in the program as an experiment. The second version favours a smaller word whenever the scores for two are equal.
And the result is (Score remains the same)

TIS SENTENCES IS NOT VARY GOD

Sunday, 3 April 2011

A study plan

I have been formulating a self-study plan for some time now. The plan involves mostly math subjects, as I finally want it to culminate in data analysis/mining/AI/ML - which again are subjects I've always been interested in but never took up seriously. Looking at the experience of people who have done such a thing before, a solid foundation in the underlying math is absolutely essential. So, I've started off with statistics and probability. Having been out of touch for a long time (15 years!) with the basics of either of these, I chose to begin simply - Sheldon M Ross's Introductory Statistics supplemented with Khan Academy's video lectures (These are perfect for brushing the dust from those rusty concepts). After finishing this I'll aim for a book like Bertsekas combined with MIT's OCW lectures.

Have decided to put the solutions to the exercises I'm doing here.

Sunday, 30 January 2011

gcc does not check out of scope names in unreachable code

While attempting to write a small HTTP server in C, I copied some code over from a previously written C file and immediately noticed a bug.

File httpd.c
#include "../mynet.h"

if(errno = EINTR) {

    //do something

} else {

    err_sys("read error")

}

Yes it's a stupid beginner mistake - typing the assignment operator instead of the equals check. The thread of execution would never enter the else block. I corrected it, but the interesting part came when I tried to compile it.

cc ../mynet.c httpd.c

mynet.c contains some handy helper functions that I've used in my other server classes. Guess what - the compilation failed with this message

"httpd.c:(.text+0x6a): undefined reference to `err_sys'"

I checked my header and the err_sys function was nowhere to be seen. If this function is missing, how did my other class (from where I copied this code) compile previously?
After some fiddling around I put the assignment operator bug back, and guess what? The code compiled fine.

Based on just these observations, we can conclude that the gcc C compiler ignored the unreachable (else) part of the code. It did not even check if the code inside the else block was legitimate. How far did this behaviour go? Let's see.

File httpd.c

#include "../mynet.h"

if(errno = EINTR) {

    //do something

} else {

    mocha(); //Undefined function

}

This compiles fine.

File httpd.c

#include "../mynet.h"

if(errno = EINTR) {

    //do something

} else {

    asdf;

}

This correctly fails with an error.

So syntax checks are being done in code that is known to be unreachable, but there are no checks for undefined functions. A bug? I would say yes. Google did not turn up much except this old link - http://compgroups.net/comp.lang.c++.moderated/could-if-else-avoid-syntax-checking-compile-time-unreachable-code

Sunday, 9 January 2011

Working through UNP

As I wrote previously, I've been working through Richard Stevens' Unix Network Programming (3rd Ed) Vol 1. It covers the basics of the Sockets API in UNIX and similar OSs.

Unfortunately I've been able to devote time for this mostly on the weekends. This translates to slow progress because UNP goes into a lot of depth about everything. This is a good thing, but it also means that I've to reread and review the last few pages everytime I try to pick up where I left off. This does not really help when I'm trying to understand concepts in depth. So what's the solution? I'll try to do a bit atleast 3-4 days a week from now on.

UNP is an amazingly detailed book - and as one of my colleagues said - "If you read that book properly, Stevens makes sure that there's nothing left for you to know on the topic". I agree.

Stevens wrote his own wrapper functions over common socket functions and used them in all the code examples in the book. The wrappers handle all error codes and portability issues (like IPv4/IPv6). These are included in a header unp.h (available in the back of the book as well as online on http://www.unpbook.com/src.html).

Some reviewers of the book gripe about this and say that this is an obstacle to learning the actual functions. But I think that there was no other way to do it without littering every example snippet with code for portability and error handling. The wrapper strategy makes it easier to follow the examples, and at the same time - as I found out - it makes you write those wrappers yourself. True, you can just include the unp.h header as you try the examples, but then you'll never know what those functions are doing. I've found that creating my own header and writing the functions as I come across them, after looking at the book's source code, works great. Most of them will end up identical to those in unp.h.

I'm pushing the examples I'm trying out into github - it's a scratchpad so not everything might compile.

I've added a generic startserver function to my header - this takes a pointer to a function as an argument. The generic function starts a server socket (bind/listen/accept), forks a child when a client connects and calls the function that was earlier passed as an argument, abstracting out the actual serving part. The function pointer syntax was not hard to figure out - I'd read Peter van der Linden's algo on unscrambling declarations in C last week. Interesting how things add up!

On Code, And Other Things

Pages