Pages

Saturday 26 December 2009

A Ruby script to search bookstores online

I started dabbling in Ruby some weeks back. The initial interest was sparked after reading "Treating Code as an Essay" (Yukihiro Matsumoto) - one of the chapters in Beautiful Code. So I started doing these bootstrapping exercises in Ruby. Some of the exercises are good - but nothing beats doing a small project to learn a new language.

I buy a lot of books, mostly online. There are a few good online bookstores in India, notably Flipkart.com, Infibeam.com and Indiaplaza.in (Sadly, Amazon does not have full-fledged shipping to India yet). The way I usually search for a book in online bookstores is (was, till now)

  1. Go to books.google.com and enter the book title
  2. Click on the best match
  3. Click on 'All Sellers' on the left of the page
  4. The Indian bookstores are usually listed towards the bottom. It does not include all stores, and sometimes the prices are not listed. I have to go to each individual site and check them out.

I wanted to collapse these steps into one - a simple script that would accept the name of the book and show results from all these bookstores, with comparative pricing. And the result was this

http://github.com/talonx/book-search

It's in Ruby, runs from the command line and writes the output to an HTML in the same directory called 'search.html'. Much needs to be done, like

  • Price based listing with the lowest on top
  • A web interface for the search
  • Add more bookstores - it's only Flipkart.com, Infibeam.com, Indiaplaza and Bookadda.com right now.

To run the script, type this (you need Ruby 1.8.x, available from http://www.ruby-lang.org/en/downloads/ and the Hpricot HTML parser library, available from http://github.com/whymirror/hpricot)
ruby lib\book-search.rb "<book title (in quotes if it has spaces)>"

Saturday 12 December 2009

Do that side project

Do that side project.

How many times have you told yourself

  • I'll start that open source project I've been thinking of
  • I'll write that utility which will make my job easier
  • I'll enroll for that course on Artificial Intelligence and write that amazing recommendation system

and then did nothing?

Well, guess what. Time passes. Yes, really.

Anne Dillard said

"How we spend our days is, of course, how we spend our lives."
Think about that for a moment.

Don't waste time on thinking about when to think about planning to think about thinking about when to start thinking about doing it. Do it now.

Here are some more resources on the subject -

  1. Shut up and Hack - http://www.slideshare.net/bluesmoon/shut-up-and-hack
  2. Do it Now - http://www.stevepavlina.com/articles/do-it-now.htm
  3. Do it Fucking now - http://seoblackhat.com/2007/01/29/do-it-fucking-now/
  4. Chris Wanstrath's keynote - http://gist.github.com/6443

Monday 7 December 2009

Adding MySQL server instances using mysqlmanager

The MySQL instance manager - mysqlmanager - provides a way to manage multiple MySQL server instances on the same installation. All these instances use a common my.cnf file - but each can be configured individually (using the same file). mysqlmanager itself provides a command line interface to control the individual instances.

Part of a sample mysql.cnf with multiple mysql instances

[mysqld1]
user = mysql
datadir = /data/mysql-1
socket = /tmp/mysql-1.sock
port = 3306

[mysqld2]
user = mysql
datadir = /data/mysql-2
socket = /tmp/mysql-2.sock
port = 3307

The ability to setup multiple database servers fast is particularly useful in development boxes where fresh DBs need to be created often. In my team, we often need to do this. Every time a new DB has to be setup, we have to go through the steps of creating a datadir, installing the system tables, adding a root password, adding the entries to the my.cnf file and starting the instance using the mysqlmanager shell.

So I whipped up a small Linux shell script which automates this process.

Here it is.

It's still in a quite primitive state - but it works!

Usage is simple -
add-mysql-instance.sh mysql config-file-location datadir groupname username password instance port instance-name mysqlmanager-user mysqlmanager-password mysqlmanager-socket-file

Of course, mysqlmanager has to be running for this to work.

I'll be adding improvements to this script - like the ability to generate a mysql instance name based on existing instances (instance names are usually mysqld1, mysqld2 etc), picking up the user name from the file itself etc.

Friday 11 September 2009

India Needs an AntiSpam Law

The Problem
I dread it whenever I have to enter my email address at an Indian ecommerce site. It's mandatory if I am buying something, and I do it reluctantly. After the product is bought, I go to the My Account link if there is one on the site and unsubscribe from all marketing notifications (because most of the times they do not bother to tell you at the time of registering or entering your email address that you have been autosubscribed to such mails).
Note that I do not mind receiving notifications from system administrators and mails related to the delivery of the product I bought. But I do not want to keep on receiving general mails about things I am not interested in.

The inevitable happens after a couple of weeks. I get emails from the site offering me discounts on new products, new deals; in short, commercial email. Unsolicited – because I did not opt in. And in some cases I opted out explicitly. In other words, Spam. Some of these mails have an Unsubscribe link at the bottom. After you have apparently 'Unsubscribed' using the said link, one of the following things happen -

1. Similar mails keep coming, with the same Unsubscribe link. Most of these links are just mailto: links as opposed to an http: link. An http: link usually means it’s a mailing list manager software, which actually works. But a mailto: link more often than not means that somebody has to manually do the removal. Which does not happen.

2. The Unsubscribe mail bounces. Either because the Unsubscribe mailbox does not exist (Surprise!) or it has exceeded its quota because people keep on Unsubscribing and nobody reads or deletes them (Surprise!)

Here are some sites that do not have a working Unsubscribe link in their emails. All my efforts to Unsubscribe from their unwanted mails have failed. Most of these are commercial sites I use regularly.

http://www.sulekha.com
http://www.pvrcinemas.com
http://www.citibank.co.in (These guys take the cake as far as repeated requests to remove my address and repeated responses that they have done so and the and sorry-sir-it-won't-happen-again routine are concerned)
http://www.siliconindia.com
http://www.indiaplaza.in
http://www.bookmyshow.com

At this point I would distinguish between two kinds of spamming -

1. The kind I describe above. You cannot mark them as spam since you might be getting legitimate mails from the same address in future (like when you buy another product and there is a confirmation) and cannot afford to miss them.


2. The 'normal' spam that you get everyday in your junk mail folder. All mail providers detect and mark them as spam automatically. These are sent by people whose only job is to spam others, usually sitting in a country whose laws are lenient enough to allow it.
To start with, ecommerce sites need to understand that giving my email address for a necessary purpose does not imply that it entitles them send any email to that address.

My email address has a privacy status similar to my telephone number.

It’s like calling up someone every week with irrelevant news just because you happen to have their phone number. (On a related note, the Indian NDNC – National Do Not Call Registry – is a step in the right direction as far as controlling whom telemarketeers in India can call is concerned).

How do other countries deal with this?

Almost all progressive countries have laws and directives dealing with this explicitly.

EU : http://en.wikipedia.org/wiki/Directive_on_Privacy_and_Electronic_Communications
Aus : http://www.dbcde.gov.au/online_safety_and_security/spam
NZ : http://www.dia.govt.nz/DIAwebsite.nsf/wpg_URL/Services-Anti-Spam-Index
US : http://en.wikipedia.org/wiki/CAN-SPAM_Act_of_2003

Here is a more comprehensive list maintained by SpamLinks.
http://spamlinks.net/legal-laws.htm#country

More…

Then there are the ISPs (Internet Service Providers).

I have a Tata Indicom broadband connection. From time to time, these guys feel I need to know about their latest antivirus offerings, or some cool deal they have for the festive season. These mails don't even have an Unsubscribe option. When I call them up and ask to be removed from receiving these mails, the customer service people are initially clueless, and on further pressing inform me that these mails are to keep me informed. Er, what? And what if I don’t want to receive them? They say they cannot remove my email.

India needs an enforceable AntiSpam law, and now.

The Indian IT Act of 2000 and its 2008 Amendment:

Disclaimer: I am not a lawyer nor do I claim to understand law well. The views below are based on a reading and an attempt to understand publicly available documents.

The only section in the Indian IT Act – the only law in the country that deals with cyber offences – that I could find dealing with unwanted email is Section 66(A).
        any electronic mail or electronic mail message for the purpose of causing
annoyance or inconvenience or to deceive or to mislead the addressee or recipient
about the origin of such messages

Section 66(A) does not even begin to address the spam problems I describe above.

Either the existing law needs to include sections for dealing more specifically with spam or we need a standalone set of laws for making this kind of unsolicited email criminally prosecutable.

Sunday 30 August 2009

Wondering about the state of Java Developers

A friend of mine forwarded this article by Yakov Fain on sys-con.com -

http://in.sys-con.com/node/1040135

The essence of the article is this

The author interviewed a lot of people for developer positions, and most of them who call themselves Java developers and cite extensive experience in J2EE lack basic knowledge of core Java.

This might sound suspiciously like a gross generalization, but I believe that's not the case. I had a similar experience when I interviewed people for developer positions on my team last month. The position called for both Java and Javascript experience. These are the things I encountered -

  • Most people who have worked solely on services (read outsourced) projects list all J* technologies on their resume, but know very little in depth of Java programming.
  • There are people who lack any kind of programmer mentality or skills at all and put their current role as something like Programmer Analyst, and this fact cannot be ascertained from their resume alone. They often try to highlight other (non-software development) achievements.
  • SCJP certification is no guarantee that a person can code in Java (Surprise? Not at all)
  • There are people who have 3.5 years of experience, with multiple services projects under their belts, and familiarity with a host of technologies, who cannot write a Java class which will print out the prime numbers between 0 and 100.
  • Most core CS concepts are forgotten after 2-3 years of working in services projects.

Please note that I am not generalizing, but these facts do indicate a problem somewhere. These developers actually a represent a very small distinct sample of the worldwide developer community, since all my interviews were done in India (both face to face in my Hyderabad office and over the phone).

Another interesting point I noted was that most non-Javascript developers think that Javascript is used only for form validation. Such usage also qualifies as 'extensive Javascript knowledge' in their resumes.
What should I conclude from this? Is this malaise widespread in other parts of the world as well? Is it specific to developers in India working on outsourced projects? (No, as the link by Yakov Fain shows) Is it a result of outsourcing, leading to a lack of innovation? Or is the innovation there, but the signal to noise ratio too low?

Sunday 12 July 2009

Consistency in Development

Consistency in Development?

Simply put, it means following a set of basic guidelines in all development activities, from coding to deployment. This does not imply having rigid protocols and processes, because immutable rules don't help development but obstruct it. What it does imply is having simple, tried and tested conventions and some formal processes that people are comfortable with - 'Whatever works best for the team'. The key to getting the most out of consistency is to arrive at these rules by consensus and making sure everyone follows them.

It has to start at the very bottom.

Coding Standards
I cannot stress this enough. While there is no such thing as a perfect code convention, there should definitely be an agreed upon convention for a team - where everyone follows the coding standards agreed upon. This is true for any language. Imagine the plight of a developer who has to work on code originally written by someone else, and it takes him hours to figure out what the code does because the coding style followed by the original author is completely different. Junior developers often don't get this. The standards can be chosen democratically by involving every member of the team, freezing the conventions and applying it to everyone's IDE (Most IDEs support code style import/export). Ideally I would trust the developers to follow this, but it can also be enforced at the source control level where style checks can fail a checkin in case somebody messes up.

           Speak the same dialect so that others understand you and vice versa.

Maintenance becomes much easier.

ArchitectureDon't run and start to hack away the moment you get the requirements. Stop for a moment - think awhile. Put your thoughts on the whiteboard and discuss with your peers. Run through your design with somebody who knows the big picture. Come up with atleast two or three different solutions to the problem - that way you would know that you have looked at it from various angles and chosen the right one.

Application StructureYou might have multiple web applications in your project, differing in configuration files, HTMLs, images, css, libraries, server side scripts etc. Storing them in a consistent manner across applications helps keep them organized and makes it easier for developers to find things, especially new ones. You will not be surprised by the completely different disk layout of an application if it follows the same directory structure as all the others. Deep down it also appeals to the organized mindset that most good developers have.

Issue Tracking
New bugs/features come up every day. Small teams can probably manage these for sometime with sticky notes and paper and pen. Some developers have their own ways of keeping track of their ToDo lists - but without a unified interface, you are not going to scale. You will have chaos - files missed in checkins, people clueless about who is looking into a particular issue, wondering about the status of a critical bug. When your team grows large, you need some kind of tracking system to track milestones and open issues, tasks to do, assign issues to developers and prioritize them. There are lots of good bug tracking systems out there - get one which works best for you. 
                Track issues in the same way across people


Deployment
Web applications will need to scale somewhere in their lifetimes, especially if they are successful. Think about your initial deployment environment - one webserver and one database server (on the same machine). As time goes by and your app becomes popular, you add servers. And features. New features translate to new application modules and new databases. The simple script you used to upload and deploy your small app is useless now.


At this point, scaling has multiple meanings
        The ability to handle increasing load and maintain baselevel performance, and

        The ability to push code into deployment quickly.

Both of these are affected by having (or not having) a consistent model.

The first point is actual application scaling - the complexity of your infrastructural setup is going to increase hugely as it grows. The second point has to do with how your team scales to increasing demands.

They are related. Critical fixes and features might need to be pushed immediately. These demands would reach proportions where you cannot afford to spend time figuring out why Server No 6 in your cluster does not have the latest changes. Automation is the key here. Automation demands formal well defined processes (for making builds, uploading to production servers etc). Formal deployment processes imply consistency. They do not mean a bureaucracy – just well followed and automated rules about how to deploy a change into production. As your app and infrastructure grows, it becomes more and more important to be able to rollback changes if necessary. This is possible only if you have well defined deployment paths and scripts.

Automate wherever possible. Minimize the number of things you have to keep in your head. And in the process, lower your stress levels!

Note: These thoughts are not entirely mine - these are the culmination of what I feel about consistency after having read the experiences and opinions of many others in blogs, books and articles, coupled with my own experiences in developing products. Also, you might have noted that I have been talking of web application development in some of the sections above, but these apply to any kind of software development.


Sunday 28 June 2009

First post! Well, not quite

Just moved this blog to my own domain here from JRoller. I can't think of any good reason why I had kept it there so long. JRoller does not score on any points - be it reliability of the server, ease of editing posts or look and feel.

All my old entries remain there - at http://www.jroller.com/talonx.