Pages

Friday, 17 May 2013

Book Review : The Art of Scalability

About the author: Theo Schlossnagle is the founder and CEO of OmtiTI.

The Book
This book aims to be a comprehensive, technology stack-agnostic compendium of strategies and guidelines to achieving scalability objectives for internet applications. It is quite thin (262 pages) and came out in 2007, when the DevOps meme was not around in its current form.  

Why I like this book
I like this book because it's oriented towards building a solid foundation on topics related to scaling. Compare this book with 'Web Operations: Keeping the Data on Time' (published in late 2010), and you'll find the book under discussion to be more grounded in fundamental principles, and the latter more oriented towards new trends. Now there's nothing wrong with the 'latest-trend' books, but it's better if one reads this kind first to get a good grounding.

Overview of Chapters:
The first three chapters cover basic principles, managing release cycles and operations teams. 

A big part of chapter 4 is devoted to explaining the difference between high availability and load balancing. There's no coverage of Cloud based options here – this is for you if you manage your own datacenters. Also, cloud based options will invariably be tied to specific vendors. Different HA options are considered with almost academic rigour. 

Chapter 5 examines load balancing options at different layers of the OSI network stack.

Chapter 6 is a mini-guide to building your own Content Delivery Network. From calculating your expected traffic, cost estimates, inter-node synchronization in a cluster to choosing the OS and having an HA network configuration – it's an interesting journey. It brings out the challenges which are invisible to most of us who push our static content to a third party CDN and forget about it. There's a section on DNS issues as well covering Anycast.

Chapter 7 covers five caching techniques. True to the general theme of the book, it does not talk about specific technologies but about theory that can be studied and applied to the problem at hand. An example of speeding up a news website is used to illustrate how to deploy and tune memcached (for that specific site's design).

In Chapter 8, we see an overview of distributed databases, including an overview of different database replication strategies. Managing, storing, aggregating and parsing logs is a challenge we all face – this is covered in Chapter 9. This chapter is dated now as there have been many advances on this topic.

Overall, a must-have for anybody who is interested or works in scaling internet facing applications.

Amazon US URL: http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X

Indian bookstores: http://isbn.net.in/9788131706114

Saturday, 4 May 2013

Thoughts on "A Note on Distributed Computing"

A Note on Distributed Computing by Jim Waldo, Geoff Wyant, Ann Wollrath, and Sam Kendall is a widely cited paper. I have been reading and trying to understand it for sometime. It's available here - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.7628

The title of the paper is innocuous but it's much more than a "note". It analyzes the key differences between local and distributed computing, and explains why attempts to unify their programming models are misguided because of the fundamental differences underlying them.


The authors used to be part of the erstwhile Sun Microsystems when they wrote it - it dates from 1994. Later some of them were members of the JINI technology team and also wrote RMI, and if you look at the Java RMI source code,  you can see some of their names.


But in 1994 when this paper was written, Java had not emerged yet. There was no J2EE and CORBA was still young.

I wish to share my thoughts after reading it, and the realization that the opinions expressed in it influenced the design of Java's RMI.


Unification

Briefly, unification = unification of the local and the distributed programming models. Note that we are talking about distributed object oriented systems here.

The unification attempts assume that objects are essentially of a single top level type (like in Java), which might span different address spaces on the same or different machines (like different JVMs on the same or different machines in the case of Java) and they can communicate in the same way irrespective of where they are located. In other words, location (same JVM versus another JVM in another country) is merely an implementation detail that can be abstracted away behind the interfaces used to communicate between two objects without any side effects.


Such a (hypothetical) system would have the following characteristics

  1. Program functionality is not affected by the location of the object on which an operation has been invoked. Or viewing it from a slightly higher level, there is a single design as to how a system communicates irrespective of whether it's deployed in one address space or in multiple ones. 
  2. Maintenance and upgrades can be done to individual objects without affecting the rest of the system. 
  3. There is no need to handle failure and performance issues in the system design.
  4. Object interfaces are always the same regardless of the context (i.e. remote or local)
The authors contend that all these statements are flawed. I'll not attempt to go into those details - the paper explains them well.

The paper then goes onto examine the 4 areas where local and distributed computing differ drastically:
    Latency

    Memory Access
    Partial Failure
    Concurrency

Those of us who have worked on distributed enterprise and internet software have come across these. These 4 differences cannot be papered over to present a 'unified' view of objects which lie on different machines.

 

Java RMI
If you look at RMI, you can see its design influenced by the assumption that the above 4 points are invalid.
  • "Remote" objects have to extend the java.rmi.Remote interface. "Remote" objects - objects that can be invoked from another JVM - are different from local objects.
  • Remote (inter-JVM) method calls have to explicitly handle the java.rmi.RemoteException, which is a checked exception, thus highlighting the fact that a distributed call is subject to modes of failure that are non-existent in a local call. In fact, it extends java.io.IOException and the javadoc is explicit about network issues "is the common superclass for a number of communication-related exceptions that may occur during the execution of a remote method call".
Let's look at #2 again. From the paper:
"As long as the interfaces between objects remain constant, the implementations of those objects can be altered at will".
Premonition of SOA, anyone? This concept would be familiar today to anybody who is acquainted with the fundamental principles of service oriented system design (replace 'object' with 'service'). But since these statements are challenged and refuted later in the paper, the question naturally arises - how come SOA is successful?  

SOA assumes that things are independent and distributed services, and any invocation of a service assumes that there are failure modes which exist because of the communication's distributed nature. This builds on the same RMI concept as having to explicitly throw RemoteException when making a remote (distributed) call. This same concept is taken into consideration while writing any SOA system, which is another way of saying that the authors of the paper were correct.

Note: A short and readable summary of Java RMI is to be found in Jim Waldo's book Java: The Good Parts.

Sunday, 28 April 2013

"Upgrading" to Fedora 18

I have been running Fedora 16 on my work laptop. It was EOL'ed early this year, which means no more upgrades, including for things like Firefox. There was no option but to upgrade. I had two choices - opt for something with long term support like Ubuntu LTS or try the new Fedora (and try the newer one in 6 months).

I opted for the latter, since I've been using Fedora for a while, hoping that I would not have to do a Windows-style post installation cleanup. 

No such luck. The things that were broken, still are.

Some highlights from the experience:

Nepomuk, Akonadi
Disable them? Sure. They are disabled in System Settings, but insist on starting up anyways. Uninstall them? Not possible. They're so tightly coupled with KDE that uninstalling them uninstalls all of KDE. The developers don't seem to be listening to the users here. 

Disk space
My installation ran out of disk space 20 minutes after I rebooted post-installation. There seemed to be some continuous process in the background which was eating up space. Some investigation identified the culprit.
[talonx@****** apps]$ pwd
/home/talonx/.kde/share/apps
[talonx@****** apps]$ find . -type f -size +50000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
./nepomuk/repository/main/data/virtuosobackend/soprano-virtuoso.log: 151G
./nepomuk/repository/main/data/virtuosobackend/soprano-virtuoso.db: 68M
Yes, it created a log file of 151G within 20 minutes. What kind of application does that? What about basic stuff like log file rotation?

PackageKit
Another of those which does not go away, and causes endless irritation.

Fedora is not something that I would prescribe to new Linux users. Others have pointed out that its instability and some features are probably the result of staying at the cutting edge. Granting that, it remains difficult to get it to a state where even people like software developers can use it to be productive.

Saturday, 8 December 2012

Experiences in building a home NAS - Part 1

Some months back I had this email conversation with a friend about the best strategy for storing movies/music/photos at home. I was running out of space, again. One of my external 1TB drives had crashed.

Hard drives fail all the time. To backup the hard drive, you need another. Multiply that by the number of drives you have. The bigger the hard drive which crashed, the bigger the psychic shock. And so on. The buy-another-1TB-drive-strategy was not working.

My friend had done some research on this topic, and he suggested building a NAS box out of commodity hardware. We exchanged mails back and forth and researched a lot for a month. There's a huge internet community doing the same thing so finding information was not a problem. An option which seemed attractive was combining a small server from HP with my own storage and RAM. This was the HP ProLiant MicroServer N36L/N40L.

The hardware

I finally went with this to avoid having to choose the non-storage hardware myself, being no expert in it. The specs for the N36L model were decent enough - 
  • AMD Athlon II Neo (dual core) 1.8 GHz
  • 1 GB included RAM (max 8 GB)
  • Seagate 160 GB
  • Gigabit ethernet
The RAM and HDD were of course, not enough. But the rest of it was. It came with 4 drive bays - to put my own hard disks in.

There were other factors in choosing this
  • I did not want a hardware RAID controller - it ties you down to the RAID controller.
  • I did not care for more storage bays.
  • It has 7 USB ports, with one internal which can host a flash drive to boot from, thus saving the other drive bays for storage!
  • To top it all, it was good looking.

The software

That's the hardware. What about the OS? NAS software? Like I said, my friend had already done some research, and I got the pointers from him. FreeNAS is a superb option if you're building your home NAS. It's based on FreeBSD and supports the ZFS file system. ZFS was conceived and implemented at the erstwhile Sun Microsystems. The virtues of this filesystem - I'll just point you to the documentation. Resilvering (auto repair) and copy-on-write are two of them.

Software RAID

ZFS supports various software RAID options, including RAID-Z. RAID-Z1 (the first level) essentially gives you the ability to survive one hard disk crash if you have 3.

With RAID-Z1, if you have 3 x 1 TB drives, you will use 2 x 1 TB for data and 1 x 1 TB for parity information. Even if one of the hard disks crashes, you can add a new one and you're fine. You use 1 TB to pay for recoverability. There are other configurations if you want more redundancy, like being able to recover if you lose more than one drive at the same time - which are more expensive as they need more disks.

Buying the hardware

I bought the ProLiant from a local dealer after scouring the Hyderabad classified pages. The HDD (3 x 1TB) and 4 GB of RAM I bought online, from Flipkart. Here are two bits of learning if you're doing the same
  • Buy ECC RAM if you want surefire data integrity.
  • Buy more hard disk space than you need now - you won't regret it. It will cost a bit more, but it's better than the alternative. The alternative would be to recreate all your ZFS volumes on the new (bigger disks). With my configuration, I'll have to buy 3 x (whatever GB) to upgrade my setup. Of course, the third option is that you don't go this HP microserver way at all, and choose a bigger box (custom built or otherwise) which supports more drives.
It's a NAS, so I needed a network switch, which I bought from eBay. This had good reviews, had 8 ports and supported Gigabit ethernet. Plus some CAT6 cables.

Putting it all together

The HP came with 4 drive bays, with one small (160 GB) HDD, where I installed FreeNAS. The other three went for storage (and parity). The RAM worked without a hitch, but that was because I had ensured it was compatible (see these hardware compatibility lists).


(All those cables were to connect my desktop monitor, keyboard and DVD drive to the HP server).




Internal view - the 4 drive bays in vertically placed in front.


The FreeNAS interface is easy to use, so setting up ZFS volumes was a breeze.



It also comes with a Cacti-like interface which lets you view OS metrics.




To complete the setup, I connected the 8 port switch to my ISP's router, and plugged in my desktop and the HP to the switch. FreeNAS lets you create NFS shares for the data stored on the NAS which I can then access as a network mounted volume.

There really are a lot of resources about setting up a home NAS in this configuration. Some of the ones that I found helpful are

This does not complete the NAS setup. Performance testing has to be done. And also, backups. I'll write about these in the next post.
     

Monday, 19 November 2012

Nagle's algorithm and delayed acks...

...don't work well together. I finally understood why from Richard Steven's UNP book.

In a nutshell,

S(ender) sends a packet, and cannot send the second one if the second one's size is less than the MSS (maximum segment size) until the first is acknowledged (Nagle). R(eceiver) receives the first packet, but cannot acknowledge it until the receiver app tries to send data on which the ACK can piggyback (Delayed ACK). S waits, R waits - until the delayed ACK timer on R times out.