Archive for the ‘Algorithms’ Category
Reasoning About Phone TCP Performance
Here’s a question that has often been asked, but seldom been answered: how does one send an image to or from a phone as efficiently and quickly as possible? More generally, how can we reason about the behavior of TCP/IP as implemented on a modern smart phone? Phones don’t seem to play by the rules which TCP was designed for, we have a terribly unreliable channel pushing packets at random, with terrible delivery rates. This is exactly the kind of scenario that would wreak havoc with congestion avoidance and control algorithms, when in reality there might not be any valid reason to artificially throttle the connection. Not much work into how TCP performs in the mobile world has gone into commercial devices and I think it’s a really big open question that deserves some attention.
To that end I’ve started looking at characterizing TCP delivery on my mobile phone, in the hopes that from this characterization I can develop some more efficient algorithms, and just get an intuitive sense for what can and cannot be offloaded to the server. A big idea that has started gaining some traction lately is heterogeneous computing, where computationally expensive tasks are offloaded from your phone to the cloud for processing. Knowing the abilities of the network data channel will play a big roll in determining the feasibility of this, and many other ideas.
Let’s Transfer Some Images
My current side project is centered around manipulating images on the mobile phone, or in the cloud, so a good starting point in this study was to see how long it takes to upload and download images of various sizes, over HTTP. I used three image sizes:
- Small – 11KB
- Medium – 51KB
- Large – 86KB
I took these sizes from the three thumbnail sizes Facebook’s Haystack image store uses to present images to the end user, as those sizes seem to be working reasonably well for them. A large image is by no means a full-sized image, but it’s more than big enough to display on any smartphone screen, with some zooming supported.
Because this is a first pass, the implementation is pretty rough, and leaves room for optimization. My goal was not to see what the most optimal way to send an image across the wire was, but it was to characterize the most obvious way a typical developer might do it. I used the pre-bundled HTTP classes, and did everything the obvious way. I wrote two functions and wrapped them up in an Android service, with some logging facilities.
The functions themselves looked like this:
Downloading
private int doDownload(String path) {
long startTime = System.currentTimeMillis();
try {
URL url = new URL(path);
URLConnection connection = url.openConnection();
// getting file length, will cause the first
// read of the HTTP headers
int lenghtOfFile = connection.getContentLength();
// create an input stream to grab data from the connection
InputStream input = new BufferedInputStream(connection.getInputStream(), 8192);
// do a read to null basically and grab the entire image, forcing
// the internals of the stream implementations to do what they must
// do to push the image to us.
byte data[] = new byte[1024];
long total = 0;
int count;
while ((count = input.read(data)) != -1) {
total += count;
}
// close the connection
input.close();
} catch (Exception e) {
return -1;
}
long stopTime = System.currentTimeMillis();
return (int) (stopTime - startTime);
}
Uploading
private int doUpload(String filePath, String path) {
long startTime = System.currentTimeMillis();
try {
// No simple way to do a multipart post, so we
// build it from scratch in a sense.
HttpClient httpClient = new DefaultHttpClient();
HttpContext localContext = new BasicHttpContext();
HttpPost httpPost = new HttpPost(path);
MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE);
entity.addPart("image", new FileBody(new File (filePath)));
httpPost.setEntity(entity);
HttpResponse response = httpClient.execute(httpPost, localContext);
} catch (IOException e) {
e.printStackTrace();
return -1;
}
long stopTime = System.currentTimeMillis();
return (int) (stopTime - startTime);
}
Quite frankly, they leave a lot of questions to be answered. To really understand what’s going on, we need to dive into the internals of the BufferedInputString class, and do some TCP dump captures of the packet data. It’s of great interest to see how the request headers and body are broken up, controlling TCP packet segmentation could have big gains when transferring images. As I said before however, this is only preliminary exploration, so we’ll leave all that to be answered in the future.
Results
I left my sample application service running on my phone as I went for a run over the course of two hours in the city. I figured this would give me a decent spread of typical mobile conditions, in a diverse set of areas, and give decent data. I collected approximately 1500 data points and produced some histograms of the data.

Taking a look at the data already yields some really interesting insights. Images are generally transfered within 2-3 seconds, that’s the common case, with an extremely long tail of extended delivery. Interestingly enough I didn’t have a single timeout, but some images took over a minute to transfer, which is completely unacceptable for the end-user experience, considering the size of the images involved.
The difference in size as compared to transfer time was approximately linear. Uploading of the small image seemed to exceed expectations, and didn’t actually have much of a long tail like other transfers. I’m guessing the length of the tail is a function of the number of round-trips required to successfully transfer the data. Examining the packet flow of these images would yield some interesting insight into that relationship.
Conclusions
So this really just creates a bunch of further questions. It’s a really high-level view of delivery times for small files on the phone, but closer examination could yield some more interesting information. I’m really curious about applications of this knowledge in designing efficient image uploading algorithms. Relaxing some of the guarantees of TCP might allow for significantly faster image upload, and data transfers more suited for mobile apps. Expect to see more along this vector in the blog in the future!
Keeping your Sanity when Working on Large Projects
The size of my projects, both personal and academic, grows larger and larger as I tackle bigger problems. My most recent personal project is starting to touch 10,000 lines of written code, with many more of third-party modules, and no end in sight. While lines of code are a poor metric for the complexity of a project it’s obvious that there’s a direct relationship between the two. As these projects continue to get larger, it becomes impossible to keep the entire thing in my head. As I focus on various pieces of it, others fade away, and the whole thing starts to become a little hard to come to terms with.
Even more alarming is that as the codebase grows in complexity, my confidence in it shrinks. When the code is small, it is easy to maintain. I understand the entire system, and see it for what it is. When I don’t like something about the code, I simply refactor the entire thing, and again I am happy. Once the project grows past a couple thousand lines refactoring gets tougher, and no longer do I feel like I have a good grip on the code. It starts to feel sloppy, and thrown together. I start feeling like there should be a better way, and that my code isn’t up to standard, and won’t be maintainable. In essence, I start to think my code sucks.
At first this really concerned me, if I couldn’t write more than a few thousand lines of code before everything turned into unorganized spaghetti, then I surely was a poor programmer. I started looking through the old code, trying to figure out where I went wrong, and why I felt it was written so poorly.
Taking a look at all of this written code, it turns out that there really wasn’t anything wrong. Sure, there’s minor pieces that could be written differently, with a better trick I picked up only after that module was completed, but there’s nothing terrible about it. No matter where I looked, I couldn’t find the lines of code that I was so sure sucked. This being the case, if the architectural decisions were decent, and implementations were clean then why did I feel so terrible about the project?
I think this answer has to do with what happens when a program exceeds your ability to fit it in your head. There’s a point where you can no longer easily hold your entire program in your head, and it happens pretty early on in a large project. There’s just too much code to keep all of the right assertions and intuitive reasoning patterns needed to fluently work in your head at once, and as a result you have to page out the ones that aren’t immediate to your current task.
This feeling has bothered me for quite some time, I couldn’t understand it before. I think the only solution is to just come to terms with it, and accept that anytime a project grows above a certain threshold you will inevitably feel like there’s some smells coming from already written code. Writing smaller modules, and ensuring things are self-contained with a consistent exception framework has really helped to contain this feeling for me. If you can write code that doesn’t surprise you later on when invoked, and which performs small, obvious functions then you’re doing something right. Focusing on creating abstractions with a point of avoiding leakage and planning upfront for code extension has also significantly helped.
If you feel this way too I wouldn’t worry too much, it only means that you’re pushing the limits of what you’re capable of doing, and creating complex systems that actually solve real world problems! Very few projects are small enough to fit in one mind, so simply focus on always creating tidy code when focusing on a specific part of your application, and making a consistent style for error handling and other generic aspects of your code.
Machine Learning in Haskell – Linear Regression
One of the classic techniques in machine learning is linear regression. This approach models a function using a linear relationship between one or more input variables and the output set. It’s used in a ton of different situations, from building classifiers for large e-commerce sites to suggest new products, to fitting a line to a data set in Excel.
Today we’ll implement this algorithm in Haskell as both an exercise in using the hmatrix library, and as a practical endeavor in using a functional language for a useful purpose.
Linear regression is implemented using linear algebra and an approach that minimizes the least squares error of a matrix of weights, using a training data set.
Overview of Linear Regression
We define a linear equation that accepts an arbitrary number of variables.
We setup a matrix of X values with a column of ones appended to the beginning to allow for an offset term.
Our matrix is set up with m rows of data, and n columns of independent variables. This represents the a set of input data.
You’ll notice that when we multiply w by the data set, we produce a column matrix of output values, corresponding to the y value for each of the set of independent variables.
This defines a set of equations, but we still have to determine the value of the weight matrix. To do this we take a set of data, called the training set, and plug it into the input data matrix, along with the output values. We subtract the estimated value, obtained via the weight matrix, from the actual output value and square this, take the derivative of the equation, set it equal to zero, and solve for the weight matrix.
All these steps are a little complex, and if you’d like to learn more the Wikipedia article on linear regression is pretty well done and gives a lot of information on the statistical theory behind regression, but in the end you end up with the following equation to determine the weight matrix:
Implementation in Haskell
To implement these functions in Haskell is actually pretty simple, we make use of hmatrix, an excellent matrix library built in Haskell.
Installing hmatrix in Mac OS is a little tricky, we have to install the GNU Scientific Library development libs first, luckily MacPorts can help us out:
sudo port install gsl-devel cabal install hmatrix
After that we’ll go ahead and load in some data. The authors of hmatrix provide a wonderful manual to get up and running with the library and a number of functions have been built to allow us to quickly build matrices up.
In this case we’ll use the loadMatrix IO function, which will pull in some input data from files. We format our data files as space-separated rows of numbers:
test.txt
0 0.660691332817078 0.1 0.754551916894156 0.2 0.925818388603147 0.3 0.904216776317371 0.4 0.754324606651347 0.5 0.572540852930199 0.6 0.226045290129906 0.7 0.135596809334937 0.8 0.075248476906341 0.9 0.186604404237266 1 0.546356452677449
train.txt
0 0.465670943260193 0.1 0.799516151425082 0.2 0.894981363854821 0.3 0.836263760453476 0.4 0.749666819566768 0.5 0.491481010373484 0.6 0.138236347335107 0.7 0.101170288570109 0.8 0.170054617567581 0.9 0.319242745425697 1 0.455485771272382
The input parameter is on the left, with the function output on the right. We have both a test and training set of data so we can evaluate the performance of our regression algorithm. Loading these files in Haskell is a cinch.
dat1 <- loadMatrix "train.txt"
dat2 <- loadMatrix "test.txt"
let [x_in, y_in] = toColumns dat1
let [x_in_test, y_in_test] = toColumns dat2
Next we need to add some dimensionality to our input variable. A cool trick commonly used in linear regression is to allow fitting of higher-order polynomials by adding additional columns to our input variable data set, and computing higher order values of the input variable for them. In our case we’ll use a power series from 0 to 2, to allow for a quadratic fit:
let x = fromColumns $ map (x_in^) [0..2]
let x_test = fromColumns $ map (x_in_test^) [0..2]
After performing this our X matrix will look something like this:
11x3 1.000 0.000 0.000 1.000 0.100 0.010 1.000 0.200 0.040 1.000 0.300 0.090 1.000 0.400 0.160 1.000 0.500 0.250 1.000 0.600 0.360 1.000 0.700 0.490 1.000 0.800 0.640 1.000 0.900 0.810 1.000 1.000 1.000
The first column is the input value raised to the 0th power, which gives us the ability to compute an offset in our weight matrix, the second column is the input value raised to the first power, and the third column is that value raised to the second power.
Next we’ll compute the weight matrix, using the equation defined previously. Using the reference sheet in hmatrix’s manual proved to be useful here to look up your basic matrix manipulation functions:
let w = (pinv $ (ctrans x) <> x) <> (ctrans x) <> y_in
Breaking this apart we use a couple functions from hmatrix.
pinv– This generates the pseudo-inverse of a matrix for us, equivalent to X^(-1).ctrans– The transpose of a matrix. Swaps out the rows and columns.<>– The cross product of two matrices.
The function operates over the entire training set, and computes a weight matrix that minimizes the error given the dimensional constraints of the input matrix.
Finally we can evaluate our function on both the test set and training set:
let train_y = x <> w
let test_y = x_test <> w
putStrLn "Training set root-mean-square value:"
print $ sqrt . sum $ toList ((y_in - train_y) ^ 2)
putStrLn "Test set root-mean-square value:"
print $ sqrt . sum $ toList ((y_in_test - test_y) ^ 2)
It’s super easy to use the weight matrix, multiplying an input set of x values against the weight matrix generates the output values. In our case we’ll get output that looks something like this:
Training set root-mean-square value: 0.7082342551441069 Test set root-mean-square value: 0.7088158510569752
Evaluation
The data set was generated from a sinusoidal function added with a slight amount of Gaussian noise, so a quadratic fit is not the best approximation. A 3rd order fit will generate a very nice root-mean-square value, and a sin(x) kernel function will generate an almost perfect fit.
The approach does generalize well though, hmatrix is a wonderful library and allows you to perform a lot of statistical operations in Haskell that you would typically reach for a program like MATLAB for.
Why does Haskell use Singly Linked Lists Anyway?
As I continue down the road often less traveled in my journey to learn Haskell, I’ve found some tasks are exceedingly difficult to perform efficiently. One of the strengths of a functional language is its ability to allow a programmer to work with a higher-level representation of the problem, leaving implementation details to the system, whether it be a compiler, interpreter, or operating system. Systems like relational databases have allowed us to take a declarative approach to specifying a program, merely naming fields and describing relationships is enough to allow the database engine to efficiently choose an algorithm and pull results from the database. Functional languages like Haskell promise this same abstraction with algorithms themselves. In imperative languages we set up carefully constructed loops, and iterate over lists to find values and calculate results. In functional programming, we instead focus on using higher order reusable iteration functions, and composing a chain of functions over a list that achieve the desired result.
In theory this sounds great, it’s a more mathematical way to think about programming. Traditionally you focus a lot on how exactly the computer should manipulate data structures to get a desired result, but in Haskell you focus on formulating your problem. If you can adequately describe the problem, GHC takes care of turning that functional description into runnable code, life is good, your job is done, and the program runs.
Functional Programming: The Op Amp of Computer Science
A popular sentiment in the functional programming community is that teaching a functional language like Haskell first is the way things should be done, and would create better CS students. I don’t agree with this. While expressing problems in a functional manner is great, I think it makes the performance of an algorithm too opaque, understanding what makes functional code truly efficient is extremely difficult.

In fact, Haskell reminds me a lot of another tool that was designed to be easy to use in a different domain, the classical op-amp.
Op-amps were created in the world of electronics to be ideal amplifiers, allowing for designers to create large, linear gains in the amplitude of a small signal using a few low-cost components. They were created because before their existence, doing this was hard to get right, and created a lot of headaches.
To their credit, op-amps solve a number of difficult problems, and make the lives of electrical engineers everywhere much, much easier. The problem arises when one tries to do something mildly complex with an op-amp, and the abstraction breaks down. A ton of scary mathematics is brought into the picture, and one has to understand all the theoretical underpinnings of the device. It turns out op amps can only be maximally useful to those who understand how to build circuits without them, what their drawbacks are, and all the theory behind their design.
This sounds a lot like Haskell. It’s a wonderful tool, and allows for a programmer to offload a lot of the mental effort required to solve an issue to the compiler, but once you move beyond simple tasks you can quickly find yourself in deep water, with poor, unexplained performance.
A good example is list manipulation in Haskell. In imperative languages there’s an intuitive grasp of how efficient an algorithm will be. You’ll literally be writing the loops out, so the asymptotic runtime is explicitly stated in the code. However, in Haskell, things aren’t so simple. Prepending to the front of lists will take on the order of O(1), while appending elements to the end of a list will operate on the order of O(n). Why is this exactly? Well, the answer isn’t immediately obvious to someone first getting started in functional programming. Some research will lead you to the conclusion that it’s due to lists in Haskell being implemented as singly linked data structures. Some further thought will give insight into why this data structure was chosen when at first it seems like a limiting design choice. One can imagine that a singly linked list would be the data structure of choice for a functional language with lazy evaluation. The compiler can make assumptions and reuse lists due to their immutable nature, and create branches from elements because they lack a backwards facing pointer, along with a number of other nifty optimizations.
That’s all great, but again, it’s not obvious to someone who isn’t skilled in the trade and has a deep understanding of traditional imperative languages.
Not to say it’s all bad
In Haskell’s defense, they have done a wonderful job creating tools and documentation that try to address this problem. GHC contains some beautiful profiling tools that allow you to analyze what the performance-limiting steps are in your computations, and the documentation for the prelude frequently mentions asymptotic runtime of functions.
Make no mistake, Haskell is a powerful tool, the class of languages it exemplifies may very well be the future of programming, but to properly utilize it requires a pretty deep understanding of what’s going on under the hood, and of the fundamentals of computer science. I think teaching a functional language as a first language would be a mistake.
Algorithm Puzzle: Finding Three Numbers that Sum to a Value
As part of the new year I’d like to start presenting a classic algorithm problem every week, briefly analyze the runtime, and shows implementations in a standard language such as C++ or Python, and an implementation in Haskell.
The first question I’d like to look at is a simple interview question I was asked a while ago. The basic problem is defined as follows:
Given a list of random numbers of arbitrary length, find a combination of three numbers that sum to a certain value with the best possible big-O runtime.
There are generally two accepted approaches to this problem, the naive approach, with a Big-O time complexity of O(n3), and a slightly more intelligent approach, with O(n2).
The Naive Approach
The obvious way to first attack this problem is by simply iterating over every possible combination of values, and testing each one. This is easily implemented in Python.
#Naive Approach def naiveApproach(numbers, value): for x in numbers: for y in numbers: for z in numbers: if (x+y+z == value): print 'Value found: %s' % str((x, y, z)) return
In Haskell this approach is also really elegant looking, I took the approach of simply creating a list of tuples representing every possible combination, and filtering it for tuples that only meet the criteria of being correct.
isSum :: Int -> (Int, Int, Int) -> Bool
isSum sum (a, b, c) = sum == (a + b + c)
naiveApproach :: [Int] -> Int -> (Int, Int, Int)
naiveApproach xs sum = head $ flip List.filter (combinations xs) $ isSum sum
where combinations xs = [ (a, b, c) | c <- xs, b <- xs, a <- xs ]
This works alright for smaller datasets, but becomes impractical for large sets of numbers and slows down dramatically.
The Better Approach
To solve this I take the typical approach and see if I can exploit some sort of underlying structure in the problem. In this case the data source is a random array of numbers, but we know that sorting a list can be done in O(nlg(n)) time, so this is a good starting point for optimization, considering that the runtime is pretty dismal as it stands.
After sorting the list it’s not entirely obvious how to approach this problem. I found that examining a simpler problem helped out. Simplifying the problem, I asked how I would go about finding two numbers in a list that sum together to a certain value. This turns out to be pretty easy. One can sort the list, assuming the sorted list ascends from left to right and then, starting from both ends, move towards the middle. For each pair, sum the values and compare to the required value. If the value summed is less than the required value, increment the left pointer by one, else decrement the right pointer. Once a value is found, or when these pointers collide, abort.
After solving this simpler problem, we apply it to the three number problem. We can apply this sub solution iteratively to each element in the list. That is, for each element in the list, test if there exists two other elements using our sub solution that, when summed together with the current element, equal the required value. The Python implementation looks something like this:
def betterApproach(numbers, value): numbers.sort() for z in range(0, len(numbers)): x = 0 y = len(numbers) - 1 while (x <= y): sumValue = numbers[x] + numbers[y] + numbers[z] if (sumValue == value): print 'Value found: %s' % str((numbers[x], numbers[y], numbers[z])) return elif (sumValue < value): x += 1 else: y -= 1
The sub solution has a runtime of O(n), assuming the list is presorted, and applying the sub solution to each element in the list nets us another n of time, bringing the total runtime of the algorithm to O(n2).
Writing this solution in Haskell was not easy, but this is my first venture into serious functional programming.
betterApproach :: [Int] -> Int -> (Int, Int, Int)
betterApproach zs sumVal = head $ betterHelper1 (sort zs)
where betterHelper1 :: [Int] -> [(Int, Int, Int)]
betterHelper1 (w:ws) = (betterHelper2 0 w (w:ws) (reverse (w:ws))) ++ betterHelper1 ws
betterHelper1 [] = []
betterHelper2 :: Int -> Int -> [Int] -> [Int] -> [(Int, Int, Int)]
betterHelper2 c z (x:xs) (y:ys)
| c >= (length zs - 1) = []
| sum [x, y, z] == sumVal = [(z, x, y)]
| sum [x, y, z] > sumVal = betterHelper2 (c + 1) z (x:xs) ys
| sum [x, y, z] < sumVal = betterHelper2 (c + 1) z xs (y:ys)
betterHelper2 _ _ _ _ = []
What we’re doing here is defining two helper functions. The first helper is pretty simple, recursing over the list and appending any results from the second helper to the end of a list. The second helper is a little more complex.
It takes four parameters, a depth counter, the current z value, and two lists of numbers. To emulate traversal from both ends of a list, I pass in a copy of the list reversed as the second list. I traverse each list head first, using guards to conditionally return a result based on the comparison of the sum of the three values with the required value. When we recurse through the list, depending on if the value is greater or less than the required value I will pop the front element of one of the lists off and discard it.
As a performance enhancement, and to prevent repetitive evaluation, we squash the recursive branch when the number of pops has exceeded the size of the list of random numbers, which is the functional equivalent of detecting when the pointers overlap.
Evaluating Performance
Predictably the Haskell implementation far outpaced the Python implementation. I turned on multicore support when running benchmarks, which gives Haskell a slightly unfair advantage, but since concurrency is one of the core strengths of the language I felt it was fair.
The following benchmarks were run using a small Python script over iterations of 10-100 random data sets of 10,000 or 100,000 elements ranging from 1 to 10,000,000. Three numbers were chosen at random to be summed together as the target value.
| Naive Algorithm 10,000 pts |
Better Algorithm 100,000 pts |
|
|---|---|---|
| Python | 27127ms | 1972ms |
| Haskell | 384ms | 682ms |
As you can see, Haskell really shines when evaluating the naive approach, which parallelizes easily. When using the more efficient implementation, Python and Haskell both can handle much larger data sets, and the performance gap narrows a little, although Haskell still comes out on top.
If you can find a more efficient implementation please let me know! I’ll be the first to admit my Haskell skills aren’t that great at this point in time. Feel free to write in the comments, or send me an e-mail.