## Multiple Stab Wounds May be Harmful to Monkeys

A couple of items of interest with some nonzero potential. A grad student in our group presented today about a new imaging technique that’s been in the literature lately called compressed sensing. Essentially a full-resolution nearly lossless image compression technique much like the ones with which we are familiar. But that’s not what makes it special. Like its name implies, compressed sensing occurs at the image acquisition stage, meaning only as much data is taken as necessary to resolve the pixels most essential to the integrity of the image. As a result this technique works best with pictures with good contrast, some structure and complexity.

Back to the beginning. Why compressed sensing? If you’re like me with a great 10 megapixel digital camera and a lousy SD card, image compression I can imagine is a perennial problem for you. As it is for me. There are a lot of pretty standard ways of saving space by doing some basic compression right in the camera before data storage. Some of these methods are even on-chip methods, involving perhaps summing over nxn pixel arrays during readout and thereby reducing the resolution of the image. 2×2 binning is common, for instance. But now, instead of 10 million pixels, you have 2.5. Oh well. Other methods might involve some kind of averaging function which in essence smudge out the details of your image on a per pixel level, so you’ll still have 10 million pixels, but lots of the neighboring ones are the same, removing those great, crisp lines that can make a photograph look so lifelike. This isn’t really much better. Anyways, if data acquisition time and data storage room is not a problem for you, then just take the full image. Great, you have no need for this technique. But, if you could have lossless compression without losing resolution, why wouldn’t you? Let me describe it in a general way (the only way I know how).

The CCD sensor of a digital camera is a grid of photo-sensitive pixels. An image is focused onto the CCD lighting up some areas of the sensor and leaving other areas dark. This information is recorded. The grid of CCD pixels outputs a matrix of values corresponding to how many photons each pixel received during the time of exposure. These values are stored on a memory device and interpreted by software as an image.

Now imagine, instead of exposing the full grid of CCD pixels, we expose only a randomly chosen subset of those pixels at a time, and sum their values. We do this $m$ times and record the result in a vector $\vec{y}$. If you think we’re losing information by doing this, well, clearly, we are, otherwise this would be a pointless exercise. But the point of compression is to recover the most essential bits and smooth out the rest. It’s quite amazing, the end result is often indistinguishable from the original. But first, some math. Because I just realized that wordpress supports latex. And that, is huge.

We can label the N total pixels in order from 1 to N and seek a one-dimensional array of values. In this basis, we’ll call the original vector, with all the raw pixel-by-pixel data, $\vec{x}$. This we will try to reconstruct in as few steps as possible. The operation of choosing a random subset of pixels and adding up their values can be expressed by the equation $\vec{q_i} \cdot \vec{x} = y_i$ for $i = 1, 2, m$. This is just a simple inner product between two vectors. Here, $\vec{q_i}$ are binary valued vectors of length N (same length as $\vec{x}$) which represent the randomly chosen open apertures. Instead of working with $m$ equations each with index $i$, we can consolidate the statement into $\widetilde{Q} \cdot \vec{x} = \vec{y}$. Where $\widetilde{Q}$ is a matrix of 0’s and 1’s and each of the $m$ rows of $\widetilde{Q}$ is one of the randomly determined configurations $\vec{q_i}$. $\vec{y}$ is equally well-determined on account of it being what is actually measured. $\vec{x}$ is the only unknown and it is what we seek. Now, if we were to set $m=N$, and take as many samples as there are pixels in the camera. This problem would be perfectly constrained, and we would be able to solve for each pixel exactly. That would be a horrible waste of time, however, as we’ve merely made it very difficult for ourselves to have in the end accomplished no kind of compression at all. The point is to use as small an $m$ as possible, or as few samples as possible, to still reach an extremely good approximation of the original image in areas of detail, while filling in zero or near-zero values for areas of void.

It may be obvious to you that with $m < N$, there will be multiple ways to choose $\vec{x}$ and still satisfy the tensor equation above. The real insight here is offered by one Terence Tao. It is a mathematical method of selecting for the lowest noise configuration, that is, the sparsest, least complicated state which still satisfies the above constraint. This is pretty ridiculous. Think about it, the effect of undersampling the space ordinarily leaves us with no information about the exact value of any pixel. But counterintuitively, by randomizing our samples, and applying a rather straightforward minimization algorithm, the solution to the problem can be approximated with great precision. For a slightly more detailed explanation, see this document. Among other things it has a neat little comparison between conventional compression methods and compressive sensing (CS) methods. CS comes out pretty strong (but it has the advantage of some information from the other technique… you’ll see), but that’s not what interests me about the image that they used to test their algorithms:

That background… looks oddly familiar…

The camera man is apparently ubiquitous in the image processing world as a test shot. I tried to find information about its origins on google but, again, to no avail…

But, right, compressive sensing, aside from being pretty counterintuitive has one additional advantage, however. And this is quite interesting. One can imagine a single-pixel camera which spits out not an array of data, but a single data point, the sum of photon counts, with each exposure. Not super useful yet, but add a fine grid which can selectively block out light, and you have on your hands on-camera loss-less compression. Certainly, because not all the pixels are active at all times, this would increase the length of exposure by how much, I don’t know, but I can only encourage this kind of thinking which could eventually make my life just that much better.

I’ll leave you with an interesting study in the news about primates with possibly far-reaching ramifications.