AI and Distributed Systems

June 11, 2014

My sister shared this article with me earlier today, and I felt it raised a lot of worthwhile points. It is a somewhat thoughtful response to the announcement that a computer had, earlier this week, passed the Turing Test. In general, however, I felt that it missed a bigger-picture analysis of how AI will impact the future of computing and our daily interaction with various devices, and how information systems architecture will play a critical role in doing so.

As my sister put it “parallel processing and hierarchical connections are obviously the best ways of functioning” and she didn’t “see why a ‘move to brain-based computing’” is such a huge paradigm shift.

I have to agree with her view on processing systems, and I believe that is the reason cloud computing and distributed systems are such a big deal. It’s why AMQP-based services are so popular. Essentially, you have a system of what are called consumers and producers, where consumers ask for information, send out their requests to a router, and the router works as a load-balancer to distribute requests to producers. Once the producer receives the message they send an acknowledgement back to the router, which proceeds to push the acknowledgement back out to the consumers. Once the initial information request is completed, the producer sends the data back out to the router, and the consumer receives the message and sends an acknowledgement for the router. The producer, as soon as it becomes available, sends the router a message to let it know that it can handle more requests.

Neither consumers nor producers are aware of each other-they rely on declared queue names, service names, and then generated key-pairs that allow the router to know where to send what information. Producers and consumers keep a “listening” channel open, where they look for matching keys to know when a certain payload has arrived.

The great thing is that by design, the AMQP system is load-balancing to reduce bottlenecks, and there is a persistence layer that, in the case that a message is unacknowledged or a request fails, the system automatically knows to queue another request or resend the information.

Looking at the neurogrid mentioned in the Wired article, the issues are not in that parallel processing hasn’t been possible: you can look at the wiki page and see that the disadvantages are exactly what the neurogrid deals with: minimizing power requirements. So it’s not that it’s not possible and hasn’t been possible, it has just been very expensive. It’s the sinking cost that is changing the shift. You have to remember, just like in the brain, there is no such thing as true parallelism. You can’t think the same thought at the same time in different perspectives, and typing, reading, and writing this email at the same time requires my brain power to be split up.

To make it analogous I’m going to use a thought experiment. If we were to use an AMQP distributed system to simulate the process of a robot writing this email, I would have a processing cluster that deals with the typing, one that deals with the reading, and one that deals with the cognitive process to pump out new content (I’m going to ignore how the information is distributed, because it is irrelevant and you can figure it out applying the producer-consumer paradigm. If you don’t understand it, just think of it as a black box, and continue reading). Each of these clusters get the same input stream, and then “listen” to different portions of it to get what they need. One queue deals with the motor information on how much to raise and lower my fingers on the keyboard, another deals with handling the visual stream to optically recognize the characters, and then yet another portion of me deals with putting the thoughts together to actually get content out.

The motor skills required to write this can be broken down by fingers. Let’s say one process deals with each finger, and I give timing instructions as to when to press down on the key so that the words form in the right order. I can send instructions to each finger simultaneously, because a different processor is in charge of each. Basically, there is one stream of data, no parallelism, and then the concurrent use of the stream by each processor to complete a single task which forms a larger function is what defines the parallelism. Then, there is the task of optical character recognition (OCR). It works the same way, I can read a word, have a processor used to recognize each character, but then there is a final non-concurrent element, and that is reading back the word. I’m not going to go into the cognitive portion of this thought experiment that relates to the actual content that I am writing, because there is too much going on there for me to account for it.

However, obviously some of these tasks take longer than others. The OCR is slow, because I have to process each character from a visual. I could even add further parallelism, by breaking down the information into 5 square pixel areas. The OCR would be in charge of tracing the letter outline in that area, reducing the information needed to put together the character. If each pixel contained 30kb of data, but I only need to know 1) the length of the segment, and 2) the path it takes, I could encode the information as an integral from pixel 1 to 5, and reduce the amount of stored data to 2kb. If a processor is working on each 5 square pixel area, and there are 50 pixels2, parallelism improves the time required to process all the information by over ten orders of magnitude. Then, a final non-concurrent process adds all the integrals together, applies a function that matches the value of that integral against whatever the character and font are registered as in its “learning” data-set using an algorithm, and with a certain percentage accuracy recognizes the character. Again, you can think about how multiple cores and a distributed system can make this whole task more efficient.

Let’s say that means that MichalBot v.1 takes 100ms to process all the incoming visuals (reading), and MichalBot v.2 takes 10ms to do so. Since v.1 needs more time to process the visuals, but there is interdependence of all three tasks in order for me to fluidly compose this email, it looks like MichalBot v.1 is…err…a bit slow in the “head” since he just kind of hangs there waiting for the information to be processed before his next decision can be made. Since MichalBot v.2 has more computing power across which to distribute the reading tasks, he operates more smoothly.

Unfortunately, getting Michal to be faster is very computationally expensive, and in turn, financially expensive. So that’s why the neurogrid is great: it’s power efficient, faster, and therefore cheaper in the traditional sense because I consume less energy. But, that’s where we can throw in another paradigm: cloud computing. Even though neurogrid will be too expensive to put into in every single device, the alternative of using paired neurogrid powered computers, connected in distributed networks built on AMQP, we can transmit requests and responses over the internet. Having it available as a cloud platform that devices query to, that is the real paradigm shift.

P.S. think about what it would mean if our brains actually work on a quantum level, and we applied this to traditional computing.