In mid-January, the ongoing race for AI put Montreal-based Maluuba on our radar. Microsoft acquired the startup and its team of researchers to build better machine intelligence tools for analyzing unstructured text to enable more natural human computer interaction — think bots that can actually respond with reasonable intelligence to a text you send. The team dropped its first paper since being acquired and it sheds light on what the group’s priorities are.
The paper outlines a method for multi-advisor reinforcement learning that breaks problems down to be simpler and more easily computable. In oversimplified terms, Maluuba is effectively trying to teach leadership to groups of machines working to solve problems.
Existing conversational interfaces are rigid and easily broken. Siri, Alexa and Cortana are miles ahead of old-fashioned dialog trees, but they still are a far cry from generalized intelligence. From a computational standpoint, a complete model of the world would be infeasible to create so instead engineers create specialized machine intelligence tools that can preform well on a smaller number of tasks. This is why you can ask Siri to make a phone call but can’t ask it to organize a large dinner event.
A lot of attention is being given to reinforcement learning, a specialized branch of machine learning. As I have explained previously, reinforcement learning steals the idea of utility from economists in an effort to quantify and iteratively evaluate decision making. Instead of explicitly telling an autonomous car every rule of the road, it can be more effective to gamify the problem and assign figurative “points” that the intelligent system can optimize. The system could hypothetically lose points for driving over a double yellow line and gain them for maintaining the speed limit.
This allows for a much more adaptable system, but it unfortunately is still a rather complex problem requiring a lot of compute. This is where multi-advisor reinforcement learning comes in.
The Maluuba team is trying to solve these complexity problems that face reinforcement learning. Their approach is to use multiple “advisors” to break the problem down into smaller, more digestible, chunks. Traditionally, a single virtual agent is used for reinforcement learning but in recent years multi-agent approaches have become more common.
In a conversation, the group presented the example of an intelligent scheduling assistant. Rather than have a single agent learn to schedule every kind of optimal meeting, it could someday make sense to assign a different agent to different classes of meetings. The challenge is getting all these agents to work together in consonance.
Intuitively it’s easy to imagine these agents as humans splitting up a task. Getting people to work together efficiently is no small task even though a divide and conquer strategy can outperform the lone wolf mentality.
The solution is to have an aggregator sit on top of all the “advisors” to make a decision. Each advisor in Maluuba’s paper has a different focus with respect to the grand problem being solved. Each agent gets a different reward for the action it specializes in. If agents take different positions, the aggregator steps in and arbitrates.
Maluuba used a simplified version of Ms. Pac-Man, called Pac-Boy to test different methods for it multi-advisor reinforcement aggregating learning framework. The team wants to study the process of breaking down problems. Ideally there is some universality in how problems can be organized around a number of optimal aggregators. This is another place where it’s interesting to think about how humans decompose problems, often inefficiently — think leadership 101 for machines.