Apple’s annual developer conference kicked off yesterday with a sprawling, much-anticipated keynote. The talk, as well as the follow-up “State of the Platform” presentation, included a number of interesting announcements and news about Apple’s machine learning efforts.
It’s particularly interesting to compare these announcements with those Google made at I/O, which preceded WWDC by a month. Google has built their fortunes on cloud computing and large-scale application of machine learning. Accordingly, many of their announcements centered on computationally expensive initiatives, like using ML itself to build new ML models and building huge arrays of GPU-enabled machines to provide tremendous amount of processing power. Apple’s success over the past decade has been dominated by the iPhone, and their announcements have more of a focus on creating and running ML models quickly and efficiently.
First off, Apple announced CreateML. Like Google’s MLKit (which you might be forgiven for thinking was an Apple product based on the name), CreateML provides high level, task-specific APIs for creating and training machine learning models. It includes functionality for image classifiers, text classifiers, and generic classifications and regressions.
A couple of interesting distinctives here: because of Apple’s emphasis on privacy, its ML architecture is designed to be run on-device, rather than in the cloud. CreateML models are no exception. And because some of them are solving known problems, they can be more efficient than is the case for generic solutions.
Consider the Image Classification task of recognizing types of flowers. The traditional way to train a model would be to create a convolutional neural network, initialize it with random values, and keep feeding it images until it learns to pull out a variety of features from the images, and then to figure out which features are relevant to the specific flowers we’re trying to identify. In this case, a lot of the most computationally expensive work happens when recognizing the features — once the features are known, picking out those specific to a rose is comparatively easy.
CreateML, on the other hand, uses transfer learning, which allows a developer to train a flower-recognition task much more quickly and with less data. This is because the image classification model it uses has been pre-trained on general computer vision tasks, and already knows about feature extraction. In effect, you don’t have to teach the model how to see — it already knows that. You have only to teach it what flowers look like. This makes for both quicker, more efficient training and smaller models.
CreateML also provides some nice integration with Playgrounds, its Jupyter notebook-like environment for writing and experimenting with code iteratively. Developers can drag in images to feed to the Image Classifier there with a nice GUI that makes it even simpler to train a model without having to worry about reading in the data from disk.
Apple continues to emphasize efficient use of ML models. Because it controls both the hardware and software, Apple’s platforms have some unique advantages for using ML in efficient ways. Metal allows iOS and MacOS to take advantage of GPUs where they’re available. CoreML provides a lingua franca for delivering and using trained models from all of the various network training systems. And the A11 bionic chip in the iPhone X even has two cores that are designed for and dedicated solely to running ML networks.
Speaking of CoreML, Apple also announced CoreML 2. Notably, Core ML 2 achieves faster speeds for running models using batch prediction. It can also reduce the size of trained models substantially by quantizing those models, reducing precision in places where it doesn’t effect the network’s output to do so.
The first day of WWDC has left no doubt that Apple is very serious about applying machine learning throughout their product line, both to make its operating systems more capable and to allow developers to take efficient advantage of the technology. With both Apple and Google introducing progressively higher-level and more performant tools, the doors are now open for those who don’t want to devote weeks of study to take advantage of the tech.
Singularity Hub brings news that Google researchers, while training an DeepMind neural network to navigate a maze, noticed structures in the network that mirror biological “Grid Cells”:
Grid cells were the subject of the 2014 Nobel Prize in Physiology or Medicine, alongside other navigation-related neurons. These cells are arranged in a lattice of hexagons, and the brain effectively overlays this pattern onto its environment. Whenever the animal crosses a point in space represented by one of these hexagons, a neuron fires, allowing the animal to track its movement.
[Researchers] found that, after training, patterns of activity that looked very similar to grid cells spontaneously appeared in one of the layers of the neural network. The researchers hadn’t programmed the model to exhibit this behavior.
It’s fascinating to see this sort of convergent evolution in action: both biological and synthetic systems developing similar structures to solve a given problem. Even more compelling is the fact that, when the researchers prevented the grid cells from forming, the network’s ability to navigate mazes successfully was sharply reduced, confirming the importance of these particular structures for spatial tasks.
Since the Google Duplex demos hit the web, numerous commentators have expressed concern about the ethics of having a computer system making calls where it passes itself off as human. Google claims this will not be a problem when the system actually goes into production, as Duplex will identify itself:
In a statement, a Google spokesperson said “transparency” is a key part of the technology and: “We are designing this feature with disclosure built-in, and we’ll make sure the system is appropriately identified.” We’ll see what that looks like when the Duplex technology rolls out in testing as a part of Google Assistant this summer.
Of course, even if Google has a strict policy on this, other tech companies likely won’t be far behind implementing similar solutions. Google itself is likely to make key functionality available through APIs at some point, which will enable others to build systems with looser controls.
Thus, while respectable companies and businesses may take the high road, it’s nearly inevitable that we’ll eventually see robots taking over calling duties for politicians and phone scammers, as paying for computer time becomes cheaper and more effective than having a human being make those calls.
The flip side of this will be that we’ll soon after end up using the same tech to answer our phones for us. (Duplex’s name itself hints at this.) In the same way we rely on spam filters to keep the garbage out of our email boxes today, we will depend on our robot administrative assistants to deflect callers with bad intent before they ever get to us. The same restaurants that Duplex called to make reservations will have Duplex answering the phone and making reservations without the caller ever getting a busy signal, being put on hold, or accidentally getting hung up on.
There will be some growing pains that result from Duplex, but good policies about self-identification will mitigate them in the short term, and using the same tech on the other end of the phone will provide a solid long-term solution that will ultimately improve the experience of making or receiving a phone call.
The lengths to which some engineers will go to avoid having to have interactions with other people is legendary among certain social circles. Over on its AI blog, Google recently announced Google Duplex, a system for making reservations, doctor’s appointments, or enquiring about other business information over the phone — a project that is not only a remarkable technical achievement, but also a tremendous enabler for these sort of introvert engineers.
Today we announce Google Duplex, a new technology for conducting natural conversations to carry out “real world” tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine.
The article is worth reading for the details on the system and how it works. But the truly jaw-dropping part of the post is the audio samples. In these, we get to eavesdrop on the system making a variety of appointments, fielding ambiguous questions, difficult pronunciation, and unexpected constraints. It’s truly an AI tour de force, and clears the bar for a constrained-problem Turing test easily — none of the people involved in the conversations seem to have any idea that they’re not talking with another human being.
One of the notable features of the system is how it uses “ummm” and “ahh” interjections both to create a more natural impression and to cover processing delays. (Some critics have posited that this is actually deceptive, as it’s designed to make the system seem human.) The combination of a recurrent neural network to help maintain context with multiple text-to-speech engines creates an impressively organic conversation. And Google seems well on its way to making this a back-end functionality for Google Assistant — you will be able to ask Google assistant to make dinner reservations for you, and it will take care of calling the restaurant in the background to set it up, and then add it to your calendar.
If our new robot overlords spare us painful phone calls, then I, for one, welcome them.
At Google I/O yesterday, the company announced ML Kit, a new Firebase API for using Machine Learning capabilities on mobile apps.
Getting started with machine learning can be difficult for many developers. Typically, new ML developers spend countless hours learning the intricacies of implementing low-level models, using frameworks, and more. Even for the seasoned expert, adapting and optimizing models to run on mobile devices can be a huge undertaking. Beyond the machine learning complexities, sourcing training data can be an expensive and time consuming process, especially when considering a global audience.
With ML Kit, you can use machine learning to build compelling features, on Android and iOS, regardless of your machine learning expertise.
The tool looks super-useful for developers, bringing with it specific APIs for Text Recognition, Face Detection, Barcode Scanning, Image Labeling, and Landmark recognition. (It’s worth noting that Apple and Google already provide some of these capabilities through other APIs on their respective operating systems, but ML Kit adds additional capabilities and does so with a common API across Firebase’s supported platforms.)
ML Kit’s models can be used in local, device-only mode, or can bring the power of Firebase’s cloud resources to bear for better accuracy or to avoid embedding a fixed model in an app. And if the built-in use cases don’t address your need, the tool supports using custom TensorFlow Lite models developers can build themselves or download from other sources.
Singularity Hub brings news of a new development on the self-driving vehicle front. While existing efforts have required careful mapping in advance with humans tagging items like stop signs, lane markers, buildings, etc., the MapLite system, developed by a trio of students at MIT’s Computer Science and Artificial Intelligence Laboratory, can successfully navigate rural roads for which it has no prior information:
In tests on unmarked country roads in in Devens, Massachusetts the researchers were able to get the car to navigate along a 1 kilometer (0.6 mile) stretch without human intervention. Their results will be presented at the International Conference on Robotics and Automation (ICRA) in Brisbane, Australia, at the end of this month.
Since current autonomous vehicles can only operate where the advance survey and mapping has been done, MapLite will stand to substantially expand the areas where they can operate.
One of the most interesting and difficult disciplines in Machine Learning is designing learning networks or models. TensorFlow and other low-level tools provide immense power, but aren’t very approachable for people who haven’t spent a good deal of time digging into and understanding their intricacies. On the other hand, there are a number of prebuilt models becoming available for download and use, but they tend to be very focused on a single purpose, such as object recognition or text classification.
Enter Lobe. Lobe not only provides a visual development environment for building custom machine learning models, but also does some excellent visualization at each step whenever possible to help the model designer intuitively grasp what is happening at that layer. It can accept visual, auditory, or numeric inputs, and has a wealth of examples that new users can learn from. In the company’s own words:
Building deep learning models can be a slow and complicated process. It’s hard to figure out how to get started, there’s a lot of technical language to learn, and even once you’re all set up, it’s hard to visualize and understand what you’re doing. Most people don’t stand a chance. That’s why we created Lobe, to give more people from diverse backgrounds and disciplines the ability to invent with deep learning and enable the next wave of intelligent products and experiences.
Their demo video makes an even more compelling case, and shows several of the examples in action.
Lobe looks like a really promising way to make machine learning more approachable for a far wider range of people. The product is currently in beta, so if you’re interested in getting a crack at it as soon as possible, get your name in now.
Welcome to ML Nexus! I’m Sean McMains, the editor of this weblog. I’ve been a developer for decades, and have, over the past year, begun an informal study of Machine Learning and where it can be applied to my work and beyond. After all of my years banging out code, I’m fascinated that we can now have computers begin to work out on their own what logic to apply to a variety of problems, with no guidance other than “yes, that’s a good guess” or “nope, that’s dead wrong.”
As I continue to learn more about this field, I thought it would be fun (for me, but I hope for you too) and helpful (ditto) to share what I’m reading about it along with a bit of commentary and perspective. I’m not yet expert, but have a decent grasp on the high-level concepts. I think having that understanding while still remembering what it’s like not to know much at all will help make conversations here interesting both for developers who are wanting to add ML to their bag of programmer tricks, as well as far laypeople who are just interested in the technology and what it can do for us.
So, I’m glad you’re here, and hope you’ll come back to visit often, subscribe to the RSS feed, or follow along on Twitter. Thanks for being part of this little adventure!