There is no reason beyond doubt that the future of AI is on the cloud. Cloud along with data fueling knowledge of the business brings in a new degree of accessibility to AI technology
Why cloud in general for AI?
Speed — Easy availability of specialized device like (GPU/TPU) that can help accelerate AI development.
Cloud AI API’s — Quick jump start into complex activities rather build from scratch. For cases like speech to text or language translation, enterprise as well might lack data to build models with high accuracy as available in the cloud
Cloud AutoML — Train high-quality models specific to business needs with a citizen data scientist or even by business users
Cloud Bursting — With advances in Hybrid Cloud, start small in the local data center and use the cloud to scale AI compute
In this article, we are going to focus on AI and related services offered by Google cloud platform. Let us start by looking at google cloud AI building blocks post a recent announcement in Next’19
Some of the key new additions to Google Cloud Platform AI capability was the introduction of AI platform that enables seamless creation of End to End machine learning pipeline, AutoML tables to automatically build and deploy machine learning models on structured data and support for new ML algorithms part of Big Query ML
Below is a summary of AI capability added or enhanced as part of Google next 19 announcement
AI development and Training is a relatively small fraction of the entire end to end Machine Learning life cycle. Data Ingestion, Data Engineering, Feature Engineering, Data Analysis and Validation, Feature Engineering, Model performance monitoring and deploying the model is where a typical Data Engineer + Data Scientist spend 90+% of the time.
While this article is focused on AI capability let us quickly check on how different but integrated Google services makes an end to end ML possible
An interesting aspect of these services is on how well they integrate with each other to create a seamless pipeline. One example is Tensorflow Transform, which uses full passes on input data during model training and is exported as Tensorflow graph to do prediction on single instances during serving. This prevents from training serving skew as same transformations are applied during both stages
Let us now discuss on GCP key capabilities recently announced in google NEXT
AutoML Tables enables an entire team of data scientists, analysts, and developers to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale. Every aspect of ML is really automated starting from
Below diagram summarizes the simplicity of AutoML tables. Once you have your model dataset most of the activity is UI guided with minimal or no coding
AutoML tables also support automated feature engineering for most data types
Currently, it runs models for below algorithms against input dataset based on selected configuration parameters
Based on the complexity of data it might also run Neural + Tree Architecture Search
BigQuery ML brings ML to the data. Models are trained and accessed in BigQuery using SQL. BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets.
While using AutoML tables no knowledge of ML is required with BigQuery ML basic understanding of ML is essential.
A nice illustration of different ML capability along with user personas. Note Cloud ML Engine is now called AI Platform training
Model performance and metrics can be tracked using BigQuery UI. UI provides details on confusion matrix, ROC curve, precision/recall matrix among others
And finally, most of the below happens behind the scenes during the 3 step model creation
Below is the algorithm support and road map as highlighted in Google Next’19
AI Platform provides seamless creation of end to end ML pipeline starting from ingesting data to preparing, discovering, training and deploying ML models. Below images summarize AI Platform end to end process
AI Platform comes with managed notebook instance which is integrated with BigQuery, Cloud Dataproc, and Cloud Dataflow, making it easy to go from data ingestion to pre-processing and exploration, and eventually model training and deployment
AI Platform supports Kubeflow, that lets you build portable ML pipelines that you can run on-premises or on Google Cloud without significant code changes. Below is the services available as part of AI Platform that helps build an end to end machine learning pipeline
One will also have access to AI technology like TensorFlow and Tensorflow Extended (TFX) tools as you deploy your AI applications to production. In case if you want to know more details on TFX check my multi-part series on this topic
Keep watching for future series of TFX...
There were few other announcements in AI space. I will give a quick rundown of key announcements
AutoML Natural Language — Custom entity extraction lets you identify custom fields from the input text
Document understanding AI enables companies to digitize, classify, and extract knowledge. It also helps to organize and store knowledge graphs and other extracted data for easy search, query, consumption, and actionable insights
Nice representation of document understanding AI solution architecture is below
Few other products where there was new enhancements or capabilities are highlighted below. You can check the references section below to get more information on newly added features