Microsoft Steps Up Data Platform and AI Ambitions

Microsoft Steps Up Data Platform and AI Ambitions

Microsoft unveils big-data-capable SQL Server 2019 and extended AI capabilities to power data-driven innovation. Here’s my analysis.  

Microsoft CEO Satya Nadella set the tone at the September 24-27 Ignite events in Orlando by sharing at least half a dozen stories of leading companies innovating and pioneering new business models with the aid of artificial intelligence (AI). It was a crisp, one-hour presentation long on vision and surprisingly short on promotion or even mentions of the significant technology announcements that followed.

Nadella warned the more than 30,000 attendees that the ability to innovate and drive new business models is as much or more about changing corporate cultures and business processes as it is about applying technology. And when the technology decisions are ready to be made, Nadella counselled executives to know which capabilities are commodities and which warrant custom development to drive differentiation. Nadella said he sees smart companies liberating data silos and moving the bulk of enterprise workloads and innovation initiatives to “the intelligent cloud” and the “intelligent edge.”

From the perspective of my data-to-decisions research, the standout tech announcements that followed Nadella's keynote included the public preview release of Microsoft SQL Server 2019, a string of Auto ML and artificial intelligence (AI) enhancements, and new analytical capabilities for big and small data. Here’s a look at what’s available, what’s coming and how they'll stand up.

Introduced in public preview at Ignite, Microsoft SQL Server 2019 is an ambitious step forward from mere database to unified data platform. Those who wish to keep using SQL Server as a conventional database deployed in conventional ways will be able to do just that, but the release promises so much more. For starters, SQL Server 2019 is built to deploy and scale out using Kubernetes, so it’s ready to support whatever evolving, elastic hybrid deployment approach required.

The next innovation is big data cluster software (see image below) that puts both SQL Server and Apache Spark query and processing engines on top of HDFS (Hadoop Distributed File System) data nodes. The architecture also supports high-scale, columnar data marts said by Microsoft execs to beat Impala and other database-on-HDFS options on query performance.

Microsoft’s Polybase technology, previously used mostly to tap into HDFS, will be used by SQL Server 2019 to tap into external sources such as Oracle, Teradata, MongoDB, PostgreSQL and IoT sources without moving data. As shown below, the architecture brings together the worlds of SQL and big data on a single platform. A new Azure Data Studio user interface, also announced it Ignite, will support both SQL-based querying and a notebook-style experience for data scientists.

Microsoft SQL Server 2019 deploys on Kubernetes and includes a big data cluster architecture that unifies the worlds of relational and big data analysis.

Data scientists and progressive data analysts and data power users have been blending structured, variably structured and unstructured data for years. In SQL Server 2019, all data, including highly structured data as well as raw log files, social streams, JSON data from mobile apps, web clickstreams and more can reside on one platform with elastic and scalable deployment capabilities and myriad data analysis possibilities. As noted, Microsoft SQL Server 2019 will still be deployable and usable as a conventional database. The heart of the database management system is all in the head node. But Microsoft expects visionary companies to also deploy the SQL Server 2019 big data cluster, included software that won’t be a separate SKU or extra-cost option.

MyPOV on Microsoft SQL Server 2019. This is a compelling unification of the SQL and big data worlds, but as a practical matter, many if not most enterprises have already re-platformed with some combination of Hadoop and Spark capacity. Many if not most also have a mix of cloud-based capacity including object storage. Finally, Microsoft’s rivals ranging from Oracle, Teradata and IBM to Cloudera, Hortonworks and MapR have also been working on blending, streamlining and cloud-enabling relational and big data capabilities.

I’m sure Microsoft SQL Server 2019 will keep current SQL Server customers on the upgrade path, but the Big Data Cluster capabilities are likely to get a mixed response. The unification will be very appealing In greenfield situations, but where there are legacy deployments, existing workloads and employee familiarity to consider, Microsoft will have to come up with very practical and cost-competitive migration options that avoid rip and replace. One Microsoft exec suggested existing HDFS instances could be brought under Microsoft management without having to move or recreate the clusters, but I suspect it won’t be that simple.

In short, I see SQL Server 2019 as driving greenfield, day-forward or gradual migrations to its unified platform, not immediate replacements. That’s fine. The platform won’t be generally available until next year and would-be customers will need time to plan their future data platform strategies.

AI figured in many of the customer innovation stories that Nadella shared, whether it was BMW’s Azure-powered Intelligent Personal Assistant, Buhler adding machine vision to  its grain-processing equipment to spot and extract foreign matter before it enters the food supply, or real estate services firm CBRE using Azure IoT to optimize energy usage and Azure Digital Twins to simulate and better manage office space. Microsoft made announcements on all five levels what it describes as its AI stack, shown below.

Microsoft is investing in machine learning and AI at five levels.

Cognitive Services: At the top are Microsoft’s Azure Cognitive Services, where the company announced the general availability of Speech services including speech-to-text, text-to speech and translation. It also delivered the ability to customize models by training against your own industry- or company-specific language and terms. MyPOV: These are the sort of services that Nadella alluded to as commoditized. Indeed, AWS, Microsoft and Google have all delivered vision, speech, language and translation services and the ability to do custom training, but I’d say Google has performance advantages on many of these core services.

Frameworks and Languages: Microsoft is beefing up support for Python and extending its work with Facebook, Nvidia and other partners on ONNX (the Open Neural Network Exchange) to encourage framework diversity and usability. MyPOV: It was interesting to see Google add support for Scikit-Learn and XGBoost at this year’s Google Next event, which was a win for framework diversity beyond TensorFlow. I like the objectives of ONNX and laud Microsoft for promoting openness, modeling efficiency and choice.

Services: At this level the big news was the introduction of automated machine learning and hyper parameter tuning. These are must-have features if data science is to be made accessible to developers, power-user analysts and aspiring data scientists. Azure ML also gained a Python SDK, another acknowledgement of the rise of this language. MyPOV: Microsoft is in a close horse race with AWS and Google, both of which have introduced similar automation and hyper parameter tuning features.

Infrastructure: Microsoft offers GPU capacity, but it’s alone among its cloud rivals in heavily pushing FPGA (field programmable gate arrays) for accelerated inferencing. Here it added support for five more deep learning algorithms (beyond the few previously supported) for image classification and recognition. MyPOV: Microsoft is touting cost and power-consumption advantages over GPUs in the inferencing role, but the evidence I’ve seen is in whitepapers and lab benchmarks. I’m waiting to see more real-world adoption and production success stories that prove these claimed advantages.

Deployment: At the deployment level, Microsoft has a particularly strong strategy around IoT and Edge deployment, and it’s clearly gaining adoption. WalMart’s CIO, Clay Johnson, shared details at Ignite on retailer’s massive IoT deployment aimed at optimizing energy usage across its 10,000-plus stores worldwide. MyPOV: Azure Sphere was a brilliant move to get Azure security and services built right into the microcontrollers that power sensors and devices. Azure IoT Edge is a comprehensive environment that supports deployment of models and gathering of intelligence and predictions from the edge. There are plenty of real-world customer success stories.

Other interesting data-related announcements from Ignite included the coming integration of Azure Cognitive Services and Automated ML into Microsoft Power BI, which fits with my research and predictions early this year on “How ML and AI Will Change BI and Analytics.” The company also introduced Azure Data Explorer, a powerful ad hoc discovery and visual analysis cloud service for big data discovery. Data Explorer looks like a handy, scalable service offering fast, big-data querying through a SQL-like intuitive language that can handle structured and variably structured data including log files, clickstreams and JSON data.

Microsoft also made a splashy Open Data Initiative announcement with the CEOs of SAP and Adobe joining Nadella on stage. Some cynically read this as an effort to steal thunder from Salesforce Dreamforce event, which followed Ignite by just one day. I’ll take partners Microsoft, SAP and Adobe at their word that they’re trying to help customers by developing and promoting open data-sharing standards and data schemas that could be used by other software providers (including, say, Salesforce and Oracle). It’s very early days however, as executives acknowledged, so we’ll have to see if this bears fruit. We’ll know whether it’s real if we hear about it again next year or three years down the road.

Related Reading: Google Next: A Deeper Dive on AI and Machine Learning AdvancesCloudera Transitions, Doubles Down on Data Science, Analytics and CloudAmazon Web Services Adds Yet More Data and ML Services, But When is Enough Enough?

Images Powered by Shutterstock