Sneak peak at the new gotcha! homepage!See More arrow right

Reshaping Business With Artificial Intelligence

Reshaping Business With Artificial Intelligence

Perhaps the most telling difference among the four maturity clusters is in their understanding of the critical interdependence between data and AI algorithms. Compared to Passives, Pioneers are 12 times more likely to understand the process for training algorithms, 10 times more likely to understand the development costs of AI-based products and services, and 8 times more likely to understand the data that’s needed for training AI algorithms. (See Figure 8.)

Most organizations represented in the survey have little understanding of the need to train AI algorithms on their data so they can recognize the sort of problem patterns that Airbus’s AI application revealed. Less than half of respondents said their organization understands the processes required to train algorithms or the data needs of algorithms.

Generating business value from AI is directly connected to effective training of AI algorithms. Many current AI applications start with one or more “naked” algorithms that become intelligent only upon being trained (predominantly on company-specific data). Successful training depends on having well-developed information systems that can pull together relevant training data. Many Pioneers already have robust data and analytics infrastructures along with a broad understanding of what it takes to develop the data for training AI algorithms. Investigators and Experimenters, by contrast, struggle because they have little analytics expertise and keep their data largely in silos, where it is difficult to integrate. While over half of Pioneer organizations invest significantly in data and training, organizations from the other maturity clusters invest substantially less. For example, only one-quarter of Investigators have made significant investments in AI technology, the data required to train AI algorithms, and processes to support that training.

Our research revealed several data-related misconceptions. One misunderstanding is that sophisticated AI algorithms alone can provide valuable business solutions without sufficient data. Jacob Spoelstra, director of data science at Microsoft, observes:

No amount of algorithmic sophistication will overcome a lack of data. This is particularly relevant as organizations work to use AI to advance the frontiers of their performance.

Some forms of data scarcity go unrecognized: Positive results alone may not be enough for training AI. Citrine Informatics, a materials-aware AI platform helping to accelerate product development, uses data from both published experiments (which are biased toward successful experiments) and unpublished experiments (which include failed experiments) through a large network of relationships with research institutions. “Negative data is almost never published, but the corpus of negative results is critical for building an unbiased database,” says Bryce Meredig, Citrine’s cofounder and chief science officer. This approach has allowed Citrine to cut R&D time in half for specific applications. W.L. Gore & Associates, Inc., developer of Gore-Tex waterproof fabric, similarly records both successful and unsuccessful results in its push to innovate; knowing what does not work helps it to know where to explore next.3

Sophisticated algorithms can sometimes overcome limited data if its quality is high, but bad data is simply paralyzing. Data collection and preparation are typically the most time-consuming activities in developing an AI-based application, much more so than selecting and tuning a model. As Airbus’ Evans says:

Pioneer organizations understand the value of their data infrastructure to fuel AI algorithms.

Additionally, companies sometimes erroneously believe that they already have access to the data they need to exploit AI. Data ownership is a vexing problem for managers across all industries. Some data is proprietary, and the organizations that own it may have little incentive to make it available to others. Other data is fragmented across data sources, requiring consolidation and agreements with multiple other organizations in order to get more complete information for training AI systems. In other cases, ownership of important data may be uncertain or contested. Getting business value from AI may be theoretically possible but pragmatically difficult.

Even if the organization owns the data it needs, fragmentation across multiple systems can hinder the process of training AI algorithms. Agus Sudjianto, executive vice president of corporate model risk at Wells Fargo & Co., puts it this way:

The need to train AI algorithms with appropriate data has wide-ranging implications for the traditional make-versus-buy decision that companies typically face with new technology investments. Generating value from AI is more complex than simply making or buying AI for a business process. Training AI algorithms involves a variety of skills, including understanding how to build algorithms, how to collect and integrate the relevant data for training purposes, and how to supervise the training of the algorithm. “We have to bring in people from different disciplines. And then, of course, we need the machine learning and AI people,” says Sudjianto. “Somebody who can lead that type of team holistically is very important.”

Pioneers rely heavily on developing internal skills through training or hiring. Organizations with less experience and understanding of AI put more emphasis on gaining access to outsourced AI-related skills, but this triggers some problems. (See Figure 9.)

The chief information officer of a large pharma company describes the products and services that AI vendors provide as “very young children.” The AI tech suppliers “require us to give them tons of information to allow them to learn,” he says, reflecting his frustration. “The amount of effort it takes to get the AI-based service to age 17, or 18, or 21 does not appear worth it yet. We believe the juice is not worth the squeeze.”

To be sure, for some support functions, such as IT management and payroll support, companies might choose to outsource the entire process (and pass along all of their data). Even if companies expect to rely largely on external support, they need their own people who know how to structure the problem, handle the data, and stay aware of evolving opportunities. “Five years ago, we would have leveraged labor arbitrage arrangements with large outsourcers to access lower cost human labor to do that work,” the pharma company CIO says. “What the vendors have done in the meantime is begin to automate those processes, oftentimes on our systems using our infrastructure, but using their technology. And I would not want it to be characterized as just rule-based. They actually have quite a bit more sophisticated heuristics to automate those things.” But such an approach is clearly not suited for companies’ distinctive offerings or core processes.

Eric Horvitz, director of Microsoft Research, argues that the tech sector is quickly catching up with the new model of offering technology tools to use with proprietary data, or “providing industry with toolsets, computation, and storage that helps to democratize AI.” Many AI algorithms and tools are already in the public domain, including Google’s TensorFlow, GitHub, and application programming interfaces from tech vendors. According to Horvitz:

The data and the algorithms constituting AI cannot simply be accurate and high performing; they also need to satisfy privacy concerns and meet regulatory requirements. Yet only half the respondents in our survey agree that their industries have established data privacy rules.

Ensuring data privacy depends on having strong data governance practices. Pioneers (73%) are far more likely to have good data governance practices than the Experimenters (34%) and Passives (30%). (See Figure 10.) This large gap represents another barrier for companies that are behind in developing their AI capabilities.

The data issues can be pronounced in heavily regulated industries such as insurance, which is shifting from a historic model based on risk pooling toward an approach that incorporates elements that predict specific risks. But some attributes are off limits. For example, while sex and religion factors could be used to predict some risks, they are unacceptable to regulators in some applications and jurisdictions.

Regulators in other financial markets also have stringent transparency requirements. As Wells Fargo’s Sudjianto says: “Models have to be very, very transparent and checked by the regulators all the time. When we choose not to use machine learning as the final model, it’s because regulatory requirements oftentimes demand solutions be less ‘black box’ and something the regulator can see very clearly. But we use machine learning algorithms to assess the model’s non-linear construction, variables and features entered, and as a benchmark for how well the traditional model performs.”

As technology races ahead of consumer expectations and preferences, companies and the public sector tread an increasingly thin line between their AI initiatives, privacy protections, and customer service. Some financial services providers are using voice-recognition technology to identify customers on the phone to save time verifying identity. Customers welcome rather than balk at this experience, in part because they value the service and trust the company not to misuse the capability or the data that enables it. Likewise, a technology vendor offers an AI-based service to help call center operators recognize when customers are getting frustrated, using real-time sentiment analysis of voice data. Less welcome applications may be on the horizon, however. In a few years, any of the 170 million installed cameras in China or the 50 million cameras in the U.S. will be able to recognize faces. In fact, jaywalkers in Shanghai can already be fined (or shamed) based on such images.4

Images Powered by Shutterstock