Google enters the contract extraction space!
On 10 April 2019 Google announced a beta version of a new product (or toolkit) called “Document Understanding AI” (“DUAI”).
Although details are limited, it seems like a play for the enterprise contract review, extraction analytics and automation space, including for legal.
In other words, the busy space currently occupied by Kira Systems, Seal, iManage Extract, Eigen Technologies, Luminance, e-Brevia and similar.
From the official press release and the marketing copy, DUAI sounds very much like incumbent point solutions in this space, including those named above.
How might it be similar to incumbent extraction providers?
For now, it’s all in the ad copy (so read it with a pinch of salt), which is all there is to go on while trial access opens up.
The most interesting (or perhaps least surprising) fact is that the copy uses more or less identical language to that used by incumbents in this space.
For example, DUAI copy includes:
- “Unlock the knowledge and insights hiding in your documents: Document Understanding AI uses machine learning on a scalable cloud-based platform to help your organization efficiently analyze documents. By automatically classifying, extracting, and enriching this information, Document Understanding AI can unlock insights and improve decision-making.”
- “Improve accuracy, governance and compliance: Lots of companies with large amounts of legacy documents go digital by having people manually enter the data, which often is a recipe for errors and redundancies. By automating and validating document workflows and archiving documents from multiple content sources into one cloud-based system, Document Understanding AI reduces these risks and ensures compliance.”
- “Turns insights into better decisions: Document Understanding AI enables you to take advantage of the facts, insights, relationships, knowledge graph representations, and predictions in your unstructured documents. These newfound insights will empower your company to make more educated, critical business decisions and improve your bottom line by unlocking the power and value hidden in your documents.”
If you click the links in the intro to this article re the incumbent product pages you’ll see much the same language. Spookily so in some cases.
Features
In terms of features, DUAI claims to offer the following:
- “Document and content management”, so possibly a threat to DMS / CMS providers?
- “Digital transaction management” for domains such as “contracts and real estate” — potentially a rival to legal A.I. tools in that space?
- “Clustering and classification and semantic question answering” — again, common to most legal A.I. tools.
- System integrations with existing solutions in finance, legal and healthcare for custom entity extraction and representation via knowledge graphing — again common to a lot of existing tools.
- Robotic Process Automation (“RPA”). Whether this is packaged together or another integration via API is unclear. If packaged, that could be a point of differentiation vs. incumbents, which typically have to be stitched to a third party RPA tool via APIs, usually through professional services than non-technical user facing connectors.
- Automated invoicing and expense mgmt. Not something the incumbents offer, but a logical interaction with adjacent uses and systems. May also suggest corporate / financial enterprise focus vs. law firm perhaps?
Partners
Partners already include Iron Mountain, DocuSign, UiPath, Accenture, Egnyte, Box and Taulia. Of note is that all are players in the legaltech and / or enterprise content management / processing space.
Unknowns
Many. In the coming months hopefully the following become clearer:
- The UI / UX. These types of systems are still finding the right trade-off between usability and functionality, which is super hard when trying to abstract ML and other complex concepts into something resembling a Lego kit of components that can intuitively be combined, trained and understood with confidence by a non-technical user who is expert in the domain problem these systems try to solve. This is because training the system must be undertaken by an SME, e.g. a lawyer if the use case is contract extraction. Making that workflow clear, confident and capable is no easy task.
- Is this a purely self-service vs. consultancy led system, i.e. will you be able to self-train data point extractions entirely using your own data and SME and / or will it be necessary to hire external or Google developers to help build nice applications? We suspect a combination given the existing Google ML tools in the text space.
- Will there be any OOTB pre-trained data point extractions? If so, what will they be, how many are there, and how well will they perform? Also, how were they trained and by whom? Can they be “topped up” with user-specific training, e.g. to tailor an assignment clause extractor to work specifically for a credit agreement specific version?
- Will the system be easier / worse to use than incumbents? As noted, usability and confidence in such systems remain a blocker to adoption without the need for lots of user training and experimentation.
- Will the system be interpretable, i.e. transparent in how it works but also why it reaches outcomes? Interpretability of A.I. systems is needed in the long term and not particularly well (or at all) addressed in most incumbent tools. Tied to this is whether it will be capable of auditing, particularly if intended for use cases re anything resembling a four eyes review process.
- Will it be faster and more accurate, in particular with regard to mining intra-clause data such as financial figures, e.g. rent clause vs. the £ / $ rent number? How does its precision, recall and F1 scoring fare, and to what extent will control regarding the capture, measurement and tracking of these metrics be available to advanced users? Again, this is usually absent or underdeveloped in incumbent products, but partly because it’s hard to trade off simplicity vs. utility.
- How will data / model ownership be managed? This will be crucial as many incumbents have had to tread carefully and not do, nor suggest, training of one system by one client is reused or reusable by the vendor for the benefit of itself or other competing clients.
- Is this part of the wider Google Cloud platform play for enterprise (other Google marketing material and press coverage coming out of the Cloud Next conference suggests so)? We’d think so given Google’s increasing enterprise penetration more generally.
- Licensing options — what will these be? How quickly / easily / cheaply will it be to trial?
- Use case specific features, e.g. for LIBOR repapering, lease extraction and so on, that are already being explored by incumbents. Will these emerge via initial clients or via Google’s product team?
- The interoperability with RPA tools (UiPath is a noted partner) and adjacent technologies, e.g. DocuSign, also a partner. How will this hang together and how self-service vs. custom will this be?
- Will there be integrations with incumbents, either extraction tools or DMS and CMS systems? (Or is the plan to replace them?)
Anyway, all of the above remains speculation until more details come out in the wash.
Impact
Hard to say as too early, but incredibly exciting.
Whilst we always take an anti-hype stance at lawtomated it’s hard not to get excited. For 18 months or more, we’ve debated competitor analyses of this space and singled out the likelihood of Google, Amazon or Microsoft sidestepping into this logical domain and use case given each’s push into enterprise applications surrounding unstructured data and text.
Amazon already has Amazon Comprehend for medical report extraction for instance. Likewise, Amazon, Microsoft and Google have each offered for some time open source tooling for use in these extraction, classification and search challenges but now it seems a move to productising for common problems might be what comes next for the big boys. Indeed some incumbent providers very likely use those open source technologies to different degrees.
Of the incumbent products, it’s hard to say whether their first-mover advantages will be a help or hindrance vs. the sheer scale and quality of Google’s technical team + leadership in A.I., including NLP applications. For one thing, Google’s original and biggest business — search — was a second mover play. Perhaps the same might apply here? But second mover doesn’t always win, even if you’re Google… who remembers / actually used Google+, the Google rival to Facebook?
As we’ve covered extensively at lawtomated, legal A.I. products for docs remain somewhat immature, often having some but not all features necessary to drive adoption at enterprise-wide scale, either within a law firm or within a legal ops function at a corporation or financial services organisation. It’s a shame because between the incumbents all the features are mostly there, just unevenly distributed between them.
That said, common to all such tools is needing the right quality and quantity of data in the right place from the outset. Google’s DUAI will be on a level playing field vs. incumbents in that regard as its wholly dependent on the buyer’s data and information systems, which are presupposed to be sub-optimal given DUAI, like other tools, seems positioned to solve that challenge.
Perhaps Google’s scale and second mover learning will allow them to knock out the missing features and combine them into a winning formula? Or perhaps Google will just buy an incumbent? That would be a huge story and perhaps an interesting extension of the huge growth in legaltech funding and the Law Society’s 2019 finding that the market is ripe for greater consolidation.
We’ll have to wait and see.
“What ifs” aside, a limiting factor for law firms will be that this is a cloud product. Whilst firms and their clients become more comfortable with the cloud it’s still a tricky sell for cloud tools. Perhaps the Google stamp of approval might, in the longer term, allow them easier headway vs. smaller providers relying on Google Cloud, AWS or Azure services given Google as the source of at least one of these rival cloud platforms.
That said, there is a huge (and potentially bigger need) for this type of solution in corporate legal for large organisations and banks given the sheer volume of manual lift and shift of unstructured data from contracts into 1+ reporting tools set up to capture only structured data. For organisations of that ilk already using Google Cloud tools, perhaps this is where Google intends to play first, which seems so based on the copy, including this line:
Lots of companies with large amounts of legacy documents go digital by having people manually enter the data, which often is a recipe for errors and redundancies. By automating and validating document workflows and archiving documents from multiple content sources into one cloud-based system, Document Understanding AI reduces these risks and ensures compliance
Equally, however, Google is not afraid to ditch products that don’t stick. There’s a large graveyard of products launched to fanfare only to be deprecated months or a few years thereafter.
All that said, we’re excited but watchful as to where this will go and what it will mean for law firms, legal users and incumbents in an already busy space.
As always, a solution — including DUAI — is only so good as an identified use case mapped to an underlying set of information about the people, process and problem to be solved.
But as these types of challenges are becoming increasingly defined (through organisations working with incumbent players) it may prove easier for Google to be targeted in its penetration, understanding of the customer and in turn its ability to tightly map this product to winning use cases that scale.
Watch this space!
Update: see also our follow-up article analysing the Google Cloud Next ’19 showcase demo of DUAI, including solution architecture, screenshots, and detailed use cases. Please see here for this article.
Originally published at lawtomated.