a closer look: using azure ai

CRAIG PILKENTON

Vice President, Technical Delivery

Cognitive Service for Langauge to Create a ‘golden record’ of Customers

 

During the course of helping clients implement optimized Extract, Transform, and Load (ETL) pipelines, our teams are often tasked with merging many disparate sources of customer data, along with more common business information records.  These specific initiatives usually involve working through many separate customer records from different systems within a company or external systems that then need to be merged together to create a single version of the truth.  These differences can be as simple as a missing character on a name field up to nickname variants of the first name that need to be evaluated for similarities.

 

The result of consolidating all accounts of a customer down to just one core item is called a golden record.  A golden customer record refers to a 'single source of truth' or a 'single customer view' consisting of one unified, trusted version of that user, capturing all the necessary information needed to know them.  Without doing this time-consuming process, the result for a company can be a series of disjointed interactions or services that can negatively affect reputation in the eyes of a customer.

 

To solve for this during these ETL projects, the quickest and most cost-effective method to achieve this consolidation is by inserting a managed data integration layer to hold and process the multiple records.  It is worth the additional effort because creating a golden record has many benefits for the relationship with the customer.  These benefits can range from improved customer retention or sales, up to supporting better business decision making by having a full view of all the unique customers being served by the company.

 

On a recent ETL initiative involving taking in multiple client rows to create a golden record of each customer, the first name field of incoming data was found to have dozens of first name variations per email address due to the different customer systems feeding the data aggregation.  Some of the records analyzed were common variations such as "Bob == Robert" or "Jan == Janet" that are simple to match, but there were hundreds more that contained other countries variations such as "Lexi == Alexandra".  These large amount of nickname options were causing many non-matches to the research queue that needed to be reviewed manually by team members.

 

Working to solve for these fuzzy matching variations, the team tested several different word matching libraries to see if their capabilities could reduce the high number of rejections.  After several days of mixed results using the library method, it was suggested that the use of Machine Learning (ML) algorithms could be used to analyze the large amount of nicknames and identify first name patterns not currently found with libraries.  This would allow for utilizing the cloud's economies-of-scale to minimize non-matches that would have to be reviewed by humans.

 

To support this option, an Azure cloud-based conversational Artificial Intelligence (AI) service called Conversational Language Understanding (CLU) was tested to extract useful information from the data being analyzed.  These types of solutions of Natural Language Processing (NLP) are primarily focused on chat bot conversations where a user may give an utterance of 'talk to customer service' that matches to an routable Intent of 'Support' for sending to a call center person.  But with the power of Machine Learning (ML), their many-to-one matching capability can be implemented on any key-value sets of words.

 

The team started testing this initiative in a Proof of Concept (PoC) Tenant by collating common nickname values from several Internet sources, creating Intents in the CLUs Language Studio around the primary name such as "Robert", then adding in utterances or nicknames that match.  In this case the utterances were nicknames that would be matched to the primary name such as "Bob, Bobby, Rob, Robbie, Robby, Robin, Bert" that also includes the core utterance "Robert" we are looking for.

After inputting these utterance nicknames into the CLUs Language Studio and then training a model to deploy, the team began testing.  This involved first using the Language Studios testing mode to validate the Intent to Utterance matching, then from within the core application using the CLU API and passing in a particular nickname to evaluate that the correct Intent comes up with a high enough Confidence Score, which in this case was targeted at greater than or equal to 70%, as opposed to a more common 90% or more.  This score on that record also must match on one or more combinations of other fields (last name + email or address + phone).  This lower confidence score allowed the team to match nicknames that were less common but still possible.

When the service is called and it returns an evaluation, the Confidence Score indicates the confidence that the answer is the right match for a given user query, or alternative Intents with their own lesser Confidence Score.  The team then uses these matches to determine if the setup of Intent to Utterance matches the input pattern expected, or are more words needed to re-train the model.

 The final step of this solution was to run it through several different variation tests to verify acceptability within the larger solution.  These tests were designed to work through the main use cases and solve for the core many-to-one matching need of creating a golden record.  After passing the first two tests through their repeated runs, the third test integrated the Azure CLU endpoint into the codebase for it to send real-time queries to.

  1. Validate consistent matching repeatability of targeted full names / Intents

  2. Review the Confidence Scores returned of the model to ensure no variance in the percentages returned depending on the nicknames / utterances passed in

  3. Ensure the Azure Conversational Language Understanding service could handle the number of API matching requests at scale with low latency 

After testing the multiple scenarios to ensure this solution would work at a repeatable, scaleable level in a testing Azure instance, the team walked the client through deploying an import project file into their Development and Production Tenants using a project export to ensure consistency and validate the testing results match.  Using this approach the team was able to easily swap the API connection string from the PoC endpoint to the Development one and validate the same results were returned.

 

The cloud-based Azure service approach gave the team the on-demand benefits of ML/AI at a very low price. Since the team did not need to create and host their own model through files, they were able to achieve a very cost-effective solution with current pricing starting at 0.5 million API calls at $1 per month. In the solution provided they simply called the Azure CLU API on demand for new records found during the nightly load. The solution is very cost effective and is projected to add a mere .1%  to the client's monthly Azure analytics  spend, costing $10-15 per month based on their data volume.

 

While there are many other efforts that contributed to the teams success in creating golden records of the client's customers, this targeted usage within the bigger solution allowed the project to take advantage of AI and Machine Learning at scale to solve a difficult problem.  This focus on using the right cloud tool for the right need shows how these kinds of services can help turn blocking issues into opportunities.

Previous
Previous

DNA:Hylaine’s whole firm approach

Next
Next

the ultimate low-code platform guide