AI-Powered Background Checks for a Global Identity Provider

Project snapshot

The amount of data created worldwide exceeds 400 million terabytes a day, yet few companies know the true capabilities of their data. Our client, a global provider of identity verification, wanted to change that. In preparation for advanced analytics and artificial intelligence, they contacted Intellias for a data capability analysis. Our client’s IT managers wanted to know if they could support AI-driven analytics in production. They also wanted to know what they needed to do to improve their level of data maturity.

Although many companies find that they need to modernize systems and restructure data, that was not the case with our client. They had already created a central platform combining dashboards with logic to generate advanced analytics about their data. The data analytics platform became a steppingstone from data immaturity to a data-driven organization.

However, during our investigation, we found that data ingestion and access control were not consistent across services. We reorganized and modernized the system architecture, standardized the data schema, and provided a clean environment for AI to ingest and process the client’s data. The solution also involved applying governance to the client’s existing data analytics platform, such as role-based access. Finally, we demonstrated the success of the new data structure by solving one of the client’s business challenges: automating performance recommendations for background check configurations with the help of AI.

AI-Powered Background Checks for a Global Identity Provider

Business challenge

The client offered background check and verification services through three products: for the EMEA, the Americas, and APAC. Each product has separate systems with rule engines and data ingestion methods, as well as different UIs for customer experience. Our customers’ clients could configure their verification flows. This frequently requires checking multiple documents in different identity validation layers. It also sets the tolerance for fraud scoring.

The client’s customer-facing personnel needed a way to understand which settings were working and which were not. Internal analysts could review outcomes on a case-by-case basis, but there was no scalable way to benchmark performance or recommend improvements based on shared analysis.

Challenges included:

Version control gap: Client configuration histories were not tracked or versioned
Inconsistent metrics: Pass rate outcomes were stored in different formats and were not the same across different systems
Partial ingestion: Data pipelines fed into the data analytics platform, but only partially and without enforced schemas
Lack of tracking: There was no automated way to identify which verification rules had the greatest effect on the success or failure of the background check
Manual reporting: Instead of standard reporting methods, personnel wrote their own scripts, developed ad hoc dashboards, and created static reports to share with customers
Disconnected AI: AI models existed, but they were exploratory models and not connected to the client’s operations, nor were they available to serve customers

Although the client had positioned themselves to benefit from advanced analytics and AI assistance, they needed clarity on what it would take to support those use cases consistently and securely across its platforms.

Solution

We led a structured data capability modelling analysis of the client’s entire data ecosystem. This process included a detailed review of system architecture, data structures, and operational workflows. The work was done in four phases:

Phase 1: Assess the client’s current data ingestion, data governance, reporting layers, and AI maturity across all ID verification platforms
Phase 2: Identify strengths, such as existing use of the data analytics platform and initial ingestion pipelines, and weaknesses, such as schema inconsistencies, limited access control, and lack of audit trails
Phase 3: Provided clear, actionable recommendations to improve the client’s level of data maturity:
1. Standardize data schemas across platforms to ensure compatibility and support analytics
2. Apply schema validation during ingestion to prevent structural errors and missing fields
3. Implement role-based access controls and a centralized permissions framework
4. Track client configuration histories and store versioned records to support change tracking and performance benchmarking
5. Add explainability, training traceability, and model audit logs for operational AI
Phase 4: Apply our recommendations to demonstrate how they solve a client’s real-world business problem: generating AI-based configuration suggestions to improve background check success rates

Reorganized data platform to demonstrate recommendations

The demonstration use case was applied to the platform for EMEA. Using 90-day rolling datasets, we built a data pipeline that applied SHAP values to identify the contribution of each verification rule to overall pass rate success or failure. SHAP (SHapley Additive exPlanations) is a method for explaining the output of machine learning models by assigning feature importance based on each feature’s contribution to a specific model prediction.

The system produced:

Customer segmentation: Clients were grouped based on company size, industry, risk appetite, and transaction volumes
Rule scoring with SHAP: Each verification rule was scored for its influence on pass rate outcomes
Natural-language recommendations: Peer-based comparisons were translated into plain, human-readable statements such as, “Customers in your segment have a 4.8% higher pass rate when Rule X is enabled”
Impact projections: Each recommendation included a numerical estimate of its likely improvement
Exportable reports: Recommendations were saved and as downloadable spreadsheets for internal use or client preparation

The recommendations excluded certain rules flagged as “non-negotiable” because they were required for compliance. Meanwhile, the risk and fraud profile for each customer was built into the segmentation logic. All generated recommendations were stored in a table, along with the time they were created, a description of the conditions or inputs that led to the recommendation, and a written explanation of the logic behind it.

We also set up an internal dashboard on Amazon QuickSight. It gave users a way to trigger recommendations manually and review the configuration history to see which suggestions were linked to actual pass rate improvements.

The use case served as a working prototype for scaling AI recommendations across other platforms. It also validated the architecture and workflow changes required to operationalize AI across the client’s identity verification products.

AI-Powered Background Checks for a Global Identity Provider

Business outcomes

The data capability analysis gave our client a clear understanding of their data. They were able to see what was working and what needed improvement. By using a real business challenge to demonstrate the effect of our recommendations, we helped the client understand the value of their data in a measurable, relevant way.

The client now has:

Access to consistent and explainable recommendations based on system behavior and configuration
Visibility into performance trends that can be tracked in near real time
The ability to compare rule effectiveness for different customers, industries, and timeframes
A system that records configuration history, captures the effect of changes to the system, and collects feedback

The business case also provided a repeatable framework for introducing AI to other products.

41%

Faster config updates

27%

Higher pass rates

89%

Increased adoption

Technology stack

Amazon QuickSight, Airflow, Python, SHAP, Scikit-learn, PostgreSQL, AWS S3, AWS Lambda, Custom RBAC, audit logging