Machine learning model building is part art, part science. A model is a representation of a process; the goal is to approximate that process or phenomenon as accurately as possible to build a task-specific model. In the world of Socure and our flagship product, Sigma, that task is fraud prevention.
The key is to accurately estimate fraud rates at any given time. Effective fraud prevention requires building adaptable models that can calculate up-to-date fraud rates based on emerging trends and patterns. As the nature of fraud shifts, models must be updated to detect and address new types of fraudulent activity.
Having a thoughtful, business-focused data science team is crucial to get the modeling right. Whether you are a product manager, fraud strategist, or data scientist, you must understand customer pain points, risk appetite, goals, and constraints. Only then can you define the objective and think about how to accurately represent the process.
Feature Engineering for Fraud Detection
Model building is a shiny object for many data scientists. In reality, building a performant and robust model is only possible with clean data, features that pack relevant signals, and an infrastructure that enables parity between training and real-time decision environments.
Imagine you are building your first fraud detection model, and you have some data and historical outcomes. It may be tempting to rapidly generate this model model by simply feeding it all available data into a commoditized platform and clicking a button to train it. However, such models often learn spurious relationships that would not hold through over time, generalize among different populations, and discriminate based on group characteristics.
At Socure, feature engineering looks at patterns and behaviors that distinguish legitimate customers from malicious actors. What behaviors suggest legitimate use versus fraud with no intent to legitimately use the service?
The key is to engineer features that hint at causality, that have orthogonal signals, and are stable over populations and time.
Even before feature engineering, quality data and signals are critical. If signals change daily, models can’t learn. Consolidating data sources and ensuring consistency is essential. Relying on a single source is dangerous — you need corroboration. While data minimization is recommended, striking a healthy balance of redundancy helps models stay stable. This will also allow your team to proactively analyze gaps and invest in new data sources to catch novel fraud attacks.
For example, in the case of an account takeover attack, the personal information of the user may have already been compromised. The only way to detect a sophisticated attack is through behavioral analytics, looking at signals such as whether the account is being accessed from another device with different touch-screen interaction patterns. If these signals are missing, models will struggle to identify new fraud, no matter how much feature engineering is done on data containing only personal information.
Your models depend on having visibility into relevant signals. It’s critical to make strategic investments to understand and obtain the signals you lack.
Optimizing Model Learning
With quality inputs in place, optimizing model learning is critical. One effective approach is to use decision trees that branch at key decision points, then boost performance by combining multiple trees into an ensemble model architecture — this is what your data science folks refer to as XGBoost!
Additionally, employing graph networks can help uncover and leverage identity relationship signals, even without access to explicit relationship data. These networks recognize linkages within the data that provide powerful predictive capacity. Allowing models to learn from these connections, whether directly provided or implicitly embedded in the data, significantly improves detection rates.
Thoughtful Data Science
All of this to say, the goal of model building at Socure is to choose the most fair, inclusive modeling approach possible.
By investing in thoughtful data science, Socure builds more fair, stable models that improve over time. With this principled, pillar-based approach in our Sigma Fraud Suite, Socure can help our customers gain better fraud prevention and business outcomes.
Connect with me on LinkedIn here so we can continue the conversation.
Yigit Yildirim is a technology executive with extensive experience leading diverse teams in hyper growth startups and established public companies. Yigit has held key positions at Emailage as VP of Data Science and Engineering, at LexisNexis as Head of Data Science and Product Innovation, and currently serves as the GM and SVP of Fraud & Risk Products and ML Platforms at Socure, leading their identity risk product suite. With a PhD in Computer Science and a Master’s in Industrial Engineering, Yigit specializes in AI-driven solutions for risk management, fraud prevention, and identity verification.