A lot has been said about big data adoption and endless possibilities – so many buzz words, names, tools that “it’s hard to see the forest for the trees.” When a CIO tries to better understand the various options and current gaps – they may be walking clueless in the dark.
“Adding the new technologies to your existing environment will require a Hybrid mode state. That is the safest way to go”
Let us discuss some of the key points of supporting your compliance needs by applying Big Data technologies, techniques and methodologies. As the financial industry adopts Big Data and Advanced Analytics, your institution’s core functionality and reputation depend on the approach you take and the solutions you choose. There are endless possibilities, so I’ll keep it simple.
Traditional BI - Business Needs that Couldn’t Be Addressed Yet
Before starting, ask, “What’s missing?” For many years everything seemed to work. Now there is a huge wave of people and companies talking about new trends and technologies to support compliance. It’s important to first understand what can and can’t be done with traditional technologies. Only when you understand the capabilities and limitations of traditional technologies should you proceed to the second phase: How to attack and fill those technology gaps.
Learn the Basics of the Big Data Landscape
Sometimes managers know a bit about everything but never enough about what they need to know. Don’t delegate here. I recommend that even CIOs learn the basics of the Big Data landscape overview, i.e., what is NoSQL, Hadoop, MapReduce, Advanced Analytics, Machine Learning, etc. Just focus on the basic disruptive technologies to understanding the concepts, the methodologies, and reasons behind these emerging / disruptive technologies.
Don’t be afraid to ask lots of questions. Today’s disruptive technologies combine with machine learning to dredge information from vast data lakes that couldn’t have been done prior to Hadoop. Since this technology has been around for more than a decade, it is no longer risky to discuss productionizing the Hadoop-based data lake. You have to know where things fit in the dynamic puzzle to better understand what’s possible.
By the time you finish learning basics on the first 15 new technologies, it’s likely 3-4 will no longer be relevant, and maybe several more will improve on the ones you’ve just studied. No need to learn about each vendor’s pros/cons in a specific field – that’s why you have the experts. Instead, focus on understanding the main purpose.
Steps to Building a Data Lake – Don’t Get Swamped!
Once you get through the first two steps, it’s likely you’ll decide to create your own Hadoop-based data lake. A large data hub can hold an enormous amount of data by overcoming the current data barriers such as cost-per-data (when the databases could only scale up), and data velocity, that used to restrict us to batch mode most of the time. Dealing with files instead of table records overcomes a variety of issues, so now unstructured and semi-structured data can be efficiently handled in the same platform.
The first step toward designing and implementing your data lake is to plan ahead. Consider future needs, potential stakeholders and possible processes. Another critical issue is to protect your data lake from becoming a compliance “Data Swamp”. Manage input to ensure that your data is truly pertinent. Much is said and done to prevent “swamping” a data lake, but the most intelligent advice is to treat it like a bucket partially filled with very small diamonds (your pertinent data).
The Future of Machine Learning in Fraud/AML/Trade Surveillance
Understanding Machine Learning is critical for every CIO to successfully manage and adapt support for Fraud, AML and Trade Surveillance into the foreseeable future. Understanding and combining the characteristics and capabilities of machine learning with the potential for insight possible with human knowledge and understanding, can really spark strategic thinking.. No need for small data sets to run statistic modeling anymore. Using big data enables us to run ML on top of the entire set. Enriched data, historic data, and more sources than weren’t found relevant when the cost was prohibitive, can be added now for improved modeling outcomes.
Empowering FIUs with Big Data
FIUs play a key role in compliance in the Big Data era. Financial Intelligence Units have turned Fraud/AML and Trade Surveillance detection into a whole different ball game. Now the daily routine runs through a first layer, investigating thousands of activities and patterns to find a few relevant anomalies that get sent to the FIU for a different investigation. Due to the ability to work with small data sets, this approach has advanced and streamlined how we can store data and run analytics. That also makes it possible to have entire account history, leverage images, voice-to-text details and more information to monitor via FIUs.
FIUs in the Big Data era enable faster, more efficient and effective identification of potential Fraud, AML and Trade Surveillance issues. If you plan to participate, it’s wise to start small. Use a business case and processes such as the ones the FIUs are handling on a daily process. Leverage the data exploration skills with the right tools. Understand the architecture that currently enables such units to succeed as much as possible in the big data era while using smart advanced analytics on one hand (with integrated data science), data analytics and exploration on top of a modern data lake on the other, and covered with the vast knowledge and existing BI tools used by the investigators.
FUTUREVIEW: Compliance and Fraud Prevention in the Age of Data Lakes and FIUs
The new era offers lots of promises and ideas. Most of them will become industry standard within less than a decade. Don’t ignore the trends – embrace them with small smart steps by understanding the existing gaps. Determine what can be covered smartly by those new technologies, and undertake step-by-step initiatives. Once started – expect some failures – and treat them as learning experiences. Most initial phases won’t create huge outcomes. But that’s ok – move slowly and carefully. Like Machine Learning, this journey will improve as you adopt.
Adding the new technologies to your existing environment will require a Hybrid mode state. That is the safest way to go. By preparing the Hadoop Infrastructure and adding the relevant/needed sources, you may start working simultaneously with existing analytical tools by using an intermediate layer that integrates well with the platform. Gradually, your data lake can become a most important resource for your analytical teams, and to support your compliance solutions in delivering enhanced and enriched detection and alerting (AML, Trade Surveillance and Fraud). Setting up future steps, moving towards the hybrid mode can remove constraints along the way to achieving a much safer and convenient environment.