Big Data for Retail Banking
Big Data for Retail Banking.
It covers areas such as:
- Individualization of product offers to existing clients
- Early fraud detection and fraud damage mitigation
- Prediction of products cancellations and client's defections
- Optimal allocation of cash to ATMs and bank branches
- Minimization of usage of expensive bank channels such as branch visits
- Reliable assessment of clients for debt products
Datasets from backups and relational databases are copied into Hadoop. Machine learning technologies are applied to find hidden patterns and correlations in data.
Dataset of monthly expenses and income categories for all clients. This dataset is created from bank accounts movements, direct debits and standing orders. Each account movement is usually accompanied by movement code such as for electricity, phone bill, restaurant type code and so on. It also uses merchant's name, description and comment fields to categorize transaction.
We recognize several categories of expenses such as housing expenses (rent or mortgage), energy expenses (gas and electricity), food and household related expenses, education (schools, books, courses), car expenses (fuel and repairs), restaurants, big ticket items (TV, furniture), taxes, recreation and hobby, credit card and loan payments, luxury items and so on.
Income categories are salaries, dividends, tax refunds, social benefits, rental income, sales and so on. Simple regression analysis of this dataset gives us overall trends for total expenses, incomes and savings as well as detail trends for each category of incomes and expenses for each client.
Machine Learning and Predictions
We can use full range of machine learning algorithms and models to make predictions. There are two broad categories of algorithms supervised and unsupervised.
Supervised learning algorithms use historical data to learn that certain combinations of input values cause certain output values. Our models are trained and verified on samples of historical data. Sample data can be chosen randomly but we have seen better results if datasets are categorized first. The customers dataset has categories such as age, income, location based on town size, education and savings. Each category is split into brackets. For example age category is split into 20 five years age brackets. We can see number of customers in each age bracket so we can sample 5% of records from each age bracket. These samples are ideal to see which categories make largest contribution to overall results. For example we can see that education makes largest contribution to accept certain investment product.
Unsupervised machine learning algorithms look for unknown patterns in available data.
We can use patterns of unusual client behavior to find early signs of frauds.
Individualization of Product Offers
Banks can save money on broad and expensive marketing campaigns to promote bank products. Products will be offered only to customers that need them and are likely to accept them. Customers should see less of irrelevant offers. This requires deep knowledge of who accepted given products in the past.
Datasets of subscriptions to bank products and services as well as historical values are analyzed. Separate model for each product and subscription is created. It chooses and verifies the best learning algorithm and finds which categories and variables do have the biggest influence.
Early fraud detection and fraud damage mitigation
It includes detection of identity frauds, credit card frauds, wire frauds, attacks on internet and mobile banking and money laundering. New types of frauds and new schemes require flexible and fast detection algorithms. In past banks used only statistical and rules based algorithms to find whether suspicious activity is taken place. These algorithms were limited because they can only recognize known frauds, they require expensive maintenance, they do not work with full history of each client and they have high level of false positives.
Dataset of known fraud cases was utilized. Fraud cases are sorted into several categories such as overdraft fraud with stolen identity, stolen credit card, consumer loan fraud, credit card top up with fraudulent check, stolen checks, skimming with card duplication, attacks on online banking with stolen customer's credential and/or security devices, rogue online merchant frauds using credit cards and so on. Neuronal networks with back propagation were used as well as decision tree algorithms. These algorithms were applied on existing datasets to find unknown occurrences of frauds.
Prediction of Product Cancellations and Client's defections
A prediction of bank products cancellations and client's defections is very time sensitive. Bank has just days to act before client irreversibly decide to cancel a product or move to competition. Bank needs to identify clients who are likely to defect, contact them and pro-actively offer alternative products or solve client's issues. It is much cheaper to retain highly profitable clients than to attract them back.
Syoncloud uses account movements, debit and credit card movements, clients dataset from CRM, product subscription dataset, call centre and branch visits transactions and log information as primary data sources for predictions. It also utilizes common datasets of incomes and expenses.
Syoncloud creates timeseries of key events such as direct debits cancellations, income to the account from salaries, dividends and rents, transfers to client's accounts at different banks, call centre and branch contacts made by the client separated into categories, cancellations of credit cards and so on.
Syoncloud selects another set of clients that do match categories such as age, income, saving and location for the same time interval but who still remain clients.
Based on these input datasets it creates models that are able to predict behaviour of clients before they irreversibly decide to move to competitors. It uses several supervised learning algorithms such as Support Vector Machines for binary classification and Neural Network with Backpropagation for predictions. From unsupervised machine learning algorithms it utilizes K-Means and Mean Shift Clustering after Principal Component Analysis was applied to reduce dimensions of input data.
Syoncloud identified several hundreds profitable clients in recent data who match patterns of clients who moved their accounts to competitors. These clients should be contacted by their respective bank branches.
Optimal Allocation of Cash for ATMs and Bank Branches
Demand for cash is highly variable during year at many ATMs and bank branch locations. The variability is caused by weather, local events, vacations, tourism and so on. It is important to predict right amount cash that needs to be deposited into ATMs as well as bank branches. It is costly to service ATMs too often, it is also costly to have cash machines out of order due lack of cash. In the same time we want to limit amount of unnecessary cash that is stored for long times in ATMs and bank branches. It leads to suboptimal cash allocation as well as it attracts crime.
As the primary datasets Syoncloud uses ATM service logs, geographic locations of ATMs and bank branches, withdraws dataset for each ATM, weather reports for ATMs and bank branch locations, schedules of sports, cultural or other events as well as holidays for all locations. Syoncloud also utilizes credit and debit card movements to assess demand for cash at various locations and during different times of the year. It uses common datasets of incomes to see when salaries, social benefits and other incomes arrived to client's accounts at different locations.
Syoncloud creates dataset of median amounts of cash withdraws for each day of the year and hour of day for all ATMs. This dataset is used to calculate influence of weather, events, day of the week or holidays on demands for cash at given location.
Syoncloud utilizes dataset of significant cultural, sport and other events during past 4 years with location coordinates. It calculates influence of each event on cash demand for all ATMs that are in 100m radius of given event. It is able to sort all events based on influence on cash demand. This dataset is used for predictions of influence of similar events.
Syoncloud also calculates correlation between local weather parameters such precipitation, temperature and wind at location of each ATM with cash demand.
Syoncloud creates correlation dataset between days when clients receive incomes, such as salaries and social benefits, and cash demands at different locations.
It creates models that can predict cash demand for each day of the year for each ATM and bank branch location. These models take into account historical weather forecast data and schedules of events. Syoncloud utilizes algorithms such as Restricted Boltzmann Machine, Perceptron and Gaussian Discriminative Analysis.
Minimization of Usage of Expensive Channels
Syoncloud can help minimize usage of expensive bank channels such as over-the-counter operations and other visits of bank branches as well as calls to call centres.
This can be achieve by optimizations of online banking and mobile banking applications, help pages and wizards as well as optimization of web pages on bank's websites. Another way to encourage reluctant clients to switch to cheaper channels is by targeted campaigns.
The primary sources of data for analysis are web log files from online banking application as well as mobile banking applications. Syoncloud also uses bank accounts movements with codes of bank channels, dataset of call centre transactions, CRM dataset with information about customers and dataset of transactions from bank branches.
Another important dataset is complains and enquiries from call centre, emails, letters and branches. Syoncloud sorts this datasets by areas of interest and correlates them with help web pages. It is able to identify help pages that are unclear and caused confusion and unnecessary calls to call centre. It also identifies operations in online banking that are complex and generated higher amount of complains. It uncovered several areas related to exchange rates during credit cards payments that were not covered by help pages but were often discussed over the phone or even by bank branch visits. Changes made to bank products related web pages, self helps, search optimizations, online banking operations and mobile banking applications can bring quick savings on outsourced call centres and bank branch visits.
Syoncloud analyses results from marketing campaigns to move reluctant clients to online and mobile banking or self-serving kiosks. It used correlation analysis and uncovered that some broad marketing campaigns were not efficient. Syoncloud analyses patterns of bank clients who recently moved most of the operations online. This gave us a tool to select portion of clients that are more likely to move online. These customers should be targeted by personalized marketing campaigns or by demonstration of advantages at bank branches.
Assessment of Clients for Debt Products
In order to reliably assess risks and approve debt products to existing clients we need take into account not just current credit scores and current disposable income of the clients but also complete history of the client as well as social context. This decreases risk for the bank as well increase income from valuable clients who would be otherwise rejected.
As a primary source of data Syoncloud uses common dataset of incomes and expenses, complete history of payment morale for credit cards, consumer loans, mortgages, overdrafts and other debt products and CRM information about clients.
It uses Markov Chain stochastic process to assess debt and payment morale related behaviour of clients. This model is tested on historical data of profitable and defaulted loans, credit cards and other debt products. We have noticed improved of reliability of credit scores and we were able to suggest suitable alternative debt products for rejected clients.