NIPFP | web page

(Co-authored with Vimal Balasubramaniam, Kusan Biswas, and Mithila A. Sarah)

The lockdown that began in the last week in March 2020 has affected people from all walks of life. Frictions faced by several sectors and stakeholders such as essential service suppliers, farms, factories, migrant labourers have been widely reported. One set of frictions that are yet to be studied is that of consumers of financial products.

The financial system is extremely important to households as they navigate the crisis. An increasing number of households will be making online payments and using non-cash technologies such as QR codes to undertake cashless transactions. Households may need to borrow fresh, and also need to continue paying their EMIs on existing debts. Given the plethora of transaction activities and the centrality of accessing formal financial systems for households, it is important to understand what are the types of concerns they faced during this lockdown? In this article, we ask: a) was there an increase in the number of complaints in accessing financial services during the lockdown? and b) what is the nature of the problems we observe?

Such statistics are generally hard to come by, especially in India, where firms do not present information on complaint resolution, or where we do not possess a public record of complaints such as the CFPB Consumer Complaints database to estimate a high-frequency measure of consumer complaints. We use Twitter -- a platform that is often used to air grievances and take a first look at what it may offer in our context. We present an analysis of Twitter posts related to the banking sector surrounding the current crisis. No doubt, the Twitterati in India is not representative of the average Indian household. We focus our gaze on a select sample of Indian households that have access to the internet (on phones and otherwise) and use English as a medium of communication by only focussing on Twitter. However, we believe that this is one source of information that provides a high-frequency monitor to the types of grievances generated in a data-scarce environment. Such an analysis does not tell us how the system resolved such problems. However, it presents us with statistics on the kinds of problems faced and sheds light on where the bottlenecks lay in the financial system, in general, and in the implementation of specific policy measures.

Approach

The methodology for our study is as follows:

1. Banks: We gathered tweets related to all Indian private and public bank handles from 27th January to 23rd April, 2020. The total number of tweets for this period were 1,83,295, out of which 70,419 were about public sector banks and 1,12,876 were of private sector banks. The difference in the tweet frequency between banks need not reflect the intensity of complaints as different banks have different customer bases. For instance, it is more likely that customers of private sector banks are more tech-savvy and hence able to use Twitter. Figure 1 presents the proportion of tweets for each bank as a proportion of the type of bank.

Figure 1: Proportion of tweets per bank in both public and private sector

To simplify our presentation we focus on four banks in the analysis: SBI, PNB, HDFC and ICICI bank. We choose the banks by the total volume of deposits (which includes demand deposits, saving bank deposits and term deposits) they hold. According to the RBI's data on liabilities and assets of scheduled commercial banks as of March 2019, amongst the public banks SBI has the highest volume of deposits (34.3%) followed by Punjab National Bank(PNB) (8%). Amongst the private sector banks, HDFC bank has the highest volume of deposits (24.5%) followed by ICICI bank(17.3%). This leaves us with a total of 1,18,428 tweets.

2. Dates: The first instance of a nationwide lockdown was on 22nd March 2020, when the Prime Minister of India requested all citizens to observe a "janta curfew" . The complete lockdown was announced on the evening of 24th March, and was enforced from 25th March to 14th April 2020. The first extension of the lockdown was announced on 13th April which was supposed to end on 3rd May 2020. However, an order for second extension of the lockdown was issued on 1st May and the lockdown is now expected to end on 17th May. We choose 22nd March as our "event date", and study the performance before and after the announcement of the lockdown.

3. Analysis: One of the serious challenges in using complex linguistic algorithms to classify tweets from India is that the nature of tweets in India is non-standard. Typically, such tweets have poor use of any particular language (English in our case) -- often rife with spelling errors. Some tweets contain key English words, surrounded by less clear language classification ("Hinglish", for instance) making it challenging to undertake more sophisticated analysis of text data.

Our approach is limited by this language consideration. A manual inspection of these tweets does not provide for an accessible approach to generating patterns in the data either. To keep this tractable, and also to accommodate the unusual language consideration, we use an unsupervised learning method which helps us classify tweets into different clusters. Once clustered, we go through the tweets in each group to qualitatively assess the nature of tweets to draw insights from them.

We start with creating a corpus with all the unique words that appear throughout these tweets. We exclude non-English terms and restrict our analysis to words in the English dictionary. For each tweet, we scan for these terms and quantify it as one when the word occurs, and zero otherwise. Once we quantify the combinations of words that occur in these tweets, we employ the simplest approach to unsupervised learning: the K-means clustering algorithm . The algorithm, in summary, partitions the dataset into K pre-defined distinct non-overlapping subgroups (clusters). At the end of the analysis, each data point (Tweet in our context) belongs to only one group. This algorithm, therefore, forms word clusters such that the total average squared distance (in technical parlance referred to as the within-cluster sum of squares) of the words in a cluster to its mean is minimized.

A vital step in this process is to determine the optimal number of clusters (K). The algorithm does not choose the number of groups automatically. Instead, we make use of the within-cluster sum of squares to identify the least number of groups that can explain most of the word combinations that are prevalent in the data. The K is determined at the "elbow" of the relationship between the number of clusters and the variation explained. Based on this assessment, we group all the tweets for the top four banks in our sample period into three clusters, and then qualitatively assess the nature of these clusters below.

Overall intensity

Figure 2 presents the tweet intensity for the four banks before and after the lockdown. We measure intensity relative to the median number of tweets for each bank before the lockdown. For example, at its peak, we find that all four banks witnessed tweet frequency that was three times higher than the pre-lockdown median for each bank. Indeed, by this measure, Twitter does seem to pick up information about consumer concerns around this period.

Figure 2: Tweet intensity for the four banks

Qualitative assessment of Tweet Groups

There is a change in the terms that most frequently appeared in tweets before and after the lockdown started. This is an indicator of the underlying concerns of the bank customers. Figure 3 plots a simple frequency chart of the word types before and after lockdown. Tweets before lockdown constituted of words like 'banking', 'customer', 'service', 'time', 'transaction', 'call','care', 'transaction', whereas after lockdown was announced, tweets contained words such as 'loan', 'credit', 'due', 'moratorium','pay', etc. Anecdotally, we know that there was considerable anxiety about repayments, and this is reflected in the tweets we see around the time.

Figure 3: Words that most frequently appeared in tweets before and after lockdown was announced

Analysing each cluster

Using the K-means clustering exercise, we find that the most frequent words in each cluster reflect three qualitative categories: transaction-related, branch-related, and a miscellaneous group, "others". We find a meaningful increase in the number of tweets in the transaction-related cluster (on average, about two percentage points), and a reduction in branch-related concerns. While this may seem natural, the nature of transaction-related concerns also changed towards liquidity, credit, and moratoriums.

The words appearing in the Transaction related cluster efore the lockdown announcement were 'call', 'branch', 'service', 'help', 'loan', 'work', 'one', 'credit', 'care', 'issue', while the terms post the announcement were 'loan', 'moratorium', 'help', 'deduct', 'time', 'pay', 'due', 'request', 'refund', 'charge'. This also indicates a shift in the concerns of the customers from general to more lockdown specific concerns. Transaction related tweets of post lockdown announcement highlight the concerns of customers who have to pay Equated Monthly Installments (EMI) during this period. A few examples of this category are:

"Request @YESBANK @TheOfficialSBI to provide interest free moratorium for EMIs due to #COVID19outbreak till normal business atmosphere is restored."

"dear @TheOfficialSBI Do you give a 2 month moratorium on loan repayments? Critical for daily wage, farmers and entrepreneurs at this point of time! The salaried class does not get impacted."

Branch related concerns pertain to bank branch specific complaints or queries. These are mostly centred around customer services provided by the banks at each of their branches. Few examples would be:

"@ICICIBank @ICICIBank_Care Very poor service delivery by ICICI Bank Bhabua Branch, Bihar. They are not taking customer concern seriously and only bypass the customer issues they are least bothered to help to the customer. How Bank like ICICI hire these kind of senseless people's."

"@HDFC_Bank @HDFCBank_Cares Due to some reasons my account was blocked including my net banking. Now to reactivate it I need to visit branch. But due to curfew in my area I am not able to visit branch. Please help me out. Thank you"

The decrease in the terms appearing in this category indicates how because of lockdown, concerns with bank branches have reduced due to limited access.

A limited exercise such as this highlights an often seen challenge in India. The Reserve Bank of India had announced various measures to ease the economic constraints for households in India. However, banks did not follow suit and communicate this well enough, soon enough. Naturally, annual aggregate information on consumer complaints cannot capture the timing aspect of consumer grievances. An approach to monitoring platforms where the public air their grievances may, therefore, be fruitful in understanding the speed and extent to which regulatory actions translate into ground realities.

Conclusion and limitations

An analysis of the Twitter feed suggests that there were substantial frictions faced in the access to and use of financial products. While K-means clustering is a straightforward approach to classifying tweets, more advanced approaches will allow for separating grievances from "opinions", improving the precision of these estimates. Twitter information, though helpful as a "leading indicator", may only be representative if the only difference between different sub-groups of the Indian population is channels and language of communication. However, this may not be necessarily true -- especially for the previously unbanked population. Therefore we need a far more systematic measurement of grievance incidence ever to be able to solve them. Twitter, however, can complement this systematic measurement infrastructure with a timely indication of pressure points in the retail financial infrastructure.

India has made significant strides on financial inclusion. However, our progress on building systems of grievance redress is limited. Universal financial inclusion will generate more number of frictions, especially for households that have either never used formal finance or have limited dependence on it. One such example would be that of migrants, daily wage earners and other informal sector workers who are dependant on cash withdrawals and are facing problems with new payment systems such as Aadhar enabled Payment System (AePS) transactions. The grievance landscape is certainly more complex than what Twitter may capture. Frictions faced by households may be as simple as money getting stuck in an ATM, as complex as being mis-sold an insurance policy, or as devastating as stolen bank deposits. If there is no recourse to solving these problems, then not only is there a welfare loss to the individual concerned, but also damage to the larger trust in the system. Sustained universal financial inclusion, therefore, requires investments in systems of grievance redress.

Renuka Sane is Associate Professor, Kusan Biswas and Mithila A. Sarah are Research Fellows, NIPFP. Vimal Balasubramaniam is a researcher at Queen Mary University, London. We thank Hemen Sampat for useful comments.

The views expressed in the post are those of the authors only. No responsibility for them should be attributed to NIPFP.

This article was first published in The Leap Blog on May 14, 2020.

Estimating customer complaints using Twitter feeds