Acquiring Valuable Data Insights via Hadoop

Anup Purohit, CIO, Yes Bank | Wednesday, 13 September 2017, 10:02 IST

We are living in what I like to term as the ‘Data Age’ where data has become the single most valuable asset. Picture the buckets of data that has been gathered by social media sites alone. There are millions of check-ins on Foursquare, billions of tweets and even more number of Facebook likes and posts. With the quantum growth in digital transactions due to increased digital enablement, the consumer is leaving a rich trail of data all across the digital world. Linking information across these hitherto untapped outside sources with the organizational information systems has become indispensable to build an ‘ANALYTICAL KYC’ of the customer . This is the backbone to build personalized sales and service experience to the customer and achieve consumer delight.

How do we analyze this huge quantum of data? Are organizations capacitated and technologically equipped to digest everything on the plate? Are the objectives really clear about what has to be done with this kind of data?

Looking at the traditional landscape we find that the existing traditional styles of database technologies which though quite reliable and effective, are nowhere capable enough to handle such volumes of information as in circulation today. It would be extremely futile if we rely on these traditional technologies since these are not scalable, burn deeper holes in organizational pockets and are unable to analyze disparate sources of data. The traditional RDBMS systems have actually taken years to mature but now people iterate and drive solutions in a matter of months — or weeks. Hence, this calls for organizations to adopt newly developed technologies which are extremely scalable, have parallel processing capabilities and which can handle the diversity of the data. This agility and flexibility is a need of the hour and organizations have to be extremely nimble-footed to ‘Carpe Diem - Seize the Opportunity’.

On scouting the landscape for a one stop solution to the problem, Hadoop emerges as a front-runner. It provides a technology platform to handle the three most critical business problems

1. Scalability with Reliability

2. Flexibility and Versatility

3. Optimized Costs

Hadoop provides a scale out, fault tolerant architecture with enough technologies in its arsenal to help businesses with most of the needs of data analysis. Leveraging on commodity servers as opposed to specialized hardware results in quantum cost savings on the infrastructure alone required to analyze these large data sets.

One of the most common buzz words in Hadoop is setting up of the Organization’s ‘Data Lake’. As the name implies, this stores all the disparate data for Hadoop ecosystem – from structured data sets to completely unstructured objects such as images and videos. But do we jump into this data lake with both feet? Well, knowing the potential of Hadoop, I already have!

Developments in Hadoop technologies are gaining momentum and are moving at a fast pace with novel innovations happening across the paraphernalia. New technologies are emerging to handle areas right from traditional integrations to search based technologies. The ecosystem has enough such that it has the capability to address traditional use cases such as Data warehouse and ETL right to cutting edge use cases on Artificial Intelligence, Real Time streaming applications and IoT.

Having said that, Hadoop is hardly a 10 year old technology which is increasingly maturing by the day. Even today, it is able to solve most of the problems and its capabilities will only grow with the passage of time. I see a lot of distributions of Hadoop augmenting the base stack with enterprise level capabilities in areas of monitoring, access control, encryption, security and the lineage of data from source to destination. And this space will keep evolving rapidly.

However, a couple of cautions before signing-off:

Firstly, Hadoop is not a mantra for everything. It is not a replacement to the existing relational databases which serve most of the applications. To formulate a blind strategy of replacing all these mature RDBMSes with Hadoop in the current scenario will be a fool’s errand.

Lastly but more importantly, there has to be a clear direction and purpose to your deployment of Hadoop. To get the best out of your Hadoop initiative, you should not only know your question but also the path to be taken to get the answers and also whether the answers make sense. It is of pragmatic importance to judge in hindsight whether the results are consistent with business logics and to establish a cause and effect relationship between the problem and the solution.

Remember the quote from Ronald Coase viz. “Torture the data, and it will confess to anything.”

Don't Miss ( 1-5 of 25 )