Monday, June 17, 2013

Open Access to Big Data a Major Driver of Value for eBay

AppId is over the quota

At Teradata's Big Data Analytics Summit, held recently in Sydney, Alex Liang of eBay (Director of the offshore Analytics Platform and Delivery) presented on their big data ecosystem. It should be noted that his description has to be taken in the context of eBay – a company in which their business is their website is a marketplace on which they aim to match customer's desires with seller's products. This takes place on a massive scale with a requirement for 99.9 + percent availability.

Nevertheless, despite this challenging environment, eBay is committed to the democratisation of data – that is, making data available to large number of employees to query, predict and experiment. To this end they have a decentralised data management system which allows employees to create a virtual datamart, ingest data, identify a trend or gain an insight, form an hypothesis, design an experiment to test that hypothesis, implement that experiment on the eBay website (via A/B testing), measure the results and undo the changes if necessary – all with a great deal of autonomy.  However, with freedom comes responsibility, so employees using data in this way are also responsible for the results they generate from their data analyses.

Can this approach be applied to other companies? As a general principle it can – the philosophy of allowed a larger number of users access to the valuable data held by company can, if implemented well, lead to outstanding results. The approach of allowing users to run free on the date (within certain well defined limits of course) only reigning them in when they approach those limits, can allow companies to exploit the value of big data in a dramatically improved manner. There is a profound philosophical difference between giving users wide access to data and only place restrictions where needed as opposed to starting with a very limited access and adding to it if and only if there is a compelling business need (somewhat analogous to difference between continental and English common law systems).

Has this philosophy of making data as widely available as possible taken root in Australia? Based on the number of questions asked during the conference about how to restrict access and monitor behaviors I would say we still have a long way to go. Of course there is a need to balance the free access to data with the need for appropriate restrictions and Alex outlined eBay's approach to implementing those restrictions including: permissions, automated monitoring, automated retirement of cold data-marts, productionisation of hot data-marts and the need to pass an exam to get access to Teradata. But it is informative that most questions were about how to manage restrictions rather than on the benefits of open access to data.

Another interesting use of date eBay is the meta-analysis of the queries being submitted. Alex described a program they have to use python to analyze the queries being submitted to Teradata, Singularity and Hadoop. The aim is to identify sub-optimal queries, but also to identify commonly requested information and to develop more efficient ways to deliver this to users. This is an example of a growing trend of data generating data – the data in the warehouse for example indirectly generates or causes users to write queries which themselves then become data which can be analyzed.

Ross Farrelly is the Chief Scientist for Teradata Date ANZ who is responsible for data mining, analytics and advanced modeling projects using the Aster Teradata platform. He is a six sigma black belt and has had many years of experience in a variety of statistical roles.


No comments:

Post a Comment