Saturday 2 p.m.–2:40 p.m.

PygData : Python in BigData

Krupesh Desai

Audience level:


A Concise overview of the term "Big Data" with simple examples followed by highlighting areas where Python is widely used in big data solutions.


Python is growing leaps and bounds these days in building big data solutions, powering all aspects of big data ranging from data procurement through APIs to data cleansing and analysis. When someone says "Big Data" in a group discussion or staff meeting, is everyone in the room thinking about the same thing? Probably not. This talk will help to clear the cloud around the understanding of the term "Big Data". While we’ll be mostly covering several aspects of big data solution in this talk, we’ll make sure to see how Python can be used in developing big data solutions and what makes Python ideal for it.

After the concise overview of big data including map-reduce programming model, hadoop and the cloud, the talk will concern itself with two main sections with code snippets: solving a big data problem and doing data science with Python. We will first explore the powerful mrjob package to process large datasets. Data science and big data are the two sides of a coin therefore the second section will cover the overview of machine learning and how it can be done with Python.