PyCon 2014

Open Data con Python y el IPython Notebook

Julia Evans  · 


Extracto de la transcripción automática del vídeo realizada por YouTube.

okay thank you everyone for coming along to this first after lunch session at PyCon 2014 here in Montreal we've got three really excellent speeches all lined up for you our first presenter is a programmer and data scientist based here in Montreal she works

on stripes data team and is a co-organizer of Pi ladies Montreal and the Montreal all-girl hack night to speak to us today about diving into open data with ipython notebook and panders please welcome Julia Evans alright um so today what I want to talk to you

about um I work on a data team um which means they work with data um and I have a bunch of different tools that I use right sometimes I dupe um sometimes I use Python sometimes like I use R once um and my favorite tools to use like the things that I enjoy

using the most our Python tools I enjoy using ipython notebook and pandas together to answer questions about my data and what I want to explain this talk is why they're my favorite um and why they could maybe be your favorite if you don't use them

already so um the way this talk is structured um it's kind of small not gonna get better apparently um is step one what our ipython notebook and pandas like what do those words mean um you may know a little bit already if you watch run on reprises excellent

keynote this morning um most of my time is going to be spent on practical examples of how to use these to answer questions about your data and then I'll end with like a little bit of advice about like where to go next if you want to learn more so first

question is like what are a Python notebook and pandas and like numpy um and like how do these like scientific computing tools work together so the first thing I want to talk about is ipython notebook so I thought that notebook is like a web-based user interface

for Python and you may think like why would I want that right like you can already write Python maybe um and why would you want to be on the it right like why would you want it to be in a webpage so um the way if I thought notebook works um is it runs as a

server on your machine um and then you will open a new new notebook right and I'm going to give you a really short demo instead of listing a bunch of features so um the most important thing that isn't always obvious is it when you use ipython notebook

you can run like write any Python code that you would normally write like it's not like a limited environment where there are only certain things you can do um it's just Python like you can do any Python things that you would otherwise do like you

could connect to databases and you can like you can do everything right um so one thing that I really like about it is like let's say I import something and then I do like string dot split and I want to know how it works right normally you would look up

the documentation here it sees that I've hesitated and it's like would you like some documentation and then I press tab again and like oh would you like even more and then I first have again because I'm kind of frustrated and it pops has all of

the documentation at the bottom explaining a split works for me and I can like resize it and I didn't have to leave right and I didn't have to even know it just like came up for me um so I find it a really useful interactive tool for doing like exploratory

work like I can do like string got split right like this um two three and then feel like oh that wasn't what I wanted maybe I need to tell it to split on a comma oh I see right so my work flow around this is often I'll do something do it wrong as we

do um and then change it and I can iterate really quickly so that's it if I thought notebook um the next thing I want to talk about is pandas um so often I will do some kind of data analysis like I'll have a dataset with every complaint call ever made

in New York right for fun and then I want to know like let's say I want to know like every hour how many noise complaints were made um except like I would like to split it out by borough and also maybe like I only care about the period in like September

and December um so I want to do like a lot of filtering and aggregating and you might think that you might have to write a loop like if you're writing it for Graham that would be like something that could happen um with pendous you don't have to write

loops and also like doing something like I read all the noise complaints and filter out things you don't want and then graph it except like um and then graph the number each hour is like five lines of code like so it's really compact it's really

easy um and but it's not like oh it's super obvious how to get started so that's what I'm going to explain but pandas is kind of the tool that lets you do all this filtering um and parsing so pandas is built on top of numpy what is numpy if

you went to Brendan Rose this talk yesterday about Python data structures you will know that if you have a Python list with like five million things if you iterate through that list you have to make five million objects which takes time even though computers

are fast so I but like sing like number is faster it's kind of a like it's it's not it's not a specific enough right so I wanted to do an example and the example was like I'll take a whole bunch of numbers right and then find the sum of

the squares so I did it in Python right I wrote it like pretty simple is pretty simple example I took 1.2 seconds to add up all those numbers squared which is pretty good right like computers are fast that that was a big number um but if I do it at numpy it

takes like 83 milliseconds which is like like 15 times faster um so when I say now pi is faster I mean um pi is like 15 times faster like that's what you should be thinking about or more right um so and the reason it's faster is because it implements

all the operations like it stores your um data structures is like series um and then like when you operate on numpy arrays you do it all at once so instead of being like writing a for loop you'll be like just sum all these things right and you just like

[ ... ]

Nota: se han omitido las otras 3.042 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.