PyCon US Portland 2016

2016-06-06

Here are some notes from PyCon this year in Portland Oregon. My second time in Portland.

THE COBBLER’S CHILDREN HAVE NO SHOES, OR BUILDING BETTER TOOLS FOR OURSELVES

Talk by Alex Gaynor.

  • Good talk on writing CLIs for internal use - turning functionality into scripts and using argparse to define the interface

LASER CUTTERS, 3D PRINTERS, AND PYTHON

  • SVG files are just XML! Way simpler than I thought.
  • Check out OpenSCAD software.

VISUAL DIAGNOSTICS FOR MORE INFORMED MACHINE LEARNING: WITHIN AND BEYOND SCIKIT-LEARN

  • More about how good sci kit is.
  • More general machine learning talk.
  • Pandas for loading data. What is it?
  • She thinks that the future of machine learning practitioners looks like her. No formal training. Because of Python and scikit learn, in part.
  • Informed machine learning is hard.

BETTER TESTING WITH LESS CODE: PROPERTY BASED TESTING WITH PYTHON

  • Testing frameworks. Can generate data for you, and you can massage it to make the data fit your design.
  • Property based testing - hypothesis - it helps find bugs
  • look at slides! Should have some good links.
  • Can bump up the number of tests in travis. I.e have it run quickly locally
  • Recommend breaking up - fast unit tests and property based testing separately

PROTOTYPING NEW APIS WITH FLASK

  • Check out marshmallow for data validation.
  • Also use flask_login for login. There’s another one that was recommended by that British guy. It provides login pages that you can override etc. not sure if this was flask_login or not.

BUILD SERVERLESS REALTIME DATA PIPELINES WITH PYTHON AND AWS LAMBDA

  • AWS lambda, only pay for the time the function runs.
  • Could be good for realtime calculations for EyeSpace 3
  • Stateless, scalable.
  • Looks like you need to upload by putting code and dependencies into a zip file. Could this work with numpy?
  • Deployment: Can do AWS CLI from travis CI job. How does this fit in with the zip file above?
  • What is API gateway?

WHEN IS IT GOOD TO BE BAD? WEB SCRAPING AND DATA ANALYSIS OF NHL PENALTIES

  • Need to look at pandas!
  • Beautiful soup.
  • “The brother problem”. I like the idea of naming a problem.

STATISTICS FOR HACKERS

  • Basically, this talk looks at how you can “simulate” the scenario to calculate p value.
  • Shuffling approach works when the null hypothesis assumes two groups are equivalent.
  • Bootstrap sampling kinda works by resanpling the data.
  • Protect against overfitting by cross validation. It is the go-to method for machine learning where you can’t really do an analytic analysis.
  • Direct simulation method used in astronomy. Crazy example of simulating the night sky!

Other notes

  • GitHub issues becoming popular, probably should switch to it for our internal issue tracker
  • Visual screenshots diffing for tests could be a good idea