PyCon US Portland 2016
2016-06-06
Here are some notes from PyCon this year in Portland Oregon. My second time in Portland.
THE COBBLER’S CHILDREN HAVE NO SHOES, OR BUILDING BETTER TOOLS FOR OURSELVES
- Good talk on writing CLIs for internal use - turning functionality into scripts and using argparse to define the interface
LASER CUTTERS, 3D PRINTERS, AND PYTHON
- SVG files are just XML! Way simpler than I thought.
- Check out OpenSCAD software.
VISUAL DIAGNOSTICS FOR MORE INFORMED MACHINE LEARNING: WITHIN AND BEYOND SCIKIT-LEARN
- More about how good sci kit is.
- More general machine learning talk.
- Pandas for loading data. What is it?
- She thinks that the future of machine learning practitioners looks like her. No formal training. Because of Python and scikit learn, in part.
- Informed machine learning is hard.
BETTER TESTING WITH LESS CODE: PROPERTY BASED TESTING WITH PYTHON
- Testing frameworks. Can generate data for you, and you can massage it to make the data fit your design.
- Property based testing - hypothesis - it helps find bugs
- look at slides! Should have some good links.
- Can bump up the number of tests in travis. I.e have it run quickly locally
- Recommend breaking up - fast unit tests and property based testing separately
PROTOTYPING NEW APIS WITH FLASK
- Check out marshmallow for data validation.
- Also use flask_login for login. There’s another one that was recommended by that British guy. It provides login pages that you can override etc. not sure if this was flask_login or not.
BUILD SERVERLESS REALTIME DATA PIPELINES WITH PYTHON AND AWS LAMBDA
- AWS lambda, only pay for the time the function runs.
- Could be good for realtime calculations for EyeSpace 3
- Stateless, scalable.
- Looks like you need to upload by putting code and dependencies into a zip file. Could this work with numpy?
- Deployment: Can do AWS CLI from travis CI job. How does this fit in with the zip file above?
- What is API gateway?
WHEN IS IT GOOD TO BE BAD? WEB SCRAPING AND DATA ANALYSIS OF NHL PENALTIES
- Need to look at pandas!
- Beautiful soup.
- “The brother problem”. I like the idea of naming a problem.
STATISTICS FOR HACKERS
- Basically, this talk looks at how you can “simulate” the scenario to calculate p value.
- Shuffling approach works when the null hypothesis assumes two groups are equivalent.
- Bootstrap sampling kinda works by resanpling the data.
- Protect against overfitting by cross validation. It is the go-to method for machine learning where you can’t really do an analytic analysis.
- Direct simulation method used in astronomy. Crazy example of simulating the night sky!
Other notes
- GitHub issues becoming popular, probably should switch to it for our internal issue tracker
- Visual screenshots diffing for tests could be a good idea