Data Awesome #13
Welcome to Data Awesome #13! Superstition has no place for data folks. 😉
Let’s get to it! 🚀
Awesome Articles 📃
One-hot encoding is the first encoding method for nominal categorical data that most people trained in statistics learn. When using a linear model, you were probably taught that you must drop one of the resulting columns. Damien Martin makes a bunch of good points about when you don’t want to drop one of the columns in a machine learning context. He also provides helpful guidelines in Are You Getting Burned By One-Hot Encoding? I 100% agree with his suggestions - especially avoiding pd.get_dummies.
Graph Neural Networks - for a while I thought that term was redundant, because neural networks can be represented as graphs. 🤔 Turns out, it’s not redundant. 😀 The term refers to a family of neural networks that are being used to solve lots of interesting problems. For an overview of the problems they are addressing, see the TWIML AI Podcast episode Trends in Graph Machine Learning with Michael Bronstein. For an overview of how Graph Neural Networks work, see this article by Kung-Hsiang, Huang (Steeve).
Making analytics truly self-service for other teams in an organization is hard. Conor Dewey shares thoughtful commentary on the issues in Scaling Data and Self-Serve Analytics.
My previous article on popular data science job listing tech keywords was recently updated by Terence Shin. He found that cloud continues to become more common in data science job listings while Hadoop saw the largest drop.
Awesome Package
The new Rich Python package gives you cool progress monitoring bars in the terminal.
Rich is a Python library for writing rich text (with color and style) to the terminal, and for displaying advanced content such as tables, markdown, and syntax highlighted code. - From the Docs.
Rich makes it easy to inspect your data structures in a Jupyter Notebook - and it works fine with JupyterLab. Here’s an example:
from rich import inspect
my_list = ["Mambo Number 1", "Mambo Number 2"]
inspect(my_list, methods=True)
Rich returns all this nicely formatted info 🌈:
Awesome Podcast 🎙
I mentioned the TWIML AI podcast above. On This Week In Machine Learning & Artificial Intelligence Sam Charrington hosts and asks great questions of his guests. Jacqueline Nolis was the most recent guest, and — small world — I’ll be co-hosting the Data Science DC MeetUp next week where she’ll be recording an episode of her Build a Career in Data Science Podcast with her co-host, Emily Robinson. Come join us! 🌍
What I’ve been up to 🖊
Streamlit continues to add great features for easily turning your Python data science project into a web app. The ability to deploy your app in just a few clicks on Streamlit Sharing is awesome — and free! I recently created a Streamlit app that demonstrates how to make popular visualizations in a half-dozen Python plotting libraries. The fine folks at Streamlit were nice enough to feature it in their Weekly Roundup. Check it out here.
I’ve started working on some more advanced PostgreSQL content, beyond the material in my Memorable SQL book. Have a SQL topic you’d like to learn ore about? Drop me a note in the comments or on Twitter with any SQL topics you’d be interested in seeing in sequel. 😂
I got some nice feedback from a reader of the books in my Data Skills Book Bundle: “I am reading your memorable Docker book. It is like the scales falling from my eyes! … All text books should be written by you.” - Thanks, but probably more than I have time to tackle. 😀
That’s all the awesomeness for now! Until next time, stay awesome data people! 🎉