Data Awesome #14

The best transformer guides, where to code, everything 🤗, and more!

Jan 18, 2022

This is the 14th issue of Data Awesome, where I share great resources and use the word awesome more than any one person should. 😉 Let’s get to it! 🚀

Awesome Articles 🖋

I teach Data Science for General Assembly and I just made the switch from using Jupyter Notebooks in JupyterLab with my students to using Visual Studio Code (VS Code). I’d dabbled before, but Chris Moffit’s 16 Reasons to Use VS Code for Developing Jupyter Notebooks convinced me to make the leap. Some of my favorite features are the git integration, variable inspector, intellisense, and the ease of switching kernels. There are a few areas where I prefer the JupyterLab interface, but overall I’d recommend folks use VS Code.
Relatedly, last year I asked Python data folks on Twitter where they do most of their coding. Small sample size and all kinds of potential selection bias issues, but interesting to see VS Code was the most popular answer nonetheless.

results of twitter poll showing vscode had most users

Next up we’ve got a pair of awesome deep learning transformer architecture walk-throughs.

Transformers from Scratch by Brandon Rohrer is an excellent deep dive into the Transformer deep neural network architecture. Transformers are best known for improving the state of the art in NLP, but they are being applied in other areas such as computer vision. I highly recommend this guide to understand what’s going on behind the scenes.
In conjunction with Transformers from Scratch, I’d suggest checking out The Illustrated Transformer by Jay Alammar. This post delivers on its promise with lots of animated illustrations to help the transformer architecture stick. 🎉
Speaking of helping things stick, I recently came across Micael Nielsen’s extensive post on the free Anki program for spaced repetition. The system is awesome for helping you remember nearly anything. I have used Anki some in the past, but fell off the horse 🐎 and abandoned it years ago. Michael’s post motivated me to start using it again. I found it encouraging that he had had some false starts but then managed to use it to great effect.
Anki Icon Alex Fraser, GPLv2, via Wikimedia Commons

Awesome Tutorial, Package, Datasets & Hosting 🍎

Hugging Face 🤗 seems to be everywhere these days.

The Hugging Face Course is an excellent and free introduction to using the Hugging Face library for transfer learning. This tutorial also has an excellent description of transformers - noticing a theme? 🤔.
Hugging Face provides a bunch of datasets, so I’ve added it to my article on the best places to find datasets. 📊
Hugging Face recently debuted Hugging Face Spaces, which is quite possibly the best place to host an interactive website. 🚀 Spaces integrates with GitHub, Streamlit, and Gradio (which the Hugging Face company recently acquired). Spaces is free for basic use cases.

Awesome Book 📖

The second edition of Introduction to Statistical Learning (ISLR) by James, Witten, Hastie & Tibshirani came out a few months back. 🎉 This machine learning holy book free to download on the book’s website. This has lots of new material, including sections on deep learning, naive Bayes, and survival analysis.

Awesome Update ⬆️

Scikit-learn continues to make life nicer for data scientists with its 1.0 update released this fall. You can’t directly get a pandas DataFrame out of a transformer yet, but the team is working on it, and this release makes it easier to stitch together the names of the features from each transformation. I wrote a guide to the new features and changes in version 1.0 here.

That’s all the awesomeness for now! Until next time, stay awesome data people! 🙂 + 📊

Data Awesome