Image for post
Image for post
Photo by Kelly Sikkema on Unsplash

A short primer on scaling up your deep learning to multiple GPUs

In this multipart article, I outline how to scale your deep learning to multiple GPUs and multiple machines using Horovod, Uber’s distributed deep learning framework.

Read part one here:

Unsurprisingly, getting distributed training to work correctly isn’t as straightforward. You can follow along with my steps to get your experiment loaded and training on GCP.


  1. Package/restructure your application (see github repo here for an example)
  2. Create a docker image and load that image to Google’s Cloud Registry
  3. Create an instance and run your training job

If everything was configured correctly, you should now have an easy-to-follow recipe for parallelizing your deep learning on GCP. …

Image for post
Image for post
Photo by ThisisEngineering RAEng on Unsplash

A short primer on scaling up your deep learning to multiple GPUs

In this multipart article, I outline how to scale your deep learning to multiple GPUs and multiple machines using Horovod, Uber’s distributed deep learning framework.

Part 1 — Setting up the experiment and laying the foundation

Deep neural networks have reached a size in which training on a single machine can take multiple days to weeks (or more!). The latest and greatest text generation models have parameter sizes that exceed 1B!

Google Colab is fantastic — really. If you’re a deep learning researcher, subscribe to Pro. It’s $9.99 a month and you get great connection times and much better reliability. …

Image for post
Image for post
Photo by Henry & Co. on Unsplash


A tensor’s journey through an LSTM Layer visualized

In building a deep neural network, especially using some of the higher level frameworks such as Keras, we often don’t fully understand what’s happening in each layer. The sequential model will get you far indeed, but when it’s time to do something more complex or intriguing, you will need to dive into the details.

In this article, I’m going to explain exactly what’s happening as you pass a batch of data through an LSTM layer with an example from PyTorch. …

Image for post
Image for post
Photo by Kenneth Berrios Alvarez on Unsplash

Deep Learning in Practice

Creating custom data loaders for PyTorch — MADE EASY!

I was in the middle of creating a custom PyTorch training module that overcomplicated things, especially when it came to generating batches for training and ensuring that those batches weren’t repeated during the training epoch. “This is a solved problem” I thought to myself as I furiously coded away in the depths of the lab.

There’s reasons why you don’t want to just increment indices as you select items from your dataset. 1) This doesn’t scale out to multiple workers. 2) You need to randomize your sequences to maximize training performance.

This is where Torch’s data utilities ( ) come in handy. …

Image for post
Image for post
Photo by Jason Leung on Unsplash

Using a “markovian” streaming data source to create a real-time Markov transition matrix

In this article, I review a method to create a real time markov transition matrices from a streaming data source. There are a number of interesting applications for this, especially in IoT devices collecting data that have a weak or strong markov property.

With this method, many useful insights can arise: such as comparing the matrices of different sources to see if there are statistically significant differences in probabilities (where there should be none) or comparing signals in time (weekdays vs weekends, or differences by hour).

You will likely notice that there are few naturally occurring stochastic processes that can be classified as a strong markov process. However, many naturally occurring processes can have a weak dependence on previous states, so much so that they can be analyzed as a traditional markov process. …

Image for post
Image for post
Photo by Efe Kurnaz on Unsplash

Deep Learning in Practice

Demonstrating the use of LSTM Autoencoders for analyzing multidimensional timeseries

In this article, I’d like to demonstrate a very useful model for understanding time series data. I’ve used this method for unsupervised anomaly detection, but it can be also used as an intermediate step in forecasting via dimensionality reduction (e.g. forecasting on the latent embedding layer vs the full layer).

In a nutshell, this method compresses a multidimensional sequence (think a windowed time series of multiple counts, from sensors or clicks, etc) to a single vector representing this information. …

Image for post
Image for post
Photo by Campaign Creators on Unsplash

The business of data science

How to hire and retain amazing data scientists

After being on both sides of the table, interviewing hundreds of candidates as well as being interviewed, I’ve come up with what I like and what I dislike about this process.

Step 1: Know what you need

Many a times, I’ve come across businesses or companies that aren’t necessarily sure what they need. They want to do something with data, but aren’t clear on who or what they need. This is the first problem to solve.

Get a clear picture of where you are, maturity wise, when it comes to data. If your company is still working with excel spreadsheets and maybe a small database, you probably don’t need a deep learning engineer with years of backend experience. …

Image for post
Image for post

Data Science Quickie

Getting a video annotated in 10 minutes

In your machine learning research, you’ve most likely come across a video that has been nicely annotated by the famous Yolo algorithm.

Such as:

I’m going to show you how to achieve this for a personal video in less than 10 minutes for a short clip (5–10 seconds). The inference time is obviously correlated with the length of the video — so be fairwarned.

I’m not going to try to explain Yolov4 to you. It’s a thorough explanation, that includes darknet, backbones and other quandaries from the depths of computer vision research. …


Using Random Forest regression to identify important features

Image for post
Image for post
Photo by Chris Liverani on Unsplash

Many a times, in the course of analysis, we find ourselves asking questions like:

“What boosts our sneaker revenue more? Youtube Ads Facebook Ads or Google Ads?”

with a small complication — We didn’t measure where the revenue came from, and we didn’t run any experiments to see what our incremental revenue is for each.

In this case, understanding the direct causality is hard, or impossible. However, we still need ways of inferring what is more important and we’d like to back that up with data.

Although this isn’t a new technique, I’d like to review how feature importances can be used as a proxy for causality. …

Deep Learning Demystified

Using GPT-2 to create a digital Terence McKenna

Image for post
Image for post
It’s been a long, strange, trip. Not-so-good photoshop: Me.

** Disclaimer, in no way, shape or form am I attempting to besmirch the late Terence McKenna**

In this article, I want to demonstrate some of the latest techniques in natural language generation, and how transfer learning can be used to narrow down the generative process.

The opportunities and applications for this, done in the right way, are endless. Some interesting examples are lyric generation, auto generating news articles, chatbots, question answering systems, and more.

I occasionally listen to McKenna’s lectures, and find his viewpoints interesting and provoking. I chose him specifically, because he was one of the first “New-Age” philosophers to really embrace the idea of Artificial Superintelligence. …



Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store