Deep Learning for Coders, Lesson 3

Questions

Hand-drawn bears, black and white images.

Current text models can generate compelling text and context, but are unable to generate correct responses.

Use on social media to generate disinformation

Model and human user interact closely.

Time series.

Only recommend things someone would like, rather than anything helpful (e.g. user has probably already heard of certain author).

  1. Defined objective: what outcome am I trying to achieve?
  2. Levers: what inputs we can control
  3. Data: what data we can collect
  4. Models: how the levers influence the objective.

The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with recommendations of items they would not have purchased without the recommendation. The lever is the ranking of the recommendations. New data must be collected to generate recommendations that will cause new sales. This will require conducting many randomised experiments in order to collect data about a wide range of recommendations for a wide range of customers. This is a step that few organisations take; but without it, you don’t have the information you need to actually optimise recommendations based on your true objective (more sales!).

Having downloaded some data, we need to assemble it in a format suitable for training, by creating an object called DataLoaders. This is a fastai class that stores multiple DataLoader objects you pass to it, normally a train and a valid. The key functionality is provided with these lines of code:

class DataLoaders(GetAtrr):
def __init__(self, *loaders): self.loaders = loaders
def __getitem__(self, i): return self.loaders[i]
train, valid = add_props(lambda i, self: self[i])

  1. What kinds of data we are working with
  2. How to get the list of items
  3. How to label these items
  4. How to create the validation set

Splits the training and validation sets. RandomSplitter does this randomly, and you can set the seed so that the same split is used each time.

seed=42, passed in as an argument.

The independent variable is often referred to as and the dependent variable as .

Pad fills out the images with zeros (black), resize crops the images to fit a square shape of the size requested, using the full width or height, and squish compacts them down. All are problematic; squish and stretch distort the image into unrealistic shapes which lowers accuracy. Also, crop removes some features which could allow us to perform recognition. Padding the images leads to empty space, meaning wasted computation and lower effective resolution.

Data augmentation refers to creating random variations of our input data, such that they appear different, but do not actually change the meaning of the data. Examples include rotation, flipping, perspective warping and contrast changes. Because augmentations means the images are all the same size, we can batch them using the GPU.

To tell fastai we want to use transforms on a batch, we use the batch_tfms parameter. Item_tfms run transformations on individual items, resizing the images to the same size.

Diagonal shows the images which were classified correctly, and the off-diagonal cells represent those which were classified incorrectly. It is calculated using the validation set.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store