Skip to content
Ulrich Stern edited this page Mar 13, 2016 · 11 revisions

yanglab-convnet

This project was originally on Google Code. Hope GitHub will prove more reliable.

Background

We used this project to automatically determine whether Drosophila were "on" or "off" egg-laying substrates for each frame of our fly videos, see sample classifications below and our paper for details. The code uses convolutional neural networks (CNNs) to learn the "on"/"off" classification task, which should make the code applicable to other tasks with relatively few changes. We got the CNNs to an error rate of only 0.07% on our task (about human performance level). Without automation, our task is prohibitively labor intensive due to the huge number of frames in a video.

The CNNs are implemented by Alex Krizhevsky's cuda-convnet; our code has two parts

  • it extends cuda-convnet with two additional image transformations for data augmentation
  • it has several Python "scripts." Most of the scripts interact with cuda-convnet, by executing cuda-convnet scripts (e.g., to train CNNs) or generating data files (with images) that cuda-convnet can read (both for training and predicting).

Installation

Instead of detailed installation instructions, the following gives an overview of the project and explains several aspects of the data and code in the hope that this will be sufficient for researchers and programmers to get things working.

yanglab-convnet directories

If you check out yanglab-convnet, you get the following directories

  • cuda-convnet
    • contains all the files in cuda-convnet that we modified
    • after you got Alex's cuda-convnet working (Alex has good documentation for this), you should replace the cuda-convnet files with the files in this directory
      • note (10 Mar 2016)
        • Alex's cuda-convnet was hosted on Google Code, which is shutting down, so I exported the code to GitHub (ulrichstern/cuda-convnet)
        • the documentation is now here
        • for my install on Ubuntu 12.04, this gist was required. (The link used to be in the documentation comments on Google Code, which did not get exported to GitHub.)
        • I may make my changes work with Alex's cuda-convnet2, but cannot promise a time yet
  • onsubs
    • our scripts, see Overview of code below for details
    • you have to customize the paths used by the scripts for your situation
  • onsubs-data
    • data we used in the paper -- images labeled with whether flies are "on" or "off" egg-laying substrates
    • V0-7: 36 batches with 600 images each
      • used for training and validation
    • V0-7_test: 9 batches with 600 images each
      • used for test
      • note: the test batches are 1-9; batches 10-36 in this directory are just copied from V0-7 to keep the layout identical
  • onsubs-layers
    • our cuda-convnet layer definition and layer parameter files
      • the former defines the CNN architecture, the latter the learning parameters
    • used for the paper: 3c2f-2.cfg (3 conv, 2 fc layers) and 3c2f-2-params.cfg

Overview of code

Note: each script below can be called with "-h" to see all command-line options.

  • we used this script to generate the labeled data (see onsubs-data directory) from our fly videos
    • it has a very simple UI to allow human labeling
  • you need to run this script only if you want to generate labeled data yourself
    • since the fly videos are large (~ 1GB each), they would have gotten us over Google's quota, and we did not include them in yanglab-convnet
  • automates training of multiple nets
    • autoCc stands for "automate cuda-convnet"; this script automates calling cuda-convnet (both convnet.py and shownet.py)
    • each training results in an experiment directory with
      • all the nets and prediction files for each net
      • a log file for the experiment with all cuda-convnet output
      • the cuda-convnet layer definition and layer parameter files used
        • the parameter file is edited by autoCc.py to, e.g., change learning rate after a certain number of epochs
    • the code in the beginning of the file can be edited to change how the nets are trained
  • runs additional predictions for previously trained nets
    • we used this, e.g., to figure out what "data augmentation for test" worked best
  • measures error rates of multiple nets
    • has model averaging (including the bootstrap) and data augmentation for test
    • error rates are calculated based on prediction files (from original training run or additional predictions runs)
  • classifies "on"/"off" substrate for each fly and frame in a fly video
    • calls cuda-convnet's shownet.py to generate predictions
    • uses positional information from Ctrax (the tracking software we use) to crop small fly images (with the fly in the center) from the full frame
  • fly behavior analysis based on Ctrax trajectories and classify.py's "on"/"off" classification