Exploring convolutional neural networks with DL4J

Exploring convolutional neural networks with DL4J

Thoughts
Motivation
TL;DR version: This post walks through an image classification problem hosted on Kaggle for Yelp. I use Scala, DeepLearning4J and convolutional neural networks. For a self-guided tour, check out the project on Github here .
Why…
This project was motivated by a personal desire of mine to:
explore deep learning on a computer vision problem.
implement an end-to-end data science project in Scala.
build an image processing pipeline using real images.
Rather than using the MNIST or CIFAR datasets with pre-processed and standardized images, I wanted to go with a more “wild” dataset of “real-world” images.
I opted for the Kaggle Yelp Restaurant Photo Classification problem. The ~200,000 training images are raw uploads from Yelp users from mobile devices or cameras with a variety of sizes, dimensions, colors and quality.
What I did instead…
I was initially going to document this project end-to-end from image processing to training the convolutional neural networks. However, upon more research and practice actually tuning convolutional networks, I’ve reconsidered my process. While the Kaggle Yelp Photo Classification problem is a novel problem, it turns out not to be a great match with the deep learning techniques I wanted to explore. Thus, this article will focus mainly on the image processing pipeline using Scala. While I introduce DL4J here, I plan to discuss my experience with it in more detail in a forthcoming post.
The Kaggle problem
The Kaggle problem is this. Yelp wants to auto-classify restaurants on the 9 characteristics below:
good_for_lunch
ambience_is_classy
good_for_kids
Each restaurant has some number of images (from a couple to several hundred). However there are no restaurant features beyond these images. Thus it is a multiple-instance learning problem where each business in the training data is represented by its bag of images.
This is also a multiple-label classification problem where each business can have one or more of the 9 characteristics listed above.
Initial approach
To deal with the multiple-instance issue, I simply applied the labels of the restaurant to all of the images associated with it and treated each image as a separate record.
To deal with with the multiple-label problem, I simply handled each class as a separate binary classification problem. While there are breeds of neural networks capable of classifying multiple labels, such as BP-MLL , these are not currently available in DL4J.
Pivot
While I didn’t expect my initial approach would land me at the top of the Kaggle leaderboard, I did expect it would allow me to build a reasonable model while exploring new and untested (to me) tools and techniques: DeepLearning4j, Scala and convolutional nets. That assumption turned out to bigger than I expected.
The noise-to-signal ratio turned out to be too high with the Yelp data to train a meaningful convolutional network given my self-imposed constraints. From what I’ve deduced from the Kaggle forum , most teams are using pre-trained neural networks to extract features from each image. From there it can be tackled as a classical (non-image) classification problem with crafty feature creation and aggregation from the image to restaurant level.
While this is far more computationally efficient and could yield better predictions, it cuts out exactly the part I wanted to explore. I eventually compromised with myself and decided to re-factor the image pipeline I developed on this project for a similar better posed problem using CIFAR or dataset created myself from using image-net .
Approach
Image processing
Images in the training set come in various shapes and sizes. See some examples below. My first pass at processing consists of:
squaring images
And some are random other things…
1. Square images
While images in the training set varied from portrait to landscape and the number of pixels, most were roughly square. Many were exactly 500 x 375, which was also the largest size, presumably the output of Yelp’s own image processing system.
To train a convolutional net, all images need to be the same shape and size. While there are likely fancier tricks and techniques that allow for different sized images, I started simple: make all images square, while preserving as much of the image as possible. I assume that the material of interest is centered, so I capture the middle-most square of each image.
Example:
This example was created with the following code:
3. Grayscale
While DL4J and convolutional nets can certainly handle color images, I decided to simplify computation and start with grayscale. This way a single 64 x 64 pixel image is represented by 4096 features rather than 4096*3 (one for each color channel: R, G, B). There is a good discussion of the numerous ways to do this here . I opted to start with the simplest of all (averaging) which appeared to work quite well. Here’s an example:
original image (left); grayscale conversion using RGB averaging (right)
This example was created with the following code:
Pipeline - images
Much of this section is specific to the Kaggle problem and discusses the data structures I created and used to keep store and manage images with their corresponding labels. It’s mainly an exploration of how to structure a data science project with Scala. If you’re primarily interested in DL4J, skip ahead to the Pipeline - DL4J section.
Image processing
In my image processing pipeline, I modified the functions in the Gists above to methods of the java.awt.image.BufferedImage class.
This allows me to operate on images with chaining like this:
import imgUtils._ val img = ImageIO.read(new File("myimagefile.jpg")) .makeSquare .resizeImg(128, 128) .image2gray
I’m not sure if this approach of extending an existing class with new methods is preferred to creating a new class, but it seemed to work well for my problem. I imagine it would be less clean if all instances of the original class do not need the newly defined methods. However, this wasn’t the case for me: all images need the new methods.
Code below: extendBufferedImage.scala
package modeling.processing import scala.Vector import org.imgscalr._ object imgUtils yield } def image2gray: Vector[Int] = image2vec(pixels2gray) def image2color: Vector[Int] = image2vec(pixels2color).flatten // make image square def makeSquare = } // resize pixels def resizeImg(width: Int, height: Int) = } }
I/O
We need to load a couple CSV files containing metadata about each image. There are some Scala CSV reader libraries out there like scala-csv , however I forwent these to get more experience testing out Scala. I defined a basic file-reader readcsv which is used by readBizLabels and readBiz2ImgLabels to read in text files containing the labels for each Yelp business and image-to-business mappings respectively.
Code below: readCsvData.scala
package modeling.io import scala.io.Source object readCsvData try finally } /** Create map from bizid to labels of form bizid -> Set(labels) */ def readBizLabels(csv: String, rows: List[Int]=List(-1)): Map[String, Set[Int]] = ).toMap } /** Create map from imgID to bizID of form imgID -> busID */ def readBiz2ImgLabels(csv: String, rows: List[Int] = List(-1)): Map[Int, String] = ).toMap }
Wrangling
I make heavy use of the Scala map class. Essentially we have three maps:
bizMap (imgID -> bizID)
dataMap (imgID -> img data)
labMap (bizID -> labels)
I suppose I could have made classes for each of these as well, but they’re really just intermediate data structures, so I didn’t bother.
readBizLabels from the code above creates the bizMap and readBiz2ImgLabels creates the imgMap. processImages from the code below creates the dataMap. Next step: create a single data representation of these three separate but related data structures.
Code below: images.scala
package modeling.processing import java.io.File import javax.imageio.ImageIO import scala.util.matching.Regex import imgUtils._ object images /** Get a list of images to load and process * * @param photoDir directory where the raw images reside * @param ids optional parameter to subset the images loaded from photoDir. * * @example println(getImageIds("/Users/abrooks/Documents/kaggle_yelp_photo/train_photos/", ids=List.range(0,10))) */ def getImageIds(photoDir: String, bizMap: Map[Int, String] = Map(-1 -> "-1"), bizIds: List[String] = List("-1")): List[String] = else } /** Read and process images into a photoID -> vector map * * @param imgs list of images to read-in. created from getImageIds function. * @param resizeImgDim dimension to rescale square images to * @param nPixels number of pixels to maintain. mainly used to sample image to drastically reduce runtime while testing features. * * @example val imgs = getImageIds("/Users/abrooks/Documents/kaggle_yelp_photo/train_photos/", ids=List(0,1,2,3,4)) println(processImages(imgs, resizeImgDim = 128, nPixels = 16)) */ def processImages(imgs: List[String], resizeImgDim: Int = 128, nPixels: Int = -1): Map[Int, Vector[Int]] = ).filter( x => x._2 != ()) .toMap }
Data structure
So there are four pieces of information to keep track of for each image:
imageID
pixel data
The data is represented like this:
I defined a class alignedData to manage it all. When instantiating an instance of alignedData, the bizMap, dataMap and labMap are provided. I used Scala’s Option type for labMap since we don’t have this information when we score test data. None is provided that case.
Under the hood, the primary data structure has the following type:
List[(Int, String, Vector[Int], Set[Int])]
which corresponds to a list of Tuple4s containing this information:
List[(imgID, bizID, pixel data vector, labels)]
Code below: alignedData.scala
package modeling.processing class alignedData(dataMap: Map[Int, Vector[Int]], bizMap: Map[Int, String], labMap: Option[Map[String, Set[Int]]]) (rowindices: List[Int] = dataMap.keySet.toList) yield } def alignLabels(dataMap: Map[Int, Vector[Int]], bizMap: Map[Int, String], labMap: Option[Map[String, Set[Int]]]) (rowindices: List[Int] = dataMap.keySet.toList): List[(Int, String, Vector[Int], Set[Int])] = yield flatten1(p, labs) } } // pre-computing and saving data as a val so method does not need to re-compute each time it is called. lazy val data = alignLabels(dataMap, bizMap, labMap)(rowindices) // getter functions def getImgIds = data.map(_._1) def getBizIds = data.map(_._2) def getImgVectors = data.map(_._3) def getBizLabels = data.map(_._4) def getImgCntsPerBiz = getBizIds.groupBy(identity).mapValues(x => x.size) }
Make ND4J dataset
Last step is to create the data structure that DL4J needs for training convolutional nets. That data structure is an ND4J DataSet .
This is relatively straightforward once you figure out how to convert native Scala data structures to this type.
Code below: makeDataSets.scala
package modeling.processing import org.nd4j.linalg.dataset. import org.nd4s.Implicits._ import org.nd4j.linalg.api.ndarray.INDArray import alignedData._ object makeDataSets def makeDataSetTE(alignedData: alignedData): INDArray = }
Run
The code to actually run the image processing pipeline boils down to the following:
Code below: snippet from main.scala
import io.readCsvData._ import processing.alignedData import processing.images._ import processing.makeDataSets._ object runPipeline import org.apache.commons.io.FileUtils import org.deeplearning4j.eval.Evaluation import org.deeplearning4j.nn.api.OptimizationAlgorithm import org.deeplearning4j.nn.conf.layers.setup.ConvolutionLayerSetup import org.deeplearning4j.nn.conf.layers. import org.deeplearning4j.nn.conf. import org.deeplearning4j.nn.multilayer.MultiLayerNetwork import org.deeplearning4j.nn.weights.WeightInit import org.deeplearning4j.optimize.api.IterationListener import org.deeplearning4j.optimize.listeners.ScoreIterationListener import org.nd4j.linalg.api.ndarray.INDArray import org.nd4j.linalg.dataset.SplitTestAndTrain import org.nd4j.linalg.factory.Nd4j import org.nd4j.linalg.lossfunctions.LossFunctions import org.slf4j.LoggerFactory import scala.collection.JavaConverters._ import org.deeplearning4j.datasets.iterator.MultipleEpochsIterator import org.deeplearning4j.datasets.iterator.impl.ListDataSetIterator object cnnEpochs System.out.println(eval.stats()) val endtime = System.currentTimeMillis() log.info("End time: " + java.util.Calendar.getInstance().getTime()) log.info("computation time: " + (endtime-begintime)/1000.0 + " seconds") log.info("Write results....") if(!saveNN.isEmpty) log.info("****************Example finished********************") } }
Save/load networks
This is definitely something you’ll want to do. These things take too long to train to not save immediately.
saveNN is pretty straightforward. It saves a .json file with the network configuration and a .bin with all the weights and parameters of the network you just trained.
loadNN just reads back the .json and .bin file you created with saveNN to a MultiLayerNetwork object that you can use to score new test data.
Code below: nn.scala
package modeling.io import org.deeplearning4j.nn.conf. import org.deeplearning4j.nn.multilayer.MultiLayerNetwork import org.nd4j.linalg.factory.Nd4j import java.io.File import org.apache.commons.io.FileUtils import java.io. import java.nio.file. object nn def saveNN(model: MultiLayerNetwork, NNconfig: String, NNparams: String) = }
Scoring
I won’t say much here, since I didn’t end up putting much emphasis on this step for reasons explained at the beginning of this post.
My scoring approach assigns business-level labels by averaging the image-level predictions. I classify a business as label “0” if the average of the probabilities across all of its images belonging class “0” is greater than 0.5.
Code below: scoring.scala
package modeling.processing import org.nd4j.linalg.api.ndarray.INDArray import org.deeplearning4j.nn.multilayer.MultiLayerNetwork object scoring /** Take model predictions from scoreModel and merge with alignedData*/ def aggImgScores2Biz(scores: INDArray, alignedData: alignedData ) = )) } }
Submit to Kaggle
Also not much to say here, but this is how I aggregated image predictions to business scores for each model. And this is the code to generate the output CSV for Kaggle.
Run
The whole project can be run from main.scala . Here it is:
package modeling import io.nn._ import io.kaggleSubmission._ import io.readCsvData._ import processing.alignedData import processing.images._ import processing.kaggleSubmission._ import processing.makeDataSets._ import processing.scoring._ import training.cnn._ import training.cnnEpochs._ object runPipeline { def main(args: Array[String]): Unit = { // image processing on training data val labMap = readBizLabels("data/labels/train.csv") val bizMap = readBiz2ImgLabels("data/labels/train_photo_to_biz_ids.csv") val imgs = getImageIds("data/images/train", bizMap, bizMap.map(_._2).toSet.toList).slice(0,20000) // 20000 images val dataMap = processImages(imgs, resizeImgDim = 128) val alignedData = new alignedData(dataMap, bizMap, Option(labMap))() // training (one model/class at a time). Many microparameters hardcoded within val cnn0 = trainModelEpochs(alignedData, bizClass = 0, saveNN = "results/modelsV0/model0") val cnn1 = trainModelEpochs(alignedData, bizClass = 1, saveNN = "results/modelsV0/model1") val cnn2 = trainModelEpochs(alignedData, bizClass = 2, saveNN = "results/modelsV0/model2") val cnn3 = trainModelEpochs(alignedData, bizClass = 3, saveNN = "results/modelsV0/model3") val cnn4 = trainModelEpochs(alignedData, bizClass = 4, saveNN = "results/modelsV0/model4") val cnn5 = trainModelEpochs(alignedData, bizClass = 5, saveNN = "results/modelsV0/model5") val cnn6 = trainModelEpochs(alignedData, bizClass = 6, saveNN = "results/modelsV0/model6") val cnn7 = trainModelEpochs(alignedData, bizClass = 7, saveNN = "results/modelsV0/model7") val cnn8 = trainModelEpochs(alignedData, bizClass = 8, saveNN = "results/modelsV0/model8") // processing test data for scoring val bizMapTE = readBiz2ImgLabels("data/labels/test_photo_to_biz.csv") val imgsTE = getImageIds("data/images/test/", bizMapTE, bizMapTE.map(_._2).toSet.toList) val dataMapTE = processImages(imgsTE, resizeImgDim = 128) val alignedDataTE = new alignedData(dataMapTE, bizMapTE, None)() // creating csv file to submit to kaggle (scores all models) val kaggleResults = createKaggleSubmitObj(alignedDataTE, "results/ModelsV0/") val kaggleSubmitResults = writeKaggleSubmissionFile("results/kaggleSubmission/kaggleSubmitFile.csv", kaggleResults, thresh = 0.5)
Thoughts
This was my first foray into deep neural networks. I haven’t used theano or any of the other widely used implementations out there, so I unfortunately don’t have much to compare my experience to.
I will say that the current documentation will only take you so far. I spent a lot of time reading Neural Networks and Deep Learning to understand the concepts and reviewing the DL4J source code to try and figure out how to implement what I thought I wanted to do.
Discovering the DL4J Gitter was the single most useful moment I had. The creators are actively answering all sorts of questions in real-time. There’s also a room for earlyadopters discussing testing and feature requests which was interesting to browse. Very impressed with the commitment and willingness to help. I even got an email from someone on the DL4J team after I pushed this project to GitHub offering to help and pointing me to the CNN specialists.
Gitter is where the action is. There’s way more here than on StackOverflow. However, the content doesn’t appear to be indexed nearly as well on Google, so found myself “Googling” in the Gitter search bar for keywords and perusing through conversations to get answers.
I recommend using the deeplearning4j-ui tool if you can. I unfortunately wasn’t able to get it working, but it looks super useful for understanding how your net training is going.
Other awesome resources I found for visualizing training for CNNs are ConvNetJS and this one .

Images Powered by Shutterstock