Sunday, 3 February 2019

NVIDIA DIGITS ( failed ) Object detection DetectNet with custom data

Let me start with this  :
I know nothing about the ML, DL , AI or those big buzz words you keep hearing couple of times a day. I just barely trying to scratch those huge mountains from past 2 months  and cant even able to successfully do that till now. So I just know names nothing else but with a lot of optimism I'm trying to use few software or programming tools to train a pre-trained network using transfer learning with custom data.

In this blog post I will try to explain how I miserably tried and failed at training a Object detection model using custom data with NVIDIA DIGITS using DetectNet.

 The total process is divided in to three steps :
  1. Installing and setting up digits in system
  2. Collecting and preparing the data
  3. Training the model
 1. Installing and setting up digits in system 
        There is exhaustive guide on how to setup the NVIDIA digits on the host system or using cloud : https://github.com/dusty-nv/jetson-inference

Follow it line by line , if you are lucky you can set it up and test it in two days as stated in the documents - but for me it took me almost entire week to set it up.

Possible Pitfalls :

2.Collecting and preparing the data
     I'm trying to detect the (lemon) leafs in the following image

So I go to the nearby field and collected around 100 images like them using my phone. I labelled them using labelImg - that's quite a laborious work for 100 , but think about when need a couple of thousand training images.

labelImg will give us  annotations as XML files in PASCAL VOC format , like this


<object>
        <name>leaf</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>2451</xmin>
            <ymin>142</ymin>
            <xmax>2798</xmax>
            <ymax>986</ymax>
        </bndbox>
 </object>
 <object>
        <name>leaf</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>637</xmin>
            <ymin>1025</ymin>



But DIGITS needs data in KITTI format.
which is a TXT file with specific information, so i wrote a python code for converting labelImg xml files to kiiti format. you can find it in my git repo.
To use my code copy all your images into the directory and specify it in SRC_DIR and it will do the rest. The code will create directory named 'labels' and saves generated files there.

After that you need to divide that data in to train and validate as specified in the doc. You can use my another script to do that.


3. Training the model
From NVIDIA blog
DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging
Well, it is really easy to train ( until you see results ). I just followed the two days a demo document to set up the training parameters. only modification i did was modifying the batch size to 1 , cause i'm getting out of memory error for batch size of 2. After the 40 epoch's my model is like

Parameters of the model are :
Dataset is

And by seeing that graph, even i (without knowing anything about model) can say something is terribly wrong with it. So i started googling about it.

Few mistakes i am thinking i did are :
1. Image size is not uniform : may be i have to resize the all images to be single size. and I haven't changed the imagesize in detectnet_network.prototxt file
Ref :  https://github.com/NVIDIA/DIGITS/issues/1011#issue-173648152

2. No of images in dataset : currently I have total (training + validation) images, may be i have to take few more images.
Ref : https://github.com/NVIDIA/DIGITS/issues/1011#issuecomment-243494118

3. Ignored the object size : " Note that in order for DetectNet to successfully detect objects in an image, the object size should be between 50×50 and 400×400 px in the input images"


Few points to note :
  • A dataset for DetectNet should be images where the object you wish detect is some smaller part of the image and has a bounding box label that is a smaller part of the image. Some of these images could have objects that take up a large part of the image, but not all of them as it is important for DetectNet to be able to learn what "non-object" image/pixels looks like around a bounding box. That ability to learn a robust background model is why DetectNet can work well. Ref - https://github.com/NVIDIA/DIGITS/issues/980#issuecomment-244932886 
  • There's no definitive way to use 'padding image' and 'resize image', but to use DetectNet without modification you want to ensure that most of your objects are within the 50x50 to 400x400 pixel range. The benefit of padding is that you maintain aspect ratio and pixel resolution/object scaling. Having said that, if you have large variation in your input image sizes it is not desirable to pad too much around small images, so you may choose to resize all images to some size in the middle. Ref -  https://github.com/NVIDIA/DIGITS/issues/980#issuecomment-245800374
Git hub Issue to look into :
And Few Blog Posts :
  1. https://devblogs.nvidia.com/deep-learning-object-detection-digits/ 
  2. https://devblogs.nvidia.com/exploring-spacenet-dataset-using-digits/ 
  3. https://devblogs.nvidia.com/detectnet-deep-neural-network-object-detection-digits/ 
  4. https://jkjung-avt.github.io/detectnet-training/ 
  5. https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607 
Well, That's all for now, I will keep updating about the progress. In mean time if you notice anything wrong in my setup or have any suggestions ? Please let me know.

No comments:

Post a Comment