Mask R-CNN for Ship Detection & Segmentation
In this post we’ll use Mask R-CNN to build a model that takes satellite images as input and outputs a bounding box and a mask that segments each ship instance in the image.
We’ll use the train and dev datasets provided by the Kaggle Airbus Challenge competition as well as the great Mask R-CNN implementation library by Matterport.
Link to code in Github: https://github.com/gabrielgarza/Mask_RCNN
Deep Learning
One of the most exciting applications of deep learning is the ability for machines to understand images. Fei-Fei Li has referred to this as giving machines the “ability to see”. There are four main classes of problems in detection and segmentation as described in the image (a) below .
There are several approaches to Instance Segmentation, in this post we will use Mask R-CNN.
Mask R-CNN
Mask R-CNN is an extension over Faster R-CNN. Faster R-CNN predicts bounding boxes and Mask R-CNN essentially adds one more branch for predicting an object mask in parallel.
I’m not going to go into detail on how Mask R-CNN works but here are the general steps the approach follows:
- Backbone model: a standard convolutional neural network that serves as a feature extractor. For example, it will turn a1024x1024x3 image into a 32x32x2048 feature map that serves as input for the next layers.
- Region Proposal Network (RPN): Using regions defined with as many as 200K anchor boxes, the RPN scans each region and predicts whether or not an object is present. One of the great advantages of the RPN is that does not scan the actual image, the network scans the feature map, making it much faster.
- Region of Interest Classification and Bounding Box: In this step the algorithm takes the regions of interest proposed by the RPN as inputs and outputs a classification (softmax) and a bounding box (regressor).
- Segmentation Masks: In the final step, the algorithm the positive ROI regions are taken in as inputs and 28x28 pixel masks with float values are generated as outputs for the objects. During inference, these masks are scaled up.
Training and Inference with Mask R-CNN
Instead of replicating the entire algorithm based on the research paper, we’ll use the awesome Mask R-CNN library that Matterport built. We’ll have to A) generate our train and dev sets, B) do some wrangling to load the into the library, C) setup our training environment in AWS for training, D) use transfer learning to start training from the coco pre-trained weights, and E) tune our model to get good results.
Step 1: Download Kaggle Data and Generate Train and Dev Splits
The dataset provided by Kaggle consists of hundreds of thousands of images so the easiest thing is to download them directly to the AWS machine where we will be doing our training. Once we download them, we’ll have to split them into train and dev sets, which will be done at random through a python script.
I highly recommend using a spot instance to download the data from Kaggle using Kaggle’s API and upload that zipped data into an S3 bucket. You’ll later download that data from S3 and unzip it at training time.
Kaggle provides a csv file called train_ship_segmentations.csv with two columns: ImageId
and EncodedPixels
(run length encoding format). Assuming we have downloaded the images into the ./datasets/train_val/
path we can split and move the images into train and dev set folders with this code:
Step 2: Load data into the library
There is a specific convention the Mask R-CNN library follows for loading datasets. We need to create a class ShipDataset that will implement the main functions required:
To convert a Run Length Encoded Mask to an image mask (boolean tensor) we use this function below rle_decode
. This is used to generate the ground truth masks that we load into the library for training in our ShipDataset class.
Step 3: Setup Training with P3 Spot Instances and AWS Batch
Given the large dataset we want to train with, we’ll need to use AWS GPU instances to get good results in a practical amount of time. P3 instances are quite expensive, but you using Spot Instances you can get a p3.2xlarge for around $0.9 / hr which represents about 70% savings. The key here is to be efficient and automate as much as we can in order to not waste any time/money in non-training tasks such as setting up the data, etc. To do that, we’ll use shell scripts and docker containers, and then use the awesome AWS Batch service to schedule our training.
The first thing I did is create a Deep Learning AMI configured for AWS Batch that uses nvidia-docker following this AWS Guide. The AMI ID is ami-073682d8e65240b76 and it is open to the community. This will allow us to train using docker containers with GPUs.
Next is creating a Dockerfile that has all of the dependencies we need as well as the shell scripts that will take care of downloading the data and run training.
Note the last three shell scripts copied into the container:
setup_project_and_data.sh
-> clones our Mask R-CNN repo, downloads and unzips our data from S3, splits the data into train and dev sets, downloads the latest weights we have saved in S3
train.sh
-> loads latest weights, runs the train command python3 ./ship.py train –dataset=./datasets –weights=last, uploads trained weights to S3 after training ends
predict.sh
-> download the Kaggle Challenge test dataset (which is used to submit your entry to the challenge), generates predictions for each of the images, converts masks to run length encoding, and uploads the predictions CSV file to S3.
Step 4: Train the model using AWS Batch
The beauty of AWS Batch is that you can create a compute environment that uses a Spot Instance and it will run a job using your docker container, and then terminate your Spot Instance as soon as your job ends.
I won’t go into great detail here (might make this another post), but essentially you build your image, upload it into AWS ECR, then in AWS Batch you schedule your training or inference job to run with command bash predict.sh or bash train.sh and wait for it to finish (you can follow the progress by looking at the logs in AWS Watch). The resulting files (trained weights or predictions csv) are uploaded to S3 by our script. The first time we train, we pass in the coco argument (in train.sh)in order to use Transfer Learning and train our model on top of the already trained coco dataset:
python3 ./ship.py train --dataset=./datasets --weights=coco
Once we have finish our initial training run we’ll pass in the last argument to the train command so we start training where we left off:
python3 ./ship.py train --dataset=./datasets --weights=last
We can tune our model using the ShipConfig class and overwriting the default settings. Setting Non-Max Suppression to 0 was important to get rid of predicting overlapping ship masks (which the Kaggle challenge doesn’t allow).
Step 5: Predict ship segmentations
To generate our predictions, all we have to do is run our container in AWS Batch with the bash predict.sh command. This will use the script inside generate_predictions.py, here’s a snippet of what inference looks like:
I ran into several challenging cases, such as waves and clouds in the images, which the model initially thought were ships. To overcome this challenge, I modified the region proposal network’s anchor box sizes RPN_ANCHOR_SCALES
to be smaller, this dramatically improved results as the model no longer predicted small waves to be ships.
Results
You can get decent results after about 30 epochs (defined in ship.py
). I trained for 160 epochs and was able to get to 80.5% accuracy in my Kaggle submission.
I’ve included a Jupyter Notebook called inspect_shyp_model.ipynb
that allows you to run the model and make predictions on any image locally on your computer.
Here are some of example images with predicted probabilities of the instance being a ship, segmentation masks, and bounding boxes overlaid on top:
Mask R-CNN model predicting 8/8 ships with masks:
Model predicting 2/2 ships:
Model having some issues with ships that are right next to each other:
Waves generate false positives for the model. Would need to further train/tune to overcome completely:
Difficult image with some docked ships and some ships located on land.
Conclusion
Overall, I learned a lot on how Mask R-CNN works and how powerful it is. The possibilities are endless in terms of how this technology can be applied and it is exciting to think about how giving machines the “ability to see” can help make the world better.
Github Repo: https://github.com/gabrielgarza/Mask_RCNN