Tensorflow 2.x Object Detection ❤⌛
July 10, 2020 TensorFlow 2 meets the Object Detection API (Blog)
Link to the official Blog :- https://blog.tensorflow.org/2020/07/tensorflow-2-meets-object-detection-api.html
Object Detection Repo :- https://github.com/tensorflow/models/tree/master/research/object_detection
This release for object detection includes:
-
New binaries for train/eval/export that are eager mode compatible.
-
A suite of TF2 compatible (Keras-based) models; this includes migrations of our most popular TF1 models (e.g., SSD with MobileNet, RetinaNet, Faster R-CNN, Mask R-CNN), as well as a few new architectures for which we will only maintain TF2 implementations: (1) CenterNet - a simple and effective anchor-free architecture based on the recent Objects as Points paper by Zhou et al, and (2) EfficientDet — a recent family of SOTA models discovered with the help of Neural Architecture Search.
-
COCO pre-trained weights for all of the models provided as TF2 style object-based checkpoints Access to DistributionStrategies for distributed training: traditionally, we have mainly relied on asynchronous training for our TF1 models. We now support synchronous training as the primary strategy; Our TF2 models are designed to be trainable using sync multi-GPU and TPU platforms
-
Colab demonstrations of eager mode compatible few-shot training and inference
-
First-class support for keypoint estimation, including multi-class estimation, more data augmentation support, better visualizations, and COCO evaluation.
In this post, I am going to the necessary steps for the training of a custom trained model for Tensorflow2 Object Detection.
It will be a long one but stick till the end for a fruitful result.
We will be using Google Colab💚. I love to get the tensor computational power of the GPUs.
Let’s divide this tutorial into different sections
- Installation of all necessary libraries
- Preparing Dataset for Custom Training
- Connecting Google Drive with Colab
- Using a pretrained model
- Creating Labelmap.pbtxt
- Creating xml to csv
- Creating tensorflow records files from csv
- Getting the config file and do the necessary changes
- Start the training
- Stop/Resume the training
- Evalauating the model
- Exporting the graph
- Doing prediction on the custom trained model
- Using Webcam for Prediction
- Working with Videos
- Converting to Tflite
- Creating Docker Images for a Detection App [TODO]
- Building a Flask App [TODO]
- Building FastAPI App [TODO]
- Applying Multithreading [TODO]
1. Installation of all necessary libraries
We do need to install the necessary libraries for the execution of the project. Let’s open Google Colab first.
Click under File option and then a New Notebook
Wen will follow the reference of the official notebook provided by the community.
Link to the Official Notebook
You can follow the official and Execute all the cells and finally get the results. But I will be creating a notebook and do everything from scratch.
Change the Runtime of the Notebook to GPU
Let’s start installing the packages required
By default, Tensorflow Gpu packages come pre-installed in the environment.
But you can always check by doing
pip freeze <img src="https://i.ibb.co/7yS41PS/Screenshot-from-2020-08-03-20-31-36.png" alt="Screenshot-from-2020-08-03-20-31-36" border="0">
In the next step follow the execution flow of the official notebook.
#install tfslim
pip install tfslim
#nstall pycocotools
pip install pycocotools
Clone the offical Models Repo
import os
import pathlib
if "models" in pathlib.Path.cwd().parts:
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
elif not pathlib.Path('models').exists():
!git clone --depth 1 https://github.com/tensorflow/models
Tensorflow Models Repository :- Tensorflow Models Repository
Conversion of the protos files to the python files
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
Installation of the object-detection library
cd models/research
pip install .
Importing all the necessary packages
#Import the necessary packages
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from IPython.display import display
#Import the object detection modules
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
Configuring some patched for TF2 from TF1
#patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1
#Patch the location of gfile
tf.gfile = tf.io.gfile
Model Downloader and loading fucntion
Model selection can be done from the Tensorflow 2 Model ZOO
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/'
model_file = model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=base_url + model_file,
untar=True)
model_dir = pathlib.Path(model_dir)/"saved_model"
model = tf.saved_model.load(str(model_dir))
return model
Loading the Labelmap
#List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
Going to the detection part
#If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = pathlib.Path('models/research/object_detection/test_images')
TEST_IMAGE_PATHS = sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.jpg")))
TEST_IMAGE_PATHS
Model Selection from the Model Zoo
In the model zoo there are various different types of SOTA models available. We can use any one for inference.
model_name = "efficientdet_d0_coco17_tpu-32"
detection_model = load_model(model_name)
I am using here EfficientNet you can use any one according to your choice.
Checking the signature of the models i.e output shapes, datatypes and inputs
print(detection_model.signatures['serving_default'].inputs)
detection_model.signatures['serving_default'].output_dtypes
detection_model.signatures['serving_default'].output_shapes
Utilizing the function to load and do prediction
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis,...]
# Run inference
model_fn = model.signatures['serving_default']
output_dict = model_fn(input_tensor)
# All outputs are batches tensors.
# Convert to numpy arrays, and take index [0] to remove the batch dimension.
# We're only interested in the first num_detections.
num_detections = int(output_dict.pop('num_detections'))
output_dict = {key:value[0, :num_detections].numpy()
for key,value in output_dict.items()}
output_dict['num_detections'] = num_detections
# detection_classes should be ints.
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
# Handle models with masks:
if 'detection_masks' in output_dict:
# Reframe the the bbox mask to the image size.
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
output_dict['detection_masks'], output_dict['detection_boxes'],
image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
tf.uint8)
output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
return output_dict
This is the function which does the prediction on the test images.
Running the object detection module some some test images
There is a folder called test images in the object detection folder with two images.
def show_inference(model, image_path):
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = np.array(Image.open(image_path))
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=8)
display(Image.fromarray(image_np))
Here we will be using the function to do inference on the images.
Displaying the final results
for image_path in TEST_IMAGE_PATHS:
show_inference(detection_model, image_path)
2. Preparing Dataset for Custom Training
Here we will be using the famous Card Dataset provided by Edge Electronics . Here the data is already annotated. So we do not need to do the hard work.
Readers might skip this part as we will talking about the annotation process and splitting the dataset.
The tool that we will be using is Labelimg
Linux Users :- Follow steps mentioned in the Github Repo
Windows Download Link :- Download Link
After the installation is successful. Open the tool
First Look of the tool
Select Open Directory and then select the folder containing the images.
Images will be shown in the right below as a list
Let’s start annotating
Select the PascalVOC option and not Yolo
Click on Create Rect Box and then annotate the image the object or objects in the image.
Click on Save.
Click on Next and then continue with the same process for each images.
After completing the whole annotation process it is good have a test train split of the dataset.
For example :- If we have 1000 images and their 1000 corresponding annotation files. Then we will split it into 80:20 ratio like 800 images and their 800 corresponding annotation files in the train folder and 200 images and their 200 corresponding annotation files in the test folder.
So the directory structure of the dataset will be like :-
+-- images
| +-- train
| +-- image_train_1.jpg
| +-- image_train_2.xml
+-- test
| +-- image_test_1.jpg
| +-- image_test_2.xml
3. Connecting Google Drive with Colab
Here we will be connecting the Google Drive with Google Colab. So that our training checkpoints can be saved in the drive in the runtine disconnection happens because we know it has a limit of around 8-12 hours.
from google.colab import drive
drive.mount('/content/drive')
Then click on the provided url and paste the key provided. Your Google Drive will be mounted. in the content folder the drive will be mounted.
I will be creating a new folder in Google Drive called TFOD2. Then i will clone the models repository in the TFOD2 for training and future refernce of the model checkpoints. I will be keeping my complete repository and the folder structure in the the TFOD2 folder.
Some sample pictures are provided below :-
The diectory structure in my Google Drive. here I created a new folder called TFOD2 and kept the cloned tensorflow models repository.
Inside models we have we have other folder out of which research and official are the important ones.
Inside the research folder we have the most important folder object_detection.
All the files we need is available under the object_detection foder.
4. Using a Pre trained model
From the Model Zoo we will be selecting the Coco trained RetinaNet50
5. Creating Labelmap.pbtxt
Create a file called labelmap.pbtxt where we will be keeping the name of the classes in our Cards Dataset.
item {
id: 1
name: 'nine'
}
item {
id: 2
name: 'ten'
}
item {
id: 3
name: 'jack'
}
item {
id: 4
name: 'queen'
}
item {
id: 5
name: 'king'
}
item {
id: 6
name: 'ace'
}
The file labelmap.pbtxt is available in the utility_files.zip provided by the Google drive link.
6. Creating xml to csv
!python xml_to_csv.py
The file xml_to_csv.py is available in the utility_files.zip provided by the Google drive link.
7. Creating tensorflow records files from csv
The file generate_tfrecord.py is available in the utility_files.zip provided by the Google drive link.
Changes to be done in the generate_tfrecord.py file as per the classes in your dataset.
#TO-DO replace this with label map
def class_text_to_int(row_label):
if row_label == 'nine':
return 1
elif row_label == 'ten':
return 2
elif row_label == 'jack':
return 3
elif row_label == 'queen':
return 4
elif row_label == 'king':
return 5
elif row_label == 'ace':
return 6
else:
None
Execution of the genrate_tfrecord.py file to create tf records.
#for training data
python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record
#for validation data
python generate_tfrecord.py --csv_input=images/test_labels.csv --image_dir=images/test --output_path=test.record
If you get a None TypeError in the elif ladder change the value of else from None to return 0
#TO-DO replace this with label map
def class_text_to_int(row_label):
if row_label == 'nine':
return 1
elif row_label == 'ten':
return 2
elif row_label == 'jack':
return 3
elif row_label == 'queen':
return 4
elif row_label == 'king':
return 5
elif row_label == 'ace':
return 6
else:
return 0
8. Getting the config file and do the necessary changes
Config file location will be available in the downloaded pretrained folder.
Pretrained we are using :- RetinaNet50
After downloading it. Unzip it and the pipeline.config file will be available.
Open the file in any text editor and do the following changes
Change the num_classes to 6. (it is based on the no of classes in the dataset)
model {
ssd {
num_classes: 6 ## Change Here
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
Comments must be removed
Change fine_tune_checkpoint value to the checkpoint file of the pretrained model, num_steps to your desired number and fine_tune_checkpoint_type value to “detection “ from “classification”.
fine_tune_checkpoint: "ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0"
num_steps: 50000 #changed
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection" #changed
use_bfloat16: true
fine_tune_checkpoint_version: V2
}
Change the path of labelmap.pbtxt ,train.record and test.record
train_input_reader {
label_map_path: "training/labelmap.pbtxt"
tf_record_input_reader {
input_path: "train.record"
}
}
eval_config {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader {
label_map_path: "training/labelmap.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "test.record"
}
}
Google Drive download link for the utility files
Files provided :-
- xml_to_csv.py
- genrate_tfrecord.py
- config file for model
- Dataset
Download link :- utility_files.zip
We will be creating a new folder called as training inside the object_detection folder where we will be keeping two files pipeline.config and labelmap.pbtxt
My training folder looks above in the object detection.
9. Start the training
So we are ready to start the training.
model_main_tf2.py is the file needed to start the training.
!python model_main_tf2.py --model_dir=training --pipeline_config_path=training/pipeline.config
I have trained for 50000 steps.
We will be saving all the checkpoints in the training folder.
10. Stop/Resume the training
!python model_main_tf2.py --model_dir=training --pipeline_config_path=training/pipeline.config
Stop the training
It can be stopped by a Keyboard Interrupt or Control+C
11.Evaluating the Model
!python model_main_tf2.py --model_dir=training --pipeline_config_path=training/pipeline.config --checkpoint_dir=training
12. Exporting the graph
!!python exporter_main_v2.py --input_type image_tensor --pipeline_config_path training/pipeline.config --trained_checkpoint_dir training/ --output_directory exported-models/my_model
At the end you can see something similar
13. Doing prediction on the custom trained model
For Prediction we will be using the notebook at we used for the first time or the one provided in the repository i.e object_detection_tutorial.ipynb
Here we are using the model loading function, then loading the labelmap.pbtxt, getting the test images and checking the model signatures.
This is the same fuction to run an inference on a single image taken the official notebook.
Finally displaying the images in the notebook with prediction.
Final Results