Semantic Segmentation in Aerial Imagery with fast.ai

This post describes an implementation of a deep learning semantic segmentation model to solve the Kaggle ship detection from aerial imagery challenge. This implementaton uses fast.ai and is based on the camvid example. It is modified to work for binary classification instead of multiclassification. In this project I employed fast.ai novel techniques:

*Finding the initial learning rate
*Utilizing pre training, which is built into the Learner
*Freezing and unfreezing layers

This post highlights one crucial step where I needed to subclass a fast.ai standard - an ImageData Object - to load training data for binary classification. With the release of new fast.ai versions, I understand this subclassing process - which I found to lightly documented - will change.

The appeal of fast.ai for me is the ability to prototype quickly. I also especially like that one can subclass the standard libary. I plan to maintain my own fork in the future.

Required Before Starting:

  1. For any deep learning task, you will eventually need a GPU enabled environment to train you network. However, for setting up your experiment (i.e. tasks like reading and viewing samples of your data, creating your labels, preprocessing files, tuning some parameters, etc.) you do not need to be running and paying for a GPU instance. Fast.ai has instructions for setting up an AWS GPU-enabled EC2 instance here: https://course.fast.ai/start_aws.html, but for setting up a project I prefer Google Colab.

  2. Download the contest training data (or a subset!) from https://www.kaggle.com/c/airbus-ship-detection and save it to your file system. Kaggle datasets often 1) contain extra files you do not need (here is a solution for specifying specific files in Kaggle CLI here) and 2) have folders jammed with 10s of GB of files, which is more than you probably need when initially setting up your project. Only the most serious rounds of training will require the full dataset.

The full notebook is available here on github. It uses a subset of the contest training data hosted in an open S3 bucket, which is enough to demonstrate preprocessing and training but not enough to train a working model. Open and run it in Google Colab!

Import libraries and set path variable to data dir

1
2
3
4
from fastai.vision import *
from fastai.callbacks.hooks import *
from fastai.utils.mem import *
DATA_DIR = Path('/path/to/downloaded/data/')

Read labels, create validation set.

In this semantic segmentation challenge, we eventually must determine which pixels in images are ships.

The fast.ai image data learner requires the training set and label set resemble image files. Often semantic segmentation data is distributed this way. In this challenge, the labels (“masks”) are distributed in run length encoded format (RLE) stored in a csv file. We read the file, for each encoded mask we write an image file and save it.

1
2
3
4
5
6
7
8
9
10
11
12
13
label_df_raw = pd.read_csv(f'{DATA_DIR}/train_ship_segmentations_v2.csv', low_memory=False )
label_df_raw = label_df_raw.replace(np.nan, '', regex=True)
pd.set_option("display.max_colwidth", 10000)

"""### make dataframe of rle data for masks"""
def in_mini(f):
return f['ImageId'] in ship_files
masks = label_df_raw[label_df_raw.apply(in_mini, axis=1)]
masks.shape, label_df_raw.shape

#merge mask df on image id
masks = masks.groupby('ImageId')['EncodedPixels'].apply(lambda x: ' '.join(x)).reset_index()
masks.shape

For each mask in RLE format, use the fastai open_mask_rel function to read data to image-like format, save it in label/ folder, so is can be later read by the Learner.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"""## Write masks to jpgs in /labels"""
dmasks = masks.to_dict()

for r in range(len( dmasks['ImageId'])):
try:
image_id = dmasks['ImageId'][r]
filename = str( DATA_DIR/'label') + '/' + image_id
rle_0 = masks.query('ImageId=="{}"'.format(image_id))['EncodedPixels'].to_string(index=False)
rle_mask = open_mask_rle( rle_0, [768,768])
rle_mask = rle_mask.rotate(-90).flip_lr()
rle_mask.save( DATA_DIR/'label/{}'.format(image_id).replace('jpg','png') )
except: # remove bas masks
print( 'bad mask ... removing ', str(DATA_DIR) + '/Train/' + image_id )
os.remove(str(DATA_DIR) + '/Train/' + image_id)

fnames = get_image_files(path_img)
lbl_names = get_image_files(path_lbl)

Our training and label folders contain image files with corresponding names (i.e. train/432.jpg corresponds to label/432.jpg). To designate a validation set to the learner, we can pass it a list of image file names.

1
2
3
4
5
6
"""## Move some ship files to valid.txt"""
ship_files = os.listdir(path_lbl)
valid_ships = ship_files[:50]

with open(str(DATA_DIR) + '/valid.txt', mode='wt', encoding='utf-8') as myfile:
myfile.write('\n'.join(valid_ships))

get_y_fn is needed in the datablock api - it says for a training image train/432.jpg, the label is label/432.jpg.

1
2
3
4
#this function is need in our model, for a given training image, based on the file name of the image it will look in the corresponding labels directory for the 
def get_y_fn(y):
s = str( DATA_DIR/'label/') + '/'+ str(y).replace(str(DATA_DIR) + '/Train/','')
return s.replace('jpg','png')

it is helpful to check the output of the get_y_fn. Fast.ai image library includes practical tools for checking data in a notebook cell.

1
2
3
4
5
6
7
8
9
10
11
12
#check output 
img_f = fnames[-4]
img = open_image(img_f)
mask = open_mask(get_y_fn(img_f), div=True)

fig,ax = plt.subplots(1,1, figsize=(10,10))
img.show(ax=ax)
mask.show(ax=ax, alpha=0.5)

src_size = np.array(mask.shape[1:])
print(src_size)
mask.data

Set parameters for datablock API

1
2
3
4
5
6
7
8
9
10
size = src_size//4
size

# free = gpu_mem_get_free_no_cache()
# # the max size of bs depends on the available GPU RAM
# if free > 8200: bs=8
# else: bs=4
# print(f"using bs={bs}, have {free}MB of GPU RAM free")
bs=4
codes = np.loadtxt(DATA_DIR/'codes.txt', dtype=str); codes

Key step: Fast.ai will read the label images written above. In the label images each pixel is 0 (no ship) or 1 (ship), but the ImageClass open() function will see 0 for no ship or 255 for ship. We override the function by 1) creating a subclass, and 2) reimplement open() to load 0 or 1, which can be accomplished using open_mask() with param div=True.

The idea for this was here https://forums.fast.ai/t/unet-binary-segmentation/29833/40

1
2
3
4
5
6
7
8
# subclassing SegmentationLabelList to set open_mask(fn, div=True)
# idea from https://forums.fast.ai/t/unet-binary-segmentation/29833/40

class SegLabelListCustom(SegmentationLabelList):
def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(ImageImageList):
_label_cls = SegLabelListCustom

Run datablock API to bundle data and specify training methods for model.

1
2
3
4
5
6
bel_from_func(get_y_fn, classes=codes))

tfms = get_transforms(flip_vert=True, max_warp=0, max_zoom=1.2, max_lighting=0.3)
data = (src.transform(tfms, size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))

Some notebook commands to inspect the newly created data object

1
2
3
4
5
data
data.train_ds.x[1]
data.train_ds.y[1]
data.show_batch(2,figsize=(6,6), alpha=0.7)
data.classes

Training utilities from fast.ai examples for learner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
"""## Create Learner"""
import pdb

def dice_loss(input, target):
smooth = 1.
input = input[:,1,None].sigmoid()
iflat = input.contiguous().view(-1).float()
tflat = target.view(-1).float()
intersection = (iflat * tflat).sum()
return (1 - ((2. * intersection + smooth) / ((iflat + tflat).sum() +smooth)))

def combo_loss(pred, targ):
bce_loss = CrossEntropyFlat(axis=1)
return bce_loss(pred,targ) + dice_loss(pred,targ)

def acc_fixed(input, targs):
n = targs.shape[0]
targs = targs.squeeze(1)
targs = targs.view(n,-1)
input = input.argmax(dim=1).view(n,-1)
return (input==targs).float().mean()

def acc_thresh(input:Tensor, target:Tensor, thresh:float=0.5, sigmoid:bool=True)->Rank0Tensor:
"Compute accuracy when `y_pred` and `y_true` are the same size."

if sigmoid: input = input.sigmoid()
n = input.shape[0]
input = input.argmax(dim=1).view(n,-1)
target = target.view(n,-1)
return ((input>thresh)==target.byte()).float().mean()

metrics = [dice_loss, accuracy_thresh, dice]

Create learner to creates and train our model. We use the U-net learner (pre-trained on Resnet 34, this is a fast.ai specialty where they created the U-net downcycle from a pretrained Resnet34!). We also pass in the data object and loss functions.

1
2
3
4
learn = unet_learner(data, models.resnet34, metrics=metrics)
learn.loss_func = combo_loss
learn.loss_func, learn.metrics
print( gpu_mem_get() )

Next we start training, anybody who has taken some fast.ai will recognize the standard technique of using lr_find to identify the proper learning rate.

1
2
3
learn.lr_find()
learn.recorder.plot()
lr = 1e-3

Next we perform an initial round of training with most layers frozen.

1
2
3
4
learn.fit_one_cycle(3, max_lr=lr)
learn.save('stage-1')
learn.load('stage-1')
learn.show_results(rows=4, figsize=(8,9))

We follow this by training the entire network end to end.

1
2
3
4
5
6
learn.unfreeze()
lrs = slice(lr/400,lr/4)
learn.fit_one_cycle(3, lrs, pct_start=0.8)
learn.save('stage-2');
learn.load('stage-2');
learn.show_results(rows=3, figsize=(8,9))

That is enough training for now. The linked notebook is a full example including code for 1) performing a second round of training on a dataset of larger images, a technique known as progressive resizing and 2) running inference, which is needed to generate predictions on the Kaggle contest test data and then submited, evaluated, and scored.

Results

Training Results

Training Results

Training Results

Training Results

Training Results

Training Results

Training Results