Main Content

Logo Recognition Network

This example shows code generation for a logo classification application that uses deep learning. It uses a pretrained network called LogoNet and classifies an input image into 32 logo categories. This example also describes how to train the network by using preprocessed training data set. Finally, this example uses the codegen command to generate a MEX function and performs the prediction.

This example illustrates the following concepts:

  • Preprocess the training images by extracting the logos and resizing to 227-by-227-by-3. Subsequently, use image augmentation to increase training data size.

  • Train the network by using the stochastic gradient descent with momentum (SGDM) optimizer.

  • Generate a CUDA® MEX and run the MEX.

Third-Party Prerequisites

Required

This example generates CUDA MEX and requires CUDA-enabled NVIDIA® GPU and compatible driver.

Optional

For non-MEX builds such as static, dynamic libraries or executables, this example has the following additional requirements.

Verify GPU Environment

Use the coder.checkGpuInstall (GPU Coder) function to verify that the compilers and libraries for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

Logo Recognition Network

Logos assist users in brand identification and recognition. Many companies incorporate their logos in advertising, documentation materials, and promotions. The logo recognition network was developed in MATLAB® and contains 22 layers. The network contains four sets of convolutional max pooling layers, three fully connected layers, and dropout layers that reduce computational expense. The network takes an input image of size 227-by-227-by-3 and classifies it into 32 logo categories. Because this network focuses on recognition, you can use it in applications where localization is not required. The network was trained in MATLAB by using the Flickr32Logos[1] and Flickr32 Plus[2] training data set. The two data sets contain around 200 images for each logo. The network was trained by using the stochastic gradient descent with momentum (SGDM) optimizer, a learning rate of 0.0001, 40 epochs, and a mini-batch size of 45. By default, the example uses a pretrained logo recognition network. The pretrained network enables you to run the entire example without having to wait for training to complete.

To train the network, set the doTraining variable in the following code to true. You must also download the Logos-32plus data set from Deep Learning for Logo Recognition and provide the location of the downloaded Logos-32plus_v1.0.1.zip file to logozipPath. The size of Logos-32plus data set is 1.95 GB. Depending on your internet connection, the download process can take time. The data set has 32 image subfolders containing a total of 7830 logo images from various brands. The groundtruth MAT-file provides the bounding box information of the logo in each image.

The preprocessLogoData function preprocesses the data for network training. The images in the Logos-32plus data set are of varying size. You must resize the images to input layer size of the network (227-by-227-by-3). The images also contain background information that you must remove. The preprocessLogoData.m performs these steps by using the bounding box information to extract the logos and creates a imageDatastore object that you can use for network training. The trainLogonet function creates logo recognition layers and trains the network by using specified training options. The network is trained using data that contains at least 110 images for each logo.

You can also increase the number of training samples by using data augmentation. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images. To increase the training data, four types of data augmentation are provided: random flipping, Gaussian blur, shearing, and contrast normalization. To use data augmentation, set the doAugmentation variable in the following code to true.

doTraining = false;

if ~doTraining
    getLogonet;
else
    logozipPath  = '';% provide path of the downloaded zip file
    zipData = fullfile(logozipPath,'Logos-32plus_v1.0.1.zip');
    unpackedData = fullfile(logozipPath,'Logos32plus');
    
    if ~exist(unpackedData,'dir')
        unzip(zipData,unpackedData);
    end

    doAugmentation = false;
    logoData = preprocessLogoData(unpackedData,doAugmentation);
    trainLogonet(logoData);
end

load('LogoNet.mat');
convnet
convnet = 
  SeriesNetwork with properties:

         Layers: [22×1 nnet.cnn.layer.Layer]
     InputNames: {'imageinput'}
    OutputNames: {'classoutput'}

Convert the SeriesNetwork network object to a dlnetwork object and save the network to a MAT-file.

dlconvnet = dag2dlnetwork(convnet);
save dlLogoNet.mat dlconvnet

To view the network architecture, use the analyzeNetwork function.

analyzeNetwork(dlconvnet)

The logonet_predict Entry-Point Function

The logonet_predict.m entry-point function takes an image input and performs prediction on the image by using the deep learning network saved in the dlLogoNet.mat file. The function loads the network object from dlLogoNet.mat into a persistent variable dlLogonet and reuses the persistent variable on subsequent prediction calls. A dlarray object is created within the function, input and output to the function are of primitive datatypes. For more information, see Code Generation for dlarray (GPU Coder).

type('logonet_predict.m')
function out = logonet_predict(in)
%#codegen

% Copyright 2017-2023 The bat365, Inc.

% A persistent object dlLogonet is used to load the network object. At the
% first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is
% reused to call predict on inputs, thus avoiding reconstructing and
% reloading the network object.

dlIn = dlarray(in, 'SSC');

persistent dlLogonet;

if isempty(dlLogonet)
   
    dlLogonet = coder.loadDeepLearningNetwork('dlLogoNet.mat','dlLogonet');

end

dlOut = predict(dlLogonet, dlIn);

out = extractdata(dlOut);

end

Generate CUDA MEX for the logonet_predict Function

Create a GPU configuration object for a MEX target and set the target language to C++. Use the coder.DeepLearningConfig (GPU Coder) function to create a CuDNN deep learning configuration object. Assign it to the DeepLearningConfig property of the GPU code configuration object. To generate CUDA MEX, use the codegen command and specify the input to be of size [227,227,3]. This value corresponds to the input layer size of the logonet network.

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -config cfg logonet_predict -args {ones(227,227,3,'single')} -report
Code generation successful: View report

Run Generated MEX

Load an input image. Call logonet_predict_mex on the input image.

im = imread('test.png');
imshow(im);

Figure contains an axes object. The axes object contains an object of type image.

im = imresize(im, [227,227]);
predict_scores = logonet_predict_mex(single(im));

Map the top five prediction scores to words in the Wordnet dictionary synset (logos).

synsetOut = convnet.Layers(end).Classes;

[val,indx] = sort(predict_scores, 'descend');
scores = val(1:5)*100;
top5labels = synsetOut(indx(1:5));

Display the top five classification labels.

outputImage = zeros(227,400,3, 'uint8');
for k = 1:3
    outputImage(:,174:end,k) = im(:,:,k);
end

scol = 1;
srow = 20;

for k = 1:5
    outputImage = insertText(outputImage, [scol, srow],...
        [char(top5labels(k)),' ',num2str(scores(k),'%2.2f'),'%'],...
        'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
    srow = srow + 20;
end

 imshow(outputImage);

Figure contains an axes object. The axes object contains an object of type image.

Clear the static network object that was loaded in memory.

clear logonet_predict_mex;

References

[1] Romberg, Stefan, Lluis Garcia Pueyo, Rainer Lienhart, and Roelof van Zwol. "Scalable Logo Recognition in Real-World Images." ACM International Conference on Multimedia Retrieval 2011 (ICMR11): 1-8. https://doi.org/10.1145/1991996.1992021

[2] Bianco, Simone, Marco Buzzelli, Davide Mazzini, and Raimondo Schettini. "Deep Learning for Logo Recognition." Neurocomputing 245 (2017): 23-30. https://doi.org/10.1016/j.neucom.2017.03.051.

Related Topics