Recognize handwritten characters using convolutional neural networks

Overview

This use case demonstrates the application of machine learning in character recognition, specifically converting handwritten characters to digital format using a Convolutional Neural Network (CNN) model.

Problem Statement

Despite the increasing use of digital documents, paper documentation remains common. Machines cannot directly understand content on physical paper, and converting handwritten characters to digital form has been challenging. Effective document processing requires a reliable method to convert handwritten text to digital format.

Solution

Advancements in Machine Learning have led to algorithms that can accurately recognize handwritten characters. Convolutional Neural Networks (CNNs) have proven effective in computer vision tasks, including handwritten character recognition.

In this use case, we train a CNN model using the MNIST dataset, which contains 70,000 images of handwritten digits. Each image is 28x28 pixels, representing a single digit. The model is trained on 60,000 images and tested on the remaining 10,000.

Live Demo

Objectives

Model Building: Use the Remote Python Script Snap from the ML Core Snap Pack to deploy a Python script that trains a CNN model on the MNIST dataset.
Model Testing: Test the trained model with sample data.
Model Hosting: Use the Remote Python Script Snap to deploy a script that hosts the model and schedules an Ultra Task for API access.
API Testing: Use the REST Post Snap to send a sample request to the Ultra Task to verify the API functionality.

Model building

In this pipeline, we use the Remote Python Script Snap to download the MNIST dataset, train a CNN model, and evaluate it. The Keras library retrieves the dataset, and the CNN model is trained to recognize handwritten digits.

After training, the model is formatted with the JSON Formatter Snap and saved on SnapLogic File System (SLFS) using the File Writer Snap.

Model testing

In this pipeline, we evaluate the model against a sample test set. The model’s accuracy on the MNIST test set of 10,000 images helps validate its performance in recognizing handwritten digits.

The Remote Python Script Snap executes a Python script to load the saved model and test it with sample images, calculating the prediction accuracy.

Model hosting

This pipeline is configured as an Ultra Task to provide a REST API accessible by external applications. It includes File Reader, JSON Parser, and Remote Python Script Snaps, similar to the Model Testing pipeline, but accepts API requests instead of direct input.

The Filter Snap authenticates the API request by checking the token, which is configurable in the pipeline parameters. The Mapper Snap extracts required fields from the request, while the Mapper Snap maps the prediction to $content.pred for the response body. The CORS headers are added for API compatibility.

Building API

To deploy this pipeline as a REST API, click the calendar icon in the toolbar and choose either a Triggered Task or an Ultra Task. A Triggered Task is suitable for batch processing, creating a new instance for each request, while an Ultra Task is better for low-latency REST API access. In this case, an Ultra Task is preferred.

No Bearer token is required, as authentication is handled within the pipeline through the Filter Snap. To retrieve the API URL, click Show Tasks in this project in Manager in the Create Task window, then select Details from the task options.

API testing

In this pipeline, a sample request is generated by the JSON Generator Snap. The request is sent to the Ultra Task via the REST Post Snap, and the Mapper Snap extracts the response from $response.entity.

The JSON Generator Snap includes $token and $params, which are added to the request body. The URL is specified in the pipeline parameters and can be retrieved from the Manager page. If necessary, check the Trust all certificates option in the REST Post Snap.

The REST Post Snap output shows the prediction results. The final Mapper Snap extracts $response.entity, displaying the predicted result for the handwritten character.