This app demonstrates Digit Recognition using Deep Learning. Draw a digit 0-9 (or something else) in the drawing area then Predict and you will get a prediction (a digit 0-9) and also a prediction if it is a digit or not.
The pixels (value is 0-255) of the drawn image is input to a model that transforms the input pixels (number of pixels is 784) into 10 values (the predictions) that can be interpreted as the probability that the drawn digit is the digit 0-9. The model contains almost 600,000 parameters and various layers that handle and transform the input pixels. The values of the parameters are learned via a process called training on a large number of labeled images. Labeled means that each input image is associated with a number 0-9 telling what digit the image looks like. The training process is very slow so this app only predicts using a saved model.
The prediction is just a mapping of 784 values into 10 values. So no magic even if it is Deep Learning.
The other prediction (is it a digit or not) is done using another model trained in another way (seeModel: digit or not). Here the 784 pixels are transformed into two values - digit or not.
So the first prediction tells which digit 0-9 is most similar to the drawn symbol and the second prediction tells if the symbol is a digit or not.
Our model is trained using the famous MNIST dataset. MNIST is a hand-written digit dataset. The training set of MNIST contains 60,000 images and the test set 10,000 images. The test set is not used for training but for evaluation to see how good the model is. The pictures in the MNIST dataset are centered and normalized to the same size. This is why "position and size matter".
____________________________________________ Layer (type) Output shape ============================================ conv2d_Conv2D1_input (InputL [null,28,28,1] ____________________________________________ conv2d_Conv2D1 (Conv2D) [null,26,26,32] ____________________________________________ conv2d_Conv2D2 (Conv2D) [null,24,24,32] ____________________________________________ max_pooling2d_MaxPooling2D1 [null,12,12,32] ____________________________________________ conv2d_Conv2D3 (Conv2D) [null,10,10,64] ____________________________________________ conv2d_Conv2D4 (Conv2D) [null,8,8,64] ____________________________________________ max_pooling2d_MaxPooling2D2 [null,4,4,64] ____________________________________________ flatten_Flatten1 (Flatten) [null,1024] ____________________________________________ dropout_Dropout1 (Dropout) [null,1024] ____________________________________________ dense_Dense1 (Dense) [null,512] ____________________________________________ dropout_Dropout2 (Dropout) [null,512] ____________________________________________ dense_Dense2 (Dense) [null,10] ============================================ Total params: 594922 Trainable params: 594922 Non-trainable params: 0
At prediction time the pixel-values of our image are the input to the first Conv2D-layer. The output of one layer is the input to the next. In the end we have a layer with just 10 values as output. Those 10 values are transformed into our 10 prediction values (the probabilities).
The layers have different roles:Before any prediction is possible the 594,922 parameters of the model must have proper values i.e. the model must be trained. The training process tries to minimize the loss value. The process is complicated but is done by Tensorflow.
We can see that we get an accuracy on the test set of 99.4%.
Epoch 1 / 15 eta=0.6 =========================================> acc=0.964 loss=0.134 237005ms 4647us/step - acc=0.921 loss=0.245 val_acc=0.979 val_loss=0.0704 Epoch 2 / 15 eta=0.7 =========================================> acc=0.964 loss=0.0486 271554ms 5325us/step - acc=0.978 loss=0.0676 val_acc=0.989 val_loss=0.0384 ................................................... Epoch 14 / 15 eta=0.7 ==========================================> acc=1.00 loss=0.000358 560396ms 5494us/step - acc=0.996 loss=0.0114 val_acc=0.994 val_loss=0.0275 Epoch 15 / 15 eta=0.7 ==========================================> acc=1.00 loss=0.0000813 580674ms 5693us/step - acc=0.997 loss=0.0107 val_acc=0.994 val_loss=0.0277 Evaluation result: Loss = 0.022; Accuracy = 0.994
This model is a pre-trained model using the "digit 0-9" model. This means that the model has a lot of knowledge of how the digits look like without any training. This new model is created out of the "digit 0-9" model by freezing all layers except the last two dense layers. Freezing means that the parameters are not changed during training. The last layer (the output layer) is modified to give two values: a prediction - digit or not digit. This method is called transfer learning.
Data for training is labeled digit or not digit. Data is produced by running a slightly modified version of this app. The pixel values of the drawn symbols are written to the local storage of the browser and then exported for training. The number of symbols for training is now around 1500. If the model makes a bad prediction for a certain symbol you have to add that symbol with a label to the training set and retrain the model to get a better model. This means rather much work...
In order to restrict the number of symbols for training the app checks if there is more than one symbol in the drawing area (only one symbol is allowed). This is checked by flood filling the symbol with fill characters (black). If there are any non-black characters in the drawing area after flood filling it is interpreted as more than one symbol.
The app is written in JavaScript using TensorFlow.js for "Machine Learning" and Bootstrap for the UI. All code related to the app is run in the browser. The web server is only keeping the files of the app. The files (including the model) are downloaded to the browser. So the predictions are done in the browser. But the training is done outside of the browser because the models are rather large.
The source code of this app is available at GitHub.
Copyright 2019 GubboIT Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
This app was inspired by the excellent book "Deep Learning with JavaScript" from Mannning Publications.
Home