This app demonstrates image classification using Deep Learning. In other words: the app tells you (predicts) what an image or a frame of a video looks like: dog, cat and so on.
Slide and Video show examples of the app making good or bad predictions.
The app is simple. An image is just fed into a pretrained model to get the prediction. The top 3 prediction values are displayed. A model with 1000 classes (cat, dog, ...) is used by the app. The model (a convnet) is not described here. See Digit app for more info on models, training etc.
Probably not too much. The app can not make good predictions on something that the model is not trained on. In the best case you will have a prediction on something that looks like the real thing or you will have just a very stupid prediction.
Our model handles 1000 classes (cat, dog, ...). Around 400 are animals and around 130 are breeds of dog. The 600 remaining classes are covering various things but a lot is of course missing. To make a general app/model that can classify almost anything is very hard (or impossible) - requires a lot of images.
What can we do if we want to add new classes to our model? In principle we have to collect a number of images (a few hundred per class) for these new classes, label the images according to class, modify the model (only slightly), and train again but only for the new classes. So it means some work...
The accuracy of our model is around 70%. Does this mean that we can expect that 70% of our predictions will be correct? No we can't. It can be better but probably not. 70% is for the predictions on the test set. The test set is a part of the original images that the model is NOT trained on. The test set is only used to get a measure on how well the model performs on never seen (in training) data. But our images are probably different from the images in the test set. One reason is that we may use images not belonging to any class - how should we know if our image belongs to one of the 1000 classes? Another reason is probably that we may see the animals in the video from other angles e.g. from above or from behind.
The app is written in JavaScript using TensorFlow.js for "Machine Learning" and Bootstrap for the UI. All code related to the app is run in the browser. The web server is only keeping the files of the app. The files (including the model) are downloaded to the browser. So the predictions are done in the browser.
The app uses a pretrained MobileNet model and supporting software. For more info see tfjs-models mobilenet at GitHub. "MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases". Number of parameters is around 4.2 million. Accuracy is around 70%.
Copyright 2019 GubboIT Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
This app was inspired by the excellent book "Deep Learning with JavaScript" from Mannning Publications.
Video clips from Animals at Skansen, Stockholm.
Home