cat articles/hiragana-tensorflow-js

Inferring Hiragana in the Browser with TensorFlow.js

created 2021-03-15

While trying out TensorFlow.js, I made a demo that recognizes hiragana using only the browser.

https://tfjs-hiragana.surge.sh/

The model gets about 99.0% accuracy on the original dataset. In the browser, though, it makes a fair number of mistakes. I suspect this is because the training data contains handwriting from pens and brushes, while browser input has a different feel. Characters such as "か" are especially hard.

The trained weights loaded by the browser are 437 KB, and the model metadata is 49 KB. I trained a simple fully connected network in Python TensorFlow: 48x48 grayscale hiragana images, about 700,000 characters, 20 layers, 100 neurons, and 71 output classes. The original Keras model was about 5 MB. After converting it for JavaScript with tensorflowjs_converter --quantize_uint8 --input_format keras input.h5 output_dir, the model metadata became about 49 KB of JSON and the trained data about 437 KB. gzip reduces it by another 20% or so.

The source data is the Character Image Dataset: 73 Hiragana Characters. Its image license is Public Domain Mark 1.0. I used 71 characters from it, excluding ゑ and ゐ.

Converting from Keras to TensorFlow.js is easy if you follow the documentation, but only officially supported APIs can be imported.

Models using unsupported ops or layers, e.g. custom layers, Lambda layers, custom losses, or custom metrics, cannot be automatically imported, because they depend on Python code that cannot be reliably translated into JavaScript.

Custom loss functions and similar pieces are common, so models that are not fairly plain Keras or Python TensorFlow models may not import cleanly. If the final target is TensorFlow.js, it may be better to design the Python model with that in mind, or to build and train the model in TensorFlow.js from the beginning. Node + GPU support seems to exist now, and the API is also fairly complete.

TensorFlow.js and TensorFlow Lite are interesting because, despite their constraints, they let you run inference on browsers, smartphones, and other edge devices. I was impressed by how casually this kind of inference can now be done.