cat articles/fasttext-quantize

Quantizing fastText to build a practical 1.7 MB text classifier

This is a note about my surprise after building a text classifier that decides whether English articles are AI-related or not with fastText and quantization. The resulting model was practical and only 1.7MB. 1.7MB!


As I wrote in Launching AI News and how I used OpenAI behind it, AI News currently classifies text as AI-related or not by converting articles into 1536-dimensional vectors with OpenAI's text-embedding-ada-002 and training lightGBM on those vectors. The problem with this approach is that every article must go through the OpenAI API. On days with many long articles, this can cost several tens of yen per day. Monthly, it probably costs 500 to 1000 yen. Small costs add up.

The data had started to accumulate, so I wanted to classify articles without spending money on the OpenAI API. The data source is about 1,100 English article titles and bodies, with a ratio of about 2 AI articles to 8 non-AI articles. I split it 7:2:1 into train, validation, and test. Also, because I want to avoid mistakenly classifying AI articles as non-AI as much as possible, I look not only at accuracy but also at recall.

Current classifier: OpenAI embeddings + lightGBM

Accuracy was 0.9636, and recall was 0.777. For embeddings without fine-tuning, this is quite high.

Transformer: deberta-v3-xsmall

Accuracy was 0.9636, and recall was 0.888. This is a properly fine-tuned transformer, deberta-v3-xsmall. The test set is small, about 110 items, so it is hard to say too much, but it is roughly the same performance as the current classifier. That is expected, since it is fine-tuned.

I also tried deberta-v3-large, but the score actually dropped. The training data may be too small for fine-tuning a larger model to fit well.

fastText: cc.en.300

Accuracy was 0.9454, and recall was 1.0. Accuracy dropped, but recall was 1.0. Because the test data is small, I cannot strongly claim the recall is excellent, but the balance looks good.

The trained model size at this point was 4.6GB. That makes sense because the original cc.en.300 is large.

fastText: ag news

Accuracy was 0.9454, and recall was 1.0. The result was the same as cc.en.300. The trained model size was 88MB, already much smaller.

The ag news model came from Supervised models. cc.en.300 is 300-dimensional data trained from Common Crawl and Wikipedia, while ag news is 10-dimensional data trained from a corpus of news article titles and descriptions. Since this dataset matches my use case fairly well, I think the compatibility was good.

At this point the model was already reasonably small and practical, but looking at Supervised models, I noticed that very lightweight quantized models are also provided. Their scores do not degrade much; for example, ag news goes from 0.924 to 0.92. The model size goes from 387MB to 1.6MB.

So I tested performance with quantization too.

fastText: ag news + quantization

Accuracy was 0.9363, and recall was 1.0. Accuracy dropped slightly, but the model size shrank dramatically from 88MB to 1.7MB.

With the fastText command, quantization can be done quickly like this. Inference with the quantized model also worked without any special handling.

fasttext  quantize -output ./trained.ag_news -input ./trained.ag_news.bin -qnorm -retrain -cutoff 100000

Extra: fastText cc.en.300 + quantization

Accuracy was 0.9181, and recall was 0.714. The score dropped a lot. The model size also dropped dramatically from 4.6G to 16MB, though. The 300-dimensional size of cc.en.300 may have worked against it here.

Performance summary

The classifiers I tried performed as follows. OpenAI embeddings are clearly strong. If cost does not matter, it feels like they are good enough. DeBERTa v3 is also good if the machine has enough resources. But AI News data processing runs on a VPS with 1 GB of memory, so it is overkill there.

So I decided to use the memory-efficient, quantized, and still practical fastText ag news model as the first-stage classifier, and then use OpenAI embeddings + lightGBM as the second stage. The first stage should filter out about 80% of articles, so the number of calls to OpenAI in the second stage should drop sharply.

  • OpenAI embeddings + lightGBM
    • acc 0.9636, recall: 0.777
  • deberta-v3-xsmall
    • acc 0.9636, recall: 0.8888
  • fastText cc.en.300, after fine-tuning: 4.6G
    • acc 0.9454, recall: 1.0
  • fastText cc.en.300, after fine-tuning and quantization: 16MB
    • acc 0.9181, recall: 0.7142
  • fastText ag news, after fine-tuning: 88M
    • acc 0.9454, recall 1.0
  • fastText ag news, after fine-tuning and quantization: 1.7M
    • acc 0.9363, recall 1.0

fastText + quantization as an option

Until now I did not know that quantizing a fastText model could reduce the model size this much. Going forward, when I want to run text classification inference on a low-spec machine, I will consider fastText + quantization as one strong option.

Extra note: bin to vec

fasttext print-word-vectors ag_news.bin

This outputs a .vec file containing only word vectors. Specify this .vec file for a pretrained model.

cat related_articles/fasttext-quantize.yaml

  1. Measuring speed, data size, and accuracy for vector search algorithms and quantization parametersA benchmark of FAISS vector search settings, including IVF, HNSW, and product quantization, with a focus on recall@1, @3, and @5 for RAG systems where top-N retrieval quality matters.
  2. Launching AI News and how I used OpenAI behind itI launched AI News, a site that collects AI, data science, and machine learning topics and summarizes them into three lines with AI. This article describes why I built it and how I used OpenAI APIs for classification and summarization.
  3. Training a Q&A + RAG-focused LLM with SFT, making 4-bit quantized models, and exceeding GPT-3.5 with a 7B modelI fine-tuned rinna's youri-7b-instruction with SFT for Japanese Q&A over RAG context, quantized it with 4-bit methods, and compared exact match, partial match, speed, and GPU memory against GPT-3.5 and GPT-4.