cat articles/text-embedding-3-small

Evaluating OpenAI's new text-embedding-3-small on a RAG task

OpenAI recently released new embedding models, text-embedding-3-small and text-embedding-3-large. According to OpenAI, text-embedding-3-small costs one fifth as much as the older ada-v2 embedding model while improving performance.

According to OpenAI's article, MTEB scores improved slightly, and MIRACL scores improved substantially. MIRACL, Multilingual Information Retrieval Across a Continuum of Languages, is an information retrieval task across multiple languages. A large score improvement there suggests we can also expect better accuracy for Japanese information retrieval tasks.

Evaluation on a Wikipedia Q&A RAG task

So I evaluated it right away. I used the same method as Solving the first AI-Ou quiz competition with vector search only: vector search over about 5.5 million passages and checking whether the retrieved results contain the answer. In other words, this tests whether RAG retrieval can find text containing the appropriate answer. For text-embedding-3-small, I passed an option to the OpenAI API to reduce the embedding dimension to 512, and used those dimension-reduced results. If the original 1536-dimensional data were used, the score would probably improve slightly.

The results are below.

Evaluation result
Evaluation result

The text-embedding-3-small result was low, and the multilingual-e5 series remained overwhelmingly strong for Q&A-style information retrieval tasks. That said, one strength of OpenAI embeddings may be their ability to embed long token inputs. The passages used here were under 400 Japanese characters, and the score might also be a little higher without dimension reduction.

Data and code used

Extra: cost

The total was 1,490,618,785 tokens, and cost about 30 USD. With the old model this would have cost five times as much, so I would not have felt like trying it. At this level, it feels roughly acceptable even for an individual.

cat related_articles/text-embedding-3-small.yaml

  1. Releasing a Japanese StaticEmbedding Model for Practical 100x Faster Text EmbeddingsI released static-embedding-japanese, a fast non-Transformer embedding model for Japanese and English text, and evaluated it on JMTEB.
  2. Releasing Japanese SPLADE v2, a Strong Retrieval Model for Texts Under 512 TokensI released japanese-splade-v2, an improved Japanese SPLADE model with strong JMTEB retrieval scores for documents up to 512 tokens.
  3. Releasing High-Performance Japanese Rerankers, and What Rerankers AreI released Japanese reranker models trained for search reranking and explain how rerankers improve retrieval quality after initial vector or keyword search.