cat articles/open-provence-release

OpenProvence: A Model for Removing Irrelevant Sentences Before Passing Text to an LLM

created 2025-10-31

Recently, systems often recursively search, gather information, create additional search queries from multiple angles to fill missing knowledge, and extract only the necessary information from the results in order to build "good knowledge" for an LLM to answer with. In 2025 trends such as AI agents, Deep Research, and context engineering, this kind of search often happens behind the scenes. The ability to retrieve useful information can be a key part of the system.

However, when a system searches a lot, the amount of search-result text also grows. That can make it harder for the LLM to extract the truly necessary information, increase hallucinations, slow processing because of larger inputs, and increase LLM usage cost.

Provence is an approach for deleting irrelevant information before passing search results to an LLM, while also assigning a relevance score. It can remove irrelevant parts from documents returned by search. When I measured the model's performance, an evaluation using long-form question-answer datasets, MLDR plus LLM evaluation, showed that it could remove about 80-95% of the text. In other words, a 10,000-character text can be reduced to roughly 500-2,000 characters before being passed to the LLM. That can substantially reduce input size. Even for datasets made of many shorter sentence-like chunks, depending on the domain, it removed about 30-70% of sentences.

The Provence implementation and models published for research are non-commercial, and no Japanese dataset was available. I therefore created a project called OpenProvence and published training and inference code, model weights, and related artifacts under open licenses. I also created and published Japanese datasets. Many of the datasets themselves are not under open licenses because they inherit the original source licenses.

Trying OpenProvence

I prepared a Hugging Face Spaces demo that runs on CPU:

🤗 https://huggingface.co/spaces/hotchpotch/open_provence_demo

For example, if you use the sample Wikipedia page about information retrieval and run sentence pruning with the query "What is vector search?", the article of about 5,000 Japanese characters is reduced to about 400 characters, leaving only the information about vector search.

You can also run the demo locally with the following steps. On a recent MacBook, inference should be fairly fast.

git clone https://huggingface.co/spaces/hotchpotch/open_provence_demo
cd open_provence_demo
uv sync
uv run python app.py

Using It from Python

From Python, it can be used as follows. The small xsmall model can run on CPU. In a GPU environment, NVIDIA with FlashAttention 2, inference should finish almost immediately and prune the text. I think it is fast enough to be integrated into a production search environment.

from transformers import AutoModel

# Change this to the model you want to use.
model_name = "hotchpotch/open-provence-reranker-xsmall-v1"
provence = AutoModel.from_pretrained(model_name, trust_remote_code=True)

question:str = "日本の首都について"
context:str = """
今日は学校に行き、さまざまなことを学んだり、友達と学食でたらふく食べた。
日本の首都は東京で、東京は日本の政治、経済、文化の中心地らしい。この都市は約1,400万人の人口を抱える世界有数の大都市らしい。
夜は飲み会に誘われたが、参加せずに帰宅した、今月そんなにお金が残ってないからなぁ、残念だ。
"""

result = provence.process(question, context, threshold=0.1)
print(f"Reranking Score: {result['reranking_score']:.4f}")
print(f"Compression Rate: {result['compression_rate']:.1f}%")
print(f"Pruned Context:\n{result['pruned_context']}")

# Output example:
# Reranking Score: 0.7043
# Compression Rate: 62.5%
# Pruned Context:
# 日本の首都は東京で、東京は日本の政治、経済、文化の中心地らしい。
# この都市は約1,400万人の人口を抱える世界有数の大都市らしい。

Using Coding Agents

For OpenProvence, I set a constraint that I would not write a single line of code myself. All implementation work, including inference, training model code, evaluation code, and dataset creation code, was done by coding agents such as Claude Code and Codex. I still had to give many correction instructions, but for a side project progressed in spare time, I think the result is quite good. Looking at the final code, it could probably be simpler. At the same time, with current LLMs, this amount of explicit and somewhat verbose code may be easier for the model to understand and modify.

The term vibe engineering has emerged for building production-quality software in collaboration with AI by continuing to provide suitable instructions, development guidelines, and an environment where AI can develop and improve the project itself.

In addition to the usual software development practices used with coding agents, such as development guidelines, unit tests, CI, and code review, I found that machine learning model projects of a certain size can also be developed this way by preparing a small baseline that can train quickly, evaluation data whose accidental changes would indicate bugs, and detailed explanations of the datasets.

Closing

An approach like OpenProvence, which removes text unrelated to the question, should work especially well for products that process very large documents.

RAG was a major topic in 2024, and in 2025 trends such as AI agents, Deep Research, and context engineering have made this area even more important. I am impressed by, and grateful for, the foresight of the Provence team at Naver Labs Europe, who worked on an important technical point early. Provence was published in January 2025.

For products using LLMs, information retrieval behind the scenes can increase value, and retrieval technology remains very interesting. I hope this project is useful in products or research.