cat articles/lcel

Understanding LangChain Expression Language (LCEL)

LCEL is a way to build LangChain chains easily. Development became active in the second half of 2023, and as of January 2024, writing LangChain code with LCEL is generally recommended, although the older style can still be used. For LCEL's benefits, the official LCEL documentation is a good reference.

However, when I started writing LCEL, code worked if I followed the official documentation exactly, but small changes often broke it. This was simply because I did not understand LCEL's behavior. The official documentation and tutorials contain many examples showing how simply LLM + RAG code can be written with LCEL, but they do not explain much about LCEL's behavior itself. Even when they do, the examples are often combined with ChatGPT or templates, so I had trouble finding an explanation of "how LCEL behaves in the first place."

So I wrote a notebook article using LangChain 0.1.0 that explains only the basics of LCEL behavior and helps understanding progress step by step. The Colab notebook is here:

LCEL basics

The basic idea of LCEL in LangChain is simple. An object receives an input value and passes its output value to the next object. This is the same as ordinary implementation.

First, define a function that doubles a value.

def double(x):
    return x * 2

double(2)
4

Next, define a function that prints the argument to standard output and returns the argument unchanged.

def tap_print(x):
    """
    引数の値を標準出力に出力して、引数の値をそのまま返す
    """
    print(f"tap_print: {x}")
    return x

Now run the two functions together. Give an argument to double, run it, and pass the result to tap_print.

tap_print(double(2))
tap_print: 4

4

That produced the expected result.

Next, convert these functions into RunnableLambda, a subclass of Runnable, which is LCEL's execution class.

from langchain_core.runnables import RunnableLambda

r_double = RunnableLambda(double)

After conversion, you can use the Runnable interface. Let's call invoke, which runs a Runnable.

r_double.invoke(2)
4

You can also define a RunnableLambda-like function with the @chain decorator. A variable named chain appears later and would be confusing, so here I import it as chain_decorator.

from langchain_core.runnables import chain as chain_decorator

@chain_decorator
def r_double(x):
    return x * 2

r_double.invoke(2)  # r_double は RunnableLambda になるので、invoke で実行できる
4

Convert tap_print into a Runnable as well.

r_tap_print = RunnableLambda(tap_print)
r_tap_print.invoke(2)
tap_print: 2

2

Now finally, let's connect and run them with |, the core of LCEL.

chain = r_double | r_tap_print
chain.invoke(2)
tap_print: 4

4

Good. r_double returns a result, and r_tap_print prints that result while returning it. What exactly is this chain?

chain.__class__
langchain_core.runnables.base.RunnableSequence

The chain is a RunnableSequence, a Runnable that runs serially. Let's write the same implementation without the | syntax sugar.

from langchain_core.runnables import RunnableSequence

chain = RunnableSequence(r_double, r_tap_print)
chain.invoke(2)
tap_print: 4

4

That produced the same result.

Now let's display the execution flow, or execution graph, for this Runnable.

chain.get_graph().print_ascii()
  +----------------+   
  | r_double_input |   
  +----------------+   
          *            
          *            
          *            
+------------------+   
| Lambda(r_double) |   
+------------------+   
          *            
          *            
          *            
+-------------------+  
| Lambda(tap_print) |  
+-------------------+  
          *            
          *            
          *            
+------------------+   
| tap_print_output |   
+------------------+   

double receives the input, passes it to tap_print, and the final output becomes tap_print_output.

Now look at this code:

chain = r_double | tap_print  # tap_print は RunnableLambda ではない!
chain.invoke(2)
tap_print: 4

4

Why does this work even though tap_print is not a Runnable? It works because Runnable objects use Python's bitwise | operator behavior: if either the left or right side is a Runnable, the other side is automatically converted into a Runnable. The actual Runnable code looks like this:

    def __or__(
        self,
        other: Union[
            Runnable[Any, Other],
            Callable[[Any], Other],
            Callable[[Iterator[Any]], Iterator[Other]],
            Mapping[str, Union[Runnable[Any, Other], Callable[[Any], Other], Any]],
        ],
    ) -> RunnableSerializable[Input, Other]:
        """Compose this runnable with another object to create a RunnableSequence."""
        return RunnableSequence(self, coerce_to_runnable(other))

    # 同様に __ror__ も定義されている

It wraps the result in RunnableSequence, and passes the other argument given to the | operator, tap_print in this example, through coerce_to_runnable.

coerce_to_runnable is also important, so let's look at it.

def coerce_to_runnable(thing: RunnableLike) -> Runnable[Input, Output]:
    """Coerce a runnable-like object into a Runnable.

    Args:
        thing: A runnable-like object.

    Returns:
        A Runnable.
    """
    if isinstance(thing, Runnable):
        return thing
    elif inspect.isasyncgenfunction(thing) or inspect.isgeneratorfunction(thing):
        return RunnableGenerator(thing)
    elif callable(thing):
        return RunnableLambda(cast(Callable[[Input], Output], thing))
    elif isinstance(thing, dict):
        return cast(Runnable[Input, Output], RunnableParallel(thing))
    else:
        raise TypeError(
            f"Expected a Runnable, callable or dict."
            f"Instead got an unsupported type: {type(thing)}"
        )

This function does the following conversions: if the object is already a Runnable, do nothing; if it looks like a generator, convert it to RunnableGenerator; if it is callable, such as a function, convert it to RunnableLambda; if it is a dict, convert it to RunnableParallel; otherwise, raise an exception.

From this, you can see that in LCEL, if either side connected by | is a Runnable, the other side is converted into a Runnable, and a RunnableSequence connecting them is returned.

Now run the next code.

chain = double | r_tap_print  # double は RunnableLambda ではない!

chain.invoke(2)
tap_print: 4

4

This time, double is not a Runnable, but r_tap_print.__ror__ converts it into a RunnableSequence, so it can run.

So far, we have seen:

  • A Runnable basically receives an input value with invoke, processes it, and returns an output value.
  • Connecting Runnables with | creates code that runs serially through RunnableSequence.
  • If either side of | is not a Runnable, it is automatically converted into a Runnable.

Seen this way, Runnable feels simple and understandable.

Dict syntax converted into RunnableParallel

Next is the syntax using | and dict, which confused me a lot at first. Consider this implementation:

  • Pass a number as the argument.
    • Keep the first value in original_value.
    • Put the doubled value in double_value.
  • Pass those results to tap_print.

Let's write code that does this.

chain = {
    "original_value": lambda x: x,
    "double_value": double,
} | r_tap_print

chain.invoke(2)
tap_print: {'original_value': 2, 'double_value': 4}

{'original_value': 2, 'double_value': 4}

It worked, though it is not obvious why. If you use this by intuition without understanding the behavior, it gradually becomes confusing. That happened to me.

Let's check what this is doing by displaying the execution graph.

chain.get_graph().print_ascii()
+--------------------------------------------+    
| Parallel<original_value,double_value>Input |    
+--------------------------------------------+    
               **              **                 
            ***                  ***              
          **                        **            
+-------------+               +----------------+  
| Lambda(...) |               | Lambda(double) |  
+-------------+               +----------------+  
               **              **                 
                 ***        ***                   
                    **    **                      
+---------------------------------------------+   
| Parallel<original_value,double_value>Output |   
+---------------------------------------------+   
                        *                         
                        *                         
                        *                         
             +-------------------+                
             | Lambda(tap_print) |                
             +-------------------+                
                        *                         
                        *                         
                        *                         
              +------------------+                
              | tap_print_output |                
              +------------------+                

It suddenly becomes parallel execution, branches, aggregates the parallel results, and then passes them to tap_print.

This is the confusing point: when you connect a dict with |, coerce_to_runnable is called, it is automatically converted into RunnableParallel, and the values of the dict are run in parallel. When RunnableParallel receives a dict, it automatically converts the dict values into Runnables, runs them in parallel, and returns the results under the corresponding keys.

Let's use coerce_to_runnable directly and see the result of the type conversion that happened through |.

from langchain_core.runnables.base import coerce_to_runnable

parallel = coerce_to_runnable(
    {
        "original_value": lambda x: x,
        "double_value": double,
    }
)

parallel.invoke(2)
{'original_value': 2, 'double_value': 4}
parallel.__class__
langchain_core.runnables.base.RunnableParallel

Declaratively writing code with the same behavior looks like this:

from langchain_core.runnables import RunnableParallel

parallel = RunnableParallel(
    {
        "original_value": coerce_to_runnable(lambda x: x),
        "double_value": coerce_to_runnable(double),
    }
)
parallel.invoke(2)
{'original_value': 2, 'double_value': 4}

So in LCEL, connecting a dict with | produces code that runs in parallel with RunnableParallel and returns values.

invoke and dict

Next, call invoke with a dict. Note that this is completely different from the dict that is converted into RunnableParallel above. This is an ordinary call with a dict as the argument.

data = {
    "input_value": 2,
    "input_do_nothing": 100,
}
chain = r_double | r_tap_print
try:
    chain.invoke(data)
except Exception as e:
    print("Error:", e)
Error: unsupported operand type(s) for *: 'dict' and 'int'

r_double expects an int, but it received a dict, so it cannot process it. You may say, "Just write chain.invoke(data['input_value'])", and that is true. But if r_double is in the middle of a chain and a dict reaches it, it cannot handle it.

In that case, insert a function that extracts only input_value.

data = {
    "input_value": 2,
    "input_do_nothing": 100,
}
chain = (lambda x: x["input_value"]) | r_double | r_tap_print
chain.invoke(data)
tap_print: 4

4

That worked.

Now let's write code that passes input_do_nothing onward unchanged, passes the calculation result of input_value onward as double_value, and also passes the original input_value.

data = {
    "input_value": 2,
    "input_do_nothing": 100,
}
chain = {
    "double_value": (lambda x: x["input_value"]) | r_double,
    "input_value": lambda x: x["input_value"],
    "input_do_nothing": lambda x: x["input_do_nothing"],
} | r_tap_print
chain.invoke(data)
tap_print: {'double_value': 4, 'input_value': 2, 'input_do_nothing': 100}

{'double_value': 4, 'input_value': 2, 'input_do_nothing': 100}

The intended values were passed to r_tap_print.

In the initial dict definition of this chain, the next | operator calls r_tap_print.__ror__, passes the dict to coerce_to_runnable, and automatically converts it into RunnableParallel, which produces the intended behavior.

But this is very verbose. If data had many more keys and you wanted to pass all of them as input values to later Runnables, it would be painful. For that situation, RunnablePassthrough exists. Let's rewrite this using RunnablePassthrough.

from langchain_core.runnables import RunnablePassthrough

data = {
    "input_value": 2,
    "input_do_nothing": 100,
}
chain = (
    RunnablePassthrough().assign(
        double_value=(lambda x: x["input_value"]) | r_double,
    )
    | r_tap_print
)
chain.invoke(data)
tap_print: {'input_value': 2, 'input_do_nothing': 100, 'double_value': 4}

{'input_value': 2, 'input_do_nothing': 100, 'double_value': 4}

Let's also add a result that triples the value.

chain = (
    RunnablePassthrough().assign(
        double_value=(lambda x: x["input_value"]) | r_double,
        triple_value=lambda x: x["input_value"] * 3,  # 暗黙的に RunnableLambda に変換される
    )
    | r_tap_print
)
chain.invoke(data)
tap_print: {'input_value': 2, 'input_do_nothing': 100, 'double_value': 4, 'triple_value': 6}

{'input_value': 2,
 'input_do_nothing': 100,
 'double_value': 4,
 'triple_value': 6}

As you can see, RunnablePassthrough is very convenient when you want to modify only part of an input dict, add keys, and pass it to the next Runnable.

Be careful not to confuse passing a dict as the argument to invoke with writing LCEL by connecting a dict with |, which effectively converts it into RunnableParallel. They have completely different intentions and behavior.

RunnablePassthrough can also be used declaratively to turn a non-dict input value into a dict.

value_format = "value is {value}, double value is {double_value}"

def template(data):
    return value_format.format(**data)

r_template = RunnableLambda(template)

chain = (
    {
        "value": RunnablePassthrough(),
        "double_value": RunnablePassthrough() | double,
    }  # r_double でなく double でも、自動で RunnableLambda に型変換される
    | r_template
    | r_tap_print
)
chain.invoke(100)
tap_print: value is 100, double value is 200

'value is 100, double value is 200'

You now understand why this behaves this way.

Next steps

If you understand this much, you should be able to read LangChain LCEL sample code and most LCEL code written by others. Looking back, if you understand how automatic conversion into Runnable works and that writing a dict around | turns it into RunnableParallel, you should run into far fewer implementation problems.

If you want to know why something behaves a certain way in practice, reading the Runnable source code may be the fastest way to understand it. In that case, I recommend reading the code.

I hope this article and notebook help someone understand LCEL.

cat related_articles/lcel.yaml

  1. Launching AI News and how I used OpenAI behind itI launched AI News, a site that collects AI, data science, and machine learning topics and summarizes them into three lines with AI. This article describes why I built it and how I used OpenAI APIs for classification and summarization.
  2. Adding type hints to a Python project and getting value from type checkingI added Python type hints and pyright to a machine learning project, and found that the setup cost was low while editor support and static checks were immediately useful.
  3. Building a simple fully connected neural network with TensorFlow 2 without KerasA hands-on note implementing a simple feed-forward neural network with only TensorFlow APIs, without Keras, to understand layers, activation functions, losses, automatic differentiation, and manual training.