翻译: It’s not always obvious when tail-call optimization is allowed

mwish

2023-09-05

原文链接：https://quuxplusone.github.io/blog/2021/01/09/tail-call-optimization/

翻译许可：

我最初本文作为 “A C++ acronym glossary” (2019-08-02) 中的一部分编写，但决定最好将其单独列出来放到一个博客中。

“TCO”代表“tail call optimization”。这是一种编译器优化，它将代码中的函数调用（通常会向栈上 push frame）并将其转换为 jump 指令（不会将新的 frame push 到栈上）。只有 return bar() 这样在函数为尾部发生调用的时候，编译器可能会采用这种优化：编译器说：“看，我知道 bar 执行完就返回给调用者，就是 return bar() 的这个函数了，让我们直接在本函数的 caller 处调用 bar()”！

Tail call optimization 通常可以用于 return bar() 这种形式，但不能用于 return bar()+1。

在 C++ 中，编程者很难准确地弄清楚哪里允许发生 TCO。主要原因是 non-trivial destructors:

1 2	void foo1() { bar(); } // tail call void foo2() { std::lock_guard lk(m); bar(); } // not a tail call

这也包含 temporary objects 的析构：

1
2
3

void bar(std::string_view);
void foo1() { bar("hello"); }  // tail call
void foo2() { bar("hello"s); }  // not a tail call

但即使这里所有东西都是 trivially destructible 的，您也可能因为需要 stack pointer 或其他东西，导致阻止发生 TCO 。https://godbolt.org/z/vcY3v9 ：

void bar();
void escape(const int&);
void foo1() { escape(42); bar(); }  // tail-call on GCC and MSVC
void foo2() { const int i = 42; escape(i); bar(); }  // not a tail-call

有趣的是，在上面的示例中，GCC 和 MSVC 为 foo1 生成 jmp bar，但 Clang 和 ICC 没有该优化。

您可能会合理地问为什么我们不能对 foo2 进行相同的优化。我认为原因是 C++ 保证每个变量（在其生命周期内）都有一个唯一的地址。如果我们要像这样实现 bar：

const int *addr_of_i;
void escape(const int& i) {
    addr_of_i = &i;
}
void bar() {
    int j;
    assert(&j != addr_of_i);
}

那么程序可以判断该实现是否（不符合规定）将 j 放入与 i 相同的内存位置。然而，没有规则规定 j不能与 42 生成的临时对象共享内存位置，因为该临时对象的生命周期与 j 的生命周期不重叠。

一个实现是否执行 TCO 是可以观察到的，因为从某种意义上说，tail-recursive 函数在使用 TCO 时可能使用 O(1) 的栈空间，但在不使用 TCO 时使用 O(n) 的栈空间——因此，当递归足够深时，会“爆破堆栈” （stack overflow）。然而，C++ 的抽象机实际上并没有任何「爆破堆栈」的概念。 C++ 程序没有一致的方法来检测或处理该情况。

原文：However, C++’s abstract machine doesn’t really have any notion of blowing the stack. There’s no conforming way for a C++ program to detect that condition or deal with it. 不确定翻译对了…

因此，足够偏执的 C++ 代码合作者将避免非常深的递归。执行此操作的一种技术是手动进行 tail-recursion optimization：

int gcd(int x, int y) {
    if (x == 0) return y;
    return gcd(y % x, x);
}

改写为:

int gcd(int x, int y) {
top:
    if (x == 0) return y;
    std::tie(x, y) = std::tuple(y % x, x);
    goto top;
}

并改写成

int gcd(int x, int y) {
    while (x != 0) {
        std::tie(x, y) = std::tuple(y % x, x);
    }
    return y;
}

执行最后一步通常非常重要，这不仅因为这种方式（ https://en.wikipedia.org/wiki/Structured_programming ）使代码更易读，还因为 goto 是极少数阻止将函数标记为 constexpr 的 C++ 结构之一（见 https://stackoverflow.com/questions/45266577/why-disallow-goto-in-constexpr-functions ）。如果您希望函数为 constexpr（无论出于何种原因），则必须避免 goto。这是 C++ 在风格上固执己见的罕见情况。

(如果你没有见过 tie = tuple trick , 你可能会喜欢我在 CppCon 2020 上的演讲 “Back to Basics: Algebraic Data Types.”.)

Posted 2021-01-09