請務必給 child_process 加上 on(‘data’) 處理

好吧，我承認我標題黨了。其實裡面有很多分支條件的，是 child_process 模塊中與 stdio 參數相關的函數需要加上 on('data') 事件處理。

哪些與 stdio 相關呢？如 child_process.spawn() 中 options 就有個可選參數 stdio，你可以指定其為 inherit、pipe、ignore 等。

怎麼算加上 on('data') 事件處理呢？監聽這個事件算一個，將 stdio 指定為類似 ignore 這類操作也是算的。

接下去我就以 child_process.spawn() 為例展開講吧。

`child_process.spawn(command[, args][, options])`

我們先來看看 child_process.spawn() 函數：

command：要執行的命令；
[,args]：執行命令時的命令行參數；
[,options]：擴展選項。

我們不關心前面的內容，只關心 options 中的 stdio 屬性。

options.stdio 可以是一個數組，也可以直接是一個字符串。

如果 options.stdio 是一個數組，則它指定了子進程對應序號的 fd 應該是什麼。默認不配置的情況下，spawn() 出來的子進程對象（設為 child）中會有 child.stdin、child.stdout 和 child.stderr 三個 Stream 對象，而子進程的 stdin、stdout 和 stderr 三個 fd 會通過管道會被重定向到該三個流中——相當於 options.stdio 配置了 'pipe'。

如果 options.stdio 是一個字符串，則代表子進程前三個 fd 都是該字符串對應的含義。如 'pipe' 與 [ 'pipe', 'pipe', 'pipe' ] 等價。

數組中的每個 fd 都可以是下面的類型（無恥摘錄文檔）：

'pipe'：在兩個進程之間建立管道。在當前進程中，該管道以 child.stdio[] 流暴露；而 child.stdin、child.stdout 和 child.stderr 分別對應 child.stdio[0-2]。子進程的對應 fd 會被重定向到當前進程的對應流中；
'ipc'：在兩個進程之間建立 IPC 信道，主子進程通過 IPC 互通有無（前提是兩個進程都得是 Node.js 進程），不過該類型不應用於 std*，而應該是數組中後續的 fd 中；
'ignore'：將 /dev/null 給到對應的 fd；
'inherit'：字面意思是繼承當前進程，該配置會將子進程的對應 fd 通過當前進程的流重定向到當前進程對應的 fd 中，不過只有前三項（stdin、stdout 和 stderr）會生效，後續 fd 若配置了 inherit 等同於 ignore；
Stream：直接是與子進程相關的 TTY、文件、Socket、管道等可讀或者可寫流對象，該流對象底層的 fd 會與子進程對應的 fd 進行共享，不過前提是流中得有個底層的文件描述符，像一個未打開的文件流對象就還沒有對應的描述符；
正整數：與 Stream 類似，對應的是一個文件描述符；
null / undefined：保持對應 fd 的默認值，前三個 fd 默認為 pipe，之後的為 ignore。

瞭解了之後，我們就可以做限定了，本文標題的意思即 pipe 這類需要消費子進程 stdio 的操作我們需要真的消費才行。

其實原因也在文檔中寫明瞭，我會在本文的最後再放出來。

先開始做實驗吧。

實驗一下

我們先準備子進程文件：

// child.js
'use strict';

let str = '123';
// let str = Array(1000000).fill('0000000000').join('');
console.log(str);

文件中有兩句 str 聲明，一句為註釋。當我們要用短字符串的時候，就用原代碼；當我們要用長字符串的時候，兩句源碼與註釋互相替換一下。

短字符串測試

我們寫如下的主進程代碼：

// index.js
'use strict';

const cp = require('child_process');
const child = cp.spawn('node', [ 'child.js' ]);
child.on('exit', () => {
  console.log('hello');
  process.exit(0);
});

運行一下 $ node index.js。一切正常，我們的 'hello' 也被輸出了。沒問題。然後在上面的代碼中加入：

child.stdout.on('data', () => {});

再運行一下，似乎沒什麼變化。脫褲子放屁。我們再加點料吧：

// index.js
'use strict';

const cp = require('child_process');
const child = cp.spawn('node', [ 'child.js' ]);
let data = '';
child.stdout.on('data', chunk => data += chunk.toString());
child.on('exit', () => {
  console.log(data);
  process.exit(0);
});

再運行一下，把 '123' 輸出了。一切如我們所料一樣。

長字符串測試

接下去，我們要註釋掉子進程的短字符串，把長字符串放出來吧。

首先是「短字符串測試」中的最後一段代碼，即有 chunk => data += chunk.toString() 這段代碼的文件。運行一下 $ node index.js 看結果。

嚯，輸出了一堆的 '0'，就像這樣：

看著太心煩了，把 data 相關的代碼去掉吧，stdout 的 data 事件監聽改回這樣：

child.stdout.on('data', () => {});

然後在 console.log 那裡也改回 'hello'。再運行一遍，世界清淨了，只剩 hello。

到目前為止，一切看起來都還算正常。

翻車記錄

接下去要開始翻車了，我們把 child.stdout.on 這一整句去掉，讓主進程代碼恢復成最初的樣子，順便加點料：

// index.js
'use strict';

const cp = require('child_process');
const child = cp.spawn('node', [ 'child.js' ]);
child.on('exit', () => {
  console.log('hello');
  process.exit(0);
});

let i = 1;
setInterval(() => {
  console.log(`噢，死月真是個沙雕呢。 x${i++}`);
}, 100);

$ node index.js，按下手中的回車鍵執行吧：

「噢，死月真是個沙雕呢。」之連環暴擊。

我們的程序卡住了。上面的源碼很短，一眼就能看出來是因為沒執行到 process.exit(0) 才卡住的。沒執行到 process.exit() 的原因其實是因為沒有觸發 child.on('exit') 事件，再往上推，則是子進程沒有退出。

不信他沒退出的話，在「死月沙雕」的期間看看進程存活狀態就知道了。

$ ps aux | grep node
xadillax 3844947  1.5  0.0 552184 31384 pts/202  Sl+  16:30   0:00 node index.js
xadillax 3844954  1.1  0.1 612436 41620 pts/202  Sl+  16:30   0:00 node child.js

動手 GDB 一下

先不看答案，我們動手 GDB 一下看看卡哪了。大家編一個 Node.js 的 Debug 版本也要好久，為了簡化過程，我們用 C 寫一個最簡單的子進程就能做好這個實驗。

// child.c
#include <stdio.h>

int main() {
    setvbuf(stdout, NULL, _IONBF, 0);
    for (int i = 0; i < 1000000; i++) printf("0000000000");
    return 0;
}

然後編譯一下：

$ gcc child.c -g

生成了 a.out，然後改一下 JavaScript 主進程源碼的 spawn() 函數：

const child = cp.spawn('/tmp/lab/a.out');

跑起來之後肯定依舊是沙雕一日遊。這個時候我們拿到 PID 進行 GDB 一下吧。

$ ps aux | grep a.out
xadillax 3848598  0.0  0.0   2488   588 pts/202  S+   16:46   0:00 /tmp/lab/a.out
$ gdb
(gdb) attach 3848598
...
(gdb) bt
#0  0x00007f066486c057 in __GI___libc_write (fd=1, buf=0x5654950b12a0, nbytes=4096) at ../sysdeps/unix/sysv/linux/write.c:26
#1  0x00007f06647ed00d in _IO_new_file_write (f=0x7f06649476a0 <_IO_2_1_stdout_>, data=0x5654950b12a0, n=4096) at fileops.c:1176
#2  0x00007f06647eead1 in new_do_write (to_do=4096, data=0x5654950b12a0 '0' <repeats 200 times>..., fp=0x7f06649476a0 <_IO_2_1_stdout_>) at libioP.h:948
#3  _IO_new_do_write (to_do=4096, data=0x5654950b12a0 '0' <repeats 200 times>..., fp=0x7f06649476a0 <_IO_2_1_stdout_>) at fileops.c:426
#4  _IO_new_do_write (fp=0x7f06649476a0 <_IO_2_1_stdout_>, data=0x5654950b12a0 '0' <repeats 200 times>..., to_do=4096) at fileops.c:423
#5  0x00007f06647ed835 in _IO_new_file_xsputn (n=10, data=<optimized out>, f=<optimized out>) at libioP.h:948
#6  _IO_new_file_xsputn (f=0x7f06649476a0 <_IO_2_1_stdout_>, data=<optimized out>, n=10) at fileops.c:1197
#7  0x00007f06647d4af2 in __vfprintf_internal (s=0x7f06649476a0 <_IO_2_1_stdout_>, format=0x565493233004 "0000000000", ap=ap@entry=0x7ffddfc84640, mode_flags=mode_flags@entry=0) at ../libio/libioP.h:948
#8  0x00007f06647bfebf in __printf (format=<optimized out>) at printf.c:33
#9  0x000056549323216f in main () at child.c:4
(gdb) frame 9
#9  0x000056549323216f in main () at child.c:4
4           for (int i = 0; i < 1000000; i++) printf("0000000000");

我們看到是卡在 child.c 的第 4 行 printf 了。它上面的執行棧也是一路 printf 卡到底。

現在我們知道了，當我們不處理這些文章開始說的事件時候，子進程有可能會卡在形如 printf 等往 stdout、stderr 這些 fd 寫的操作上。

Unix Domain Socket 緩衝區

我們回過頭去看看，我們的實驗代碼主子進程之間是通過什麼來聯立 stdout 的。根據最開始的文檔摘錄，噢，原來是 pipe 呢！

通常情況下，Linux 下的管道緩衝區為 65536 字節。然而 Node.js 子進程 stdio 的值若為 pipe，則其實是建立了一個 Unix Domain Socket。

也就是說，子進程的 stdout 是一條與主進程之間建立起來的 Unix Domain Socket。其兩端的進程均將該管道看做一個文件，子進程負責往其中寫內容，而主進程則從中讀取。

讓我們把視線放到工地上。

管道是有大小的。如果我們堵住管道的出口，那麼我們一直往管道里面灌水，最終會導致水灌不進去堵住了。這句話同樣適用於我們上面的代碼。

也就是說，我們最開始沒有翻車的代碼，因為輸出的內容太少，佔不滿管道緩衝區，所以不會阻塞程序執行，最終得以安全退出；而後面翻車則是因為我們輸出的內容太多了，導致不一會兒緩衝區就滿了，而我們的主進程又沒去消費，所以就翻車了。

主進程停止讀取

為什麼我們 on('data') 了就能消費，而不加就沒消費呢。按理說 Node.js 都讀過來，emit 了事，就能繼續讀下一趴了。其實不是的。

看看 Node.js 的判斷 Readable Stream 是否要讀取新內容的邏輯（https://github.com/nodejs/node/blob/v12.18.3/lib/_stream_readable.js#L586-L621）。

function maybeReadMore_(stream, state) {
  while (!state.reading && !state.ended &&
         (state.length < state.highWaterMark ||
          (state.flowing && state.length === 0))) {
    const len = state.length;
    debug('maybeReadMore read 0');
    stream.read(0);
    if (len === state.length)
      // Didn't get any data, stop spinning.
      break;
  }
  state.readingMore = false;
}

前面其它正常的前提我們拋開不講，如流正在讀啊，還能讀到數據啊什麼的。

當 Readable Stream 內部的 Buffer 長度沒到水位線（通常是 16384），或者其處於 flowing 狀態且緩存沒數據的時候，該流會繼續從源讀數據。

一個 Readable Stream 最開始的 flowing 狀態是 null。也就是說在這個狀態下，當達到緩存水位線之後，就不會繼續讀數據了。

那什麼時候這個狀態會變呢？在這裡：https://github.com/nodejs/node/blob/v12.18.3/lib/_stream_readable.js#L868-L897。

當你調用了 stream.on() 的時候，它會判斷你這次調用所監聽的事件。若事件是 'data' 且當前的 flowing 狀態不為 false 的話：

  if (ev === 'data') {
    // Update readableListening so that resume() may be a no-op
    // a few lines down. This is needed to support once('readable').
    state.readableListening = this.listenerCount('readable') > 0;

    // Try start flowing on next tick if stream isn't explicitly paused
    if (state.flowing !== false)
      this.resume();
  }

Readable Stream 就會執行 resume()。在 resume() 中（https://github.com/nodejs/node/blob/v12.18.3/lib/_stream_readable.js#L955-L969）：

// pause() and resume() are remnants of the legacy readable stream API
// If the user uses them, then switch into old mode.
Readable.prototype.resume = function() {
  const state = this._readableState;
  if (!state.flowing) {
    debug('resume');
    // We flow only if there is no one listening
    // for readable, but we still have to call
    // resume()
    state.flowing = !state.readableListening;
    resume(this, state);
  }
  state[kPaused] = false;
  return this;
};

會將 flowing 設置為 true。正如 Node.js 文檔中說的一樣：

All Readable streams begin in paused mode but can be switched to flowing mode in one of the following ways:

Adding a 'data' event handler.

Calling the stream.resume() method.

Calling the stream.pipe() method to send the data to a Writable.

即所有的 Readable Stream 一開始都處於暫停狀態，對其添加 data 事件才會開始切為 flowing 狀態。而在暫停狀態下，stdio 的 pipe 流會先緩存略大於或等於水位線的數據。

在暫停狀態下，只有你添加了 data 事件處理器才會開始讀取數據並丟給你；而如果你處於 flowing 狀態，只移除消費者，那麼這些數據就會丟失——因為流其實並沒有暫停。

文檔上雖說一開始處於暫停狀態時我們沒去監聽數據，那麼流就不會產生數據。實際上在內部實現上是產生了數據，而這部分數據是被緩存起來了。

念文檔

好了，回到最開始。我之前說了“其實原因也在文檔中寫明瞭，我會在本文的最後再放出來”。現在是時間了，看看這裡：https://nodejs.org/api/child_process.html#child_process_child_process。

By default, pipes for stdin, stdout, and stderr are established between the parent Node.js process and the spawned child. These pipes have limited (and platform-specific) capacity. If the child process writes to stdout in excess of that limit without the output being captured, the child process will block waiting for the pipe buffer to accept more data. This is identical to the behavior of pipes in the shell. Use the { stdio: 'ignore' } option if the output will not be consumed.

默認情況下，spawn 等會在 Node.js 進程與子進程間建立 stdin、stdout 和 stderr 的管道。管道容量有限（不同平臺容量不同）。如果子進程往 stdout 寫入內容，而另一端沒有捕獲導致管道滿了的話，在管道騰出空間前，子進程就會一直阻塞。該行為與 Shell 中的管道一致。如果我們不關心輸出內容的話，請設置 { stdio: 'ignore' }。

看吧，就是這個理兒。如果我們將其設為 ignore 的話，其三個 std* 就會導到 /dev/null 去。

小結

所以標題的標題黨就是這個意思。

你一旦建立了子進程，且其 stdout 之類的是一個 pipe，你就必須對它的數據負責。哪怕你只是監聽了這個事件，裡面寫個空函數，Node.js 也會認為你消費了，不然 Node.js 會把子進程的數據一直掛載在它 Stream 的緩存中，最後到一個水位（大於 16384 的時候）之後就停止讀取子進程數據了。然後就會導致子進程寫阻塞。

child_process.spawn(command[, args][, options])