開發與維運

MYSQL SUBQUERY執行過程

實為吾之愚見,望諸君酌之!聞過則喜,與君共勉

環境

version | 5.6.24-debug |
| version_comment | Source distribution |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |

SQL

該SQL是一個subquery SQL

SELECT h_1.*, o.S FROM h h_1, p o WHERE o.id = h_1.T AND h_1.id IN ( SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T )

問題

subquery內的單獨的SQL耗時0.01S,合併起來後,整個SQL耗時4min20S,耗時非常長

執行時間與執行計劃對比

整個SQL的執行時間與執行計劃:

SELECT h_1.*, o.S FROM h h_1, p o WHERE o.id = h_1.T AND h_1.id IN ( SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T )

7 rows in set (4 min 20.57 sec)

id select_T table T possible_keys key key_len ref rows Extra
1 PRIMARY o ALL PRIMARY NULL NULL NULL 150 NULL
1 PRIMARY h_1 ref idx_T idx_T 5 alitest.o.id 278 Using where
2 DEPENDENT SUBQUERY h_11 index_merge index_HI,idx_T,idx_F idx_F,index_HI 5,5 NULL 6 Using intersect(idx_F,index_HI); Using where; Using filesort
2 DEPENDENT SUBQUERY o1 eq_ref PRIMARY PRIMARY 4 alitest.h_11.T 1 Using index

SQL拆分執行時間如下:

subquery SQL:

SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T

7 rows in set (0.01 sec)

id select_T table T possible_keys key key_len ref rows Extra
1 SIMPLE h_11 index_merge index_HI,idx_T,idx_F idx_F,index_HI 5,5 NULL 6 Using intersect(idx_F,index_HI); Using where; Using filesort
1 SIMPLE o1 eq_ref PRIMARY PRIMARY 4 alitest.h_11.T 1 Using index

外層SQL:

SELECT h_1.*, o.S FROM h h_1, p o WHERE o.id = h_1.T

60000 rows in set (1.38 sec)

id select_T table T possible_keys key key_len ref rows Extra
1 SIMPLE o ALL PRIMARY NULL NULL NULL 150 NULL
1 SIMPLE h_1 ref idx_T idx_T 5 alitest.o.id 278 NULL

問題分析

分析方法

藉助GDB調試MYSQL,確認問題

耗時環節代碼

該SQL整體執行時,代碼的主要執行部分分為2部分,這兩部分構成了MYSQL的nested loop算法,分別如下:

代碼1

sub_select (join=0x7fbe78005808, join_tab=0x7fbe78006738, end_of_records=false) at /opt/mysql-5.6.24/sql/sql_executor.cc:1203

主要代碼塊:該代碼塊以while進行循環,獲取多表關聯時第一個表的數據(取決於執行計劃的執行順序)循環讀取並進行比較判斷,while循環結束的前提是error<0,也就是數據取完

while (rc == NESTED_LOOP_OK && join->return_tab >= join_tab)

  {

    int error;

    if (in_first_read)

    {

      in_first_read= false;
       //表的read first record記錄
      error= (*join_tab->read_first_record)(join_tab);

    }

    else
       ////取出表的下一行記錄直到最後一條記錄
      error= info->read_record(info);

    DBUG_EXECUTE_IF("bug13822652_1", join->thd->killed= THD::KILL_QUERY;);

    if (error > 0 || (join->thd->is_error()))   // Fatal error

      rc= NESTED_LOOP_ERROR;

    else if (error < 0)
      //以error狀態判斷數據是否取完,取完後循環在此終止
      break;

    else if (join->thd->killed)     // Aborted by user

    {

      join->thd->send_kill_message();

      rc= NESTED_LOOP_KILLED;

    }

    else

    {

      if (join_tab->keep_current_rowid)

        join_tab->table->file->position(join_tab->table->record[0]);
       //對獲取到的行記錄,進行比較,該函數內部可能會繼續調用sub select,產生nest loop
      rc= evaluate_join_record(join, join_tab);

    }

  }

代碼2

evaluate_join_record (join=0x7fbe64005478, join_tab=0x7fbe640063a8) at /opt/mysql-5.6.24/sql/sql_executor.cc:1449

主要代碼塊:

@@1部分主要對拿到的數據進行判斷,確認是否符合where後的條件,以該SQL為例,如果從表h h_1裡拿到了一行數據,因為該表where後有判斷條件,條件為:

h_1.id IN ( SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T )

則該代碼塊(@@1)會對這個subquery進行調用(相當於重新執行一次這個subquery,gdb跟蹤時可以跟蹤到最終調用JOIN::exec->do_select->sub_select->evaluate_join_record),所以沒取一次數據,就要對其進行判斷,故這個subquery每一次都要重新執行,它並不是只執行一次拿到數據然後對比。

@@2 部分的*join_tab->next_select會重新調用sub_select,進入循環部分,獲取下一個關聯表的數據,並再次進入evaluate_join_record 進行一系列判斷,直至數據取完

@@1

 if (condition)

  {

    found= MY_TEST(condition->val_int());

    if (join->thd->killed)

    {

      join->thd->send_kill_message();

      DBUG_RETURN(NESTED_LOOP_KILLED);

    }

    /* check for errors evaluating the condition */

    if (join->thd->is_error())

      DBUG_RETURN(NESTED_LOOP_ERROR);

  }

@@2

enum enum_nested_loop_state rc;

      /* A match from join_tab is found for the current partial join. */

      rc= (*join_tab->next_select)(join, join_tab+1, 0);

      join->thd->get_stmt_da()->inc_current_row_for_warning();

      if (rc != NESTED_LOOP_OK)

        DBUG_RETURN(rc);

推測和結論

從代碼調試的結果看,subquery並不是執行一次就結束,mysql針對這個查詢,會先執行外層查詢(while循環,具體循環次數取決於記錄數),然後每一次都要調用evaluate_join_record 進行判斷(無論是p o表還是h h_1表),當取h h_1表時,每一次讀取都會對subquery進行一次編譯,循環往復,直至數據取完,所以在這個過程中,subquery的SQL會被執行很多次,造成耗時增加。

https://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain-extra-information

For DEPENDENT SUBQUERY, the subquery is re-evaluated only once for each set of different values of the variables from its outer context.

解決辦法

改寫為join查詢:

SELECT h_1.*, o.S FROM h h_1, p o, ( SELECT SUBSTRING_INDEX(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) AS ceshi FROM h h_11, p o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T ) alitest WHERE o.id = h_1.T AND h_1.id = alitest.ceshi

id select_T table T possible_keys key key_len ref rows filtered Extra
1 PRIMARY ALL NULL NULL NULL NULL 5 100.00 Using where
1 PRIMARY h_1 eq_ref PRIMARY,idx_T PRIMARY 4 alitest.ceshi 1 100.00 Using where
1 PRIMARY o eq_ref PRIMARY PRIMARY 4 alitest.h_1.T 1 100.00 NULL
2 DERIVED h_11 index_merge index_HI,idx_T,idx_F idx_F,index_HI 5,5 NULL 6 83.33 Using intersect(idx_F,index_HI); Using where; Using filesort
2 DERIVED o1 eq_ref PRIMARY PRIMARY 4 alitest.h_11.T 1 100.00 Using index

Leave a Reply

Your email address will not be published. Required fields are marked *