實為吾之愚見,望諸君酌之!聞過則喜,與君共勉
環境
version | 5.6.24-debug |
| version_comment | Source distribution |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
SQL
該SQL是一個subquery SQL
SELECT h_1.*, o.S
FROM h h_1, p
o WHERE o.id = h_1.T AND h_1.id IN ( SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p
o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T )
問題
subquery內的單獨的SQL耗時0.01S,合併起來後,整個SQL耗時4min20S,耗時非常長
執行時間與執行計劃對比
整個SQL的執行時間與執行計劃:
SELECT h_1.*, o.S
FROM h h_1, p
o WHERE o.id = h_1.T AND h_1.id IN ( SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p
o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T )
7 rows in set (4 min 20.57 sec)
id | select_T | table | T | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | o | ALL | PRIMARY | NULL | NULL | NULL | 150 | NULL |
1 | PRIMARY | h_1 | ref | idx_T | idx_T | 5 | alitest.o.id | 278 | Using where |
2 | DEPENDENT SUBQUERY | h_11 | index_merge | index_HI,idx_T,idx_F | idx_F,index_HI | 5,5 | NULL | 6 | Using intersect(idx_F,index_HI); Using where; Using filesort |
2 | DEPENDENT SUBQUERY | o1 | eq_ref | PRIMARY | PRIMARY | 4 | alitest.h_11.T | 1 | Using index |
SQL拆分執行時間如下:
subquery SQL:
SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p
o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T
7 rows in set (0.01 sec)
id | select_T | table | T | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | h_11 | index_merge | index_HI,idx_T,idx_F | idx_F,index_HI | 5,5 | NULL | 6 | Using intersect(idx_F,index_HI); Using where; Using filesort |
1 | SIMPLE | o1 | eq_ref | PRIMARY | PRIMARY | 4 | alitest.h_11.T | 1 | Using index |
外層SQL:
SELECT h_1.*, o.S
FROM h h_1, p
o WHERE o.id = h_1.T
60000 rows in set (1.38 sec)
id | select_T | table | T | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | o | ALL | PRIMARY | NULL | NULL | NULL | 150 | NULL |
1 | SIMPLE | h_1 | ref | idx_T | idx_T | 5 | alitest.o.id | 278 | NULL |
問題分析
分析方法
藉助GDB調試MYSQL,確認問題
耗時環節代碼
該SQL整體執行時,代碼的主要執行部分分為2部分,這兩部分構成了MYSQL的nested loop算法,分別如下:
代碼1
sub_select (join=0x7fbe78005808, join_tab=0x7fbe78006738, end_of_records=false) at /opt/mysql-5.6.24/sql/sql_executor.cc:1203
主要代碼塊:該代碼塊以while進行循環,獲取多表關聯時第一個表的數據(取決於執行計劃的執行順序)循環讀取並進行比較判斷,while循環結束的前提是error<0,也就是數據取完
while (rc == NESTED_LOOP_OK && join->return_tab >= join_tab)
{
int error;
if (in_first_read)
{
in_first_read= false;
//表的read first record記錄
error= (*join_tab->read_first_record)(join_tab);
}
else
////取出表的下一行記錄直到最後一條記錄
error= info->read_record(info);
DBUG_EXECUTE_IF("bug13822652_1", join->thd->killed= THD::KILL_QUERY;);
if (error > 0 || (join->thd->is_error())) // Fatal error
rc= NESTED_LOOP_ERROR;
else if (error < 0)
//以error狀態判斷數據是否取完,取完後循環在此終止
break;
else if (join->thd->killed) // Aborted by user
{
join->thd->send_kill_message();
rc= NESTED_LOOP_KILLED;
}
else
{
if (join_tab->keep_current_rowid)
join_tab->table->file->position(join_tab->table->record[0]);
//對獲取到的行記錄,進行比較,該函數內部可能會繼續調用sub select,產生nest loop
rc= evaluate_join_record(join, join_tab);
}
}
代碼2
evaluate_join_record (join=0x7fbe64005478, join_tab=0x7fbe640063a8) at /opt/mysql-5.6.24/sql/sql_executor.cc:1449
主要代碼塊:
@@1部分主要對拿到的數據進行判斷,確認是否符合where後的條件,以該SQL為例,如果從表h h_1裡拿到了一行數據,因為該表where後有判斷條件,條件為:
h_1.id IN ( SELECT substring_index(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) FROM h h_11, p
o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T )
則該代碼塊(@@1)會對這個subquery進行調用(相當於重新執行一次這個subquery,gdb跟蹤時可以跟蹤到最終調用JOIN::exec->do_select->sub_select->evaluate_join_record),所以沒取一次數據,就要對其進行判斷,故這個subquery每一次都要重新執行,它並不是只執行一次拿到數據然後對比。
@@2 部分的*join_tab->next_select會重新調用sub_select,進入循環部分,獲取下一個關聯表的數據,並再次進入evaluate_join_record 進行一系列判斷,直至數據取完
@@1
if (condition)
{
found= MY_TEST(condition->val_int());
if (join->thd->killed)
{
join->thd->send_kill_message();
DBUG_RETURN(NESTED_LOOP_KILLED);
}
/* check for errors evaluating the condition */
if (join->thd->is_error())
DBUG_RETURN(NESTED_LOOP_ERROR);
}
@@2
enum enum_nested_loop_state rc;
/* A match from join_tab is found for the current partial join. */
rc= (*join_tab->next_select)(join, join_tab+1, 0);
join->thd->get_stmt_da()->inc_current_row_for_warning();
if (rc != NESTED_LOOP_OK)
DBUG_RETURN(rc);
推測和結論
從代碼調試的結果看,subquery並不是執行一次就結束,mysql針對這個查詢,會先執行外層查詢(while循環,具體循環次數取決於記錄數),然後每一次都要調用evaluate_join_record 進行判斷(無論是p o表還是h h_1表),當取h h_1表時,每一次讀取都會對subquery進行一次編譯,循環往復,直至數據取完,所以在這個過程中,subquery的SQL會被執行很多次,造成耗時增加。
https://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain-extra-information
For DEPENDENT SUBQUERY, the subquery is re-evaluated only once for each set of different values of the variables from its outer context.
解決辦法
改寫為join查詢:
SELECT h_1.*, o.S
FROM h h_1, p
o, ( SELECT SUBSTRING_INDEX(GROUP_CONCAT(h_11.id ORDER BY h_11.C DESC), ',', 1) AS ceshi FROM h h_11, p
o1 WHERE h_11.HI = 90 AND h_11.F = 81 AND o1.id = h_11.T GROUP BY T ) alitest WHERE o.id = h_1.T AND h_1.id = alitest.ceshi
id | select_T | table | T | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where | |
1 | PRIMARY | h_1 | eq_ref | PRIMARY,idx_T | PRIMARY | 4 | alitest.ceshi | 1 | 100.00 | Using where |
1 | PRIMARY | o | eq_ref | PRIMARY | PRIMARY | 4 | alitest.h_1.T | 1 | 100.00 | NULL |
2 | DERIVED | h_11 | index_merge | index_HI,idx_T,idx_F | idx_F,index_HI | 5,5 | NULL | 6 | 83.33 | Using intersect(idx_F,index_HI); Using where; Using filesort |
2 | DERIVED | o1 | eq_ref | PRIMARY | PRIMARY | 4 | alitest.h_11.T | 1 | 100.00 | Using index |