[MySQL]時區設置引發的卡頓

作者：田傑
查詢執行時間長引發應用感知 “卡頓” 的場景在數據庫的日常支持和使用中並不少見，但由於時區設置引發的 SQL 執行“卡頓”仍然是一個有趣的現象，之前沒有具體關注過。
這次客戶的細緻與堅持讓我們找到了問題的源頭。

1. 名詞解釋

序列號	名詞	說明
1	CPU 使用率	非空閒的 CPU 時間佔比。
2	User CPU 使用率	用戶空間（user-space）應用代碼消耗的 CPU 時間佔比。
3	Sys CPU 使用率	系統空間（sys-space）內核代碼消耗 CPU 時間佔比。
4	Futex	Linux 內核提供的快速用戶態鎖/信號量；在無競爭場景完全在用戶空間中運行，但在存在競爭場景會引發系統調用。

問題現象
客戶 MySQL 8.0 實例在 2020-03-19 22:03 ~ 22:04 出現大量活躍連接堆積，慢日誌中出現大量低成本查詢，並且 CPU 使用率不高但系統 SYS CPU 使用率出現異常波動。

3. 問題排查

3.1 OS 層面

我們來考慮一下有哪些因素可能會導致卡頓：
• 物理機 OS 層面波動（通過 IO_WAIT 指標排除）。
• MySQL 自身機制。

3.2 MySQL 層面

排除掉 OS 層面異常類因素，我們開始聚焦在 mysqld 進程調用棧的分析。
為了更好的分析 MySQL 的行為，阿里數據庫提供了扁鵲系統來跟蹤、統計和展示確定時間內的進程內部方法調用情況。

我們分析上圖可以看到 40.5% 的 CPU 時間消耗在 Time_zone_system::gmt_sec_to_TIME() 方法的調用上，就是以下這一段的代碼。

void Time_zone_system::gmt_sec_to_TIME(MYSQL_TIME *tmp, my_time_t t) const {

  struct tm tmp_tm;

  time_t tmp_t = (time_t)t;

  localtime_r(&tmp_t, &tmp_tm);

  localtime_to_TIME(tmp, &tmp_tm);

  tmp->time_type = MYSQL_TIMESTAMP_DATETIME;

  adjust_leap_second(tmp);

}

仔細閱讀這段代碼會發現 localtime_to_TIME() 和 adjust_leap_second() 都是簡單的格式轉換和計算，並不涉及系統調用。
而 localtime_r() 會涉及到 glibc 中的 __localtime_r() 方法，代碼如下

/* Return the `struct tm' representation of *T in local time,

   using *TP to store the result.  */

struct tm *

__localtime_r (t, tp)

     const time_t *t;

     struct tm *tp;

{

  return __tz_convert (t, 1, tp);

}

weak_alias (__localtime_r, localtime_r)

我們繼續下鑽來看一下 __tz_convert() 的實現，代碼如下

/* Return the `struct tm' representation of *TIMER in the local timezone.

 Use local time if USE_LOCALTIME is nonzero, UTC otherwise.  */

struct tm *

__tz_convert (const time_t *timer, int use_localtime, struct tm *tp)

{

long int leap_correction;

int leap_extra_secs;

if (timer == NULL)
  {
    __set_errno (EINVAL);
    return NULL;
  }
__libc_lock_lock (tzset_lock);
/* Update internal database according to current TZ setting.
   POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
   This is a good idea since this allows at least a bit more parallelism.  */
tzset_internal (tp == &_tmbuf && use_localtime, 1);
if (__use_tzfile)
  __tzfile_compute (*timer, use_localtime, &leap_correction,
        &leap_extra_secs, tp);
else
  {
    if (! __offtime (timer, 0, tp))
tp = NULL;
    else
__tz_compute (*timer, tp, use_localtime);
    leap_correction = 0L;
    leap_extra_secs = 0;
  }
if (tp)
  {
    if (! use_localtime)
{
  tp->tm_isdst = 0;
  tp->tm_zone = "GMT";
  tp->tm_gmtoff = 0L;
}
    if (__offtime (timer, tp->tm_gmtoff - leap_correction, tp))
      tp->tm_sec += leap_extra_secs;
    else
tp = NULL;
  }
__libc_lock_unlock (tzset_lock);
return tp;
}

注意到代碼中有加鎖和解鎖的操作出現，那麼現在我們來看一下 __libc_lock_lock() 的定義，代碼如下

#if IS_IN (libc) || IS_IN (libpthread)

# ifndef __libc_lock_lock

#  define __libc_lock_lock(NAME) \

  ({ lll_lock (NAME, LLL_PRIVATE); 0; })

# endif

#else

# undef __libc_lock_lock

# define __libc_lock_lock(NAME) \

  __libc_maybe_call (__pthread_mutex_lock, (&(NAME)), 0)

#endif

繼續追溯 lll_lock() 的實現，代碼如下

static inline void
__attribute__ ((always_inline))
__lll_lock (int *futex, int private)
{
  int val = atomic_compare_and_exchange_val_24_acq (futex, 1, 0);
  if (__glibc_unlikely (val != 0))
    {
      if (__builtin_constant_p (private) && private == LLL_PRIVATE)
        __lll_lock_wait_private (futex);
      else
        __lll_lock_wait (futex, private);
    }
}
#define lll_lock(futex, private) __lll_lock (&(futex), private)

可以看到代碼中使用 atomic_compare_and_exchange_val_24_acq() 嘗試對 futex 加鎖。
而 futex 作為多個 thread 間共享的一塊內存區域在多個 client thread（多個會話/查詢）競爭的場景下會引發系統調用而進入系統態，導致 SYS 系統態 CPU 使用率上升。
並且該臨界區保護的鎖機制限制了時區轉換方法 __tz_convert() 的併發度，進而出現多個會話/查詢等待獲取鎖進入臨界區的情況，當衝突爭搶激烈的場景下引發卡頓
那麼是什麼引發的
Time_zone_system::gmt_sec_to_TIME() 調用呢，追溯下 Field_timestampf::get_date_internal() 方法，代碼如下

bool Field_timestampf::get_date_internal(MYSQL_TIME *ltime) {
  THD *thd = table ? table->in_use : current_thd;
  struct timeval tm;
  my_timestamp_from_binary(&tm, ptr, dec);
  if (tm.tv_sec == 0) return true;
  thd->time_zone()->gmt_sec_to_TIME(ltime, tm);
  return false;
}

該方法中調用了基類 Time_zone 的虛函數 gmt_sec_to_TIME() 來進行帶時區的秒到時間格式的轉換，結合 Field_timestampf::get_date_internal() 的名稱能夠推斷出查詢中應該涉及了 timestamp 數據類型的訪問。
基於上面的推測我們驗證下卡頓的查詢和其數據類型

# 慢查詢
SELECT 
    id, 
    ......
    create_time, update_time, 
    ...... 
FROM mytab 
WHERE duid IN (?,?,?,?,? ) 
and (state in (2, 3) 
    or ptype !=0)
# 查詢涉及的表
CREATE TABLE `mytab` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `duid` char(32) NOT NULL,
  ......
  `state` tinyint(2) unsigned NOT NULL DEFAULT '0',
  `ptype` tinyint(4) NOT NULL DEFAULT '0',
  `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  ......,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB

從上面的信息能夠看到 create_time 和 update_time 字段都是 timestamp 數據類型，驗證了之前的猜測。

4. 問題解決

在上面分析的基礎上可以看到調用 Time_zone_system::gmt_sec_to_TIME() 引入的 OS 層面 futex 鎖競爭導致了低成本查詢執行卡頓。
為了規避調用該方法，可以在實例控制檯將 time_zone 參數值由 system 調整為當地時區，比如中國東 8 區時區 '+8:00'。
修改後，會調用 Time_zone_offset::gmt_sec_to_TIME() 來直接在 MySQL 層面進行計算，避免訪問 glibc 的函數引發 OS 層面的加解鎖。
修改效果對比（對比執行同樣次數的 timestamp 數據類型查詢完成時間）
time_zone='system'，需要約 15 分鐘完成

time_zone='+8:00'，需要約 5 分鐘完成

5. 最佳實踐

高併發應用如果涉及到高頻次的 timestamp 類型數據訪問：
• 如果確實要使用 timestamp 類型，建議控制檯設置 time_zone 參數為 UTC/GMT 偏移量格式，比如東8區 '+8:00'，可以有效降低高併發查詢執行開銷，降低響應時間 RT。
• 由於 MySQL 5.7 版本後 Datatime 類型支持 Timestamp 類型的默認值並且支持 on update current_timestamp 屬性，建議使用 Datetime 類型替換 Timestamp 類型。