Scope

在上一篇文章中提到Envoy中通過Scope來創建Metrics，為什麼要搞一個Scope的東西出來呢?Scope誕生的目的其實是為了更好的管理一組stats，比如關於集群的stats，這類stats的名稱有個特點就是都是以cluster.作為前綴，那麼可以以cluster.來創建一個Scope，這樣就可以通過這個Scope來管理所有的集群相關的stats，而且通過這個Scope創建的stats其名稱可以省略掉cluster.前綴，這樣可以節約很多內存資源。通過Scope還可以創建Scope，創建的Scope的名字會帶上父Scope的名稱。

上面這張圖表示的是兩個集群的upstream_rq_total這個指標使用Scope的表示形式。完整的指標名稱是cluster.http1_cluster.upstream_rq_total和cluster.http2_cluster.upstream_rq_total在Envoy中會首先創建一個cluster.的Scope，然後通過這個Scope創建一個http1_cluster.的Scope，然後再創建一個http2_cluster.的Scope，最後分別利用這兩個Scope創建upstream_rq_total stats。通過Scope一來可以有效的管理一組stats，另外通過Scope可以讓一類stats共享stats前綴。避免冗餘的stats字符串。例如上面的upstream_rq_total只需要存放upstream_rq_total這個字符串即可，可以共享對應Scope提供的前綴

  ScopePtr root_scope = store_->createScope("cluster.");
  auto http1_scope = root_scope->createScope("http1_cluster.");
  auto http2_scope = root_scope->createScope("http2_cluster.");
  auto upstream_rq_total_http1 = http1_scope->counter("upstream_rq_total");
  auto upstream_rq_total_http2 = http2_scope->counter("upstream_rq_total");

Store、ThreadLocalStore、 TlsScope

有了Scope後那如何去創建Scope呢?，如何去管理所有的Scope創建的Metrics呢?

Store繼承自Scope接口，並額外增加了counters、gauges、histograms三個方法用於從所有的Scope中彙總所有的Metrics。StoreRoot繼承Store並添加了和TagProducer、StatsMatcher、Sink相關的三個方法，最後ThreadLocalStoreImpl實現了這三個接口。首先來看下createScope方法，這是用來創建一個Scope然後返回，所有的Scope都存放在scopes_成員中。這裡返回的Scope具體類型是ScopeImpl，繼承自TlsScope。

ScopePtr ThreadLocalStoreImpl::createScope(const std::string& name) {
  auto new_scope = std::make_unique<ScopeImpl>(*this, name);
  Thread::LockGuard lock(lock_);
  scopes_.emplace(new_scope.get());
  return new_scope;
}

接著我們看下TlsScope。

class TlsScope : public Scope {
public:
  ~TlsScope() override = default;
  virtual Histogram& tlsHistogram(StatName name, ParentHistogramImpl& parent) PURE;
};

只是額外添加了一個tlsHistogram方法而已，繼續看下它的實現。

  struct ScopeImpl : public TlsScope {
    ......
    ScopePtr createScope(const std::string& name) override {
      return parent_.createScope(symbolTable().toString(prefix_.statName()) + "." + name);
    }
        ....
    static std::atomic<uint64_t> next_scope_id_;

    const uint64_t scope_id_;
    ThreadLocalStoreImpl& parent_;
    StatNameStorage prefix_;
    mutable CentralCacheEntry central_cache_;
  };

  struct CentralCacheEntry {
    StatMap<CounterSharedPtr> counters_;
    StatMap<GaugeSharedPtr> gauges_;
    StatMap<ParentHistogramImplSharedPtr> histograms_;
    StatNameStorageSet rejected_stats_;
  };

每一個Scope都有一個CentralCacheEntry成員用於存放緩存的Metrics，createScope方法最終調用的還是ThreadLocalStoreImpl::createScope，所以ThreadLocalStoreImpl中可以保存所有創建的Scope。接下來看下ScopeImpl是如何創建Metrics的。

Counter& ScopeImpl::counter(const std::string& name) override {
  StatNameManagedStorage storage(name, symbolTable());
  return counterFromStatName(storage.statName());
}
Counter& ScopeImpl::counterFromStatName(StatName name) {
  // Setp1: 先通過StatsMatcher模塊檢查是否拒絕產生Stats，如果是就直接返回的一個NullCounter
  if (parent_.rejectsAll()) {
    return parent_.null_counter_;
  }

  // Setp2: 拼接完整的stat name
  Stats::SymbolTable::StoragePtr final_name = symbolTable().join({prefix_.statName(), name});
  StatName final_stat_name(final_name.get());

  // Setp3: 從thread local緩存中獲取scope的緩存
  StatMap<CounterSharedPtr>* tls_cache = nullptr;
  StatNameHashSet* tls_rejected_stats = nullptr;
  if (!parent_.shutting_down_ && parent_.tls_) {
    TlsCacheEntry& entry = parent_.tls_->getTyped<TlsCache>().scope_cache_[this->scope_id_];
    tls_cache = &entry.counters_;
    tls_rejected_stats = &entry.rejected_stats_;
  }
    // Setp4: 創建Counter
  return safeMakeStat<Counter>(
      final_stat_name, central_cache_.counters_, central_cache_.rejected_stats_,
      [](Allocator& allocator, StatName name, absl::string_view tag_extracted_name,
         const std::vector<Tag>& tags) -> CounterSharedPtr {
        return allocator.makeCounter(name, tag_extracted_name, tags);
      },
      tls_cache, tls_rejected_stats, parent_.null_counter_);
}

為什麼創建一個Counter要去拿TlsCache呢?，TlsCacheEntry和CentralCacheEntry是什麼關係呢?

struct TlsCache : public ThreadLocal::ThreadLocalObject {
  absl::flat_hash_map<uint64_t, TlsCacheEntry> scope_cache_;
};

struct TlsCacheEntry {
    StatMap<CounterSharedPtr> counters_;
    StatMap<GaugeSharedPtr> gauges_;
    StatMap<TlsHistogramSharedPtr> histograms_;
    StatMap<ParentHistogramSharedPtr> parent_histograms_;
    StatNameHashSet rejected_stats_;
  };

可以看出這個TlsCache中存放的內容是一個Map，key是Scope id(目的是為了可以在ThreadLocal中存放多個Scope，通過Scope id來區分)，value是一個TlsCacheEntry，這個結構和Scope內的CentralCacheEntry是一模一樣的。做這些的目的其實還是為了能讓Envoy可以在核心流程中無鎖的進行stats的統計。如果多個線程共享同一個Scope，那麼每一個線程都通過同一個Scope來訪問CentralCacheEntry，那麼自然會存在多線程的問題，也就是說每次訪問CentralCacheEntry都需要加鎖。如果每一個線程都有一個自己獨立的Scope，每一個Scope共享相同的Metrics，每個線程訪問自己的Scope是線程安全的，然後找到對應的Metrics，這個Metrics本身的操作是線程安全的，這樣就可以使得整個過程是無鎖的了。為此Scope和內部存放的Metrics是解耦的，默認CentralCacheEntry為空，每當獲取一個stats的時候，先查ThreadLocal中是否存在，不存在就去看CentralCacheEntry，沒有的話就創建stats，然後放入CentralCacheEntry中，然後再存一份到ThreadLocal中，這樣做的目的是為了可以在主線程可以通過遍歷所有的Scope拿到CentralCacheEntry來最最後的彙總，具體的代碼分析可以看下面的註釋。

template <class StatType>
StatType& ThreadLocalStoreImpl::ScopeImpl::safeMakeStat(
    StatName name, StatMap<RefcountPtr<StatType>>& central_cache_map,
    StatNameStorageSet& central_rejected_stats, MakeStatFn<StatType> make_stat,
    StatMap<RefcountPtr<StatType>>* tls_cache, StatNameHashSet* tls_rejected_stats,
    StatType& null_stat) {
    // Setp1: 這個stats是否被rejected
  if (tls_rejected_stats != nullptr &&
      tls_rejected_stats->find(name) != tls_rejected_stats->end()) {
    return null_stat;
  }
    // Setp2: 查看Tls cache是否存在，存在就直接返回
  // If we have a valid cache entry, return it.
  if (tls_cache) {
    auto pos = tls_cache->find(name);
    if (pos != tls_cache->end()) {
      return *pos->second;
    }
  }

  // We must now look in the central store so we must be locked. We grab a reference to the
  // central store location. It might contain nothing. In this case, we allocate a new stat.
  // Setp3: 搜索central_cache，如果不存在就創建stats，這裡要加鎖的，因為主線程會訪問            
  //                 central_cache，其他線程也會操作central_cache。
  Thread::LockGuard lock(parent_.lock_);
  auto iter = central_cache_map.find(name);
  RefcountPtr<StatType>* central_ref = nullptr;
  if (iter != central_cache_map.end()) {
    central_ref = &(iter->second);
  } else if (parent_.checkAndRememberRejection(name, central_rejected_stats, tls_rejected_stats)) {
    // Note that again we do the name-rejection lookup on the untruncated name.
    return null_stat;
  } else {
    TagExtraction extraction(parent_, name);
    RefcountPtr<StatType> stat =
        make_stat(parent_.alloc_, name, extraction.tagExtractedName(), extraction.tags());
    ASSERT(stat != nullptr);
    central_ref = &central_cache_map[stat->statName()];
    *central_ref = stat;
  }
    
  // Step4: 往Tls中也插入一份，使得Tls cache和central cache保持一致
  // If we have a TLS cache, insert the stat.
  if (tls_cache) {
    tls_cache->insert(std::make_pair((*central_ref)->statName(), *central_ref));
  }

  // Finally we return the reference.
  return **central_ref;
}

整個Scope的TlsCache、Central cache以及Metrics的的關係可以用下面這張圖來表示。

IsolatedStoreImpl

最後來講解下IsolatedStoreImpl，總的來說Envoy的stats store存在兩個類別，一類就是ThreadLocalStore，這類store可以通過StoreRoot接口添加TagProducer、StatsMatcher以及設置Sink，也就是說這類Store存儲的stats可以進行Tag的提取、可以通過配置的Sink把stats發送到其他地方，目前Envoy支持的Sink有statsd、dog_statsd、metrics_service、hystrix等，發送stats的時候還可以根據配置的StatsMatcher有選擇的發送符合要求的stats，另外一類的stats store就是IsolatedStoreImpl，這類stats store僅僅是用來存儲Envoy內部使用的一些stats，比如per upstream host的stats統計。這類stats量很大，它使用的就是IsolatedStoreImpl，也不會通過admin的stats接口暴露出去。IsolatedStoreImpl另外的一個用途就是單元測試。

總結

本文首先講解了Scope的設計意圖，通過Scope可以管理一組stats，還可以共享stats前綴，避免不必要的字符串冗餘，接著講解了stats store，一類是ThreadLocalStore，這類store通過central cache和Tls cache的設計避免了加鎖操作，每個線程都會創建Scope還有對應的，每一個Scope都有一個central cache以及在ThreadLocal中有一個TlsCache，所有的這些Cache引用的Metrics是共享的。另外一類是IsolatedStoreImpl，是非線程安全的，在Envoy中主要用於兩個地方，一個是per host的stats統計，另外一個則是單元測試，充當一個簡單的stats store來進行stats統計相關的測試。

Envoy源碼分析之Stats Scope

Scope

Store、ThreadLocalStore、 TlsScope

IsolatedStoreImpl

總結

Leave a Reply Cancel reply