嚐鮮阿里雲容器服務Kubernetes 1.16，共享TensorFlow實驗室《二》–共享GPU的彈性

上一篇文章《嚐鮮阿里雲容器服務Kubernetes 1.16，共享TensorFlow實驗室》我們講述瞭如何通過CGPU的方案來實現CGPU資源的共享和隔離。
本文介紹基於CGPU資源的彈性能力。
ps：下面的說明是基於上一篇文章的環境來進行的描述，環境的搭建請參考上一篇文章。

配置彈性伸縮組

在“集群列表”中目標集群的“更多”的下拉菜單中選中“自動伸縮”
配置基礎的“縮容規則”後，“創建伸縮組”，選擇“共享GPU實例”
然後選中需要的類型，比如本例中選擇規格“ecs.gn6i-c4g1.xlarge”，其中我們已經默認設置了彈出節點的標籤 "cgpu: true, workload_type: gpushare"
點擊確定後，彈性伸縮組配置完成

觸發擴容

將下面的內存存儲為 mem_deployment.yaml，通過命令 kubectl apply -f mem_deployment.yaml 來初始化環境

---
# Define the tensorflow deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tf-notebook
  labels:
    app: tf-notebook
spec:
  replicas: 1
  selector: # define how the deployment finds the pods it mangages
    matchLabels:
      app: tf-notebook
  template: # define the pods specifications
    metadata:
      labels:
        app: tf-notebook
    spec:
      containers:
      - name: tf-notebook
        image: tensorflow/tensorflow:1.4.1-gpu-py3
        resources:
          limits:
            aliyun.com/gpu-mem: 4
          requests:
            aliyun.com/gpu-mem: 4
        ports:
        - containerPort: 8888
        env:
          - name: PASSWORD
            value: mypassw0rd

# Define the tensorflow service
---
apiVersion: v1
kind: Service
metadata:
  name: tf-notebook
spec:
  ports:
  - port: 80
    targetPort: 8888
    name: jupyter
  selector:
    app: tf-notebook
  type: LoadBalancer

通過命令kubectl scale --replicas 7 deploy/tf-notebook擴大副本數至7，觸發彈性伸縮組擴容

jumper(⎈ |zjk-gpu:default)➜  ~ kubectl scale --replicas 7 deploy/tf-notebook
deployment.extensions/tf-notebook scaled
jumper(⎈ |zjk-gpu:default)➜  ~ kubectl get pod -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP             NODE                           NOMINATED NODE   READINESS GATES
tf-notebook-7cf4575d78-dc2fr   0/1     Pending   0          19s   <none>         <none>                         <none>           <none>
tf-notebook-7cf4575d78-jm2cb   0/1     Pending   0          19s   <none>         <none>                         <none>           <none>
tf-notebook-7cf4575d78-lmn5w   0/1     Pending   0          19s   <none>         <none>                         <none>           <none>
tf-notebook-7cf4575d78-n9ldb   1/1     Running   0          19s   172.20.64.39   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-rzgtl   1/1     Running   0          19s   172.20.64.40   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-vzxvb   1/1     Running   0          58m   172.20.64.36   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-w6spt   0/1     Pending   0          19s   <none>         <none>                         <none>           <none>

#彈出資源需要一定的時間...

jumper(⎈ |zjk-gpu:default)➜  ~ kubectl get pod -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP             NODE                           NOMINATED NODE   READINESS GATES
tf-notebook-7cf4575d78-dc2fr   1/1     Running   0          2m10s   172.20.67.21   cn-zhangjiakou.192.168.3.198   <none>           <none>
tf-notebook-7cf4575d78-jm2cb   1/1     Running   0          2m10s   172.20.67.20   cn-zhangjiakou.192.168.3.198   <none>           <none>
tf-notebook-7cf4575d78-lmn5w   1/1     Running   0          2m10s   172.20.67.79   cn-zhangjiakou.192.168.3.199   <none>           <none>
tf-notebook-7cf4575d78-n9ldb   1/1     Running   0          2m10s   172.20.64.39   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-rzgtl   1/1     Running   0          2m10s   172.20.64.40   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-vzxvb   1/1     Running   0          60m     172.20.64.36   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-w6spt   1/1     Running   0          2m10s   172.20.67.22   cn-zhangjiakou.192.168.3.198   <none>           <none>
jumper(⎈ |zjk-gpu:default)➜  ~ kubectl get node -L cgpu,workload_type
NAME                           STATUS   ROLES    AGE    VERSION            CGPU   WORKLOAD_TYPE
cn-zhangjiakou.192.168.0.138   Ready    master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.1.112   Ready    master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.1.113   Ready    <none>   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.115   Ready    master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.184   Ready    <none>   8d     v1.16.6-aliyun.1   true
cn-zhangjiakou.192.168.3.189   Ready    <none>   7d9h   v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.198   Ready    <none>   134m   v1.16.6-aliyun.1   true   gpushare
cn-zhangjiakou.192.168.3.199   Ready    <none>   129m   v1.16.6-aliyun.1   true   gpushare
jumper(⎈ |zjk-gpu:default)➜  ~ arena top node -s -d

NAME:       cn-zhangjiakou.192.168.3.184
IPADDRESS:  192.168.3.184

NAME                          NAMESPACE  GPU0(Allocated)
tf-notebook-7cf4575d78-n9ldb  default    4
tf-notebook-7cf4575d78-rzgtl  default    4
tf-notebook-7cf4575d78-vzxvb  default    4
Allocated :                   12 (85%)
Total :                       14
----------------------------------------------------------------------------------------------------------------------------------

NAME:       cn-zhangjiakou.192.168.3.198
IPADDRESS:  192.168.3.198

NAME                          NAMESPACE  GPU0(Allocated)
tf-notebook-7cf4575d78-dc2fr  default    4
tf-notebook-7cf4575d78-jm2cb  default    4
tf-notebook-7cf4575d78-w6spt  default    4
Allocated :                   12 (85%)
Total :                       14
----------------------------------------------------------------------------------------------------------------------------------

NAME:       cn-zhangjiakou.192.168.3.199
IPADDRESS:  192.168.3.199

NAME                          NAMESPACE  GPU0(Allocated)
tf-notebook-7cf4575d78-lmn5w  default    4
Allocated :                   4 (28%)
Total :                       14
----------------------------------------------------------------------------------------------------------------------------------


Allocated/Total GPU Memory In GPUShare Node:
28/42 (GiB) (66%)

如上所示，當副本數調至7時，額外彈出了兩個gpu節點，“cgpu: true,workload_type: gpushare”
通過arena的命令可以看到顯存資源使用了 28/42

觸發縮容

由上可見，對於共享型GPU，是可以正常的彈出資源的。接下來我們把資源釋放，來驗證共享GPU資源的縮容情況

jumper(⎈ |zjk-gpu:default)➜  ~ kubectl scale --replicas 1 deploy/tf-notebook
deployment.extensions/tf-notebook scaled
jumper(⎈ |zjk-gpu:default)➜  ~ kubectl  get pod -o wide
NAME                           READY   STATUS        RESTARTS   AGE    IP             NODE                           NOMINATED NODE   READINESS GATES
tf-notebook-7cf4575d78-dc2fr   1/1     Terminating   0          4m7s   172.20.67.21   cn-zhangjiakou.192.168.3.198   <none>           <none>
tf-notebook-7cf4575d78-jm2cb   1/1     Terminating   0          4m7s   172.20.67.20   cn-zhangjiakou.192.168.3.198   <none>           <none>
tf-notebook-7cf4575d78-lmn5w   1/1     Terminating   0          4m7s   172.20.67.79   cn-zhangjiakou.192.168.3.199   <none>           <none>
tf-notebook-7cf4575d78-n9ldb   1/1     Terminating   0          4m7s   172.20.64.39   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-rzgtl   1/1     Terminating   0          4m7s   172.20.64.40   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-vzxvb   1/1     Running       0          62m    172.20.64.36   cn-zhangjiakou.192.168.3.184   <none>           <none>
tf-notebook-7cf4575d78-w6spt   1/1     Terminating   0          4m7s   172.20.67.22   cn-zhangjiakou.192.168.3.198   <none>           <none>
jumper(⎈ |zjk-gpu:default)➜  ~ kubectl get node
NAME                           STATUS   ROLES    AGE    VERSION
cn-zhangjiakou.192.168.0.138   Ready    master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.1.112   Ready    master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.1.113   Ready    <none>   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.115   Ready    master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.184   Ready    <none>   8d     v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.189   Ready    <none>   7d8h   v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.198   Ready    <none>   78m    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.199   Ready    <none>   73m    v1.16.6-aliyun.1
#此時新彈出來的機器的狀態都是Ready，在下一個縮容週期中會縮容這些新彈出的Node，一段時間之後，這個時間取決於彈性伸縮組的縮容週期的設置

jumper(⎈ |zjk-gpu:default)➜  ~ kubectl  get node
NAME                           STATUS     ROLES    AGE    VERSION
cn-zhangjiakou.192.168.0.138   Ready      master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.1.112   Ready      master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.1.113   Ready      <none>   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.115   Ready      master   19d    v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.184   Ready      <none>   8d     v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.189   Ready      <none>   7d9h   v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.198   NotReady   <none>   142m   v1.16.6-aliyun.1
cn-zhangjiakou.192.168.3.199   NotReady   <none>   137m   v1.16.6-aliyun.1

如上所示，通過降低副本數後，經過一段時間，新彈出的機器會重新釋放 -- 此處使用了ECS的極速模式，故大家看到的狀態是NotReady而不是節點直接消失，極速模式可以讓下次啟動的速度更快，代價是會產生少量的存儲費用。