開發與維運

數據加工 – 日誌富化實戰

數據加工介紹

數據加工是阿里雲日誌服務推出的一項功能,用於對結構化或非結構化日誌進行實時ETL行處理。目前該功能包含了200+的算子,本文從富化場景出發,介紹如何在數據加工過程中使用富化函數。
PS: 我們這裡講的富化,對應的是SQL ETL場景的join

圖片.png

數據加工的入口:
打開sls日誌服務,選擇相關logstore的查詢頁面後,可以看到有一個 “數據加工”的開關,打開這個開關就可以寫數據加工的代碼了。
圖片.png
數據加工函數總覽:http://help.aliyun.com/document_detail/159702.html

場景設定

本文以Nginx日誌 http code富化為例拋磚引玉,幫助大家熟悉數據加工中的日誌富化方法
http返回碼在訪問日誌中比較常見,將返回碼富化,可以讓我們更直觀地看到每個請求的狀態,做更多統計工作。

圖片.png

下面是常見的http code碼含義的映射表
圖片.png

使用數據加工 進行日誌富化

方式1 - 使用res_local 高級參數

假設,我們富化的數據是一個csv 保存了code的映射關係

code,alias,category,description
100,1xx,Informational,Continue
101,1xx,Informational,Switching Protocols
...

將code的映射關係保存為數據加工的高級參數,key為 http_code, 值為csv文件內容

e_table_map(tab_parse_csv("code,alias,category,description\n100,1xx,Informational,Continue\n101,1xx,Informational,Switching Protocols\n102,1xx,Informational,Processing (WebDAV)\n200,2xx,Success,OK\n201,2xx,Success,Created\n202,2xx,Success,Accepted\n203,2xx,Success,Non-Authoritative Information\n204,2xx,Success,No Content\n205,2xx,Success,Reset Content\n206,2xx,Success,Partial Content\n207,2xx,Success,Multi-Status (WebDAV)\n208,2xx,Success,Already Reported (WebDAV)\n226,2xx,Success,IM Used\n300,3xx,Redirection,Multiple Choices\n301,3xx,Redirection,Moved Permanently\n302,3xx,Redirection,Found\n303,3xx,Redirection,See Other\n304,3xx,Redirection,Not Modified\n305,3xx,Redirection,Use Proxy\n306,3xx,Redirection,(Unused)\n307,3xx,Redirection,Temporary Redirect\n308,3xx,Redirection,Permanent Redirect (experimental)\n400,4xx,Client Error,Bad Request\n401,4xx,Client Error,Unauthorized\n402,4xx,Client Error,Payment Required\n403,4xx,Client Error,Forbidden\n404,4xx,Client Error,Not Found\n405,4xx,Client Error,Method Not Allowed\n406,4xx,Client Error,Not Acceptable\n407,4xx,Client Error,Proxy Authentication Required\n408,4xx,Client Error,Request Timeout\n409,4xx,Client Error,Conflict\n410,4xx,Client Error,Gone\n411,4xx,Client Error,Length Required\n412,4xx,Client Error,Precondition Failed\n413,4xx,Client Error,Request Entity Too Large\n414,4xx,Client Error,Request-URI Too Long\n415,4xx,Client Error,Unsupported Media Type\n416,4xx,Client Error,Requested Range Not Satisfiable\n417,4xx,Client Error,Expectation Failed\n418,4xx,Client Error,I'm a teapot (RFC 2324)\n420,4xx,Client Error,Enhance Your Calm (Twitter)\n422,4xx,Client Error,Unprocessable Entity (WebDAV)\n423,4xx,Client Error,Locked (WebDAV)\n424,4xx,Client Error,Failed Dependency (WebDAV)\n425,4xx,Client Error,Reserved for WebDAV\n426,4xx,Client Error,Upgrade Required\n428,4xx,Client Error,Precondition Required\n429,4xx,Client Error,Too Many Requests\n431,4xx,Client Error,Request Header Fields Too Large\n444,4xx,Client Error,No Response (Nginx)\n449,4xx,Client Error,Retry With (Microsoft)\n450,4xx,Client Error,Blocked by Windows Parental Controls (Microsoft)\n451,4xx,Client Error,Unavailable For Legal Reasons\n499,4xx,Client Error,Client Closed Request (Nginx)\n500,5xx,Server Error,Internal Server Error\n501,5xx,Server Error,Not Implemented\n502,5xx,Server Error,Bad Gateway\n503,5xx,Server Error,Service Unavailable\n504,5xx,Server Error,Gateway Timeout\n505,5xx,Server Error,HTTP Version Not Supported\n506,5xx,Server Error,Variant Also Negotiates (Experimental)\n507,5xx,Server Error,Insufficient Storage (WebDAV)\n508,5xx,Server Error,Loop Detected (WebDAV)\n509,5xx,Server Error,Bandwidth Limit Exceeded (Apache)\n510,5xx,Server Error,Not Extended\n511,5xx,Server Error,Network Authentication Required\n598,5xx,Server Error,Network read timeout error\n599,5xx,Server Error,Network connect timeout error\n"),
              [("http_code","code")],
              [("alias","http_code_alias"), ("description","http_code_desc"), 
              ("category","http_code_category")])

效果:
圖片.png

方式2 - 通過使用OSS文件實現富化

假設,我們的http code映射關係存在一個文件裡。格式如下:

code,alias,category,description
100,1xx,Informational,Continue
101,1xx,Informational,Switching Protocols
...

上傳 http_code.csv文件到oss
打開OSS控制檯 http://oss.console.aliyun.com
找到已有的bucket或者新建一個bucket,根據控制檯指引上傳文件

圖片.png

使用加工,富化

e_table_map(
      tab_parse_csv(
           res_oss_file(endpoint="oss-cn-shanghai-internal.aliyuncs.com",
              ak_id=res_local("AK_ID"), ak_key=res_local("AK_KEY"), 
              bucket="ali-sls-etl-test", 
              file="http_code.csv", format='text')),
              [("http_code","code")],
              [("alias","http_code_alias"),
               ("description","http_code_desc"),
               ("category","http_code_category")])

res_local 引用的值需要在高級參數裡定義。

圖片.png
效果:

圖片.png

方式3 - 通過MySQL 表實現富化

假設,我們的http_code映射關係存在一個mysql表裡
圖片.png

使用加工,富化

e_table_map(res_rds_mysql(address="MySQL主機地址", 
                  username="用戶名", password="密碼",
                  database="數據庫",table="表名", refresh_interval=300),
              [("http_code","code")],
              [("alias","http_code_alias"), ("description","http_code_desc"), 
              ("category","http_code_category")])
              

注意: 數據加工支持vpc方法方式rds,配置vpc打通可以參考:https://help.aliyun.com/document_detail/162753.html
效果:

圖片.png

總結

整體流程

圖片.png

方法比較

圖片.png

限制:所有維表限制在2G

參考材料

https://help.aliyun.com/document_detail/125489.html 富化函數
https://help.aliyun.com/document_detail/129401.html 資源函數
https://www.restapitutorial.com/httpstatuscodes.html http code碼

Leave a Reply

Your email address will not be published. Required fields are marked *