赞
踩
当对Hive的分区表进行cube的构建时,可以进行每天定时增量构建。就需要编写linux脚本,动态获取增量构建的开始时间和结束时间,再调用kylin提供的Restful API。最终通过定时调度工具进行脚本的执行,实现定时cube增量构建
用base64进行kylin的用户名和密码编码,返回值就是Authorization:Basic。在进行Restful API请求的时候,就可以用这个来验证了
[root@kylin1 ~]# python -c "import base64; print base64.standard_b64encode('ADMIN:KYLIN')"
QURNSU46S1lMSU4=
[root@kylin1 ~]#
示例如下:
[root@kylin1 ~]# curl -X POST -H "Authorization: Basic QURNSU46S1lMSU4=" -H "Content-Type: application/json" -d '{"sql":"select depart_name, sum(salary) salary_sum from kylin_test.employee a join kylin_test.department b on a.depart_no = b.depart_no group by depart_name", "project":"salary_project"}' http://kylin1:7070/kylin/api/query
{"columnMetas":[{"isNullable":1,"displaySize":256,"label":"DEPART_NAME","name":"DEPART_NAME","schemaName":"KYLIN_TEST","catelogName":null,"tableName":"DEPARTMENT","precision":256,"scale":0,"columnType":12,"columnTypeName":"VARCHAR","autoIncrement":false,"caseSensitive":true,"searchable":false,"currency":false,"signed":true,"writable":false,"definitelyWritable":false,"readOnly":true},{"isNullable":1,"displaySize":15,"label":"SALARY_SUM","name":"SALARY_SUM","schemaName":null,"catelogName":null,"tableName":null,"precision":15,"scale":0,"columnType":8,"columnTypeName":"DOUBLE","autoIncrement":false,"caseSensitive":true,"searchable":false,"currency":false,"signed":true,"writable":false,"definitelyWritable":false,"readOnly":true}],"results":[["数据部","65000.0"],["后端部","60000.0"],["前端部","55000.0"],["产品部","50000.0"]],"cube":"CUBE[name=salary_cube]","cuboidIds":"1","realizationTypes":"2","affectedRowCount":0,"isException":false,"exceptionMessage":null,"duration":77245,"totalScanCount":8,"totalScanBytes":1977,"totalScanFiles":2,"metadataTime":3,"totalSparkScanTime":3312,"hitExceptionCache":false,"storageCacheUsed":false,"traceUrl":null,"traces":[{"name":"SQL_TRANSFORMATION","group":"PREPARATION","duration":10089},{"name":"SQL_PARSE_AND_OPTIMIZE","group":"PREPARATION","duration":6774},{"name":"CUBE_MATCHING","group":"PREPARATION","duration":1695},{"name":"PREPARE_AND_SUBMIT_JOB","group":null,"duration":55132},{"name":"WAIT_FOR_EXECUTION","group":null,"duration":131},{"name":"EXECUTION","group":null,"duration":2823},{"name":"FETCH_RESULT","group":null,"duration":0}],"partial":false,"pushDown":false,"sparkPool":"lightweight_tasks"}[root@kylin1 ~]#
返回的json中的results就是查询的结果数据
示例如下:
[root@kylin1 ~]# curl -X PUT -H "Authorization: Basic QURNSU46S1lMSU4=" -H "Content-Type: application/json" -d '{"startTime":"1658880000000", "endTime":"1658966400000", "buildType":"BUILD"}' http://kylin1:7070/kylin/api/cubes/salary_cube/build
{"uuid":"0461d336-704a-40cc-a09a-fdc4ffcd0c2c","last_modified":1658978465025,"version":"4.0.0.0","name":"BUILD CUBE - salary_cube - FULL_BUILD - GMT+08:00 2022-07-28 11:21:04","projectName":"salary_project","type":"BUILD","duration":0,"related_cube":"salary_cube","display_cube_name":"salary_cube","related_segment":"1991da1b-dfbc-24fb-4b25-22d28c979043","related_segment_name":"FULL_BUILD","exec_start_time":0,"exec_end_time":0,"exec_interrupt_time":0,"mr_waiting":0,"steps":[{"interruptCmd":null,"id":"0461d336-704a-40cc-a09a-fdc4ffcd0c2c-00","name":"Detect Resource","sequence_id":0,"exec_cmd":null,"interrupt_cmd":null,"exec_start_time":0,"exec_end_time":0,"exec_wait_time":0,"step_status":"PENDING","cmd_type":"SHELL_CMD_HADOOP","info":{},"run_async":false},{"interruptCmd":null,"id":"0461d336-704a-40cc-a09a-fdc4ffcd0c2c-01","name":"Build Cube with Spark","sequence_id":1,"exec_cmd":null,"interrupt_cmd":null,"exec_start_time":0,"exec_end_time":0,"exec_wait_time":0,"step_status":"PENDING","cmd_type":"SHELL_CMD_HADOOP","info":{},"run_async":false}],"submitter":"ADMIN","job_status":"PENDING","build_instance":"unknown","progress":0.0}[root@kylin1 ~]#
[root@kylin1 ~]#
startTime和endTime定义build的开始时间和结束时间时间戳,符合左闭右开原则。即Hive分区表的分区字段过滤条件。如果构建的时间比定义的时间小8小时,是因为kylin只识别0时区的时间,0时区的0点对应东八区的8点,可以考虑将定义的时间加8小时
buildType表示build的类型,可选值有:BUILD、MERGE、REFRESH
linux脚本如下:
[root@kylin1 ~]# cat salary_cube.sh #!/usr/bin/sh # 从第1个参数获取cube_name cube_name=$1 # 从第2个参数获取构建cube的日期, 如果不传递,默认为昨天的日期 if [ -n "$2" ] then build_date=$2 else build_date=`date -d "1 days ago" "+%Y-%m-%d"` fi # 获取build的10位开始时间戳 start_build_unixtime10=`date -d "$build_date 08:00:00" +%s` # 将build的10位开始时间戳,转换为13位时间戳 start_build_unixtime13=$(($start_build_unixtime10*1000)) # 获取build的13位结束时间戳 end_build_unixtime13=$(($start_build_unixtime13+86400000)) # 进行cube的构建 curl -X PUT -H "Authorization: Basic QURNSU46S1lMSU4=" -H "Content-Type: application/json" -d '{"startTime":'$start_build_unixtime13', "endTime":'$end_build_unixtime13', "buildType":"BUILD"}' http://kylin1:7070/kylin/api/cubes/$cube_name/build [root@kylin1 ~]#
调用linux脚本,进行cube的构建
[root@kylin1 ~]# sh salary_cube.sh salary_cube
{"uuid":"257d7630-6470-41b1-968b-e27fef0fddfc","last_modified":1658989552142,"version":"4.0.0.0","name":"BUILD CUBE - salary_cube - FULL_BUILD - GMT+08:00 2022-07-28 14:25:52","projectName":"salary_project","type":"BUILD","duration":0,"related_cube":"salary_cube","display_cube_name":"salary_cube","related_segment":"ba26e4c9-310b-2654-0f83-a8bd129d3faf","related_segment_name":"FULL_BUILD","exec_start_time":0,"exec_end_time":0,"exec_interrupt_time":0,"mr_waiting":0,"steps":[{"interruptCmd":null,"id":"257d7630-6470-41b1-968b-e27fef0fddfc-00","name":"Detect Resource","sequence_id":0,"exec_cmd":null,"interrupt_cmd":null,"exec_start_time":0,"exec_end_time":0,"exec_wait_time":0,"step_status":"PENDING","cmd_type":"SHELL_CMD_HADOOP","info":{},"run_async":false},{"interruptCmd":null,"id":"257d7630-6470-41b1-968b-e27fef0fddfc-01","name":"Build Cube with Spark","sequence_id":1,"exec_cmd":null,"interrupt_cmd":null,"exec_start_time":0,"exec_end_time":0,"exec_wait_time":0,"step_status":"PENDING","cmd_type":"SHELL_CMD_HADOOP","info":{},"run_async":false}],"submitter":"ADMIN","job_status":"PENDING","build_instance":"unknown","progress":0.0}[root@kylin1 ~]#
[root@kylin1 ~]#
在kylin的Web界面的Monitor,可以看到已经提交的cube构建任务
多次进行全量cube的构建,会用最新的segment更新旧的segment
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。