赞
踩
前文:
Uniq 模型拳打KUDU、HUDI,Aggregate 模型脚踢Kylin、ClickHouse,Duplicate 模型跟Hive 势不两立。Doris+Flink将会是实时数据仓库的重要基石。
目录
优缺点明显,使用一个新的技术,就得知道他的局限性,没错,指标列排序过滤慢得离谱。
必须清楚需求和模型的关系!划清维度表(筛选条件/获取属性)和事实表(出指标)。
- -- 常用命令
-
- -- 性能调优
- -- 设置比并发数 每个 BE 节点上执行实例的个数
- set parallel_fragment_exec_instance_num = 5;
- -- 数据包扫描行数
- set batch_size = 4096;
- -- 设置为20G,只针对当前session 有效,默认值为2G
- set exec_mem_limit=21474836480;
- -- 开启缓存
- set enable_sql_cache = true;
- set enable_partition_cache=true;
- -- 聚合节点数,针对场景适用
- set parallel_exchange_instance_num = -1;
-
- -- 查询资源
- -- 查询BE
- SHOW BACKENDS;
- -- 查询BROKER
- SHOW BROKER;
- -- 查询资源
- SHOW RESOURCES;
- -- 查询导入
- show load;
- -- 查询构建物化视图
- SHOW ALTER TABLE ROLLUP FROM dw;
- -- 取消构建物化视图
- cancel alter table rollup from dw.dwa_tg_sku_1d_detail_w (134548);
- -- 查询分区
- show partitions from dwa_tg_sku_1d_detail_w;
- -- 查询数据分布
- show data from dw.dwa_tg_sku_1d_detail_w;
- -- 查询物化视图
- desc dwa_tg_sku_1d_detail_w all;
- -- 查询计划
- show query profile "/";
- -- 查询变量
- show variables like '%exec_mem_limit%';
-
- -- 构建bitmap索引
- CREATE INDEX zone_index ON xxx(zone) USING BITMAP COMMENT'balabala';
- -- 构建
- create materialized view products_summary as
- -- 修改配置
- ALTER TABLE dw.dwa_tg_sku_1d_detail_w_detail set ("colocate_with" = "products_id");
-
- -- 建表及导入
- CREATE TABLE dw.xxx(
- `dctime` date,
- `domain` int,
- `products_id` bigint,
- `sku` varchar(500)
- `record_cnt` int )
- ENGINE=olap
- DUPLICATE KEY(
- dctime,
- domain,
- products_id
- )
- PARTITION BY RANGE (`dctime`) (
- PARTITION p20210401 VALUES LESS THAN ("20210402"),
- PARTITION p20210401 VALUES LESS THAN ("20210403")
- )
- DISTRIBUTED BY HASH (products_id)
- PROPERTIES (
- "replication_num" = "2",
- "dynamic_partition.enable" = "true",
- "dynamic_partition.time_unit" = "DAY",
- "dynamic_partition.end" = "3",
- "dynamic_partition.prefix" = "p",
- "dynamic_partition.buckets" = "6",
- "colocate_with" = "products_id"
- );
-
- LOAD LABEL dw.xxyyzz
- (
- DATA FROM TABLE xx_hive_ex
- INTO TABLE xx
- SET
- (
- dctime=cast(dctime as date)
- )
- WHERE(
- dctime BETWEEN '2021-04-21' AND '2021-04-30'
- )
- )
- WITH RESOURCE 'my_spark'
- (
- "spark.executor.cores" = "4",
- "spark.dynamicAllocation.maxExecutors" = "4",
- "spark.executor.memory" = "16g",
- "spark.yarn.executor.memoryOverhead" = "32368",
- "spark.network.timeout" = "1200000"
- )
- PROPERTIES
- (
- "timeout" = "28800"
- ,"exec_mem_limit"="64424509440"
- );
-
- -- 构建rollup
- create materialized view r1 as
- SELECT dctime
- ,domain
- ,products_id
- ,SUM(pv) pv
- ,HLL_UNION(hll_hash(A)) --HLL_UNION
- ,bitmap_union(to_bitmap(B)) --BITMAP_UNION
- FROM xxx
- GROUP BY dctime
- ,domain
- ,products_id
- 查询时使用 APPROX_COUNT_DISTINCT(A) / HLL_UNION_AGG(HLL_HASH(A)) / BITMAP_UNION_COUNT(TO_BITMAP(B))

记得把SQL几个参数调优下!SQL慢去看explain+webui!
敲重点,实时数据仓库可以把聚合逻辑丢到OLAP系统中处理了,我们要做的只需要拼接好明细,就可以复用啦!
Agg模型应该是使用最多的模型,但有个致命的缺陷,即使完全命中Rollup也会把SQL逻辑执行一遍(在Rollup表查询),目的是为了将历史与未compaction的数据合并,这就导致了只要是group by的数据集不小的话,速度就很慢。
HLL节省bitmap 40%计算时间,且误差在1%
用Rollup是真香,记得查询前加些查询参数。
3.1 0.14 版本Bitmap在 SparkLoad 那 null 会被转为 0,导致订单uv全是错的
3.2 明细层建完rollup 不能 SparkLoad
3.3 s3的 broker load 7000万数据40分钟太慢
3.4 insert into 后的数据不能建rollup
不要想着一个聚合模型解决所有查询,但是需求离谱到所有维度都想要(六七千万个组合,接近明细数据),聚合不起来。。。目前没有办法,用Kylin吧?
BITMAP/HLL计算贼慢,SUM指标还好,SQL的查询工作量在那,多一个指标,工作量越多,加一倍机器,速度提升一倍(MPP原因),3台30秒不够,那就扩到9台10秒,充钱就能解决。
https://blog.csdn.net/wypblog/article/details/103590812
https://blog.csdn.net/CZ_yjsy_data/article/details/114292958
doris非常适合于星型模型,不需要把所有筛选条件冗余到事实表,只要事实表和维度表通过Broadcast/Bucket Shuffle Join/Colocation Join时,就join两边的表,就像单表一样处理,也可以命中rollup。
- -- 因为走的时Bucket Shuffle Join,所以可以sj_id就像在事实表中处理,包括t1的rollup为 (dctime,domain,products_id)也可命中
-
- SELECT t1.dctime
- ,t1.domain
- ,t1.products_id
- ,t2.sj_id
- ,SUM(A) pv
- ,APPROX_COUNT_DISTINCT(B)
- ,BITMAP_UNION_COUNT(TO_BITMAP(C))
- FROM xxx t1
- join yyy t2
- on t1.`domain` = t2.`domain` and t1.products_id = t2.products_id
- WHERE t1.dctime BETWEEN '2021-04-07' AND '2021-04-07'
- and t2.domain =1
- GROUP BY t1.dctime
- ,t1.domain
- ,t1.products_id
- ,t2.sj_id
- order by pv desc
- LIMIT 400

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。