• 欢迎访问搞代码网站,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站!
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏搞代码吧

Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

mysql 搞代码 4年前 (2022-01-09) 10次浏览 已收录 0个评论

1.GROUPING SETS与另外哪种方式等价? 2.根据GROUP BY的维度的所有组合进行聚合由哪个关键字完成? 3.ROLLUP与ROLLUP关系是什么? GROUPING SETS,GROUPING__ID,CUBE,ROLLUP这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统

1.GROUPING SETS与另外哪种方式等价?
2.根据GROUP BY的维度的所有组合进行聚合由哪个关键字完成?

3.ROLLUP与ROLLUP关系是什么?


GROUPING SETS,GROUPING__ID,CUBE,ROLLUP这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统计,比如,分小时、天、月的UV数。Hive版本为 apache-hive-0.13.1数据准备:

    2015-03,2015-03-10,cookie1    2015-03,2015-03-10,cookie5    2015-03,2015-03-12,cookie7    2015-04,2015-04-12,cookie3    2015-04,2015-04-13,cookie2    2015-04,2015-04-13,cookie4    2015-04,2015-04-16,cookie4    2015-03,2015-03-10,cookie2    2015-03,2015-03-10,cookie3    2015-04,2015-04-12,cookie5    2015-04,2015-04-13,cookie6    2015-04,2015-04-15,cookie3    2015-04,2015-04-15,cookie2    2015-04,2015-04-16,cookie1    CREATE EXTERNAL TABLE lxw1234 (    month STRING,    day STRING,    cookieid STRING    ) ROW FORMAT DELIMITED    FIELDS TERMINATED BY ','    stored as textfile location '/tmp/lxw11/';    hive> select * from lxw1234;    OK    2015-03 2015-03-10      cookie1    2015-03 2015-03-10      cookie5    2015-03 2015-03-12      cookie7    2015-04 2015-04-12      cookie3    2015-04 2015-04-13      cookie2    2015-04 2015-04-13      cookie4    2015-04 2015-04-16      cookie4    2015-03 2015-03-10      cookie2    2015-03 2015-03-10      cookie3    2015-04 2015-04-12      cookie5    2015-04 2015-04-13      cookie6    2015-04 2015-04-15      cookie3    2015-04 2015-04-15      cookie2    2015-04 2015-04-16      cookie1

GROUPING SETS
在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL

    SELECT    month,    day,    COUNT(DISTINCT cookieid) AS uv,    GROUPING__ID    FROM lxw1234    GROUP BY month,day    GROUPING SETS (month,day)    ORDER BY GROUPING__ID;    month      day            uv      GROUPING__ID    ------------------------------------------------    2015-03    NULL            5       1    2015-04    NULL            6       1    NULL       2015-03-10      4       2    NULL       2015-03-12      1       2    NULL       2015-04-12      2       2    NULL       2015-04-13      3       2    NULL       2015-04-15      2       2    NULL       2015-04-16      2       2    等价于    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month    UNION ALL    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day

再如:

    SELECT    month,    day,    COUNT(DISTINCT cookieid) AS uv,    GROUPING__ID    FROM lxw1234    GROUP BY month,day    GROUPING SETS (month,day,(month,day))    ORDER BY GROUPING__ID;    month         day             uv      GROUPING__ID    ------------------------------------------------    2015-03       NULL            5       1    2015-04       NULL            6       1    NULL          2015-03-10      4       2    NULL          2015-03-12      1       2    NULL          2015-04-12      2       2    NULL          2015-04-13      3       2    NULL          2015-04-15      2       2    NULL          2015-04-16      2       2    2015-03       2015-03-10      4       3    2015-03       2015-03-12      1       3    2015-04       2015-04-12      2       3    2015-04       2015-04-13      3       3    2015-04       2015-04-15      2       3    2015-04       2015-04-16      2       3    等价于    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month    UNION ALL    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day    UNION ALL    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day

其中的 GROUPING__ID,表示结果属于哪一个分组集合。

CUBE
根据GROUP BY的维度的所有组合进行聚合。

    SELECT    month,    day,    COUNT(DISTINCT cookieid) AS uv,    GROUPING__ID    FROM lxw1234    GROUP BY month,day    WITH CUBE    ORDER BY GROUPING__ID;    month                              day             uv     GROUPING__ID    --------------------------------------------    NULL            NULL            7       0    2015-03         NULL            5       1    2015-04         NULL            6       1    NULL            2015-04-12      2       2    NULL            2015-04-13      3       2    NULL            2015-04-15      2       2    NULL            2015-04-16      2       2    NULL            2015-03-10      4       2    NULL            2015-03-12      1       2    2015-03         2015-03-10      4       3    2015-03         2015-03-12      1       3    2015-04         2015-04-16      2       3    2015-04         2015-04-12      2       3    2015-04         2015-04-13      3       3    2015-04         2015-04-15      2       3    等价于    SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM lxw1234    UNION ALL    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month    UNION ALL    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day    UNION ALL    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day

ROLLUP
是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

    比如,以month维度进行层级聚合:    SELECT    month,    day,    COUNT(DISTINCT cookieid) AS uv,    GROUPING__ID      FROM lxw1234    GROUP BY month,day    WITH ROLLUP    ORDER BY GROUPING__ID;    month                              day             uv     GROUPING__ID    ---------------------------------------------------    NULL             NULL            7       0    2015-03          NULL            5       1    2015-04          NULL            6       1    2015-03          2015-03-10      4       3    2015-03          2015-03-12      1       3    2015-04          2015-04-12      2       3    2015-04          2015-04-13      3       3    2015-04          2015-04-15      2       3    2015-04          2015-04-16      2       3    可以实现这样的上钻过本文来源gaodaima#com搞(代@码$网6程:    月天的UV->月的UV->总UV复制代码    --把month和day调换顺序,则以day维度进行层级聚合:    SELECT    day,    month,    COUNT(DISTINCT cookieid) AS uv,    GROUPING__ID      FROM lxw1234    GROUP BY day,month    WITH ROLLUP    ORDER BY GROUPING__ID;    day                                month              uv     GROUPING__ID    -------------------------------------------------------    NULL            NULL               7       0    2015-04-13      NULL               3       1    2015-03-12      NULL               1       1    2015-04-15      NULL               2       1    2015-03-10      NULL               4       1    2015-04-16      NULL               2       1    2015-04-12      NULL               2       1    2015-04-12      2015-04            2       3    2015-03-10      2015-03            4       3    2015-03-12      2015-03            1       3    2015-04-13      2015-04            3       3    2015-04-15      2015-04            2       3    2015-04-16      2015-04            2       3    可以实现这样的上钻过程:    天月的UV->天的UV->总UV    (这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)

搞代码网(gaodaima.com)提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected],我们会在看到邮件的第一时间内为您处理,或直接联系QQ:872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

喜欢 (0)
[搞代码]
分享 (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址