HIVE正则表达式

13,366次阅读

没有评论

共计 1631 个字符，预计需要花费 5 分钟才能阅读完成。

随着HIVE的持续使用，现在业务上也需要匹配文本，一般的加减乘除之类以及统计聚合运算都满足不了了，如果是短的字符串可以使用字符串相关的处理，但是如果是大量的文本都需要正则表达式从中提取出想要的内容

，下面的例子都是copy过来的，作为相关函数的说明，会使用就好，来自http://blog.csdn.net/bitcarmanlee/article/details/51106726

需要注意的是转义字符，一般情况下比如匹配数字一般写代码\d就行了，在hive中需要双斜杠\\d

相关正则表达式的内容可以参考正则表达式

1.regexp

语法: A REGEXP B
操作类型: strings
描述: 功能与RLIKE相同

select count(*) from olap_b_dw_hotelorder_f where create_date_wid not regexp '\\d{8}'

与下面查询的效果是等效的：

select count(*) from olap_b_dw_hotelorder_f where create_date_wid not rlike '\\d{8}';

2.regexp_extract

语法: regexp_extract(string subject, string pattern, int index)
返回值: string
说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。

hive> select regexp_extract('IloveYou','I(.*?)(You)',1) from test1 limit 1;
Total jobs = 1
...
Total MapReduce CPU Time Spent: 7 seconds 340 msec
OK
love
Time taken: 28.067 seconds, Fetched: 1 row(s)

hive> select regexp_extract('IloveYou','I(.*?)(You)',2) from test1 limit 1;
Total jobs = 1
...
OK
You
Time taken: 26.067 seconds, Fetched: 1 row(s)

hive> select regexp_extract('IloveYou','(I)(.*?)(You)',1) from test1 limit 1;
Total jobs = 1
...
OK
I
Time taken: 26.057 seconds, Fetched: 1 row(s)

hive> select regexp_extract('IloveYou','(I)(.*?)(You)',0) from test1 limit 1;
Total jobs = 1
...
OK
IloveYou
Time taken: 28.06 seconds, Fetched: 1 row(s)

hive> select regexp_replace("IloveYou","You","") from test1 limit 1;
Total jobs = 1
...
OK
Ilove
Time taken: 26.063 seconds, Fetched: 1 row(s)

3.regexp_replace

语法: regexp_replace(string A, string B, string C)
返回值: string
说明：将字符串A中的符合Java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似Oracle中的regexp_replace函数。

hive> select regexp_replace("IloveYou","You","") from test1 limit 1;
Total jobs = 1
...
OK
Ilove
Time taken: 26.063 seconds, Fetched: 1 row(s)

hive> select regexp_replace("IloveYou","You","lili") from test1 limit 1;
Total jobs = 1
...
OK
Ilovelili

正文完

请博主喝杯咖啡吧！

hive

发表至： Sql

2017-05-24

转载说明：除特殊说明外本站文章皆由CC-4.0协议发布，转载请注明出处。

技术大牛养成指南，一篇不鸡汤的成功学实践