hive udtf_随笔_内存溢出

hive udtf

LanguageManual UDF - Apache Hive - Apache Software Foundationhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDFexplode (array)

select explode(array('A','B','C'));

select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;

select tf.*,t.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;

explode (map)

select explode(map('A',10,'B',20,'C',30));

select explode(map('A',10,'B',20,'C',30)) as (key,value); --和上面一样

select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;--和上面一样

select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;

posexplode

select posexplode(array('A','B','C'));
select posexplode(array('A','B','C')) as (pos,val);

select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;

inline (array of structs)

inline( ARRAY( STRUCT()[,STRUCT()] - explodes and array and struct into a table

select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;

注意col3 的类型去掉date了就是string了

stack (values)

stack(n, cols...) - turns k columns into n rows of size k/n each

select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');

select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);

select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;

select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;

注意stack和inline看着很像其实不一样

首先inline的入参是array of structs 意思就是首先必须是array ，里面必须是struct

stack 则是 stack(n列，n倍的参数)

select stack (3,1,2,3) ，我想把123 分成3行

select stack (3,1,"cc",2,"love",3,"zbf") 我想每一行有两列，这个很适合与解析有规律的字符串

explode(ARRAY a)

Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

Tkey,Tvalue

explode(MAP m)

Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).

int,Tposexplode(ARRAY a)Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.

T1,...,Tn

inline(ARRAY> a)

Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.)

T1,...,Tn/rstack(int r,T1 V1,...,Tn/r Vn)Breaks up n values V1,...,Vn into r rows. Each row will have n/r columns. r must be constant.

string1,...,stringn

json_tuple(string jsonStr,string k1,...,string kn)

Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the get_json_object UDF because it can get multiple keys with just one call.

string 1,...,stringn

parse_url_tuple(string urlStr,string p1,...,string pn)

Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the parse_url() UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:.

select json_tuple('{"id":2,"name":"cc2"}',"id","name")

select get_json_object('{"id":2,"name":"cc2"}','$.id'),get_json_object('{"id":2,"name":"cc2"}','$.name')

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/zaji/5656051.html

hive udtf

发表评论

评论列表（0条）