Hive中使用自定义函数(UDF)实现分析函数row_number的功能

2014-11-24 17:41:27 · 作者: · 浏览: 0

之前部门实现row_number是使用的transform,我觉得用UDF实现后,平时的使用会更方便,免去了transform相对繁琐的语法。


用到的测试表为:


hive> desc row_number_test;
OK
id1 int
id2 string
age int
score double
name string



hive> select * from row_number_test;
OK
2 t04 25 60.0 youlia
1 t01 20 85.0 liujiannan
1 t02 24 70.0 zengqiu
2 t03 30 88.0 hongqu
2 t03 27 70.0 yongqi
1 t02 19 75.0 wangdong
1 t02 24 70.0 zengqiu



select row_number() over (partition by id1 order by age desc) from row_number_test;


转换为hive语句应该是:


select row_number(id1) from --partition by的字段传到row_number函数中去


(select * from row_number_test distribute by id1 sort by id1,age desc) a;



如果partition by 两个字段:


select row_number() over (partition by id1,id2 order by score) from row_number_test;


转换为hive语句应该是:


select row_number(id1,id2) --partition by的字段传到row_number函数中去


from (select * from row_number_test distribute by id1,id2 sort by id1,id2,score) a;



展示一下查询结果:


1.


select id1,id2,age,score,name,row_number(id1) rn from (select * from row_number_test distribute by id1 sort by id1,age desc) a;



OK
2 t03 30 88.0 hongqu 1
2 t03 27 70.0 yongqi 2
2 t04 25 60.0 youlia 3
1 t02 24 70.0 zengqiu 1
1 t02 24 70.0 zengqiu 2
1 t01 20 85.0 liujiannan 3
1 t02 19 75.0 wangdong 4



2.


select id1,id2,age,score,name,row_number(id1,id2) rn from (select * from row_number_test distribute by id1,id2 sort by id1,id2,score) a;



OK
2 t04 25 60.0 youlia 1
1 t02 24 70.0 zengqiu 1
2 t03 27 70.0 yongqi 1
1 t02 24 70.0 zengqiu 2
1 t02 19 75.0 wangdong 3
1 t01 20 85.0 liujiannan 1
2 t03 30 88.0 hongqu 2



下面是代码,只实现了接收1个参数和2个参数的eva luator方法,参数再多的照搬代码就可以了,代码仅供参考:


import org.apache.hadoop.hive.ql.exec.UDF;


import org.apache.hadoop.hive.ql.udf.UDFType;


@UDFType(deterministic = false)


public class Row_number extends UDF {


private static int MAX_VALUE = 50;


private static String comparedColumn[] = new String[MAX_VALUE];


private static int rowNum = 1;


public int eva luate (Object ...args){


String columnValue[] = new String[args.length];


for(int i=0;i

columnValue[i] = args[i].toString();


if (rowNum == 1)


{


for(int i=0;i

comparedColumn[i] = columnValue[i];


}


for(int i=0;i

{


if ( !comparedColumn[i].equals(columnValue[i]) )


{


for (int j=0;j

{


comparedColumn[j] = columnValue[j];


}


rowNum = 1;


return rowNum++;


}


}


return rowNum++;


}


public static void main(String args[])


{


Row_number t = new Row_number();


System.out.println(t.eva luate(123));


System.out.println(t.eva luate(123));


System.out.println(t.eva luate(123));


System.out.println(t.eva luate(1234));


System.out.println(t.eva luate(1234));


System.out.println(t.eva luate(1234));


System.out.println(t.eva luate(1235));


}


}


相关阅读