ÉèΪÊ×Ò³ ¼ÓÈëÊÕ²Ø

TOP

hiveÈ«ÅÅÐòÓÅ»¯
2014-11-24 07:25:18 À´Ô´: ×÷Õß: ¡¾´ó ÖРС¡¿ ä¯ÀÀ:0´Î
Tags£ºhive ÅÅÐò ÓÅ»¯
hiveÈ«ÅÅÐòÓÅ»¯
È«ÅÅÐò
HiveµÄÅÅÐò¹Ø¼ü×ÖÊÇSORT BY£¬ËüÓÐÒâÇø±ðÓÚ´«Í³ Êý¾Ý¿âµÄORDER BYÒ²ÊÇΪÁËÇ¿µ÷Á½ÕßµÄÇø±ð¨CSORT BYÖ»ÄÜÔÚµ¥»ú·¶Î§ÄÚÅÅÐò¡£¿¼ÂÇÒÔÏÂ±í¶¨Ò壺
CREATE TABLE if not exists t_order( id int, -- ¶©µ¥±àºÅ sale_id int, -- ÏúÊÛID customer_id int, -- ¿Í»§ID product _id int, -- ²úÆ·ID amount int -- ÊýÁ¿ ) PARTITIONED BY (ds STRING);
ÔÚ±íÖвéѯËùÓÐÏúÊۼǼ£¬²¢°´ÕÕÏúÊÛIDºÍÊýÁ¿ÅÅÐò£º
set mapred.reduce.tasks=2; Select sale_id, amount from t_order Sort by sale_id, amount;
ÕâÒ»²éѯ¿ÉÄܵõ½·ÇÆÚÍûµÄÅÅÐò¡£Ö¸¶¨µÄ2¸öreducer·Ö·¢µ½µÄÊý¾Ý¿ÉÄÜÊÇ£¨¸÷×ÔÅÅÐò£©£º
Reducer1£º
Sale_id | amount 0 | 100 1 | 30 1 | 50 2 | 20
Reducer2£º
Sale_id | amount 0 | 110 0 | 120 3 | 50 4 | 20
ÒòΪÉÏÊö²éѯûÓÐreduce key£¬hive»áÉú³ÉËæ»úÊý×÷Ϊreduce key¡£ÕâÑùµÄ»°ÊäÈë¼ÇÂ¼Ò²Ëæ»úµØ±»·Ö·¢µ½²»Í¬reducer»úÆ÷ÉÏÈ¥ÁË¡£ÎªÁ˱£Ö¤reducerÖ®¼äûÓÐÖØ¸´µÄsale_id¼Ç¼£¬¿ÉÒÔʹÓÃDISTRIBUTE BY¹Ø¼ü×ÖÖ¸¶¨·Ö·¢keyΪsale_id¡£¸ÄÔìºóµÄHQLÈçÏ£º
set mapred.reduce.tasks=2; Select sale_id, amount from t_order Distribute by sale_id Sort by sale_id, amount;
ÕâÑùÄܹ»±£Ö¤²éѯµÄÏúÊۼǼ¼¯ºÏÖУ¬ÏúÊÛID¶ÔÓ¦µÄÊýÁ¿ÊÇÕýÈ·ÅÅÐòµÄ£¬µ«ÊÇÏúÊÛID²»ÄÜÕýÈ·ÅÅÐò£¬Ô­ÒòÊÇhiveʹÓÃhadoopĬÈϵÄHashPartitioner·Ö·¢Êý¾Ý¡£
Õâ¾ÍÉæ¼°µ½Ò»¸öÈ«ÅÅÐòµÄÎÊÌâ¡£½â¾öµÄ°ì·¨ÎÞÍâºõÁ½ÖÖ£º
1.) ²»·Ö·¢Êý¾Ý£¬Ê¹Óõ¥¸öreducer£º
set mapred.reduce.tasks=1;
ÕâÒ»·½·¨µÄȱÏÝÔÚÓÚreduce¶Ë³ÉΪÁËÐÔÄÜÆ¿¾±£¬¶øÇÒÔÚÊý¾ÝÁ¿´óµÄÇé¿öÏÂÒ»°ã¶¼ÎÞ·¨µÃµ½½á¹û¡£µ«ÊÇʵ¼ùÖÐÕâÈÔÈ»ÊÇ×î³£Óõķ½·¨£¬Ô­ÒòÊÇͨ³£ÅÅÐòµÄ²éѯÊÇΪÁ˵õ½ÅÅÃû¿¿Ç°µÄÈô¸É½á¹û£¬Òò´Ë¿ÉÒÔÓÃlimit×Ó¾ä´ó´ó¼õÉÙÊý¾ÝÁ¿¡£Ê¹ÓÃlimit nºó£¬´«Êäµ½reduce¶Ë£¨µ¥»ú£©µÄÊý¾Ý¼Ç¼Êý¾Í¼õÉÙµ½n* £¨map¸öÊý£©¡£
2.) ÐÞ¸ÄPartitioner£¬ÕâÖÖ·½·¨¿ÉÒÔ×öµ½È«ÅÅÐò¡£ÕâÀï¿ÉÒÔʹÓÃHadoop×Ô´øµÄTotalOrderPartitioner£¨À´×ÔÓÚYahoo!µÄTeraSortÏîÄ¿£©£¬ÕâÊÇÒ»¸öΪÁËÖ§³Ö¿çreducer·Ö·¢ÓÐÐòÊý¾Ý¿ª·¢µÄPartitioner£¬ËüÐèÒªÒ»¸öSequenceFile¸ñʽµÄÎļþÖ¸¶¨·Ö·¢µÄÊý¾ÝÇø¼ä¡£Èç¹ûÎÒÃÇÒѾ­Éú³ÉÁËÕâÒ»Îļþ£¨´æ´¢ÔÚ/tmp/range_key_list£¬·Ö³É100¸öreducer£©£¬¿ÉÒÔ½«ÉÏÊö²éѯ¸ÄдΪ
set mapred.reduce.tasks=100; set hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner; set total.order.partitioner.path=/tmp/ range_key_list; Select sale_id, amount from t_order Cluster by sale_id Sort by amount;
ÓкܶàÖÖ·½·¨Éú³ÉÕâÒ»Çø¼äÎļþ£¨ÀýÈçhadoop×Ô´øµÄo.a.h.mapreduce.lib.partition.InputSampler¹¤¾ß£©¡£ÕâÀï½éÉÜÓÃHiveÉú³ÉµÄ·½·¨£¬ÀýÈçÓÐÒ»¸ö°´idÓÐÐòµÄt_sale±í£º
CREATE TABLE if not exists t_sale ( id int, name string, loc string );
ÔòÉú³É°´sale_id·Ö·¢µÄÇø¼äÎļþµÄ·½·¨ÊÇ£º
create external table range_keys(sale_id int) row format serde 'org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe' stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' location '/tmp/range_key_list'; insert overwrite table range_keys select distinct sale_id from source t_sale sampletable(BUCKET 100 OUT OF 100 ON rand()) s sort by sale_id;
Éú³ÉµÄÎļþ£¨/tmp/range_key_listĿ¼Ï£©¿ÉÒÔÈÃTotalOrderPartitioner°´sale_idÓÐÐòµØ·Ö·¢reduce´¦ÀíµÄÊý¾Ý¡£Çø¼äÎļþÐèÒª¿¼ÂǵÄÖ÷ÒªÎÊÌâÊÇÊý¾Ý·Ö·¢µÄ¾ùºâÐÔ£¬ÕâÓÐÀµÓÚ¶ÔÊý¾ÝÉîÈëµÄÀí½â¡£
²âÊÔ°¸Àý£º
Êý¾Ý 140g, °´ÕÕ×Ö¶Îtime ½µÐòÅÅÁÐ Ñ¡³ö×î´óµÄǰ50¸ö¡£
ʹÓà һ°ã·½·¨ select * from table order by time desc limit 50. Ö´ÐÐÁË1Сʱ6·ÖÖÓÍêÈ«Ëã³ö¡£
ÈÎÎñÊý1¸ö mapÊý 1783 reduce 1
¶ø select * from (select * from table distribute by time sort by time desc limit 50 ) t order by time desc limit 50;
ÐèÒª5·ÖÖÓËã³ö¡£½á¹ûÒ»Ö¡£
ÈÎÎñÊý2¸ö ·Ö±ðÊÇ£º
map 1783 reduce 245
map 245 reduce 1
¡¾´ó ÖРС¡¿¡¾´òÓ¡¡¿ ¡¾·±Ìå¡¿¡¾Í¶¸å¡¿¡¾Êղء¿ ¡¾ÍƼö¡¿¡¾¾Ù±¨¡¿¡¾ÆÀÂÛ¡¿ ¡¾¹Ø±Õ¡¿ ¡¾·µ»Ø¶¥²¿¡¿
·ÖÏíµ½: 
ÉÏһƪ£ºmongodbÖеÄÅÅÐòºÍË÷Òý¿ìËÙѧϰ ÏÂһƪ£ºÈçºÎ²éÕÒ¶©µ¥ÌáʾVPRS VE217ÊýÁ¿/..

ÆÀÂÛ

ÕÊ¡¡¡¡ºÅ: ÃÜÂë: (ÐÂÓû§×¢²á)
Ñé Ö¤ Âë:
±í¡¡¡¡Çé:
ÄÚ¡¡¡¡ÈÝ:

¡¤Linuxϵͳ¼ò½é (2025-12-25 21:55:25)
¡¤Linux°²×°MySQL¹ý³Ì (2025-12-25 21:55:22)
¡¤Linuxϵͳ°²×°½Ì³Ì£¨ (2025-12-25 21:55:20)
¡¤HTTP Åc HTTPS µÄ²î„ (2025-12-25 21:19:45)
¡¤ÍøÕ¾°²È«±ØÐ޿ΣºÍ¼ (2025-12-25 21:19:42)