[¿áÍæ Spark] Spark Streaming Ô´Âë½âÎöϵÁУ¬·µ»ØĿ¼ÇëÃÍ´ÁÕâÀï
¡¸ÌÚѶ¡¤¹ãµãͨ¡¹¼¼ÊõÍŶÓÈÙÓþ³öÆ·
±¾ÎÄÄÚÈÝÊÊÓ÷¶Î§£º
- 2016.02.25 update, Spark 2.0 ȫϵÁÐ ¡Ì (2.0.0-SNAPSHOT ÉÐδÕýʽ·¢²¼)
- 2016.03.10 update, Spark 1.6 ȫϵÁÐ ¡Ì (1.6.0, 1.6.1)
- 2015.11.09 update, Spark 1.5 ȫϵÁÐ ¡Ì (1.5.0, 1.5.1, 1.5.2)
- 2015.07.15 update, Spark 1.4 ȫϵÁÐ ¡Ì (1.4.0, 1.4.1)
ÔĶÁ±¾ÎÄÇ°£¬ÇëÒ»¶¨ÏÈÔĶÁSpark
Streaming ʵÏÖ˼·ÓëÄ£¿é¸ÅÊöÒ»ÎÄ£¬ÆäÖиÅÊöÁË Spark Streaming µÄ 4 ´óÄ£¿éµÄ»ù±¾×÷Óã¬ÓÐÁËÈ«¾Ö¸ÅÄîºóÔÙ¿´±¾ÎĶÔÄ£¿é 3£ºÊý¾Ý²úÉúÓëµ¼Èë
ϸ½ÚµÄ½âÊÍ¡£
ÒýÑÔ
ÎÒÃÇÇ°ÃæÔÚDStream,
DStreamGraph Ïê½â½²µ½£¬Õû¸öDStreamGraph
ÊÇÓÉoutput
stream
ͨ¹ýdependencyÒýÓùØϵ£¬Ë÷Òýµ½ÉÏÓÎDStream
½Úµã¡£¶øµÝ¹éµÄ×·Ëݵ½×îÉÏÓεÄInputDStream
½Úµãʱ£¬¾ÍûÓжÔÆäËüDStream
½ÚµãµÄÒÀÀµÁË£¬ÒòΪInputDStream
½Úµã±¾Éí¾Í´ú±íÁË×îÔʼµÄÊý¾Ý¼¯¡£
ÎÒÃǶÔÄ£¿é 3£ºÊý¾Ý²úÉúÓëµ¼Èë
ϸ½ÚµÄ½âÊÍ£¬ÊǽöÕë¶ÔReceiverInputDStream
¼°Æä×ÓÀàµÄ£»ÆäËüInputDStream
×ÓÀàµÄ½²½â£¬ÎÒÃÇÔÚÁíÍâµÄÎÄÕÂÖнøÐС£¼´£¬±¾Ä£¿éµÄÌÖÂÛ·¶Î§ÊÇ£º
- ReceiverInputDStream
- ×ÓÀà SocketInputDStream
- ×ÓÀà TwitterInputDStream
- ×ÓÀà RawInputDStream
- ×ÓÀà FlumePollingInputDStream
- ×ÓÀà MQTTInputDStream
- ×ÓÀà FlumeInputDStream
- ×ÓÀà PluggableInputDStream
- ×ÓÀà KafkaInputDStream
ReceiverTracker
·Ö·¢ Receiver ¹ý³Ì
ÎÒÃÇÒѾ֪µÀ£¬ReceiverTracker
×ÔÉíÔËÐÐÔÚ driver ¶Ë£¬ÊÇÒ»¸ö¹ÜÀí·Ö²¼ÔÚ¸÷¸ö executor ÉϵÄReceiver
µÄ×ÜÖ¸»ÓÕß¡£
ÔÚssc.start()
ʱ£¬½«Òþº¬µØµ÷ÓÃReceiverTracker.start()
£»¶øReceiverTracker.start()
×îÖØÒªµÄÈÎÎñ¾ÍÊǵ÷ÓÃ×Ô¼ºµÄlaunchReceivers()
·½·¨½«Receiver
·Ö·¢µ½¶à¸ö
executor ÉÏÈ¥¡£È»ºóÔÚÿ¸ö executor ÉÏ£¬ÓÉReceiverSupervisor
À´·Ö±ðÆô¶¯Ò»¸öReceiver
½ÓÊÕÊý¾Ý¡£Õâ¸ö¹ý³ÌÓÃÏÂͼ±íʾ£º
ÎÒÃǽ«ÒÔ 1.4.0 ºÍ 1.5.0 ÕâÁ½¸ö°æ±¾Îª´ú±í£¬×Ðϸ·ÖÎöһϠlaunchReceivers() µÄʵÏÖ¡£
1.4.0 ´ú±íÁË 1.5.0 ÒÔÇ°µÄ°æ±¾£¬Èç 1.2.x, 1.3.x, 1.4.x
1.5.0 ´ú±íÁË 1.5.0 ÒÔÀ´µÄ°æ±¾£¬Èç 1.5.x, 1.6.x
Spark
1.4.0 µÄ launchReceivers() ʵÏÖ
Spark 1.4.0 µÄlaunchReceivers()
µÄ¹ý³ÌÈçÏ£º
-
(1.a)¹¹Ôì Receiver RDD¡£¾ßÌåµÄ£¬ÊÇÏȱéÀúËùÓеÄReceiverInputStream
£¬»ñµÃ½«ÒªÆô¶¯µÄËùÓÐx
¸öReceiver
µÄʵÀý¡£È»ºó£¬°ÑÕâЩʵÀýµ±×öx
·ÝÊý¾Ý£¬ÔÚ
driver ¶Ë¹¹ÔìÒ»¸öRDD
ʵÀý£¬Õâ¸öRDD
·ÖΪx
¸ö
partition£¬Ã¿¸ö partition °üº¬Ò»¸öReceiver
Êý¾Ý£¨¼´Receiver
ʵÀý£©¡£
-
(1.b)¶¨Òå¼ÆËã func¡£ÎÒÃǽ«ÔÚ¶à¸ö executor ÉϹ²Æô¶¯x
¸öTask
£¬Ã¿¸öTask
¸ºÔðÒ»¸ö
partition µÄÊý¾Ý£¬¼´Ò»¸öReceiver
ʵÀý¡£ÎÒÃÇÒª¶ÔÕâ¸öReceiver
ʵÀý×öµÄ¼ÆË㶨ÒåΪfunc
º¯Êý£¬¾ßÌåµÄ£¬func
ÊÇ£º
- ÒÔÕâ¸ö
Receiver
ʵÀýΪ²ÎÊý£¬¹¹ÔìеÄReceiverSupervisor
ʵÀýsupervisor
£ºsupervisor
= new ReceiverSupervisorImpl(receiver, ...)
supervisor.start()
£»ÕâÒ»²½½«Æô¶¯ÐÂÏß³ÌÆô¶¯Receiver
ʵÀý£¬È»ºóºÜ¿ì·µ»Øsupervisor.awaitTermination()
£»½«Ò»Ö± block סµ±Ç°Task
µÄÏß³Ì
-
(1.c)·Ö·¢ RDD(Receiver) ºÍ func µ½¾ßÌåµÄ executor¡£ÉÏÃæ (a)(b) Á½²½Ö»ÊÇÔÚ driver ¶Ë¶¨ÒåÁËRDD[Receiver]
ºÍ
Õâ¸öRDD
Ö®ÉϽ«Ö´ÐеÄfunc
£¬µ«²¢Ã»ÓоßÌåµÄÈ¥×ö¡£ÕâÒ»²½Êǽ«Á½ÕߵĶ¨Òå·Ö·¢µ½
executor ÉÏÈ¥£¬ÂíÉϾͿÉÒÔʵ¼ÊÖ´ÐÐÁË¡£
-
(2)ÔÚ¸÷¸ö executor ¶Ë£¬Ö´ÐÐ(1.b) Öж¨ÒåµÄfunc
¡£¼´Æô¶¯Receiver
ʵÀý£¬²¢Ò»Ö±
block סµ±Ç°Ï̡߳£
ÕâÑù£¬Í¨¹ý 1 ¸öRDD
ʵÀý°üº¬x
¸öReceiver
£¬¶ÔÓ¦Æô¶¯
1 ¸öJob
°üº¬x
¸öTask
£¬¾Í¿ÉÒÔÍê³ÉReceiver
µÄ·Ö·¢ºÍ²¿ÊðÁË¡£ÉÏÊö
(1.a)(1.b)(1.c)(2) µÄ¹ý³ÌʾÒâÈçÏÂͼ£º
ÕâÀï Spark Streaming ϲãµÄ Spark Core ¶ÔReceiver
·Ö·¢ÊǺÁÎÞ¸ÐÖªµÄ£¬ËüÖ»ÊÇÖ´ÐÐÁË¡°Ó¦ÓòãÃ桱 -- ¶Ô Spark Core À´½²£¬Spark Streaming ¾ÍÊÇ¡°Ó¦ÓòãÃ桱-- µÄÒ»¸öÆÕͨJob
£»µ«
Spark Streaming ֻͨ¹ýÕâ¸öÆÕͨJob
¼´¿ÉÍê¡°ÌØÊ⹦ÄÜ¡±µÄReceiver
·Ö·¢£¬¿ÉνÇÉÃîÇÉÃî¡£
ÉÏÊöÂ߼ʵÏÖµÄÔ´ÂëÇëµ½Spark 1.4.0
µÄ ReceiverTracker²é¿´¡£
Spark
1.5.0 µÄ launchReceivers() ʵÏÖ
ÆäʵÉÏÃæÕâ¸öʵÏÖ£¬Õâ¸ö³¤Ê±ÔËÐеķַ¢Job
»¹´æÔÚһЩÎÊÌ⣺
- Èç¹ûij¸ö
Task
ʧ°Ü³¬¹ýspark.task.maxFailures(ĬÈÏ=4)
´ÎµÄ»°£¬Õû¸öJob
¾Í»áʧ°Ü¡£Õâ¸öÔÚ³¤Ê±ÔËÐеÄ
Spark Streaming ³ÌÐòÀExecutor
¶àʧЧ¼¸´Î¾ÍÓпÉÄܵ¼ÖÂTask
ʧ°Ü´ïµ½ÉÏÏÞ´ÎÊýÁË¡£ - Èç¹ûij¸ö
Task
ʧЧһÏ£¬Spark Core µÄTaskScheduler
»á½«ÆäÖØв¿Êðµ½ÁíÒ»¸ö
executor ÉÏÈ¥ÖØÅÜ¡£µ«ÕâÀïµÄÎÊÌâÔÚÓÚ£¬¸ºÔðÖØÅÜµÄ executor ¿ÉÄÜÊÇÔÚÏ·¢ÖØÅܵÄÄÇÒ»¿ÌÊÇÕýÔÚÖ´ÐÐTask
Êý½ÏÉٵģ¬µ«²»Ò»¶¨Äܹ»½«Receiver
·Ö²¼µÄ×î¾ùºâµÄ¡£ - ÓиöÓû§ code ¿ÉÄÜ»áÏë×Ô¶¨ÒåÒ»¸ö
Receiver
µÄ·Ö²¼²ßÂÔ£¬±ÈÈçËùÓеÄReceiver
¶¼²¿Êðµ½Í¬Ò»¸ö½ÚµãÉÏÈ¥¡£
´Ó 1.5.0 ¿ªÊ¼£¬Spark Streaming Ìí¼ÓÁËÔöÇ¿µÄReceiver
·Ö·¢²ßÂÔ¡£¶Ô±È֮ǰµÄ°æ±¾£¬Ö÷ÒªµÄ±ä¸üÔÚÓÚ£º
- Ìí¼Ó¿É²å°ÎµÄ
ReceiverSchedulingPolicy
- °Ñ
1
¸öJob
£¨°üº¬x
¸öTask
£©£¬¸ÄΪx
¸öJob
£¨Ã¿¸öJob
Ö»°üº¬1
¸öTask
£© - Ìí¼Ó¶Ô
Receiver
µÄ¼à¿ØÖØÆô»úÖÆ
ÎÒÃÇÒ»¸öÒ»¸ö¿´Ò»¿´¡£
(1)
¿É²å°ÎµÄ ReceiverSchedulingPolicy
ReceiverSchedulingPolicy
µÄÖ÷ҪĿµÄ£¬ÊÇÔÚ Spark Streaming ²ãÃæÌí¼Ó¶ÔReceiver
µÄ·Ö·¢Ä¿µÄµØµÄ¼ÆË㣬Ïà¶ÔÓÚ֮ǰ°æ±¾ÒÀÀµ
Spark Core µÄTaskScheduler
½øÐÐͨÓ÷ַ¢£¬ÐµÄReceiverSchedulingPolicy
»á¶Ô
Streaming Ó¦ÓõĸüºÃµÄÓïÒåÀí½â£¬Ò²ÄܼÆËã³ö¸üºÃµÄ·Ö·¢²ßÂÔ¡£
ReceiverSchedulingPolicy
ÓÐÁ½¸ö·½·¨£¬·Ö±ðÓÃÓÚ£º
ĬÈϵÄReceiverSchedulingPolicy
ÊÇʵÏÖΪround-robin
ʽµÄÁË¡£ÎÒÃǾÙÀý˵Ã÷ÏÂÕâÁ½¸ö·½·¨£º
ÆäÖУ¬ÔÚReceiver y
ʧЧʱ£¬ÒÔÇ°µÄ Spark Streaming ÓпÉÄÜ»áÔÚ executor 1 ÉÏÖØÆôRecever
y
£¬¶ø 1.5.0 ÒÔÀ´£¬½«ÔÚ executor 3 ÉÏÖØÆôReceiver y
¡£
(2)
ÿ¸ö Receiver ·Ö·¢Óе¥¶ÀµÄ Job ¸ºÔð
1.5.0 °æ±¾ÒÔÀ´µÄ Spark Streaming£¬ÊÇΪÿ¸öReceiver
¶¼·ÖÅäµ¥¶ÀµÄÖ»ÓÐ 1 ¸öTask
µÄJob
À´³¢ÊÔ·Ö·¢£¬ÕâÓëÒÔÇ°°æ±¾½«x
¸öReceiver
¶¼·Åµ½Ò»¸öÓÐx
¸öTask
µÄJob
Àï·Ö·¢ÊǺܲ»Ò»ÑùµÄ¡£
¶øÇÒ£¬¶ÔÓÚÕâ½öÓеÄÒ»¸öTask
£¬Ö»ÔÚµÚ 1 ´ÎÖ´ÐÐʱ£¬²Å³¢ÊÔÆô¶¯Receiver
£»Èç¹û¸ÃTask
ÒòΪʧЧ¶ø±»µ÷¶Èµ½ÆäËü
executor Ö´ÐÐʱ£¬¾Í²»ÔÙ³¢ÊÔÆô¶¯Receiver
¡¢Ö»×öÒ»¸ö¿Õ²Ù×÷£¬´Ó¶øµ¼Ö±¾Job
µÄ״̬Êdzɹ¦Ö´ÐÐÒÑÍê³É¡£ReceiverTracker
»áÁíÍâµ÷ÆðÒ»¸öJob
¡ª¡ª
ÓпÉÄÜ»áÖØмÆËãReceiver
µÄÄ¿µÄµØ ¡ª¡ª À´¼ÌÐø³¢ÊÔReceiver
·Ö·¢¡¡Èç´ËÖ±µ½³É¹¦ÎªÖ¹¡£
ÁíÍ⣬ÓÉÓÚ Spark Core µÄTask
Ï·¢Ê±Ö»»á²Î¿¼²¢´ó²¿·Öʱºò×ðÖØ Spark Streaming ÉèÖõÄpreferredLocation
Ä¿µÄµØÐÅÏ¢£¬»¹ÊÇÓÐÒ»¶¨¿ÉÄܸ÷ַ¢Receiver
µÄJob
²¢Ã»ÓÐÔÚÎÒÃÇÏëÒªµ÷¶ÈµÄ
executor ÉÏÔËÐС£´Ëʱ£¬ÔÚµÚ 1 ´ÎÖ´ÐÐTask
ʱ£¬»áÊ×ÏÈÏòReceiverTracker
·¢ËÍRegisterReceiver
ÏûÏ¢£¬Ö»Óеõ½¿Ï¶¨µÄ´ð¸´Ê±£¬²ÅÕæÕýÆô¶¯Receiver
£¬·ñÔò¾Í¼ÌÐø×öÒ»¸ö¿Õ²Ù×÷£¬µ¼Ö±¾Job
µÄ״̬Êdzɹ¦Ö´ÐÐÒÑÍê³É¡£µ±È»£¬ReceiverTracker
Ò²»áÁíÍâµ÷ÆðÒ»¸öJob
£¬À´¼ÌÐø³¢ÊÔReceiver
·Ö·¢¡¡Èç´ËÖ±µ½³É¹¦ÎªÖ¹¡£
ÎÒÃÇÓÃͼʾÀ´±í´ïÕâ¸ö¸Ä¶¯£º
ËùÒÔͨ¹ýÉÏÃæ¿ÉÒÔ¿´µ½£¬Ò»¸öReceiver
µÄ·Ö·¢Job
ÊÇÓпÉÄÜûÓÐÍê³É·Ö·¢Receiver
µÄÄ¿µÄµÄ£¬ËùÒÔReceiverTracker
»á¼ÌÐøÔÙÆðÒ»¸öJob
À´³¢ÊÔReceiver
·Ö·¢¡£Õâ¸ö»úÖƱ£Ö¤ÁË£¬Èç¹ûÒ»´ÎReceiver
Èç¹ûûÓеִïÔ¤ÏȼÆËãºÃµÄ
executor£¬¾ÍÓлú»áÔٴνøÐзַ¢£¬´Ó¶øʵÏÖÔÚ Spark Streaming ²ãÃæ¶ÔReceiver
ËùÔÚλÖøüºÃµÄ¿ØÖÆ¡£
(3)
¶ÔReceiver
µÄ¼à¿ØÖØÆô»úÖÆ
ÉÏÃæ·ÖÎöÁËÿ¸öReceiver
¶¼ÓÐרÃŵÄJob
À´±£Ö¤·Ö·¢ºó£¬ÎÒÃÇ·¢ÏÖÕâÑùÒ»À´£¬Receiver
µÄʧЧÖØÆô¾Í²»ÊÜspark.task.maxFailures(ĬÈÏ=4)
´ÎµÄÏÞÖÆÁË¡£
ÒòΪÏÖÔÚµÄReceiver
ÖØÊÔ²»ÊÇÔÚTask
¼¶±ð£¬¶øÊÇÔÚJob
¼¶±ð£»²¢ÇÒReceiver
ʧЧºó²¢²»»áµ¼ÖÂÇ°Ò»´ÎJob
ʧ°Ü£¬¶øÊÇÇ°Ò»´ÎJob
³É¹¦¡¢²¢ÐÂÆðÒ»¸öJob
ÔٴνøÐзַ¢¡£ÕâÑùÒ»À´£¬²»¹Ü
Spark Streaming ÔËÐж೤ʱ¼ä£¬Receiver
×ÜÊDZ£³Ö»îÐԵģ¬²»»áËæ×Å executor µÄ¶ªÊ§¶øµ¼ÖÂReceiver
ËÀÈ¥¡£
×ܽá
ÎÒÃÇÔÙ¼òµ¥¶Ô±ÈһϠ1.4.0 ºÍ 1.5.0 °æ±¾ÔÚReceiver
·Ö·¢ÉϵÄÇø±ð£º
ͨ¹ýÒÔÉÏ·ÖÎö£¬ÎÒÃÇ×ܽ᣺
|
Spark Streaming 1.4.0 |
Spark Streaming 1.5.0 |
Receiver »îÐÔ |
²»±£Ö¤ÓÀ»î |
ÎÞÏÞÖØÊÔ¡¢±£Ö¤ÓÀ»î |
Receiver ¾ùºâ·Ö·¢ |
ÎÞ±£Ö¤ |
round-robin ²ßÂÔ |
×Ô¶¨Òå Receiver ·Ö·¢ |
ºÜ tricky |
·½±ã |
ÖÂл
±¾ÎÄËù·ÖÎöµÄ 1.5.0 ÒÔÀ´ÔöÇ¿µÄReceiver
·Ö·¢²ßÂÔ£¬ÊÇÓÉÖìÊ«ÐÛͬѧǿÊƹ±Ï׸øÉçÇøµÄ£º