ÉèΪÊ×Ò³ ¼ÓÈëÊÕ²Ø

TOP

Receiver ·Ö·¢Ïê½â
2018-11-13 15:20:02 ¡¾´ó ÖРС¡¿ ä¯ÀÀ:87´Î
Tags£ºReceiver ·Ö·¢ Ïê½â

[¿áÍæ Spark] Spark Streaming Ô´Âë½âÎöϵÁУ¬·µ»ØĿ¼ÇëÃÍ´ÁÕâÀï

¡¸ÌÚѶ¡¤¹ãµãͨ¡¹¼¼ÊõÍŶÓÈÙÓþ³öÆ·

±¾ÎÄÄÚÈÝÊÊÓ÷¶Î§£º

  • 2016.02.25 update, Spark 2.0 ȫϵÁÐ ¡Ì (2.0.0-SNAPSHOT ÉÐδÕýʽ·¢²¼)
  • 2016.03.10 update, Spark 1.6 ȫϵÁÐ ¡Ì (1.6.0, 1.6.1)
  • 2015.11.09 update, Spark 1.5 ȫϵÁÐ ¡Ì (1.5.0, 1.5.1, 1.5.2)
  • 2015.07.15 update, Spark 1.4 ȫϵÁÐ ¡Ì (1.4.0, 1.4.1)

ÔĶÁ±¾ÎÄÇ°£¬ÇëÒ»¶¨ÏÈÔĶÁSpark Streaming ʵÏÖ˼·ÓëÄ£¿é¸ÅÊöÒ»ÎÄ£¬ÆäÖиÅÊöÁË Spark Streaming µÄ 4 ´óÄ£¿éµÄ»ù±¾×÷Óã¬ÓÐÁËÈ«¾Ö¸ÅÄîºóÔÙ¿´±¾ÎĶÔÄ£¿é 3£ºÊý¾Ý²úÉúÓëµ¼Èëϸ½ÚµÄ½âÊÍ¡£

ÒýÑÔ

ÎÒÃÇÇ°ÃæÔÚDStream, DStreamGraph Ïê½â½²µ½£¬Õû¸öDStreamGraphÊÇÓÉoutput streamͨ¹ýdependencyÒýÓùØϵ£¬Ë÷Òýµ½ÉÏÓÎDStream½Úµã¡£¶øµÝ¹éµÄ×·Ëݵ½×îÉÏÓεÄInputDStream½Úµãʱ£¬¾ÍûÓжÔÆäËüDStream½ÚµãµÄÒÀÀµÁË£¬ÒòΪInputDStream½Úµã±¾Éí¾Í´ú±íÁË×îԭʼµÄÊý¾Ý¼¯¡£

image

ÎÒÃǶÔÄ£¿é 3£ºÊý¾Ý²úÉúÓëµ¼Èëϸ½ÚµÄ½âÊÍ£¬ÊǽöÕë¶ÔReceiverInputDStream¼°Æä×ÓÀàµÄ£»ÆäËüInputDStream×ÓÀàµÄ½²½â£¬ÎÒÃÇÔÚÁíÍâµÄÎÄÕÂÖнøÐС£¼´£¬±¾Ä£¿éµÄÌÖÂÛ·¶Î§ÊÇ£º

- ReceiverInputDStream
  - ×ÓÀà SocketInputDStream
  - ×ÓÀà TwitterInputDStream
  - ×ÓÀà RawInputDStream
  - ×ÓÀà FlumePollingInputDStream
  - ×ÓÀà MQTTInputDStream
  - ×ÓÀà FlumeInputDStream
  - ×ÓÀà PluggableInputDStream
  - ×ÓÀà KafkaInputDStream

ReceiverTracker ·Ö·¢ Receiver ¹ý³Ì

ÎÒÃÇÒѾ­ÖªµÀ£¬ReceiverTracker×ÔÉíÔËÐÐÔÚ driver ¶Ë£¬ÊÇÒ»¸ö¹ÜÀí·Ö²¼ÔÚ¸÷¸ö executor ÉϵÄReceiverµÄ×ÜÖ¸»ÓÕß¡£

ÔÚssc.start()ʱ£¬½«Òþº¬µØµ÷ÓÃReceiverTracker.start()£»¶øReceiverTracker.start()×îÖØÒªµÄÈÎÎñ¾ÍÊǵ÷ÓÃ×Ô¼ºµÄlaunchReceivers()·½·¨½«Receiver·Ö·¢µ½¶à¸ö executor ÉÏÈ¥¡£È»ºóÔÚÿ¸ö executor ÉÏ£¬ÓÉReceiverSupervisorÀ´·Ö±ðÆô¶¯Ò»¸öReceiver½ÓÊÕÊý¾Ý¡£Õâ¸ö¹ý³ÌÓÃÏÂͼ±íʾ£º

image

ÎÒÃǽ«ÒÔ 1.4.0 ºÍ 1.5.0 ÕâÁ½¸ö°æ±¾Îª´ú±í£¬×Ðϸ·ÖÎöһϠlaunchReceivers() µÄʵÏÖ¡£

1.4.0 ´ú±íÁË 1.5.0 ÒÔÇ°µÄ°æ±¾£¬Èç 1.2.x, 1.3.x, 1.4.x

1.5.0 ´ú±íÁË 1.5.0 ÒÔÀ´µÄ°æ±¾£¬Èç 1.5.x, 1.6.x

Spark 1.4.0 µÄ launchReceivers() ʵÏÖ

Spark 1.4.0 µÄlaunchReceivers()µÄ¹ý³ÌÈçÏ£º

  • (1.a)¹¹Ôì Receiver RDD¡£¾ßÌåµÄ£¬ÊÇÏȱéÀúËùÓеÄReceiverInputStream£¬»ñµÃ½«ÒªÆô¶¯µÄËùÓÐx¸öReceiverµÄʵÀý¡£È»ºó£¬°ÑÕâЩʵÀýµ±×öx·ÝÊý¾Ý£¬ÔÚ driver ¶Ë¹¹ÔìÒ»¸öRDDʵÀý£¬Õâ¸öRDD·ÖΪx¸ö partition£¬Ã¿¸ö partition °üº¬Ò»¸öReceiverÊý¾Ý£¨¼´ReceiverʵÀý£©¡£

  • (1.b)¶¨Òå¼ÆËã func¡£ÎÒÃǽ«ÔÚ¶à¸ö executor ÉϹ²Æô¶¯x¸öTask£¬Ã¿¸öTask¸ºÔðÒ»¸ö partition µÄÊý¾Ý£¬¼´Ò»¸öReceiverʵÀý¡£ÎÒÃÇÒª¶ÔÕâ¸öReceiverʵÀý×öµÄ¼ÆË㶨ÒåΪfuncº¯Êý£¬¾ßÌåµÄ£¬funcÊÇ£º

    • ÒÔÕâ¸öReceiverʵÀýΪ²ÎÊý£¬¹¹ÔìеÄReceiverSupervisorʵÀýsupervisor£ºsupervisor = new ReceiverSupervisorImpl(receiver, ...)
    • supervisor.start()£»ÕâÒ»²½½«Æô¶¯ÐÂÏß³ÌÆô¶¯ReceiverʵÀý£¬È»ºóºÜ¿ì·µ»Ø
    • supervisor.awaitTermination()£»½«Ò»Ö± block סµ±Ç°TaskµÄÏß³Ì
  • (1.c)·Ö·¢ RDD(Receiver) ºÍ func µ½¾ßÌåµÄ executor¡£ÉÏÃæ (a)(b) Á½²½Ö»ÊÇÔÚ driver ¶Ë¶¨ÒåÁËRDD[Receiver]ºÍ Õâ¸öRDDÖ®ÉϽ«Ö´ÐеÄfunc£¬µ«²¢Ã»ÓоßÌåµÄÈ¥×ö¡£ÕâÒ»²½Êǽ«Á½ÕߵĶ¨Òå·Ö·¢µ½ executor ÉÏÈ¥£¬ÂíÉϾͿÉÒÔʵ¼ÊÖ´ÐÐÁË¡£

  • (2)ÔÚ¸÷¸ö executor ¶Ë£¬Ö´ÐÐ(1.b) Öж¨ÒåµÄfunc¡£¼´Æô¶¯ReceiverʵÀý£¬²¢Ò»Ö± block סµ±Ç°Ï̡߳£

ÕâÑù£¬Í¨¹ý 1 ¸öRDDʵÀý°üº¬x¸öReceiver£¬¶ÔÓ¦Æô¶¯ 1 ¸öJob°üº¬x¸öTask£¬¾Í¿ÉÒÔÍê³ÉReceiverµÄ·Ö·¢ºÍ²¿ÊðÁË¡£ÉÏÊö (1.a)(1.b)(1.c)(2) µÄ¹ý³ÌʾÒâÈçÏÂͼ£º

image

ÕâÀï Spark Streaming ϲãµÄ Spark Core ¶ÔReceiver·Ö·¢ÊǺÁÎÞ¸ÐÖªµÄ£¬ËüÖ»ÊÇÖ´ÐÐÁË¡°Ó¦ÓòãÃ桱 -- ¶Ô Spark Core À´½²£¬Spark Streaming ¾ÍÊÇ¡°Ó¦ÓòãÃ桱-- µÄÒ»¸öÆÕͨJob£»µ« Spark Streaming ֻͨ¹ýÕâ¸öÆÕͨJob¼´¿ÉÍê¡°ÌØÊ⹦ÄÜ¡±µÄReceiver·Ö·¢£¬¿ÉνÇÉÃîÇÉÃî¡£

ÉÏÊöÂß¼­ÊµÏÖµÄÔ´ÂëÇëµ½Spark 1.4.0 µÄ ReceiverTracker²é¿´¡£

Spark 1.5.0 µÄ launchReceivers() ʵÏÖ

ÆäʵÉÏÃæÕâ¸öʵÏÖ£¬Õâ¸ö³¤Ê±ÔËÐеķַ¢Job»¹´æÔÚһЩÎÊÌ⣺

  • Èç¹ûij¸öTaskʧ°Ü³¬¹ýspark.task.maxFailures(ĬÈÏ=4)´ÎµÄ»°£¬Õû¸öJob¾Í»áʧ°Ü¡£Õâ¸öÔÚ³¤Ê±ÔËÐÐµÄ Spark Streaming ³ÌÐòÀExecutor¶àʧЧ¼¸´Î¾ÍÓпÉÄܵ¼ÖÂTaskʧ°Ü´ïµ½ÉÏÏÞ´ÎÊýÁË¡£
  • Èç¹ûij¸öTaskʧЧһÏ£¬Spark Core µÄTaskScheduler»á½«ÆäÖØв¿Êðµ½ÁíÒ»¸ö executor ÉÏÈ¥ÖØÅÜ¡£µ«ÕâÀïµÄÎÊÌâÔÚÓÚ£¬¸ºÔðÖØÅÜµÄ executor ¿ÉÄÜÊÇÔÚÏ·¢ÖØÅܵÄÄÇÒ»¿ÌÊÇÕýÔÚÖ´ÐÐTaskÊý½ÏÉٵģ¬µ«²»Ò»¶¨Äܹ»½«Receiver·Ö²¼µÄ×î¾ùºâµÄ¡£
  • ÓиöÓû§ code ¿ÉÄÜ»áÏë×Ô¶¨ÒåÒ»¸öReceiverµÄ·Ö²¼²ßÂÔ£¬±ÈÈçËùÓеÄReceiver¶¼²¿Êðµ½Í¬Ò»¸ö½ÚµãÉÏÈ¥¡£

´Ó 1.5.0 ¿ªÊ¼£¬Spark Streaming Ìí¼ÓÁËÔöÇ¿µÄReceiver·Ö·¢²ßÂÔ¡£¶Ô±È֮ǰµÄ°æ±¾£¬Ö÷ÒªµÄ±ä¸üÔÚÓÚ£º

  1. Ìí¼Ó¿É²å°ÎµÄReceiverSchedulingPolicy
  2. °Ñ1¸öJob£¨°üº¬x¸öTask£©£¬¸ÄΪx¸öJob£¨Ã¿¸öJobÖ»°üº¬1¸öTask£©
  3. Ìí¼Ó¶ÔReceiverµÄ¼à¿ØÖØÆô»úÖÆ

ÎÒÃÇÒ»¸öÒ»¸ö¿´Ò»¿´¡£

(1) ¿É²å°ÎµÄ ReceiverSchedulingPolicy

ReceiverSchedulingPolicyµÄÖ÷ҪĿµÄ£¬ÊÇÔÚ Spark Streaming ²ãÃæÌí¼Ó¶ÔReceiverµÄ·Ö·¢Ä¿µÄµØµÄ¼ÆË㣬Ïà¶ÔÓÚ֮ǰ°æ±¾ÒÀÀµ Spark Core µÄTaskScheduler½øÐÐͨÓ÷ַ¢£¬ÐµÄReceiverSchedulingPolicy»á¶Ô Streaming Ó¦ÓõĸüºÃµÄÓïÒåÀí½â£¬Ò²ÄܼÆËã³ö¸üºÃµÄ·Ö·¢²ßÂÔ¡£

ReceiverSchedulingPolicyÓÐÁ½¸ö·½·¨£¬·Ö±ðÓÃÓÚ£º

  • ÔÚ Streaming ³ÌÐòÊ×´ÎÆô¶¯Ê±£º

    • ÊÕ¼¯ËùÓÐInputDStream°üº¬µÄËùÓÐReceiverʵÀý ¡ª¡ªreceivers
    • ÊÕ¼¯ËùÓÐµÄ executor ¡ª¡ªexecutors¡ª¡ª ×÷ΪºòÑ¡Ä¿µÄµØ
    • È»ºó¾Íµ÷ÓÃReceiverSchedulingPolicy.scheduleReceivers(receivers, executors)À´¼ÆËãÿ¸öReceiverµÄÄ¿µÄµØ executor Áбí
  • ÔÚ Streaming ³ÌÐòÔËÐйý³ÌÖУ¬Èç¹ûÐèÒªÖØÆôij¸öReceiver£º

    • ½«Ê×ÏÈ¿´Ò»¿´Ö®Ç°¼ÆËã¹ýµÄÄ¿µÄµØ executor »¹Ã»Óл¹ alive µÄ
    • Èç¹ûûÓУ¬¾ÍÐèÒªReceiverSchedulingPolicy.rescheduleReceiver(receiver, ...)À´ÖØмÆËãÕâ¸öReceiverµÄÄ¿µÄµØ executor Áбí

ĬÈϵÄReceiverSchedulingPolicyÊÇʵÏÖΪround-robinʽµÄÁË¡£ÎÒÃǾÙÀý˵Ã÷ÏÂÕâÁ½¸ö·½·¨£º

image

ÆäÖУ¬ÔÚReceiver yʧЧʱ£¬ÒÔÇ°µÄ Spark Streaming ÓпÉÄÜ»áÔÚ executor 1 ÉÏÖØÆôRecever y£¬¶ø 1.5.0 ÒÔÀ´£¬½«ÔÚ executor 3 ÉÏÖØÆôReceiver y¡£

(2) ÿ¸ö Receiver ·Ö·¢Óе¥¶ÀµÄ Job ¸ºÔð

1.5.0 °æ±¾ÒÔÀ´µÄ Spark Streaming£¬ÊÇΪÿ¸öReceiver¶¼·ÖÅäµ¥¶ÀµÄÖ»ÓÐ 1 ¸öTaskµÄJobÀ´³¢ÊÔ·Ö·¢£¬ÕâÓëÒÔÇ°°æ±¾½«x¸öReceiver¶¼·Åµ½Ò»¸öÓÐx¸öTaskµÄJobÀï·Ö·¢ÊǺܲ»Ò»ÑùµÄ¡£

¶øÇÒ£¬¶ÔÓÚÕâ½öÓеÄÒ»¸öTask£¬Ö»ÔÚµÚ 1 ´ÎÖ´ÐÐʱ£¬²Å³¢ÊÔÆô¶¯Receiver£»Èç¹û¸ÃTaskÒòΪʧЧ¶ø±»µ÷¶Èµ½ÆäËü executor Ö´ÐÐʱ£¬¾Í²»ÔÙ³¢ÊÔÆô¶¯Receiver¡¢Ö»×öÒ»¸ö¿Õ²Ù×÷£¬´Ó¶øµ¼Ö±¾JobµÄ״̬Êdzɹ¦Ö´ÐÐÒÑÍê³É¡£ReceiverTracker»áÁíÍâµ÷ÆðÒ»¸öJob¡ª¡ª ÓпÉÄÜ»áÖØмÆËãReceiverµÄÄ¿µÄµØ ¡ª¡ª À´¼ÌÐø³¢ÊÔReceiver·Ö·¢¡­¡­Èç´ËÖ±µ½³É¹¦ÎªÖ¹¡£

ÁíÍ⣬ÓÉÓÚ Spark Core µÄTaskÏ·¢Ê±Ö»»á²Î¿¼²¢´ó²¿·Öʱºò×ðÖØ Spark Streaming ÉèÖõÄpreferredLocationÄ¿µÄµØÐÅÏ¢£¬»¹ÊÇÓÐÒ»¶¨¿ÉÄܸ÷ַ¢ReceiverµÄJob²¢Ã»ÓÐÔÚÎÒÃÇÏëÒªµ÷¶ÈµÄ executor ÉÏÔËÐС£´Ëʱ£¬ÔÚµÚ 1 ´ÎÖ´ÐÐTaskʱ£¬»áÊ×ÏÈÏòReceiverTracker·¢ËÍRegisterReceiverÏûÏ¢£¬Ö»Óеõ½¿Ï¶¨µÄ´ð¸´Ê±£¬²ÅÕæÕýÆô¶¯Receiver£¬·ñÔò¾Í¼ÌÐø×öÒ»¸ö¿Õ²Ù×÷£¬µ¼Ö±¾JobµÄ״̬Êdzɹ¦Ö´ÐÐÒÑÍê³É¡£µ±È»£¬ReceiverTrackerÒ²»áÁíÍâµ÷ÆðÒ»¸öJob£¬À´¼ÌÐø³¢ÊÔReceiver·Ö·¢¡­¡­Èç´ËÖ±µ½³É¹¦ÎªÖ¹¡£

ÎÒÃÇÓÃͼʾÀ´±í´ïÕâ¸ö¸Ä¶¯£º

image

ËùÒÔͨ¹ýÉÏÃæ¿ÉÒÔ¿´µ½£¬Ò»¸öReceiverµÄ·Ö·¢JobÊÇÓпÉÄÜûÓÐÍê³É·Ö·¢ReceiverµÄÄ¿µÄµÄ£¬ËùÒÔReceiverTracker»á¼ÌÐøÔÙÆðÒ»¸öJobÀ´³¢ÊÔReceiver·Ö·¢¡£Õâ¸ö»úÖƱ£Ö¤ÁË£¬Èç¹ûÒ»´ÎReceiverÈç¹ûûÓеִïÔ¤ÏȼÆËãºÃµÄ executor£¬¾ÍÓлú»áÔٴνøÐзַ¢£¬´Ó¶øʵÏÖÔÚ Spark Streaming ²ãÃæ¶ÔReceiverËùÔÚλÖøüºÃµÄ¿ØÖÆ¡£

(3) ¶ÔReceiverµÄ¼à¿ØÖØÆô»úÖÆ

ÉÏÃæ·ÖÎöÁËÿ¸öReceiver¶¼ÓÐרÃŵÄJobÀ´±£Ö¤·Ö·¢ºó£¬ÎÒÃÇ·¢ÏÖÕâÑùÒ»À´£¬ReceiverµÄʧЧÖØÆô¾Í²»ÊÜspark.task.maxFailures(ĬÈÏ=4)´ÎµÄÏÞÖÆÁË¡£

ÒòΪÏÖÔÚµÄReceiverÖØÊÔ²»ÊÇÔÚTask¼¶±ð£¬¶øÊÇÔÚJob¼¶±ð£»²¢ÇÒReceiverʧЧºó²¢²»»áµ¼ÖÂÇ°Ò»´ÎJobʧ°Ü£¬¶øÊÇÇ°Ò»´ÎJob³É¹¦¡¢²¢ÐÂÆðÒ»¸öJobÔٴνøÐзַ¢¡£ÕâÑùÒ»À´£¬²»¹Ü Spark Streaming ÔËÐж೤ʱ¼ä£¬Receiver×ÜÊDZ£³Ö»îÐԵģ¬²»»áËæ×Å executor µÄ¶ªÊ§¶øµ¼ÖÂReceiverËÀÈ¥¡£

×ܽá

ÎÒÃÇÔÙ¼òµ¥¶Ô±ÈһϠ1.4.0 ºÍ 1.5.0 °æ±¾ÔÚReceiver·Ö·¢ÉϵÄÇø±ð£º

imageimage

ͨ¹ýÒÔÉÏ·ÖÎö£¬ÎÒÃÇ×ܽ᣺

Spark Streaming 1.4.0 Spark Streaming 1.5.0
Receiver »îÐÔ ²»±£Ö¤ÓÀ»î ÎÞÏÞÖØÊÔ¡¢±£Ö¤ÓÀ»î
Receiver ¾ùºâ·Ö·¢ ÎÞ±£Ö¤ round-robin ²ßÂÔ
×Ô¶¨Òå Receiver ·Ö·¢ ºÜ tricky ·½±ã

ÖÂл

±¾ÎÄËù·ÖÎöµÄ 1.5.0 ÒÔÀ´ÔöÇ¿µÄReceiver·Ö·¢²ßÂÔ£¬ÊÇÓÉÖìÊ«ÐÛͬѧǿÊƹ±Ï׸øÉçÇøµÄ£º

¡¾´ó ÖРС¡¿¡¾´òÓ¡¡¿ ¡¾·±Ìå¡¿¡¾Í¶¸å¡¿¡¾Êղء¿ ¡¾ÍƼö¡¿¡¾¾Ù±¨¡¿¡¾ÆÀÂÛ¡¿ ¡¾¹Ø±Õ¡¿ ¡¾·µ»Ø¶¥²¿¡¿
ÉÏһƪ£ºSparkËã×Ó£ºRDD»ù±¾×ª»»²Ù×÷(7)¨.. ÏÂһƪ£ºSparkѧϰÌåϵÕûÀí(»ù´¡Æª¡¢Öм¶..

×îÐÂÎÄÕÂ

ÈÈÃÅÎÄÕÂ

Hot ÎÄÕÂ

Python

C ÓïÑÔ

C++»ù´¡

´óÊý¾Ý»ù´¡

linux±à³Ì»ù´¡

C/C++ÃæÊÔÌâÄ¿