TOP

Hadoop 中web服务的REST API介绍
2019-01-28 12:39:52 】 浏览:17998
Tags:Hadoop web 服务 REST API 介绍

Hadoop YARN中web服务的REST API介绍

首先说说什么是REST

REST的全拼是(REpresentational State Transfer,表述性状态转移)。REST指的是一组架构约束条件和原则,满足这些约束条件和原则的应用程序设计就是RESTful。

那架构和框架的区别是什么?

框架,即framework,其实是某种应用的半成品,就是一组组件,供你选择来完成自己的系统。简单说就是别人搭好的舞台,你来做表演。而且框架是一般是成熟的,不断升级的软件。

架构,也就是通常所说的软件体系结构,体系结构一般分为三部分:构建、用于描述计算机; 连接器,用于描述构建的链接部分;配置,将构建和连接器组成一个有机整体。

这两者进行比较,架构,呈现的是一个设计规约,而框架是程序代码。架构大多数指导一个软件系统的实施与开发,而框架的首要目的是为了复用,因此架构可以有其体系结构,用于指导框架的开发。


而REST并不是一种新兴的什么技术语言,也不是什么新的框架,而是一种概念、风格或者约束,回归到Http本身的建议。

web几大基本技术:

URI(统一资源标示符)

HTTP(超文本传输协议)(post、get、put、delete)

Hypertext(超文本,用来描述资源的内容和状态,可以用html、xml、json或者自定义格式的文本来描述任何一个资源)

REST应具备的几点约束

1、每个资源都应该有唯一的一个标识

2、使用标准的方法更改资源的状态

3、request和response的自描述

4、资源多重表述

5、无状态服务



  Hadoop YARN自带了一系列的web service REST API,我们可以通过这些web service访问集群(cluster)、节点(nodes)、应用(application)以及应用的历史信息。根据API返回的类型,这些URL源归会类到不同的组。一些API返回collector类型的,有些返回singleton类型。这些web service REST API的语法如下:

http://{http address of service}/ws/{version}/{resourcepath}

  其中,{http address of service}是我们需要获取信息的服务器地址,目前支持访问ResourceManager, NodeManager,MapReduce application master, and history server;{version}是这些API的版本,目前只支持v1;{resourcepath}定义singleton资源或者collection资源的路径.
  下面举例说明这些web service怎么用。
假设你有一个application_1388830974669_1540349作业,并且运行完了。可以通过下面的命令得到这个作业的一些信息:

$ curl --compressed -H"Accept: application/json"-X \

上面的运行结果是返回一个Json格式的,如下:

{
"app": {
"finishedTime":0,
"trackingUI":"ApplicationMaster",
"state":"RUNNING",
"user":"user1",
"id":"application_1326821518301_0010",
"clusterId":1326821518301,
"finalStatus":"UNDEFINED",
"amHostHttpAddress":"host.domain.com:8042",
"progress":82.44703,
"name":"Sleep job",
"startedTime":1326860715335,
"elapsedTime":31814,
"diagnostics":"",
"queue":"a1"
}
}

根据这些信息,用户可以获取到更多关于application_1326821518301_0010的信息,比如大家可以通过上面Json中的trackingUrl从ResourceManage中得到更进一步的信息:

$ curl --compressed -H"Accept: application/json"-X \
{
"jobs": {
"job": [
{
"runningReduceAttempts":1,
"reduceProgress":72.104515,
"failedReduceAttempts":0,
"newMapAttempts":0,
"mapsRunning":0,
"state":"RUNNING",
"successfulReduceAttempts":0,
"reducesRunning":1,
"acls": [
{
"value":" ",
"name":"mapreduce.job.acl-modify-job"
},
{
"value":" ",
"name":"mapreduce.job.acl-view-job"
}
],
"reducesPending":0,
"user":"user1",
"reducesTotal":1,
"mapsCompleted":1,
"startTime":1326860720902,
"id":"job_1326821518301_10_10",
"successfulMapAttempts":1,
"runningMapAttempts":0,
"newReduceAttempts":0,
"name":"Sleep job",
"mapsPending":0,
"elapsedTime":64432,
"reducesCompleted":0,
"mapProgress":100,
"diagnostics":"",
"failedMapAttempts":0,
"killedReduceAttempts":0,
"mapsTotal":1,
"uberized":false,
"killedMapAttempts":0,
"finishTime":0
}
]
}
}

如果用户希望得到上述job id为job_1326821518301_10_10作业的一些task信息可以用下面命令执行:

$ curl --compressed -H"Accept: application/json"-X \
输出:
{
"tasks": {
"task": [
{
"progress":100,
"elapsedTime":5059,
"state":"SUCCEEDED",
"startTime":1326860725014,
"id":"task_1326821518301_10_10_m_0",
"type":"MAP",
"successfulAttempt":"attempt_1326821518301_10_10_m_0_0",
"finishTime":1326860730073
},
{
"progress":72.104515,
"elapsedTime":0,
"state":"RUNNING",
"startTime":1326860732984,
"id":"task_1326821518301_10_10_r_0",
"type":"REDUCE",
"successfulAttempt":"",
"finishTime":0
}
]
}
}

送上面可以看出,map任务已经完成了,但是reduce任务还在跑。如果用户需要看一下task_1326821518301_10_10_r_0 task的信息,可以用下面的命令:

$ curl --compressed -X \
GET "http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/ \
mapreduce/jobs/job_1326821518301_10_10/tasks/task_1326821518301_10_10_r_0/attempts"
输出:
{
"taskAttempts": {
"taskAttempt": [
{
"elapsedMergeTime":158,
"shuffleFinishTime":1326860735378,
"assignedContainerId":"container_1326821518301_0010_01_000003",
"progress":72.104515,
"elapsedTime":0,
"state":"RUNNING",
"elapsedShuffleTime":2394,
"mergeFinishTime":1326860735536,
"rack":"/10.10.10.0",
"elapsedReduceTime":0,
"nodeHttpAddress":"host.domain.com:8042",
"type":"REDUCE",
"startTime":1326860732984,
"id":"attempt_1326821518301_10_10_r_0_0",
"finishTime":0
}
]
}
}

reduce attempt 还在运行,如果用户需要查看对应的attempt当前的counter values,可以用下面命令:

$ curl --compressed -H"Accept: application/json"-X GET \
"http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce \
/jobs/job_1326821518301_10_10/tasks/task_1326821518301_10_10_r_0/attempts \
/attempt_1326821518301_10_10_r_0_0/counters"
输出:
{
"JobTaskAttemptCounters": {
"taskAttemptCounterGroup": [
{
"counterGroupName":"org.apache.hadoop.mapreduce.FileSystemCounter",
"counter": [
{
"value":4216,
"name":"FILE_BYTES_READ"
},
{
"value":77151,
"name":"FILE_BYTES_WRITTEN"
},
{
"value":0,
"name":"FILE_READ_OPS"
},
{
"value":0,
"name":"FILE_LARGE_READ_OPS"
},
{
"value":0,
"name":"FILE_WRITE_OPS"
},
{
"value":0,
"name":"HDFS_BYTES_READ"
},
{
"value":0,
"name":"HDFS_BYTES_WRITTEN"
},
{
"value":0,
"name":"HDFS_READ_OPS"
},
{
"value":0,
"name":"HDFS_LARGE_READ_OPS"
},
{
"value":0,
"name":"HDFS_WRITE_OPS"
}
]
},
{
"counterGroupName":"org.apache.hadoop.mapreduce.TaskCounter",
"counter": [
{
"value":0,
"name":"COMBINE_INPUT_RECORDS"
},
{
"value":0,
"name":"COMBINE_OUTPUT_RECORDS"
},
{
"value":1767,
"name":"REDUCE_INPUT_GROUPS"
},
{
"value":25104,
"name":"REDUCE_SHUFFLE_BYTES"
},
{
"value":1767,
"name":"REDUCE_INPUT_RECORDS"
},
{
"value":0,
"name":"REDUCE_OUTPUT_RECORDS"
},
{
"value":0,
"name":"SPILLED_RECORDS"
},
{
"value":1,
"name":"SHUFFLED_MAPS"
},
{
"value":0,
"name":"FAILED_SHUFFLE"
},
{
"value":1,
"name":"MERGED_MAP_OUTPUTS"
},
{
"value":50,
"name":"GC_TIME_MILLIS"
},
{
"value":1580,
"name":"CPU_MILLISECONDS"
},
{
"value":141320192,
"name":"PHYSICAL_MEMORY_BYTES"
},
{
"value":1118552064,
"name":"VIRTUAL_MEMORY_BYTES"
},
{
"value":73728000,
"name":"COMMITTED_HEAP_BYTES"
}
]
},
{
"counterGroupName":"Shuffle Errors",
"counter": [
{
"value":0,
"name":"BAD_ID"
},
{
"value":0,
"name":"CONNECTION"
},
{
"value":0,
"name":"IO_ERROR"
},
{
"value":0,
"name":"WRONG_LENGTH"
},
{
"value":0,
"name":"WRONG_MAP"
},
{
"value":0,
"name":"WRONG_REDUCE"
}
]
},
{
"counterGroupName":"org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter",
"counter": [
{
"value":0,
"name":"BYTES_WRITTEN"
}
]
}
],
"id":"attempt_1326821518301_10_10_r_0_0"
}
}

当job完成之后,用户希望从历史服务器中获取这些作业的信息,可以用下面命令:

$ curl --compressed -X GET \
输出:
{
"job": {
"avgReduceTime":1250784,
"failedReduceAttempts":0,
"state":"SUCCEEDED",
"successfulReduceAttempts":1,
"acls": [
{
"value":" ",
"name":"mapreduce.job.acl-modify-job"
},
{
"value":" ",
"name":"mapreduce.job.acl-view-job"
}
],
"user":"user1",
"reducesTotal":1,
"mapsCompleted":1,
"startTime":1326860720902,
"id":"job_1326821518301_10_10",
"avgMapTime":5059,
"successfulMapAttempts":1,
"name":"Sleep job",
"avgShuffleTime":2394,
"reducesCompleted":1,
"diagnostics":"",
"failedMapAttempts":0,
"avgMergeTime":2552,
"killedReduceAttempts":0,
"mapsTotal":1,
"queue":"a1",
"uberized":false,
"killedMapAttempts":0,
"finishTime":1326861986164
}
}

用户也可以从ResourceManager中获取到最终applications的信息:

$ curl --compressed -H"Accept: application/json"-X GET \
输出:
{
"app": {
"finishedTime":1326861991282,
"trackingUI":"History",
"state":"FINISHED",
"user":"user1",
"id":"application_1326821518301_0010",
"clusterId":1326821518301,
"finalStatus":"SUCCEEDED",
"amHostHttpAddress":"host.domain.com:8042",
"progress":100,
"name":"Sleep job",
"startedTime":1326860715335,
"elapsedTime":1275947,
"diagnostics":"",
"queue":"a1"
}
}

Hadoop 中web服务的REST API介绍 https://www.cppentry.com/bencandy.php?fid=114&id=205919

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Hadoop并行计算原理与分布式并发.. 下一篇Hadoop 基于protobuf 的RPC的客户..