这是在运行spark时遇到的一些问题的小结,仅仅作参考,当时参考的具体地址也忘记了,如果想了解更详细的情况可以根据具体问题去搜索相关答案。
1. 运行在yarn集群上时报错:
Failed to send RPC5111091680910991783 to /192.168.xxx.xxxx:49208:java.nio.channels.ClosedChannelException
解决办法:配置yarn-site.xml一下内容
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
2.hadoop报警告:
WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable
解决方案是在文件hadoop-env.sh中增加:
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
3. yarn警告:
Neither spark.yarn.jarsnor spark.yarn.archive is set, falling back to uploading libraries underSPARK_HOME.
解决办法:
在hdfs上创建保存spark相关jars的目录:hdfs dfs -mkdir -p/jars/spark_jars
上传spark的jars:hdfs dfs-put /home/bin/spark/spark-2.2.0-bin-without-hadoop/jars/*/jars/spark_jars/
在spark的conf的spark-default.conf添加配置:spark.yarn.jars=hdfs://master:9000/jars/spark_jars/*
4.yarn报错:
Attempted to requestexecutors before the AM has registered!
解决办法:
spark-default.conf中配置spark.yarn.jars出错,更改上面问题3中的
spark.yarn.jars=hdfs://master:9000/jars/spark_jars/*èèspark.yarn.jars=hdfs://SparkMaster:9000/jars/spark_jars/*
5.yarn警告
Truncated the stringrepresentation of a plan since it was too large. This behavior can be adjustedby setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
解决办法:
spark.debug.maxToStringFields=100
读取文件的时候,由于字段数目较多,超过了默认的字段数目25,会提示警告,可以通过在spark-defaults.conf文件中添加一行处理
重启slave,再此进入spark-shell, warning消失了。
6.hadoop报错
Unable to loadnative-hadoop library for your platform... using builtin-java classes whereapplicable
解决办法:
Hadoop加载本地库失败
方法1.在Hadoop的配置文件core-site.xml中可以设置是否使用本地库:
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
方法2
在文件hadoop-env.sh中增加:
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
7.spark报错
no modle named “XXX
解决办法:
运行spark程序时候,有两种需要依赖的包:一种是项目开发中依赖的文件,一种是第三方包
对于文件通过 spark.file参数提交spark.files=ClusterAPI.py,DataFormateAPI.py来提交(他们必须在pythonpath路径中)
对于第三方包可以在每个节点上安装依赖的第三方包然后将包路径加入到环境变量中
8. hadoop报错
java.io.EOFException: Endof File Exception between local host is: "SparkWorker2/192.168.xx.xxx";destination host is: "SparkMaster":9000; : java.io.EOFException;
解决办法:
重新格式化nameNode :hadoop namenode-format
这是之前运行spark时遇到的一些问题,
---------------------
ref:https://blog.csdn.net/xiaoQL520/article/details/79538277