设为首页 加入收藏

TOP

TensorFlowOnSpark 使用
2018-12-13 18:44:54 】 浏览:40
Tags:TensorFlowOnSpark 使用
版权声明:本文为博主原创文章,未经博主允许随机转载。 https://blog.csdn.net/mtj66/article/details/79094518

搭建请参考上一篇文章。

1.广播环境变量,指定Python的路径

export PYTHON_ROOT=/data/Python

export PYSPARK_PYTHON=${PYTHON_ROOT}/bin/python

export SPARK_YARN_USER_ENV=”PYSPARK_PYTHON=Python/bin/python

2.提交具体任务

遇到权限问题是普遍现象,层层排查
hdfs dfs -chmod 777 /user/hdfs

hdfs dfs -ls chmod 766 /user/hdfs

hdfs dfs -mkdir /user/hdfs/mnist_model
chmod hdfs:hdfs -R /data/TensorflowOnSpark

因为输出目录为yarn创建,所以确保路径的执行以及读写权限

spark-submit –master yarn –deploy-mode cluster –num-executors 3 –executor-memory 2g \
–queue default \
–py-files TensorFlowOnSpark/tfspark.zip,TensorFlowOnSpark/examples/mnist/tf/mnist_dist.py \
–conf spark.dynamicAllocation.enabled=false –conf spark.yarn.maxAppAttempts=1 \
–archives hdfs:///user/${USER}/Python.zip#Python \
–conf spark.executorEnv.LD_LIBRARY_PATH=”/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64:/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64/libhdfs.so:/usr/java/jdk1.8.0_131/jre/lib/amd64/server/” \
TensorFlowOnSpark/examples/mnist/tf/mnist_spark.py \
–images mnist/tfr/train \
–format tfr \
–mode train \
–model mnist_model

spark-submit –master yarn –deploy-mode cluster –queue default \
–num-executors 3 \
–executor-memory 3g \
–py-files /data/TensorFlowOnSpark/tfspark.zip,/data/TensorFlowOnSpark/examples/mnist/tf/mnist_dist.py \
–conf spark.dynamicAllocation.enabled=false \
–conf spark.yarn.maxAppAttempts=1 \
–archives hdfs:///user/${USER}/Python.zip#Python \
–conf spark.executorEnv.LD_LIBRARY_PATH=”/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64:/usr/java/jdk1.8.0_131/jre/lib/amd64/server/” \
/data/TensorFlowOnSpark/examples/mnist/tf/mnist_spark.py –images mnist/tfr/test –mode inference \
–model mnist_model \
-o predictions2

相关说明

/usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so

/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64/libhdfs.so
/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-hdfs/lib/native/libhdfs.so

TensorFlowOnSpark框架相关的依赖 ,以及分布式执行mnist所需要的mnist_dist.py

–py-files /data/TensorFlowOnSpark/tfspark.zip,/data/TensorFlowOnSpark/examples/mnist/tf/mnist_dist.py \

指定已经编译好的Python的路径,这里USER是指hdfs,在切换到hdfs用户的时候,环境变量已经包含

–archives hdfs:///user/${USER}/Python.zip#Python

指定hdfs文件操作相关的操作的依赖包

–conf spark.executorEnv.LD_LIBRARY_PATH=”/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64:/usr/java/jdk1.8.0_131/jre/lib/amd64/server/” \

平台错误

NotFoundError: /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-hdfs/lib/native/libhdfs.so: cannot open shared object file: No such file or directory

ansible test_hadoop -m shell -a “mkdir -p /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-hdfs/lib/native/”

ansible test_hadoop -m shell -a “ln -s /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64/libhdfs.so /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-hdfs/lib/native/libhdfs.so”

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Python GIL 下一篇[ Python ] python 从哪开始执行 ..

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目