TensorFlowOnSpark 使用 - Python

TOP

TensorFlowOnSpark 使用

2018-12-13 18:44:54 【大中小】浏览:40次

Tags：TensorFlowOnSpark 使用

搭建请参考上一篇文章。

1.广播环境变量,指定Python的路径

export PYTHON_ROOT=/data/Python

export PYSPARK_PYTHON=${PYTHON_ROOT}/bin/python

export SPARK_YARN_USER_ENV=”PYSPARK_PYTHON=Python/bin/python”

2.提交具体任务

遇到权限问题是普遍现象，层层排查
hdfs dfs -chmod 777 /user/hdfs

hdfs dfs -ls chmod 766 /user/hdfs

hdfs dfs -mkdir /user/hdfs/mnist_model
chmod hdfs:hdfs -R /data/TensorflowOnSpark

因为输出目录为yarn创建，所以确保路径的执行以及读写权限

spark-submit –master yarn –deploy-mode cluster –num-executors 3 –executor-memory 2g \
–queue default \
–py-files TensorFlowOnSpark/tfspark.zip,TensorFlowOnSpark/examples/mnist/tf/mnist_dist.py \
–conf spark.dynamicAllocation.enabled=false –conf spark.yarn.maxAppAttempts=1 \
–archives hdfs:///user/${USER}/Python.zip#Python \
–conf spark.executorEnv.LD_LIBRARY_PATH=”/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64:/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64/libhdfs.so:/usr/java/jdk1.8.0_131/jre/lib/amd64/server/” \
TensorFlowOnSpark/examples/mnist/tf/mnist_spark.py \
–images mnist/tfr/train \
–format tfr \
–mode train \
–model mnist_model

spark-submit –master yarn –deploy-mode cluster –queue default \
–num-executors 3 \
–executor-memory 3g \
–py-files /data/TensorFlowOnSpark/tfspark.zip,/data/TensorFlowOnSpark/examples/mnist/tf/mnist_dist.py \
–conf spark.dynamicAllocation.enabled=false \
–conf spark.yarn.maxAppAttempts=1 \
–archives hdfs:///user/${USER}/Python.zip#Python \
–conf spark.executorEnv.LD_LIBRARY_PATH=”/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64:/usr/java/jdk1.8.0_131/jre/lib/amd64/server/” \
/data/TensorFlowOnSpark/examples/mnist/tf/mnist_spark.py –images mnist/tfr/test –mode inference \
–model mnist_model \
-o predictions2

平台错误

NotFoundError: /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-hdfs/lib/native/libhdfs.so: cannot open shared object file: No such file or directory

ansible test_hadoop -m shell -a “mkdir -p /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-hdfs/lib/native/”

ansible test_hadoop -m shell -a “ln -s /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib64/libhdfs.so /data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop-hdfs/lib/native/libhdfs.so”


【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：Python GIL	下一篇：[ Python ] python 从哪开始执行 ..

1.广播环境变量,指定Python的路径

2.提交具体任务

相关说明

平台错误