Spark-Submit： --packages vs --jars

java scala cassandra apache-spark

2022-09-02 09:20:50

有人可以解释火花提交脚本之间的区别吗？--packages--jars

nohup ./bin/spark-submit   --jars ./xxx/extrajars/stanford-corenlp-3.8.0.jar,./xxx/extrajars/stanford-parser-3.8.0.jar \
--packages datastax:spark-cassandra-connector_2.11:2.0.7 \
--class xxx.mlserver.Application \
--conf spark.cassandra.connection.host=192.168.0.33 \
--conf spark.cores.max=4 \
--master spark://192.168.0.141:7077  ./xxx/xxxanalysis-mlserver-0.1.0.jar   1000  > ./logs/nohup.out &

另外，如果依赖项位于我的应用程序中，我是否需要配置？（我问，因为我只是通过更改版本来炸毁我的应用程序，而忘记在--packagespom.xml--packagespom.xml)

我目前正在使用，因为jars很大（超过100GB），因此减慢了阴影jar编译的速度。我承认我不确定为什么我使用，除了因为我正在关注datastax文档--jars--packages

答案 1

如果你这样做，它将显示：spark-submit --help

--jars JARS                 Comma-separated list of jars to include on the driver
                              and executor classpaths.

--packages                  Comma-separated list of maven coordinates of jars to include
                              on the driver and executor classpaths. Will search the local
                              maven repo, then maven central and any additional remote
                              repositories given by --repositories. The format for the
                              coordinates should be groupId:artifactId:version.

如果是 --jars