Spark 源码调试¶
环境¶
只涉及 Java 及 Scala 代码的编译、测试及调试
- macOS 10.14.5
- Java 8u152
- Scala 2.12.8
- Maven 3.6.0
- SBT 0.13.18
- IDEA 2018.3.4
- Spark 2.4.3
命令¶
编译¶
maven
./build/mvn -DskipTests clean package
./build/mvn -pl :spark-core_2.12 -DskipTests clean package -am
-pl
,编译指定模块,{groupId}:{artifactId} 或是 dir_path-am
,同时编译依赖模块
./dev/make-distribution.sh --tgz -Phadoop-2.7 -Phive -Phive-thriftserver -Pyarn -DskipTests
sbt
./build/sbt -Pscala-2.12 -Phive -Phive-thriftserver -Pyarn -Dhadoop-2.7 -DskipTests -Dsbt.override.build.repos=false project core package
测试¶
maven
./build/mvn -pl :core
./build/mvn test -DwildcardSuites=none -Dtest=org.apache.spark.streaming.JavaAPISuite
-DwildcardSuites
,指定 Scala 测试-Dtest
,指定 Java 测试
./build/mvn test -pl core -Dtest=none -Dsuites='*DAGSchedulerSuite SPARK-3353'
- Scala Test 使用通配符匹配:类、方法、测试名称
sbt
./build/sbt -Pscala-2.12 -Phive -Phive-thriftserver -Pyarn -Dhadoop-2.7 -DskipTests -Dsbt.override.build.repos=false project core test testOnly *DAGSchedulerSuite -- -z "SPARK-3353"
调试¶
IDEA:Run > Edit Configurations > + > Remote > Host localhost
、Port 5005
、Command -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
./bin/spark-submit --driver-java-options "-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005" --class org.apache.spark.examples.SparkPi examples/target/spark-examples_2.11-2.4.3-SNAPSHOT.jar
- Standalone 模式,可只启动一个 Executor 进行调试
maven
./build/mvn test -pl core -Dtest=none -Dsuites='*DAGSchedulerSuite' -DdebugForkedProcess=true -DdebuggerPort=5005
-DdebugForkedProcess=true
,开启 UT 调试,默认端口 5005
sbt
./build/sbt -Pscala-2.12 -Phive -Phive-thriftserver -Pyarn -Dhadoop-2.7 -DskipTests -Dsbt.override.build.repos=false -jvm-debug 5005 project core set fork in Test := false testOnly org.apache.spark.scheduler.DAGSchedulerSuite
-DdebugForkedProcess=true
,开启 UT 调试,默认端口 5005