附录 A - Spark
Application -1:n-> Session(Context) -1:n-> Job -1:n-> Stage
Task -1:n-> Partition -1:n-> Block
Configuration
Application Properties
名称 |
版本 |
默认值 |
推荐值 |
含义 |
spark.driver.memory |
- |
1g |
2g, 4g |
Driver 内存 |
spark.driver.cores |
- |
1 |
2, 4 |
Driver 核数 |
spark.executor.memory |
- |
1g |
4g, 16g |
Executor 内存 |
spark.executor.cores |
- |
1 |
2, 8 |
Executor 核数 |
Runtime Environment
Shuffle Behavior
Compression and Serialization
Memory Management
Execution Behavior
Networking
Scheduling
Dynamic Allocation
Spark SQL
名称 |
版本 |
默认值 |
推荐值 |
含义 |
spark.sql.shuffle.partitions |
- |
200 |
20, 400 |
Shuffle分区数量(Join、Aggr) |
spark.sql.autoBroadcastJoinThreshold |
- |
10L * 1024 * 1024 |
(32, 64) * 1024 * 1024 |
自动优化为BroadcastJoin阈值 |
spark.sql.adaptive.enabled |
- |
false |
true |
自适应查询执行(Broadcast、Partition、Skew) |
spark.sql.adaptive.shuffle.targetPostShuffleInputSize |
- |
64 * 1024 * 1024 |
(32, 128) * 1024 * 1024 |
Shuffle读取文件大小 |
spark.sql.adaptive.minNumPostShufflePartitions |
- |
-1 |
10, 200 |
Shuffle最小分区数量 |
Yarn
Hive
名称 |
版本 |
默认值 |
推荐值 |
含义 |
hive.exec.dynamic.partition |
- |
false |
true |
允许动态分区 |
hive.exec.dynamic.partition.mode |
- |
strict |
nonstrict |
动态分区模式 |
hive.exec.max.dynamic.partitions |
- |
1000 |
100-1000 |
允许创建最大分区数 |
MapReduce
HDFS
JVM
Reference