Skip to content

ARTS - 2019 Week 7-2




Oversubscribing Apache Spark Resource Usage for Fun and $$$

Apache Spark is quickly being adopted at Facebook and now powers an important portion of Facebook’s batch ETL workload. While Spark is typically more efficient than Hive, we continue to search for opportunities to further reduce hardware costs. Recently, we have started an effort to apply custom resource oversubscription for every unique Spark job. We often observe that system resources (e.g. CPU / memory) tend to be underutilized in Spark jobs.

For example, when the data can not fit in memory, disk spill may dominate overall performance. Since the default configuration allocates one CPU core per Spark task, this can lead to situations where Spark jobs are not using all the CPU resources allocated to them and as a result, the overall cluster CPU utilization can remain low even under peak scheduling conditions. Similarly, for memory, depending on the input data and shuffle sizes, the job might not fully utilize its reserved memory and a significant amount of cluster memory can be wasted. In this session, we will describe the CPU and memory oversubscription technique we use to increase overall resource utilization of our Spark clusters. We leverage a historical stats based resource oversubscription algorithm that considers the historical resource usage of each unique Spark job and predicts the ideal resource allocation to minimize unused resources and increase the cluster utilization.

We’ll also explain the changes necessary to Spark and the resource manager in order to support oversubscription. Finally, we will conclude by sharing our results and the future direction for this project.

【PDF】Tuning Apache Spark Resource Usage For Fun And Efficiency

Spark 运行模型、内存模型:

  • CPU/IO:Metrics、Efficiency
  • Memory:Metrics、Efficiency

History-based Resource Auto-tuning


  • CPU 核支持分配小数
  • 周期运行作业资源使用的日志


  • 预测作业所需要 CPU 和内存
  • 作业获取预测的资源进行申请


  • 作业过去10天运行的历史信息
  • CPU 使用时间、CPU 申请时间
  • Executor 最大使用内存
  • 作业标识


  • 内存使用量
  • CPU使用量


  • CPU 上下文切换
  • 内存使用超限额被 Kill

Kill Job


  • CPU:申请、使用、剩余
  • Memory:申请、使用、剩余
  • Backlog:延迟、积压
  • Resource Waiting:资源等待时间


Spark 常用参数

  • spark.executor.cores=4;
  • spark.executor.memory=8G;
  • spark.executor.memory.overhead=4G;
  • spark.task.cpus=1;
  • spark.sql.shuffle.partitions=800;


Spark SQL 规则