ARTS - 2019 Week 7-2¶
Apache Spark is quickly being adopted at Facebook and now powers an important portion of Facebook’s batch ETL workload. While Spark is typically more efficient than Hive, we continue to search for opportunities to further reduce hardware costs. Recently, we have started an effort to apply custom resource oversubscription for every unique Spark job. We often observe that system resources (e.g. CPU / memory) tend to be underutilized in Spark jobs.
For example, when the data can not fit in memory, disk spill may dominate overall performance. Since the default configuration allocates one CPU core per Spark task, this can lead to situations where Spark jobs are not using all the CPU resources allocated to them and as a result, the overall cluster CPU utilization can remain low even under peak scheduling conditions. Similarly, for memory, depending on the input data and shuffle sizes, the job might not fully utilize its reserved memory and a significant amount of cluster memory can be wasted. In this session, we will describe the CPU and memory oversubscription technique we use to increase overall resource utilization of our Spark clusters. We leverage a historical stats based resource oversubscription algorithm that considers the historical resource usage of each unique Spark job and predicts the ideal resource allocation to minimize unused resources and increase the cluster utilization.
We’ll also explain the changes necessary to Spark and the resource manager in order to support oversubscription. Finally, we will conclude by sharing our results and the future direction for this project.
History-based Resource Auto-tuning¶
- CPU 核支持分配小数
- 预测作业所需要 CPU 和内存
- CPU 使用时间、CPU 申请时间
- Executor 最大使用内存
- CPU 上下文切换
- 内存使用超限额被 Kill
- Resource Waiting：资源等待时间