Skip to content

计算(Compute)

RDD

Characteristics

  • 分布(Partitions)
  • 本地化(PreferredLocations)
  • 依赖(Dependencies)
  • 迭代(Iterator)
  • 分区(Partitioner)

Operations

  • Creation
  • Transformation
  • Storage
  • Action

Dependencies

  • Narrow Dependencies
  • Shuffle/Wide Dependencies

Stage

  • ResultStage
  • ShuffleMapStage

DAG

  • Lineage
  • Fault Tolerance
  • Data Dependency

Shuffle

  • Read/Write
  • Server/Client
  • Pull/Push

Tungsten

  • Memory Management and Binary Processing
  • Cache-aware computation
  • Code generation
  • No virtual function dispatches
  • Intermediate data in memory vs CPU registers
  • Loop unrolling and SIMD

Reference