Spark Shuffle

Concept

  • Task(ShuffleMapTask, ResultTask)

Shuffle Write

  • ShuffleMapTask#runTask -> ShuffleManager(SortShuffleManager)#getWriter -> ShuffleWriter(SortShuffleWriter, UnsafeShuffleWriter)#write

  • ShuffleWriter

    SortShuffleWriter(ExternalSorter, Aggregator, ExternalAppendOnlyMap)

    UnsafeShuffleWriter(ShuffleExternalSorter)

Shuffle Read

  • ShuffledRDD#compute -> ShuffleManager(SortShuffleManager)#getReader -> ShuffleReader(BlockStoreShuffleReader)#read

  • BlockStoreShuffleReader(ExternalSorter, Aggregator, ExternalAppendOnlyMap)