Spark Shared Variables

Concept

Broadcast

Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

Broadcast(TorrentBroadcast)

  • unpersist(TorrentBroadcast.unpersist(id, removeFromDriver = flase, blocking = flase))

  • destroy(TorrentBroadcast.unpersist(id, removeFromDriver = true, blocking = true))

BroadcastFactory(TorrentBroadcastFactory)

  • BlockManager

  • BroadcastManager -> BroadcastFactory(TorrentBroadcastFactory)#newBroadcast

Accumulators

AccumulatorV2

AccumulatorV2 parameterized class represents an accumulator that accumulates IN values to produce OUT result.

  • AccumulatorMetadata

  • AccumulatorContext

  • LongAccumulator, DoubleAccumulator, CollectionAccumulator

  • isZero, copy, reset, add, merge, value

To be on the safe side, always use accumulators inside actions ONLY.