Concept
Broadcast
Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.
Broadcast(TorrentBroadcast)
unpersist(TorrentBroadcast.unpersist(id, removeFromDriver = flase, blocking = flase))
destroy(TorrentBroadcast.unpersist(id, removeFromDriver = true, blocking = true))
BroadcastFactory(TorrentBroadcastFactory)
BlockManager
BroadcastManager -> BroadcastFactory(TorrentBroadcastFactory)#newBroadcast
Accumulators
AccumulatorV2
AccumulatorV2 parameterized class represents an accumulator that accumulates IN values to produce OUT result.
AccumulatorMetadata
AccumulatorContext
LongAccumulator, DoubleAccumulator, CollectionAccumulator
isZero, copy, reset, add, merge, value
To be on the safe side, always use accumulators inside actions ONLY.
Links
- Author:HyperJ
- Source:HyperJ’s Blog
- Link:Spark Shared Variables