Hive Optimization

Hive

Configuration Properties

dynamic partition

hive.exec.dynamic.partition

Default Value: false prior to Hive 0.9.0; true in Hive 0.9.0 and later (HIVE-2835)
Added In: Hive 0.6.0
Whether or not to allow dynamic partitions in DML/DDL.
set hive.exec.dynamic.partition = true

hive.exec.dynamic.partition.mode

Default Value: strict
Added In: Hive 0.6.0
In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In nonstrict mode all partitions are allowed to be dynamic.
Set to nonstrict to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see hive.txn.manager.
set hive.exec.dynamic.partition.mode = nonstrict

hive.exec.max.dynamic.partitions

Default Value: 1000
Added In: Hive 0.6.0
Maximum number of dynamic partitions allowed to be created in total.
set hive.exec.max.dynamic.partitions = 1000

hive.exec.max.dynamic.partitions.pernode

Default Value: 100
Added In: Hive 0.6.0
Maximum number of dynamic partitions allowed to be created in each mapper/reducer node.
set hive.exec.max.dynamic.partitions.pernode = 100

parallel

hive.exec.parallel

Default Value: false
Added In: Hive 0.5.0
Whether to execute jobs in parallel.  Applies to MapReduce jobs that can run in parallel, for example jobs processing different source tables before a join.  As of Hive 0.14, also applies to move tasks that can run in parallel, for example moving files to insert targets during multi-insert.
set hive.exec.parallel = true

hive.exec.parallel.thread.number

Default Value: 8
Added In: Hive 0.6.0
How many jobs at most can be executed in parallel.
set hive.exec.parallel.thread.number = 12

merge

hive.merge.mapfiles

Default Value: true
Added In: Hive 0.4.0
Merge small files at the end of a map-only job.
set hive.merge.mapfiles = true

hive.merge.mapredfiles

Default Value: false
Added In: Hive 0.4.0
Merge small files at the end of a map-reduce job.
set hive.merge.mapredfiles = true

optimize

hive.optimize.groupby

Default Value: true
Added In: Hive 0.5.0
Whether to enable the bucketed group by from bucketed partitions/tables.
set hive.optimize.groupby = true

Knowledge makes me travel through time and space.

Hive Optimization

Hive

Configuration Properties

dynamic partition

hive.exec.dynamic.partition

hive.exec.dynamic.partition.mode

hive.exec.max.dynamic.partitions

hive.exec.max.dynamic.partitions.pernode

parallel

hive.exec.parallel

hive.exec.parallel.thread.number

merge

hive.merge.mapfiles

hive.merge.mapredfiles

optimize

hive.optimize.groupby

Deprecated Properties

Design

Program

pruning

join

count distinct

group by

(not) in/exists

multi insert、union all

Other

Hadoop

Core

HDFS

YARN

MapReduce

Spark

Server & OS

Links