附录 B - Spark¶
数据类型(Data Types)¶
Spark SQL and DataFrames support the following data types:
- Numeric types
- ByteType: Represents 1-byte signed integer numbers. The range of numbers is from -128 to 127.
- ShortType: Represents 2-byte signed integer numbers. The range of numbers is from -32768 to 32767.
- IntegerType: Represents 4-byte signed integer numbers. The range of numbers is from -2147483648 to 2147483647.
- LongType: Represents 8-byte signed integer numbers. The range of numbers is from -9223372036854775808 to 9223372036854775807.
- FloatType: Represents 4-byte single-precision floating point numbers.
- DoubleType: Represents 8-byte double-precision floating point numbers.
- DecimalType: Represents arbitrary-precision signed decimal numbers. Backed internally by java.math.BigDecimal. A BigDecimal consists of an arbitrary precision integer unscaled value and a 32-bit integer scale.
- String type
- StringType: Represents character string values.
- Binary type
- BinaryType: Represents byte sequence values.
- Boolean type
- BooleanType: Represents boolean values.
- Datetime type
- TimestampType: Represents values comprising values of fields year, month, day, hour, minute, and second.
- DateType: Represents values comprising values of fields year, month, day.
- Complex types
- ArrayType(elementType, containsNull): Represents values comprising a sequence of elements with the type of elementType. containsNull is used to indicate if elements in a ArrayType value can have null values.
- MapType(keyType, valueType, valueContainsNull): Represents values comprising a set of key-value pairs. The data type of keys are described by keyType and the data type of values are described by valueType. For a MapType value, keys are not allowed to have null values. valueContainsNull is used to indicate if values of a MapType value can have null values.
- StructType(fields): Represents values with the structure described by a sequence of StructFields (fields).
- StructField(name, dataType, nullable): Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable is used to indicate if values of this fields can have null values.
Manual¶
DDL (Data Definition Language)¶
CREATE/DROP/ALTER/TRUNCATE/SHOW/DESCRIBE
DATABASE/TABLE/COLUMN/VIEW/INDEX/MACRO/FUNCTION
DML (Data Manipulation Language)¶
LOAD/INSERT/UPDATE/DELETE/MERGE
IMPORT/EXPORT
EXPLAIN
DQL (Data Query Language)¶
SELECT
DCL (Data Control Language)¶
GRANT/REVOKE
ROLE/PRIVILEGE
DTL (Data Transaction Language)¶
LOCKS/TRANSACTIONS/COMPACTIONS
Functions¶
array_distinct¶
array_distinct(array) - Removes duplicate values from the array.
Examples:
> SELECT array_distinct(array(1, 2, 3, null, 3)); [1,2,3,null]
Since: 2.4.0
array_except¶
array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates.
Examples:
> SELECT array_except(array(1, 2, 3), array(1, 3, 5)); [2]
Since: 2.4.0
array_intersect¶
array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates.
Examples:
> SELECT array_intersect(array(1, 2, 3), array(1, 3, 5)); [1,3]
Since: 2.4.0
array_join¶
array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.
Examples:
> SELECT array_join(array('hello', 'world'), ' '); hello world > SELECT array_join(array('hello', null ,'world'), ' '); hello world > SELECT array_join(array('hello', null ,'world'), ' ', ','); hello , world
Since: 2.4.0
array_max¶
array_max(array) - Returns the maximum value in the array. NULL elements are skipped.
Examples:
> SELECT array_max(array(1, 20, null, 3)); 20
Since: 2.4.0
array_min¶
array_min(array) - Returns the minimum value in the array. NULL elements are skipped.
Examples:
> SELECT array_min(array(1, 20, null, 3)); 1
Since: 2.4.0
array_position¶
array_position(array, element) - Returns the (1-based) index of the first element of the array as long.
Examples:
> SELECT array_position(array(3, 2, 1), 1); 3
Since: 2.4.0
array_remove¶
array_remove(array, element) - Remove all elements that equal to element from array.
Examples:
> SELECT array_remove(array(1, 2, 3, null, 3), 3); [1,2,null]
Since: 2.4.0
array_repeat¶
array_repeat(element, count) - Returns the array containing element count times.
Examples:
> SELECT array_repeat('123', 2); ["123","123"]
Since: 2.4.0
array_sort¶
array_sort(array) - Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.
Examples:
> SELECT array_sort(array('b', 'd', null, 'c', 'a')); ["a","b","c","d",null]
Since: 2.4.0
array_union¶
array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates.
Examples:
> SELECT array_union(array(1, 2, 3), array(1, 3, 5)); [1,2,3,5]
Since: 2.4.0
arrays_overlap¶
arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.
Examples:
> SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5)); true
Since: 2.4.0
arrays_zip¶
arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
Examples:
> SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4)); [{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}] > SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4)); [{"0":1,"1":2,"2":3},{"0":2,"1":3,"2":4}]
Since: 2.4.0
from_json¶
from_json(jsonStr, schema[, options]) - Returns a struct value with the given jsonStr and schema.
Examples:
> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE'); {"a":1, "b":0.8} > SELECT from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); {"time":"2015-08-26 00:00:00.0"}
Since: 2.2.0
to_json¶
to_json(expr[, options]) - Returns a json string with a given struct value
Examples:
> SELECT to_json(named_struct('a', 1, 'b', 2)); {"a":1,"b":2} > SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); {"time":"26/08/2015"} > SELECT to_json(array(named_struct('a', 1, 'b', 2)); [{"a":1,"b":2}] > SELECT to_json(map('a', named_struct('b', 1))); {"a":{"b":1}} > SELECT to_json(map(named_struct('a', 1),named_struct('b', 2))); {"[1]":{"b":2}} > SELECT to_json(map('a', 1)); {"a":1} > SELECT to_json(array((map('a', 1)))); [{"a":1}]
Since: 2.2.0