当前位置:   article > 正文

Hadoop Hive

hive data is divided into table ______ and ______.

Hive Service

Hive Shell/CLI, beeline, HiveServer2, Hive Web Interface, jar, metastore.

Hive Shell

The primary way to interact with Hive by issuing command in HiveQL.

  • %hive
  • hive> hiveql ...
  • %hive -f 'script'
  • %hive -e 'hiveql'

HiveServer2

Run hive as a server exposing a Thrift service, enabling access from a range of clients written in different languages.

Hadoop Cluster

FileSystem

A Hive table is logically made up of the data being stored and the associated metadata describing the layout of the data in the table.

Data resides in Hadoop filesystem, which includes local filesystem, S3 and HDFS.

Metadata is stored separately in a RDBMS, which is default to Derby

Execution Engine

  • hive.execution.engine=mapreduce, tez, spark;
  • MapReduce is the default. Both Tez and Spark are general DAG engine that provides more flexibility and higher performance than MapReduce.

Resource Manager

  • default to local job runner.
  • yarn.resourcemanager.address.

Metastore

The central repository of Hive metadata, which is divided into two pieces:

  • metadata service: by default, it's running in the same JVM as Hive
  • metadata store: by default, it uses embedded Derby database backed by the local disk, which only allows one user to connect at a time.

Configuration

Precedence hierarchy of configuring Hive

  • The Hive set command (hive>)
  • The command line -hiveconf option
  • hive-site.xml and Hadoop site files 
    • core-site.xml
    • hdfs-site.xml
    • mapred-site.xml
    • yarn-site.xml
  • Hive default and Hadoop default
    • core-default.xml
    • hdfs-default.xml
    • mapred-default.xml
    • yarn-default.xml

Table

Create Table

  1. ----- MANAGED TABLE -----
  2. -- data is moved to Hive Warehouse
  3. CREATE TABLE table_name (
  4. field1 type1,
  5. field2 type2,
  6. field3 type3,
  7. ...
  8. )
  9. ----- EXTERNAL TABLE ----
  10. -- data remain as is, and not moved
  11. CREATE EXTERNAL TABLE table_name (
  12. field1 type1,
  13. field2 type2,
  14. field3 type3,
  15. ...
  16. )
  17. LOCALTION 'path'
  18. ----- STORAGE FORMAT ----
  19. -- default : TEXTFILE,
  20. -- row based binary: AVRO, SEQUENCEFILE,
  21. -- column based binary: PARQUET, RCFILE, ORCFILE
  22. STORED AS TEXTFILE
  23. ----- ROW FORMAT ------
  24. -- only needed for TEXTFILE: DELIMINATED, SERDE
  25. ROW FORMAT DELIMINATED
  26. FIELDS TERMINATED BY '\001'
  27. COLLECTION ITEMS TERMINATED BY '\002'
  28. MAP KEYS TERMINATED BY '\003'
  29. LINES TERMINATED BY '\n'
  30. ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
  31. WITH SERDEPROPERTIES (
  32. ....
  33. )
  34. ----- STORAGE HANDLER -----
  35. -- non-native storage, for example, HBase
  36. STORED BY

Load Data

  1. ----- LOAD DATA -----
  2. LOAD DATA
  3. LOCAL INPATH 'path to source file'
  4. -- replace existing table
  5. [OVERWRITE]
  6. -- copy file to $HIVE/warehouse/table_name/
  7. INTO TABLE table_name
  8. ----- IMPORT DATA -----
  9. -- at creation
  10. CREATE TABLE target_table (...)
  11. AS
  12. SELECT field1, field2 ...
  13. FROM source_table
  14. -- post creation
  15. INSERT [OVERWRITE] TABLE target_table
  16. [PARTITION (dt=value)]
  17. SELECT fiel1, field2 ...
  18. FROM source_table
  19. -- one source to multiple targets
  20. FROM source_table
  21. INSERT [OVERWRITE] TABLE target_table1
  22. SELECT ...
  23. INSERT [OVERWRITE] TABLE target_table2
  24. SELECT ...

Others

Partition and Bucket

A way of coarse-grained parts based on the value of a partition column, such as a date. Using partitions can make it faster to do queries on sliced data.

  1. --
  2. CREATE TABLE log (ts BIGINT, line STRING)
  3. PARTITIONED BY (dt STRING, country STRING)
  4. --
  5. LOAD DATA
  6. LOCAL INPATH 'path to source'
  7. INTO TABLE log
  8. PARTITION (dt='2001-01-01', country='GB')

 

Query

Sorting and Aggregation, MapReduce Scripts, Subqueries, Views, Joins

  • Inner Joins
  • Outer Joins
  • Semi Joins
  • Map Joins

User Defined Function (UDF)

UDF and UDAF.

转载于:https://my.oschina.net/u/3551123/blog/1483956

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/389939
推荐阅读
相关标签
  

闽ICP备14008679号