你好赵伟

这个屌丝很懒，什么也没留下！

热门标签

Hive编程指南学习03_hive编程指南-学习笔记(三) 数据操作

作者：你好赵伟 | 2024-06-16 04:27:48

踩

hive编程指南-学习笔记(三) 数据操作

新建数据表employees

create table employees(
name string,
salary float,
subordinates array<string>,
deductions map<string,float>,
address struct<street:string,city:string,state:string,zip:int>
)
partitioned by (country string,state string);
1
2
3
4
5
6
7
8

向表中装载数据

load data local inpath '/opt/datafiles/employees.txt' overwrite into table employees partition (country='US',state='CA');
1

partition子句分区表，用户必须为每个分区的键指定一个值。
数据将会存放到这个文件夹
/user/hive/warehouse/learnhive.db/employees/country=US/state=CA
1
2
3

select name,salary from employees;
+-------------------+-----------+
|       name        |  salary   |
+-------------------+-----------+
| John Doe          | 100000.0  |
| Mary Smith        | 80000.0   |
| Todd Jones        | 70000.0   |
| Bill King         | 60000.0   |
| Boss Man          | 200000.0  |
| Fred Finance      | 150000.0  |
| Stacy Accountant  | 60000.0   |
+-------------------+-----------+
1
2
3
4
5
6
7
8
9
10
11
12

当用户选择的列是集合数据类型时，Hive会使用JSON语法应用于输出。subordinates列为一个数组，注意：集合的字符串元素是加上引号的，而基本数据类型string的列值是不加引号的。

select name,subordinates from employees;
1

在这里插入图片描述

deductions列是一个MAP

select name,deductions from employees;
1

在这里插入图片描述

address列是一个struct

select name,address from employees;
1

在这里插入图片描述

引用集合类型中的元素

引用数组（选择数组subordinates的第2个元素）

select name,subordinates[1] from employees;
1

在这里插入图片描述

引用一个不存在的元素将会返回NULL。同时提取出的string数据类型的值不再加引号

引用MAP元素

select name,deductions['Insurance'] from employees;
1

在这里插入图片描述

引用struct元素

select name,address.city from employees;
1

在这里插入图片描述

新建数据表stocks

create table stocks(
exchange_e string,
symbol string,
ymd string,
price_open float,
price_high float,
price_low float,
price_close float,
volume int,
price_adj_close float)
row format delimited fields terminated by ',';
1
2
3
4
5
6
7
8
9
10
11

装载数据

load data local inpath '/opt/datafiles/stocks.csv' overwrite into table stocks;
1

使用正则表达式来指定列

首先得执行这条语句

set hive.support.quoted.identifiers=none;
1

select symbol,`price.*` from stocks limit 5;
1

在这里插入图片描述

使用列值进行计算

select upper(name),salary,deductions["Federal Taxes"],round(salary*(1-deductions["Federal Taxes"])) from employees;
1

在这里插入图片描述

算术运算符

+,加
-,减
*,乘
/,除
%,求余
&,按位取与
|,按位取或
^,按位取亦或
~,按位取反
1
2
3
4
5
6
7
8
9

limit语句

select name,salary from employees limit 2;
1

在这里插入图片描述

select name,salary from employees limit 1,2;
查询从第1列开始，返回2列
1
2

在这里插入图片描述

列别名

select name as n,salary from employees;
1

在这里插入图片描述

DML数据操作

数据导入

load

load data [local] inpath '数据的path' [overwrite] into table student [partition (partcol1=val1,…)];

（1）load data:表示加载数据
（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表
（3）inpath:表示加载数据的路径
（4）overwrite:表示覆盖表中已有数据，否则表示追加
（5）into table:表示加载到哪张表
（6）student:表示具体的表
（7）partition:表示上传到指定分区
1
2
3
4
5
6
7
8
9

insert

insert into或overwrite table  student_par values(1,'wangwu'),(2,'zhaoliu');

insert into：以追加数据的方式插入到表或分区，原有数据不会删除
insert overwrite：会覆盖表中已存在的数据

insert overwrite table student_par select id, name from student ; 
(根据单张表查询结果插入)
1
2
3
4
5
6
7

根据查询结果创建表

create table if not exists student3 as select id, name from student;
1

创建表时通过Location指定加载数据路径

create external table if not exists student5(
              id int, name string
              )
              row format delimited fields terminated by '\t'
              location '/student;
1
2
3
4
5

数据导出

Insert导出

1）将查询的结果导出到本地

insert overwrite local directory '/opt/outfiles' select * from employees;
1

在这里插入图片描述

2）将查询的结果格式化导出到本地

insert overwrite local directory '/opt/outfiles' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from employees;

1
2
3
4

在这里插入图片描述

3）将查询的结果导出到HDFS上(没有local)

insert overwrite directory '/opt/outfiles' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from employees;
1
2
3

在这里插入图片描述

Hadoop命令导出到本地

dfs -get /user/hive/warehouse/learnhive.db/test2/test.txt /opt/outfiles/out_test.txt;
1

在这里插入图片描述

Hive Shell 命令导出

[xwk@hadoop102 outfiles]$ cd /opt/software/hive/
[xwk@hadoop102 hive]$ ./bin/hive -e 'select * from learnhive.test2;' > /opt/outfiles/out_test01.txt;
1
2

在这里插入图片描述

Export导出到HDFS上

export和import主要用于两个Hadoop平台集群之间Hive表迁移。

export table learnhive.test2 to '/opt/outfiles';
1

在这里插入图片描述

Import数据到指定Hive表中

注意：先用export导出后，再将数据导入。
先删除数据
在这里插入图片描述

import table test2 from '/opt/outfiles';
1

在这里插入图片描述

清除表中数据（Truncate）

注意：Truncate只能删除管理表，不能删除外部表中数据

truncate table test2;
select * from test2;
+-------------+----------------+-----------------+----------------+
| test2.name  | test2.friends  | test2.children  | test2.address  |
+-------------+----------------+-----------------+----------------+
+-------------+----------------+-----------------+----------------+
1
2
3
4
5
6

本文内容由网友自发贡献，转载请注明出处：【wpsshop博客】

Hive编程指南学习03_hive编程指南-学习笔记(三) 数据操作

文章目录

新建数据表employees

向表中装载数据

引用集合类型中的元素

引用数组（选择数组subordinates的第2个元素）

引用MAP元素

引用struct元素

新建数据表stocks

装载数据

DML数据操作

数据导入

load

insert

数据导出

Insert导出

1）将查询的结果导出到本地

2）将查询的结果格式化导出到本地

3）将查询的结果导出到HDFS上(没有local)

Hadoop命令导出到本地

Hive Shell 命令导出

Export导出到HDFS上

Import数据到指定Hive表中

清除表中数据（Truncate）