代码探险家

这个屌丝很懒，什么也没留下！

热门标签

实验5Hive查询_(1)创建emp表,包含empno(int)、ename(string)、gender(string

作者：代码探险家 | 2024-08-12 11:52:54

踩

(1)创建emp表,包含empno(int)、ename(string)、gender(string)、bday(string)、ar

实验5Hive查询

实验目的及要求

了解Hive的SQL基本语法。
掌握Hive的多种查询方式。

实验系统环境及版本

Linux Ubuntu 20.04
JDK1.8
Hadoop3.1.0
MySQL8.0.28
Hive3.1.2

实验任务

掌握Hive的全表查询、别名查询、限定查询与多表联合查询。
掌握Hive的多表插入、多目录输出。
使用Shell脚本查看Hive中的表。

实验内容及步骤

输入jps检查Hadoop相关进程，是否已经启动。若未启动，切换到/usr/local/hadoop目录下，启动Hadoop。

jps

cd /usr/local/hadoop

./sbin/start-all.sh

在这里插入图片描述

执行启动命令，开启MySQL数据库，用于存放Hive的元数据信息。

sudo service mysql start
在这里插入图片描述

在终端命令行界面直接输入hive命令，启动hive命令：

hive

在这里插入图片描述

打开一个新的终端命令行界面，切换到/opt/datas目录下：

cd /opt/datas

在这里插入图片描述

使用WinSCP软件讲Windows系统中的数据文件emp.txt、dept.txt导入Linux本地的/opt/datas目录下。

在hive命令行，创建emp表，包含emppno(Int)、ename(String)、gender(String)、bday(String)、area(String)、score(Double)、deptno(Int)、scholarship(Double)8个字段,以’\t’为分隔符。

create table if not exists emp5(

empno int,ename string,gender string,bday string,area string,score double, deptno int,scholarship double)

row format delimited fields terminated by ‘\t’;

在这里插入图片描述

创建dept表,包含deptno(Int)、dname(String)、buildingsno(Int)3个字段，以’\t’为分隔符。

create table if not exists dept5(

deptno int,dname string,buildingsno int)

row format delimited fields terminated by ‘\t’;

在这里插入图片描述

将本地/opt/datas/目录下的数据文件emp.txt导入Hive的emp表中，数据文件dept.txt导入Hive的dept表中：

load data local inpath ‘/opt/datas/emp.txt’ into table emp5;

load data local inpath ‘/opt/datas/dept.txt’ into table dept5;

在这里插入图片描述

更改表emp和dept的属性，设置汉字编码，否则汉字出现乱码。注意：GBK必须大写。

alter table emp5

set serdeproperties(‘serialization.encoding’=‘utf-8’);

alter table dept5

set serdeproperties(‘serialization.encoding’=‘utf-8’);
在这里插入图片描述

全表查询。查询emp表中的全部字段：

select * from emp5;

在这里插入图片描述

别名查询，查询表emp中的empno和bday字段：

select e.empno,e.bday from emp5 e;

在这里插入图片描述

限定查询（Where）。查询emp表中deptno=100的empno：

select empno from emp5 where deptno=100;

在这里插入图片描述

两表联合查询。通过deptno连接表emp和表dept，查询表emp的bday字段和表dept的dname字段：

select e.bday,d.dname from emp5 e,dept5 d where e.deptno=d.deptno;

在这里插入图片描述

多表插入。多表插入是指在同一条语句中，把读取的同一份数据插入不同的表中。使用emp表作为插入表，创建emp1和emp2两表作为被插入表。

① 创建表emp1和表emp2：

create table emp1 like emp5;

create table emp2 like emp5;

在这里插入图片描述

② 将emp表中数据插入表emp1和表emp2：

from emp5

insert overwrite table emp1 select *

insert overwrite table emp2 select *;

在这里插入图片描述

③ 查询表emp1和表emp2中的数据信息：

select * from emp1;

select * from emp2;

在这里插入图片描述

删除/opt/datas目录下的所有output目录，为后续操作做准备.

rm -r /opt/datas/output*

在这里插入图片描述

多目录输出文件，将同一表数据输出到本地不同目录中。将emp表数据导出到本地/opt/datas/output1和/opt/datas/output2目录中。

from emp5

insert overwrite local directory ‘/opt/datas/output1’

row format delimited fields terminated by ‘\t’

select *

insert overwrite local directory ‘/opt/datas/output2’

row format delimited fields terminated by ‘\t’

select *;

在这里插入图片描述

切换到本地/opt/datas/output1目录中，查询输出文件。

cd /opt/datas/output1

cat 000000_0

在这里插入图片描述

桶表抽样查询。

查询桶表student_b中的数据，抽取桶1中的数据：

select id,name from student_b tablesample(bucket 1 out of 4 on name);

在这里插入图片描述

查询桶表student_b中的数据，抽取桶1和桶3中的数据：

select id,name from student_b tablesample(bucket 1 out of 2 on name);

在这里插入图片描述

查询桶表student_b中的数据，随机抽取4个桶中的数据：

select id,name from student_b tablesample(bucket 1 out of 4 on rand());
在这里插入图片描述

按数据量百分比抽样查询：

select ename, bday, score from emp5 tablesample(10 percent);

在这里插入图片描述

按数据大小（1B）抽样查询：

select ename, bday, score from emp5 tablesample(1b);

在这里插入图片描述

按数据大小（1KB）抽样查询：

select ename, bday, score from emp5 tablesample(1k);

在这里插入图片描述

按数据行数抽样查询：

select ename, bday, score from emp5 tablesample(8 rows);
在这里插入图片描述

使用Shell脚本调用Hive查询语句。

① 切换到本地目录/opt/datas下，使用vim命令编写一个Shell脚本，命名为alltable，实现查询Hive中的所有表。

cd /opt/datas

vim alltable

在这里插入图片描述

② 在alltable中，输入以下脚本，并保存退出：

hive -e ‘show tables;’

在这里插入图片描述

③ 赋予alltable具有执行权限：

chmod +x alltable

在这里插入图片描述

④ 执行Shell脚本：

./alltable

在这里插入图片描述

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/代码探险家/article/detail/969634