赞
踩
总计写在前面,请使用公平调度器!!!
我根据时间节点来梳理一下发生了什么。
大概在几年前,搭建的数仓集群中。数据量不大,做离线一个晚上就能轻松调度完,那时候使用的hadoop自带的调度器,容量调度器。但默认配置没有改,就会发生什么!
root主leaf下面只有default。虽然是容量调度器,但运行起来就是一个fifo。
过了几个月,随着调度增加。发现不对劲了!我集群的资源没有用上啊。就增加了调度队列,比如说root主leaf下有个hive主leaf下面两个队列hive1和hive2,还有个kylin队列,flink队列等。。。这些都是在$HADOOP_HOME/etc/hadoop/capacity-scheduler.xml 里面配置的,我不过多描述了,不会配置的朋友网上搜索一下。
大概过了一年以后,我无论怎么优化,优化数据库,采集,代码等。发现不尽人意,资源还是没完全利用起来,这个时候我发现了容量调度器本身就存在缺陷。就开始启用公平调度器。在简单的配置下,测试了一下。我测试的hadoop自带的hadoop-mapreduce-examples-2.7.2.jar,这可是hadoop自带的程序。几个窗口同时运行,发现速度极快而且几乎同时完成。于是,就改用公平调度器!
修改yarn-site.xml:
添加如下:
- <property>
- <name>yarn.resourcemanager.scheduler.class</name>
- <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
- <description>set fair sch</description>
- </property>
然后在$HADOOP_HOME/etc/hadoop下面新建fair-scheduler.xml文件,里面的内容为,可以自己修改:
- <?xml version="1.0"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!--
- This file contains pool and user allocations for the Fair Scheduler.
- Its format is explained in the Fair Scheduler documentation at
- http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.
- The documentation also includes a sample config file.
- -->
-
- <allocations>
- <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
-
- <queue name="default">
- <weight>50</weight>
- </queue>
-
- <queue name="kylin">
- <weight>50</weight>
- </queue>
-
- <queue name="hive1">
- <weight>50</weight>
- </queue>
-
- <queuePlacementPolicy>
- <rule name="specified" create="false" />
- <rule name="primaryGroup" create="false" />
- <rule name="default" queue="default" />
- </queuePlacementPolicy>
- </allocations>

重启yarn。
完成,可以去hadoop页面查看!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。