Cloudera Impala provides direct query and interactive SQL for the data you store in HDFS Apache Hadoop and HBase. In addition to using the same unified storage platform as Hive, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax). Impala also provides a familiar platform for batch or real-time query and unification.
Second, Impala installation
1. Installation requirements
(1) software requirements
Red Hat Enterprise Linux (RHEL)/CentOS 6.2 (64-bit)
CDH 4. 1.0 or higher.
stock
relational database
(2) Hardware requirements
In the process of Join query, data sets need to be loaded into memory for calculation, so installing Impalad requires high memory.
2, installation preparation
(1) operating system version view
& gt More/Wait/Questions
Centos version 6.2 (final version)
Kernel \ron an \m
(2) Machine preparation
10 . 28 . 169. 1 12mr 5
10 . 28 . 169. 1 13mr 6
10 . 28 . 169. 1 14mr 7
10 . 28 . 169. 1 15mr 8
Each machine installs roles.
mr5:NameNode、ResourceManager、SecondaryNameNode、Hive、impala-state-store
mr6、mr7、mr8:DataNode、NodeManager、impalad
(3) user preparation
Create a new user hadoop on each machine through ssh.
(4) Software preparation
Download from cloudera official website:
Hadoop:
Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz
Configuration unit:
hive-0 . 9 . 0-CD H4 . 1.2 . tar . gz
Impala:
Impala-0.3-1.p0.366.el6.x86 _ 64.rpm.
impala-debug info-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm
impala-server-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm
Impala-Shell-0.3-1.p0.366.el6.x86 _ 64.rpm.
Impala dependency package download:
4. Install hadoop-2.0.0-cdh4. 1.2.
(1) Installation package preparation
Hadoop users log on to the mr5 machine, upload Hadoop-2.0.0-CDH4.1.2.tar.gz to the /home/hadoop/ directory and decompress it:
tar zxvf Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz
(2) Configure environment variables
Modify. Bash_profile environment variable under hadoop user home directory /home/hadoop/ of mr5 machine:
export Java _ HOME =/usr/JDK 1 . 6 . 0 _ 30
export JAVA _ BIN = $ { JAVA _ HOME }/BIN
exportCLASSPATH=。 :$ JAVA _ HOME/lib/dt . jar:$ JAVA _ HOME/lib/tools . jar
export JAVA _ OPTS = "-djava . library . path =/usr/local/lib-server-xms 1024m-xmx 2048m-XX:MaxPermSize = 256m-djava . awt . headless = true-dsun . net . client . defaultreadtime out = 600
00-djmagick . system class loader = no-dnetworkaddress . cache . TTL = 300-dsun . net . inet addr . TTL = 300 "
export Hadoop _ HOME =/HOME/Hadoop/Hadoop-2 . 0 . 0-CDH 4. 1.2
export HADOOP _ PREFIX = $ HADOOP _ HOME
export HADOOP _ MAPRED _ HOME = $ { HADOOP _ HOME }
export HADOOP _ COMMON _ HOME = $ { HADOOP _ HOME }
export HADOOP _ HDFS _ HOME = $ { HADOOP _ HOME }
export HADOOP _ YARN _ HOME = $ { HADOOP _ HOME }
Export path = $ path: $ {Java _ home}/bin: $ {Hadoop _ home}/bin: $ {Hadoop _ home}/sbin.
ExportJAVA_HOME JAVA_BIN path classpath JAVA_OPTS
export HADOOP _ LIB = $ { HADOOP _ HOME }/LIB
Export HADOOP _ CONF _ directory =${HADOOP_HOME}/etc/hadoop
(3) Modify the configuration file
On the mr5 machine, a hadoop user logs in to modify the configuration file of hadoop (configuration file directory: Hadoop-2.0.0-CDH4.1.2/etc/Hadoop).
(1), slaves:
Add the following nodes
mr6
mr7
mr8
(2)、hadoop-env.sh:
Add the following environment variables
export Java _ HOME =/usr/JDK 1 . 6 . 0 _ 30
export Hadoop _ HOME =/HOME/Hadoop/Hadoop-2 . 0 . 0-CDH 4. 1.2
export HADOOP _ PREFIX = $ { HADOOP _ HOME }
Export Hadoop _ mapred _ home = $ {Hadoop _ home}
export HADOOP _ COMMON _ HOME = $ { HADOOP _ HOME }
export HADOOP _ HDFS _ HOME = $ { HADOOP _ HOME }
export HADOOP _ YARN _ HOME = $ { HADOOP _ HOME }
export PATH = $ PATH:$ { JAVA _ HOME }/bin:$ { HADOOP _ HOME }/bin:$ { HADOOP _ HOME }/sbin
ExportJAVA_HOME JAVA_BIN path classpath JAVA_OPTS
export HADOOP _ LIB = $ { HADOOP _ HOME }/LIB
Export HADOOP _ CONF _ directory =${HADOOP_HOME}/etc/hadoop
(3)、core-site.xml:
File system default name
hdfs://mr5:9000
The name of the default file system. The string "local" or host:port for DFS.
real
Io.native.lib available
real
hadoop.tmp.dir
/home/hadoop/tmp
The basis of other temporary directories.
(4)、hdfs-site.xml:
Name node
File: /home/hadoop/dfsdata/name
Determine where the DFS name node should store the name table in the local file system. If this is a comma-separated list of directories, the name table will be copied in all directories to achieve redundancy.
real
Data catalogue
File: /home/hadoop/dfsdata/data
Determine where DFS data nodes should store their data blocks in the local file system. If this is a comma-separated list of directories, the data will be stored in all named directories, usually on different devices. Directories that do not exist will be ignored.
real
dfs.replication
three
limit of authority
wrong
(5)、mapred-site.xml:
mapreduce.framework.name
story
mapreduce.job.tracker
hdfs://mr5:900 1
real
mapreduce.task.io.sort.mb
5 12
Mapreduce.task.io Sort. Factor
100
MapReduce . reduce . shuffle . parallel copies
50
mapreduce.cluster.temp.dir
File: /home/hadoop/mapreddata/system
real
mapreduce.cluster.local.dir
File: /home/hadoop/mapreddata/local
real
(6) Yarn environment sh:
Add the following environment variables
export Java _ HOME =/usr/JDK 1 . 6 . 0 _ 30
export Hadoop _ HOME =/HOME/Hadoop/Hadoop-2 . 0 . 0-CDH 4. 1.2
export HADOOP _ PREFIX = $ { HADOOP _ HOME }
export HADOOP _ MAPRED _ HOME = $ { HADOOP _ HOME }
export HADOOP _ COMMON _ HOME = $ { HADOOP _ HOME }
export HADOOP _ HDFS _ HOME = $ { HADOOP _ HOME }
export HADOOP _ YARN _ HOME = $ { HADOOP _ HOME }
export PATH = $ PATH:$ { JAVA _ HOME }/bin:$ { HADOOP _ HOME }/bin:$ { HADOOP _ HOME }/sbin
ExportJAVA_HOME JAVA_BIN path classpath JAVA_OPTS
export HADOOP _ LIB = $ { HADOOP _ HOME }/LIB
Export HADOOP _ CONF _ directory =${HADOOP_HOME}/etc/hadoop
(7)、yarn-site.xml:
yarn.resourcemanager.address
mr5:8080
yarn . resource manager . scheduler . address
mr5:808 1
yarn . resource manager . resource-tracker . address
mr5:8082
yarn.nodemanager.aux-services
Mapreduce. shuffle
yarn . node manager . aux-services . MapReduce . shuffle . class
org . Apache . Hadoop . map red . shuffle handler
yarn.nodemanager.local-dirs
File: /home/hadoop/nmdata/local
Local directory used by node manager
yarn.nodemanager.log-dirs
File: /home/hadoop/nmdata/log
Directory used by Node Manager as the log directory.
(4) Copy to other nodes
(1), after configuring steps 2 and 3 on mr5, compress hadoop-2.0.0-cdh4. 1.2.
RM Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz
tar zcff Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz Hadoop-2 . 0 . 0-CDH 4. 1.2
Then copy Hadoop-2.0.0-CDH4.1.2.tar.gz to mr6, mr7 and mr8 machines remotely.
scp/home/Hadoop/Hadoop-2 . 0 . 0-CD H4 . 1.2 . tar . gz Hadoop @ mr6:/home/Hadoop/
scp/home/Hadoop/Hadoop-2 . 0 . 0-CD H4 . 1.2 . tar . gz Hadoop @ mr7:/home/Hadoop/
scp/home/Hadoop/Hadoop-2 . 0 . 0-CD H4 . 1.2 . tar . gz Hadoop @ mr8:/home/Hadoop/
(2) Remote copy the file bash_profile of hadoop user configuration environment on mr5 machine to mr6, mr7 and mr8 machines.
scp/home/hadoop/。 bash _ profile Hadoop @ mr6:/home/Hadoop/
scp/home/hadoop/。 bash _ profile Hadoop @ mr7:/home/Hadoop/
scp/home/hadoop/。 bash _ profile Hadoop @ mr8:/home/Hadoop/
After the copy is completed, it will be executed in the /home/hadoop/ directory of mr5, mr6, mr7 and mr8 machines.
source.bash_profile
Make environment variables valid.
(5) Start hdfs and yarn.
After all the above steps are completed, log in to mr5 machine with hadoop user and execute them in turn:
Hdfsnamenode- format
start-dfs.sh
start-yarn.sh
View through jps command:
Mr5 successfully started the NameNode, ResourceManager and SecondaryNameNode processes.
Mr6, mr7 and mr8 successfully started the DataNode and NodeManager processes.
(6) verification success status
Check the running status of nodes and the execution of jobs by the following methods:
Browser access (the host needs to be configured locally)
5.hive-0.9.0-cdh4. 1.2 Installation
(1) Installation package preparation
Use hadoop users to upload hive-0.9.0-cdh4. 1.2 to the /home/hadoop/ directory of mr5 machine and extract it:
Tar zxvf configuration unit -0.9.0-cdh4. 1.2
(2) Configure environment variables
Add environment variables to. bash_profile:
export hive _ HOME =/HOME/Hadoop/hive-0 . 9 . 0-CD H4 . 1.2
export PATH = $ PATH:$ { JAVA _ HOME }/bin:$ { HADOOP _ HOME }/bin:$ { HADOOP _ HOME }/sbin:$ { HIVE _ HOME }/bin
Export HIVE _ CONF _ directory =$HIVE_HOME/conf
exportHIVE_LIB=$HIVE_HOME/lib
After adding, execute the following command to make the environment variable take effect:
..bash_profile
(3) Modify the configuration file
Modify the configuration file of the hive (configuration file directory: hive-0.9.0-CDH4.1.2/conf/).
Create a new hive-site.xml file in the hive-0.9.0-CDH4.1.2/conf/directory and add the following configuration information:
hive.metastore.local
real
javax.jdo.option.ConnectionURL
JDBC:MySQL:// 10 . 28 . 169.6 1:3306/hive _ impala? createDatabaseIfNotExist=true
javax . jdo . option . connection driver name
com.mysql.jdbc.Driver
Javax. jdo. option. connection user name
hadoop
Javax. jdo. option. connection password
123456
Hive.security.authorization Enable
wrong
hive . security . authorization . create table . owner . grants
all
hive.querylog.location
$ { user . home }/hive-logs/query log
(4) Verify the success status
After completing the above steps, verify that the installation of the hive is successful.
Execute hive on the mr5 command line and enter "show tables", and the following prompt will appear, indicating that the hive has been successfully installed:
& gt reserve
Hive & gt display table;
good
Time: 18.952 seconds
hive & gt
6.impala device
Description:
(1), and the following steps 1, 2, 3 and 4 are executed by the root user under mr5, mr6, mr7 and mr8 respectively.
(2) The following step 5 is executed under hadoop users.
(1) Install related software packages:
Install mysql-connector-java:
Yum installs mysql-connector-java.
Install bigtop
rpm-IVH bigtop-utils-0.4+300- 1 . CDH 4.0 . 1.p 0. 1.el6 . no arch . rpm
Install libevent
rpm-ivhlibevent- 1 . 4 . 13-4 . el6 . x86 _ 64 . rpm
If you need to install other related software packages, you can visit the following links:
/centos/6.3/OS/x86 _ 64/packages/。
(2) Install the rpm of impala and execute it separately.
rpm-IVH impala-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm
rpm-IVH impala-server-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm
rpm-IVH impala-debug info-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm
rpm-IVH impala-shell-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm
(3) Find the installation directory of impala.
After completing steps 1 and 2, pass the following command:
Find/name impala
Output:
/usr/lib/debug/usr/lib/impala
/usr/lib/impala
/var/run/impala
/var/log/impala
/var/lib/ substitute /impala
/etc/default/impala
/etc/ substitute /impala
Find the installation directory of impala: /usr/lib/impala.
(4) Configure Impala
Create a conf under the Impala installation directory /usr/lib/impala, and copy the core-site.xml under the conf folder in hadoop, hdfs-site.xml and hive-site.xml under the conf folder in Hive to it.
Add the following to the core-site.xml file:
DFS . client . read . short current
real
DFS . client . read . short circuit . skip . checksum
wrong
Add the following contents to the hdfs-site.xml file of hadoop and impala to restart hadoop and impala:
DFS . datanode . data . directory . perm
755
DFS . block . local-path-access . user
hadoop
Enabled
real
(5) Start the service
(1). Use the following command to start Impala state storage in mr5:
& gtGLOG _ v = 1 nohup state stored-state _ store _ port = 24000 & amp;
If statestore starts normally, you can check it at /tmp/statestored.info, and if it is abnormal, you can check it at /tmp/statestored. Error locating error message.
(2) Start Impalad in mr6, mr7 and mr8 with the following command:
mr6:
& gtGLOG _ v = 1 nohup impala-state _ store _ host = mr5-nn = mr5-nn _ port = 9000-hostname = mr6-IP address = 10 . 28 . 169 . 1 13 & amp;
mr7:
& gtGLOG _ v = 1 nohup impala-state _ store _ host = mr5-nn = mr5-nn _ port = 9000-hostname = mr7-IP address = 10 . 28 . 169 . 1 14 & amp;
mr8:
& gtGLOG _ v = 1 nohup impala-state _ store _ host = mr5-nn = mr5-nn _ port = 9000-hostname = mr8-IP address = 10 . 28 . 169 . 1 15 & amp;
If impalad starts normally, you can check it at /tmp/impalad.info, and if it is abnormal, you can check it at /tmp/impossible. Error locating error message.
(6) Use the shell
Use Impala-Shell to start impala shell, connect impala hosts (mr6, mr7, mr8) respectively, refresh metadata, and then execute the shell command. The related commands are as follows (can be executed at any node):
& gt impala shell
[not connected] > Connection mr6:2 1000
[mr6:2 1000] > refresh yourself.
[mr6:2 1000] > contact information 7:2 1000.
[mr7:2 1000] > refresh yourself.
[mr7:2 1000] > contact information 8:2 1000.
[mr8:2 1000] > refresh yourself.
(7) Verification success status
Use Impala-Shell to start impala shell, connect impala hosts respectively, refresh metadata, and then execute shell commands. The related commands are as follows (can be executed at any node):
& gt impala shell
[not connected] > Connection mr6:2 1000
[mr6:2 1000] > refresh yourself.
[mr6:2 1000] > display the database.
System default value
[mr6:2 1000]>
The prompt message above appears, indicating that the installation is successful.