What is impala and how to install it?

I. Introduction to Impala

Cloudera Impala provides direct query and interactive SQL for the data you store in HDFS Apache Hadoop and HBase. In addition to using the same unified storage platform as Hive, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax). Impala also provides a familiar platform for batch or real-time query and unification.

Second, Impala installation

1. Installation requirements

(1) software requirements

Red Hat Enterprise Linux (RHEL)/CentOS 6.2 (64-bit)

CDH 4. 1.0 or higher.

stock

relational database

(2) Hardware requirements

In the process of Join query, data sets need to be loaded into memory for calculation, so installing Impalad requires high memory.

2, installation preparation

(1) operating system version view

& gt More/Wait/Questions

Centos version 6.2 (final version)

Kernel \ron an \m

(2) Machine preparation

10 . 28 . 169. 1 12mr 5

10 . 28 . 169. 1 13mr 6

10 . 28 . 169. 1 14mr 7

10 . 28 . 169. 1 15mr 8

Each machine installs roles.

mr5:NameNode、ResourceManager、SecondaryNameNode、Hive、impala-state-store

mr6、mr7、mr8:DataNode、NodeManager、impalad

(3) user preparation

Create a new user hadoop on each machine through ssh.

(4) Software preparation

Download from cloudera official website:

Hadoop:

Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz

Configuration unit:

hive-0 . 9 . 0-CD H4 . 1.2 . tar . gz

Impala:

Impala-0.3-1.p0.366.el6.x86 _ 64.rpm.

impala-debug info-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm

impala-server-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm

Impala-Shell-0.3-1.p0.366.el6.x86 _ 64.rpm.

Impala dependency package download:

4. Install hadoop-2.0.0-cdh4. 1.2.

(1) Installation package preparation

Hadoop users log on to the mr5 machine, upload Hadoop-2.0.0-CDH4.1.2.tar.gz to the /home/hadoop/ directory and decompress it:

tar zxvf Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz

(2) Configure environment variables

Modify. Bash_profile environment variable under hadoop user home directory /home/hadoop/ of mr5 machine:

export Java _ HOME =/usr/JDK 1 . 6 . 0 _ 30

export JAVA _ BIN = $ { JAVA _ HOME }/BIN

exportCLASSPATH=。 :$ JAVA _ HOME/lib/dt . jar:$ JAVA _ HOME/lib/tools . jar

export JAVA _ OPTS = "-djava . library . path =/usr/local/lib-server-xms 1024m-xmx 2048m-XX:MaxPermSize = 256m-djava . awt . headless = true-dsun . net . client . defaultreadtime out = 600

00-djmagick . system class loader = no-dnetworkaddress . cache . TTL = 300-dsun . net . inet addr . TTL = 300 "

export Hadoop _ HOME =/HOME/Hadoop/Hadoop-2 . 0 . 0-CDH 4. 1.2

export HADOOP _ PREFIX = $ HADOOP _ HOME

export HADOOP _ MAPRED _ HOME = $ { HADOOP _ HOME }

export HADOOP _ COMMON _ HOME = $ { HADOOP _ HOME }

export HADOOP _ HDFS _ HOME = $ { HADOOP _ HOME }

export HADOOP _ YARN _ HOME = $ { HADOOP _ HOME }

Export path = $ path: $ {Java _ home}/bin: $ {Hadoop _ home}/bin: $ {Hadoop _ home}/sbin.

ExportJAVA_HOME JAVA_BIN path classpath JAVA_OPTS

export HADOOP _ LIB = $ { HADOOP _ HOME }/LIB

Export HADOOP _ CONF _ directory =${HADOOP_HOME}/etc/hadoop

(3) Modify the configuration file

On the mr5 machine, a hadoop user logs in to modify the configuration file of hadoop (configuration file directory: Hadoop-2.0.0-CDH4.1.2/etc/Hadoop).

(1), slaves:

Add the following nodes

mr6

mr7

mr8

(2)、hadoop-env.sh:

Add the following environment variables

export Java _ HOME =/usr/JDK 1 . 6 . 0 _ 30

export Hadoop _ HOME =/HOME/Hadoop/Hadoop-2 . 0 . 0-CDH 4. 1.2

export HADOOP _ PREFIX = $ { HADOOP _ HOME }

Export Hadoop _ mapred _ home = $ {Hadoop _ home}

export HADOOP _ COMMON _ HOME = $ { HADOOP _ HOME }

export HADOOP _ HDFS _ HOME = $ { HADOOP _ HOME }

export HADOOP _ YARN _ HOME = $ { HADOOP _ HOME }

export PATH = $ PATH:$ { JAVA _ HOME }/bin:$ { HADOOP _ HOME }/bin:$ { HADOOP _ HOME }/sbin

ExportJAVA_HOME JAVA_BIN path classpath JAVA_OPTS

export HADOOP _ LIB = $ { HADOOP _ HOME }/LIB

Export HADOOP _ CONF _ directory =${HADOOP_HOME}/etc/hadoop

(3)、core-site.xml:

File system default name

hdfs://mr5:9000

The name of the default file system. The string "local" or host:port for DFS.

real

Io.native.lib available

real

hadoop.tmp.dir

/home/hadoop/tmp

The basis of other temporary directories.

(4)、hdfs-site.xml:

Name node

File: /home/hadoop/dfsdata/name

Determine where the DFS name node should store the name table in the local file system. If this is a comma-separated list of directories, the name table will be copied in all directories to achieve redundancy.

real

Data catalogue

File: /home/hadoop/dfsdata/data

Determine where DFS data nodes should store their data blocks in the local file system. If this is a comma-separated list of directories, the data will be stored in all named directories, usually on different devices. Directories that do not exist will be ignored.

real

dfs.replication

three

limit of authority

wrong

(5)、mapred-site.xml:

mapreduce.framework.name

story

mapreduce.job.tracker

hdfs://mr5:900 1

real

mapreduce.task.io.sort.mb

5 12

Mapreduce.task.io Sort. Factor

100

MapReduce . reduce . shuffle . parallel copies

50

mapreduce.cluster.temp.dir

File: /home/hadoop/mapreddata/system

real

mapreduce.cluster.local.dir

File: /home/hadoop/mapreddata/local

real

(6) Yarn environment sh:

Add the following environment variables

export Java _ HOME =/usr/JDK 1 . 6 . 0 _ 30

export Hadoop _ HOME =/HOME/Hadoop/Hadoop-2 . 0 . 0-CDH 4. 1.2

export HADOOP _ PREFIX = $ { HADOOP _ HOME }

export HADOOP _ MAPRED _ HOME = $ { HADOOP _ HOME }

export HADOOP _ COMMON _ HOME = $ { HADOOP _ HOME }

export HADOOP _ HDFS _ HOME = $ { HADOOP _ HOME }

export HADOOP _ YARN _ HOME = $ { HADOOP _ HOME }

export PATH = $ PATH:$ { JAVA _ HOME }/bin:$ { HADOOP _ HOME }/bin:$ { HADOOP _ HOME }/sbin

ExportJAVA_HOME JAVA_BIN path classpath JAVA_OPTS

export HADOOP _ LIB = $ { HADOOP _ HOME }/LIB

Export HADOOP _ CONF _ directory =${HADOOP_HOME}/etc/hadoop

(7)、yarn-site.xml:

yarn.resourcemanager.address

mr5:8080

yarn . resource manager . scheduler . address

mr5:808 1

yarn . resource manager . resource-tracker . address

mr5:8082

yarn.nodemanager.aux-services

Mapreduce. shuffle

yarn . node manager . aux-services . MapReduce . shuffle . class

org . Apache . Hadoop . map red . shuffle handler

yarn.nodemanager.local-dirs

File: /home/hadoop/nmdata/local

Local directory used by node manager

yarn.nodemanager.log-dirs

File: /home/hadoop/nmdata/log

Directory used by Node Manager as the log directory.

(4) Copy to other nodes

(1), after configuring steps 2 and 3 on mr5, compress hadoop-2.0.0-cdh4. 1.2.

RM Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz

tar zcff Hadoop-2 . 0 . 0-CDH 4. 1.2 . tar . gz Hadoop-2 . 0 . 0-CDH 4. 1.2

Then copy Hadoop-2.0.0-CDH4.1.2.tar.gz to mr6, mr7 and mr8 machines remotely.

scp/home/Hadoop/Hadoop-2 . 0 . 0-CD H4 . 1.2 . tar . gz Hadoop @ mr6:/home/Hadoop/

scp/home/Hadoop/Hadoop-2 . 0 . 0-CD H4 . 1.2 . tar . gz Hadoop @ mr7:/home/Hadoop/

scp/home/Hadoop/Hadoop-2 . 0 . 0-CD H4 . 1.2 . tar . gz Hadoop @ mr8:/home/Hadoop/

(2) Remote copy the file bash_profile of hadoop user configuration environment on mr5 machine to mr6, mr7 and mr8 machines.

scp/home/hadoop/。 bash _ profile Hadoop @ mr6:/home/Hadoop/

scp/home/hadoop/。 bash _ profile Hadoop @ mr7:/home/Hadoop/

scp/home/hadoop/。 bash _ profile Hadoop @ mr8:/home/Hadoop/

After the copy is completed, it will be executed in the /home/hadoop/ directory of mr5, mr6, mr7 and mr8 machines.

source.bash_profile

Make environment variables valid.

(5) Start hdfs and yarn.

After all the above steps are completed, log in to mr5 machine with hadoop user and execute them in turn:

Hdfsnamenode- format

start-dfs.sh

start-yarn.sh

View through jps command:

Mr5 successfully started the NameNode, ResourceManager and SecondaryNameNode processes.

Mr6, mr7 and mr8 successfully started the DataNode and NodeManager processes.

(6) verification success status

Check the running status of nodes and the execution of jobs by the following methods:

Browser access (the host needs to be configured locally)

5.hive-0.9.0-cdh4. 1.2 Installation

(1) Installation package preparation

Use hadoop users to upload hive-0.9.0-cdh4. 1.2 to the /home/hadoop/ directory of mr5 machine and extract it:

Tar zxvf configuration unit -0.9.0-cdh4. 1.2

(2) Configure environment variables

Add environment variables to. bash_profile:

export hive _ HOME =/HOME/Hadoop/hive-0 . 9 . 0-CD H4 . 1.2

export PATH = $ PATH:$ { JAVA _ HOME }/bin:$ { HADOOP _ HOME }/bin:$ { HADOOP _ HOME }/sbin:$ { HIVE _ HOME }/bin

Export HIVE _ CONF _ directory =$HIVE_HOME/conf

exportHIVE_LIB=$HIVE_HOME/lib

After adding, execute the following command to make the environment variable take effect:

..bash_profile

(3) Modify the configuration file

Modify the configuration file of the hive (configuration file directory: hive-0.9.0-CDH4.1.2/conf/).

Create a new hive-site.xml file in the hive-0.9.0-CDH4.1.2/conf/directory and add the following configuration information:

hive.metastore.local

real

javax.jdo.option.ConnectionURL

JDBC:MySQL:// 10 . 28 . 169.6 1:3306/hive _ impala? createDatabaseIfNotExist=true

javax . jdo . option . connection driver name

com.mysql.jdbc.Driver

Javax. jdo. option. connection user name

hadoop

Javax. jdo. option. connection password

123456

Hive.security.authorization Enable

wrong

hive . security . authorization . create table . owner . grants

all

hive.querylog.location

$ { user . home }/hive-logs/query log

(4) Verify the success status

After completing the above steps, verify that the installation of the hive is successful.

Execute hive on the mr5 command line and enter "show tables", and the following prompt will appear, indicating that the hive has been successfully installed:

& gt reserve

Hive & gt display table;

good

Time: 18.952 seconds

hive & gt

6.impala device

Description:

(1), and the following steps 1, 2, 3 and 4 are executed by the root user under mr5, mr6, mr7 and mr8 respectively.

(2) The following step 5 is executed under hadoop users.

(1) Install related software packages:

Install mysql-connector-java:

Yum installs mysql-connector-java.

Install bigtop

rpm-IVH bigtop-utils-0.4+300- 1 . CDH 4.0 . 1.p 0. 1.el6 . no arch . rpm

Install libevent

rpm-ivhlibevent- 1 . 4 . 13-4 . el6 . x86 _ 64 . rpm

If you need to install other related software packages, you can visit the following links:

/centos/6.3/OS/x86 _ 64/packages/。

(2) Install the rpm of impala and execute it separately.

rpm-IVH impala-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm

rpm-IVH impala-server-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm

rpm-IVH impala-debug info-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm

rpm-IVH impala-shell-0.3- 1 . p 0.366 . el6 . x86 _ 64 . rpm

(3) Find the installation directory of impala.

After completing steps 1 and 2, pass the following command:

Find/name impala

Output:

/usr/lib/debug/usr/lib/impala

/usr/lib/impala

/var/run/impala

/var/log/impala

/var/lib/ substitute /impala

/etc/default/impala

/etc/ substitute /impala

Find the installation directory of impala: /usr/lib/impala.

(4) Configure Impala

Create a conf under the Impala installation directory /usr/lib/impala, and copy the core-site.xml under the conf folder in hadoop, hdfs-site.xml and hive-site.xml under the conf folder in Hive to it.

Add the following to the core-site.xml file:

DFS . client . read . short current

real

DFS . client . read . short circuit . skip . checksum

wrong

Add the following contents to the hdfs-site.xml file of hadoop and impala to restart hadoop and impala:

DFS . datanode . data . directory . perm

755

DFS . block . local-path-access . user

hadoop

Enabled

real

(5) Start the service

(1). Use the following command to start Impala state storage in mr5:

& gtGLOG _ v = 1 nohup state stored-state _ store _ port = 24000 & amp;

If statestore starts normally, you can check it at /tmp/statestored.info, and if it is abnormal, you can check it at /tmp/statestored. Error locating error message.

(2) Start Impalad in mr6, mr7 and mr8 with the following command:

mr6:

& gtGLOG _ v = 1 nohup impala-state _ store _ host = mr5-nn = mr5-nn _ port = 9000-hostname = mr6-IP address = 10 . 28 . 169 . 1 13 & amp;

mr7:

& gtGLOG _ v = 1 nohup impala-state _ store _ host = mr5-nn = mr5-nn _ port = 9000-hostname = mr7-IP address = 10 . 28 . 169 . 1 14 & amp;

mr8:

& gtGLOG _ v = 1 nohup impala-state _ store _ host = mr5-nn = mr5-nn _ port = 9000-hostname = mr8-IP address = 10 . 28 . 169 . 1 15 & amp;

If impalad starts normally, you can check it at /tmp/impalad.info, and if it is abnormal, you can check it at /tmp/impossible. Error locating error message.

(6) Use the shell

Use Impala-Shell to start impala shell, connect impala hosts (mr6, mr7, mr8) respectively, refresh metadata, and then execute the shell command. The related commands are as follows (can be executed at any node):

& gt impala shell

[not connected] > Connection mr6:2 1000

[mr6:2 1000] > refresh yourself.

[mr6:2 1000] > contact information 7:2 1000.

[mr7:2 1000] > refresh yourself.

[mr7:2 1000] > contact information 8:2 1000.

[mr8:2 1000] > refresh yourself.

(7) Verification success status

Use Impala-Shell to start impala shell, connect impala hosts respectively, refresh metadata, and then execute shell commands. The related commands are as follows (can be executed at any node):

& gt impala shell

[not connected] > Connection mr6:2 1000

[mr6:2 1000] > refresh yourself.

[mr6:2 1000] > display the database.

System default value

[mr6:2 1000]>

The prompt message above appears, indicating that the installation is successful.