Tag - hadoop

环境搭建 hadoop cdh    2019-05-06 06:51:53    41    0    0

Before You Begin

SSH Configuration

vi /etc/hosts
x.x.x.x linode01
x.x.x.x linode02
x.x.x.x linode03

hostnamectl set-hostname linode01

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub linode01
ssh-copy-id -i ~/.ssh/id_rsa.pub linode02
ssh-copy-id -i ~/.ssh/id_rsa.pub linode03

Disable Firewall

systemctl stop firewalld.service
systemctl disable firewalld.service

Dependency

yum install psmisc -y
yum install libxslt-devel -y
yum install chkconfig bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs redhat-lsb -y
yum install python-psycopg2 -y 
yum install snappy snappy-devel  -y

#NFS
yum install rpcbind -y
service rpcbind start

Install the Oracle JDK

cd /opt/
sudo wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn/java/jdk/7u80-b15/jdk-7u80-linux-x64
环境搭建 hadoop cdh    2019-05-06 06:51:53    26    0    0

Before You Begin

SSH Configuration

vi /etc/hosts
x.x.x.x linode01
x.x.x.x linode02
x.x.x.x linode03

hostnamectl set-hostname linode01

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub linode01
ssh-copy-id -i ~/.ssh/id_rsa.pub linode02
ssh-copy-id -i ~/.ssh/id_rsa.pub linode03

Disable Firewall

systemctl stop firewalld.service
systemctl disable firewalld.service

Dependency

yum install psmisc -y
yum install libxslt-devel -y
yum install chkconfig bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs redhat-lsb -y
yum install python-psycopg2 -y 
yum install snappy snappy-devel  -y

#NFS
yum install rpcbind -y
service rpcbind start

Install and Configure External Databases

sudo yum install postgresql-server postgresql -y
sudo su - postgres
initdb -D /var/lib/pgsql/data

#remote access
vi /var/lib/pgsql/data/postgresql.conf
listen_addresses ='*'

vi /var/lib/pgsql/data/pg_h
环境搭建 hadoop    2019-05-06 06:51:53    189    0    0

Overview

The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client’s local file system. Currently NFS Gateway supports and enables the following usage patterns:

  • Users can browse the HDFS file system through their local file system on NFSv3 client compatible operating systems.
  • Users can download files from the the HDFS file system on to their local file system.
  • Users can upload files from their local file system directly to the HDFS file system.
  • Users can stream data directly to HDFS through the mount point. File append is supported but random write is not supported.

The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory. The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.

Configuration

in core-site.xml of the namenode, the following must be set( in non-secure mode)

<property>
  <name>hadoop.proxyuser.nfs
环境搭建 hadoop    2019-05-06 06:51:53    35    0    0

安装Jdk

参考Jdk安装

配置Ssh免密码登录

参考SSH免密码登陆

安装配置Hadoop

下载Hadoop

wget http://mirror.csclub.uwaterloo.ca/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
tar zxvf hadoop-2.6.4.tar.gz 
cd hadoop-2.6.4/

修改配置文件

修改hadoop-env.sh(hadoop-2.6.4/etc/hadoop/)

vi hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_91

修改yarn-env.sh

vi yarn-env.sh
export JAVA_HOME=/opt/jdk1.8.0_91

修改配置文件slaves文件

vi slaves
node2
node3

修改配置文件core-site.xml

mkdir ~/hadoop-2.6.4/data
vi core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/data</value>
        <description>
            A base for other temporary directories.
        </description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:9000</value> 
        <description>
            The name of the default file system. 
            A URI whose scheme and authority determine the FileSystem imp
环境搭建 hadoop    2019-05-06 06:51:53    32    0    0

Install Scala

wget http://downloads.lightbend.com/scala/2.10.6/scala-2.10.6.tgz
tar zxvf scala-2.10.6.tgz

vi /etc/profile

export SCALA_HOME=/home/hadoop/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin

source /etc/profile
scala -version

Install Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz
tar zxvf spark-1.6.2-bin-hadoop2.6.tgz
cd spark-1.6.2-bin-hadoop2.6/conf
cp spark-env.sh.template spark-env.sh

vi spark-env.sh

export JAVA_HOME=/opt/jdk1.8.0_91
export SCALA_HOME=/home/hadoop/scala-2.10.6
export HADOOP_HOME=/home/hadoop/hadoop-2.6.4
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export YARN_HOME=/home/hadoop/hadoop-2.6.4
    export YARN_CONF_DIR=${YARN_HOME}/etc/hadoop
export SPARK_HOME=/home/hadoop/spark-1.6.2-bin-hadoop2.6
export SPARK_LOCAL_DIRS=/home/hadoop/spark-1.6.2-bin-hadoop2.6
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
    expo
环境搭建 hadoop    2019-05-06 06:51:53    16    0    0

Config Metastore

Install the Hive metastore somewhere in your cluster, see hive installation.

As part of this process, you configure the Hive metastore to use an external database as a metastore. Impala uses this same database for its own table metadata. You can choose either a MySQL or PostgreSQL database as the metastore.

It is recommends setting up a Hive metastore service rather than connecting directly to the metastore database; this configuration is required when running Impala under CDH 4.1. Make sure the /etc/impala/conf/hive-site.xml file contains the following setting, substituting the appropriate hostname for metastore_server_host:

<property>
<name>hive.metastore.uris</name>
<value>thrift://metastore_server_host:9083</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>3600</value>
<description>MetaStore Client socket timeout in seconds</description>
</property>

Install Imp

环境搭建 hadoop    2019-05-06 06:51:53    390    0    0

By default Hadoop HTTP web-consoles (JobTracker, NameNode, TaskTrackers and DataNodes) allow access without any form of authentication.

The next section describes how to configure Hadoop HTTP web-consoles to require user authentication.

Configuration

The following properties should be in the core-site.xml of all the nodes in the cluster.

 <property>
      <name>hadoop.http.filter.initializers</name>
      <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
      <description>
                    Authentication for Hadoop HTTP web-consoles
                    add to this property the org.apache.hadoop.security.AuthenticationFilterInitializer initializer class.
      </description>
    </property>
    <property>
      <name>hadoop.http.authentication.type</name>
      <value>pers.louyj.utils.hadoop.auth.ext.StandardAuthenticationHandler</value>
      <description>
                    Defines authentic
环境搭建 hadoop    2019-05-06 06:51:53    35    0    0

Install Hive

wget http://mirror.bit.edu.cn/apache/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz
tar zxvf apache-hive-2.1.0-bin.tar.gz
mv apache-hive-2.1.0-bin hive-2.1.0

Install Postgresql

sudo -u postgres psql

CREATE ROLE hive LOGIN PASSWORD 'hive_password';
CREATE DATABASE metastore OWNER hive ENCODING 'UTF8';
GRANT ALL PRIVILEGES ON DATABASE metastore TO hive;


cd /home/hadoop/hive-2.1.0/lib
wget http://central.maven.org/maven2/org/postgresql/postgresql/9.4.1211.jre7/postgresql-9.4.1211.jre7.jar

Configuration

cd /home/hadoop/hive-2.1.0/conf

vi hive-site.xml

<configuration>
<property>
    <name>hive.exec.scratchdir</name>
    <value>hdfs://linode01.touchworld.link:9000/hive/scratchdir</value>
</property>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://linode01.touchworld.link:9000/hive/warehousedir</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <val
环境搭建 hadoop    2019-05-06 06:51:53    91    0    0

Install database

install postgresql

  1. sudo yum install postgresql-server postgresql

init database

  1. sudo su - postgres
  2. initdb -D /var/lib/pgsql/data

start service

  1. systemctl status postgresql.service
  2. systemctl start postgresql.service
  3. systemctl stop postgresql.service

remote access

  1. vi /var/lib/pgsql/data/postgresql.conf
  2. listen_addresses ='*'
  3. vi /var/lib/pgsql/data/pg_hba.conf
  4. host all all 0.0.0.0/0 trust

restart service

  1. systemctl restart postgresql.service

set password

  1. su - postgres
  2. psql
  3. \password postgres
  4. xxxpgxxx

create cloudera-manager database

Connect to PostgreSQL:

  1. sudo -u postgres psql

If you are not using the Cloudera Manager installer, create a database for the Cloudera Manager Server. The database name, user name, and password can be any value. Record the names chosen because you will need them later when running the scm_prepare_database.sh script.

  1. CREATE ROLE scm LOGIN PASSWORD 'scm';
  2. CREATE DATABA