环境搭建 hadoop 2019-05-06 06:51:53 100 0 0

Install Jdk

Configure SSH

Install Ntp

yum install ntp
vi /etc/ntp.conf
server ntp1.aliyun.com iburst
server ntp2.aliyun.com iburst
server ntp3.aliyun.com iburst
systemctl enable ntpd
systemctl start ntpd

Install Hadoop

Download Hadoop

wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.5/hadoop-2.7.5.tar.gz
tar zxvf hadoop-2.7.5.tar.gz 
cd hadoop-2.7.5/

Configure Hadoop

Config Env

vi .bashrc
export JAVA_HOME=/opt/jdk1.8.0_202
export HADOOP_PID_DIR=/data/hadooptemp

Configure slaves

vi slaves
test01
test02
test03

Configure core-site.xml

mkdir /data/hadoop
mkdir /data/hadooptemp

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop</value>
        <description>
            A base for other temporary directories.
        </description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://cluster01</value>
        <descr

More

CDH5 Installation Path-C

环境搭建 hadoop cdh 2019-05-06 06:51:53 56 0 0

Before You Begin

SSH Configuration

vi /etc/hosts
x.x.x.x linode01
x.x.x.x linode02
x.x.x.x linode03

hostnamectl set-hostname linode01

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub linode01
ssh-copy-id -i ~/.ssh/id_rsa.pub linode02
ssh-copy-id -i ~/.ssh/id_rsa.pub linode03

Disable Firewall

systemctl stop firewalld.service
systemctl disable firewalld.service

Dependency

yum install psmisc -y
yum install libxslt-devel -y
yum install chkconfig bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs redhat-lsb -y
yum install python-psycopg2 -y 
yum install snappy snappy-devel  -y

#NFS
yum install rpcbind -y
service rpcbind start

Install the Oracle JDK

cd /opt/
sudo wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn/java/jdk/7u80-b15/jdk-7u80-linux-x64

More

CDH5 Installation Path-B

环境搭建 hadoop cdh 2019-05-06 06:51:53 43 0 0

Before You Begin

SSH Configuration

vi /etc/hosts
x.x.x.x linode01
x.x.x.x linode02
x.x.x.x linode03

hostnamectl set-hostname linode01

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub linode01
ssh-copy-id -i ~/.ssh/id_rsa.pub linode02
ssh-copy-id -i ~/.ssh/id_rsa.pub linode03

Disable Firewall

systemctl stop firewalld.service
systemctl disable firewalld.service

Dependency

yum install psmisc -y
yum install libxslt-devel -y
yum install chkconfig bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs redhat-lsb -y
yum install python-psycopg2 -y 
yum install snappy snappy-devel  -y

#NFS
yum install rpcbind -y
service rpcbind start

Install and Configure External Databases

sudo yum install postgresql-server postgresql -y
sudo su - postgres
initdb -D /var/lib/pgsql/data

#remote access
vi /var/lib/pgsql/data/postgresql.conf
listen_addresses ='*'

vi /var/lib/pgsql/data/pg_h

More

Mount HDFS by NFS

环境搭建 hadoop 2019-05-06 06:51:53 436 0 0

Overview

The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client’s local file system. Currently NFS Gateway supports and enables the following usage patterns:

Users can browse the HDFS file system through their local file system on NFSv3 client compatible operating systems.
Users can download files from the the HDFS file system on to their local file system.
Users can upload files from their local file system directly to the HDFS file system.
Users can stream data directly to HDFS through the mount point. File append is supported but random write is not supported.

The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory. The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.

Configuration

in core-site.xml of the namenode, the following must be set( in non-secure mode)

<property>
  <name>hadoop.proxyuser.nfs

More

Spark Installation

环境搭建 hadoop 2019-05-06 06:51:53 45 0 0

Install Scala

wget http://downloads.lightbend.com/scala/2.10.6/scala-2.10.6.tgz
tar zxvf scala-2.10.6.tgz

vi /etc/profile

export SCALA_HOME=/home/hadoop/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin

source /etc/profile
scala -version

Install Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz
tar zxvf spark-1.6.2-bin-hadoop2.6.tgz
cd spark-1.6.2-bin-hadoop2.6/conf
cp spark-env.sh.template spark-env.sh

vi spark-env.sh

export JAVA_HOME=/opt/jdk1.8.0_91
export SCALA_HOME=/home/hadoop/scala-2.10.6
export HADOOP_HOME=/home/hadoop/hadoop-2.6.4
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export YARN_HOME=/home/hadoop/hadoop-2.6.4
    export YARN_CONF_DIR=${YARN_HOME}/etc/hadoop
export SPARK_HOME=/home/hadoop/spark-1.6.2-bin-hadoop2.6
export SPARK_LOCAL_DIRS=/home/hadoop/spark-1.6.2-bin-hadoop2.6
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
    expo

More

Impala Installation

环境搭建 hadoop 2019-05-06 06:51:53 47 0 0

Config Metastore

Install the Hive metastore somewhere in your cluster, see hive installation.

As part of this process, you configure the Hive metastore to use an external database as a metastore. Impala uses this same database for its own table metadata. You can choose either a MySQL or PostgreSQL database as the metastore.

It is recommends setting up a Hive metastore service rather than connecting directly to the metastore database; this configuration is required when running Impala under CDH 4.1. Make sure the /etc/impala/conf/hive-site.xml file contains the following setting, substituting the appropriate hostname for metastore_server_host:

<property>
<name>hive.metastore.uris</name>
<value>thrift://metastore_server_host:9083</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>3600</value>
<description>MetaStore Client socket timeout in seconds</description>
</property>

Install Imp

More

Simple Authentication for Hadoop

环境搭建 hadoop 2019-05-06 06:51:53 1049 0 0

By default Hadoop HTTP web-consoles (JobTracker, NameNode, TaskTrackers and DataNodes) allow access without any form of authentication.

The next section describes how to configure Hadoop HTTP web-consoles to require user authentication.

Configuration

The following properties should be in the core-site.xml of all the nodes in the cluster.

 <property>
      <name>hadoop.http.filter.initializers</name>
      <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
      <description>
                    Authentication for Hadoop HTTP web-consoles
                    add to this property the org.apache.hadoop.security.AuthenticationFilterInitializer initializer class.
      </description>
    </property>
    <property>
      <name>hadoop.http.authentication.type</name>
      <value>pers.louyj.utils.hadoop.auth.ext.StandardAuthenticationHandler</value>
      <description>
                    Defines authentic

More

Hive Installation

环境搭建 hadoop 2019-05-06 06:51:53 65 0 0

Install Hive

wget http://mirror.bit.edu.cn/apache/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz
tar zxvf apache-hive-2.1.0-bin.tar.gz
mv apache-hive-2.1.0-bin hive-2.1.0

Install Postgresql

sudo -u postgres psql

CREATE ROLE hive LOGIN PASSWORD 'hive_password';
CREATE DATABASE metastore OWNER hive ENCODING 'UTF8';
GRANT ALL PRIVILEGES ON DATABASE metastore TO hive;


cd /home/hadoop/hive-2.1.0/lib
wget http://central.maven.org/maven2/org/postgresql/postgresql/9.4.1211.jre7/postgresql-9.4.1211.jre7.jar

Configuration

cd /home/hadoop/hive-2.1.0/conf

vi hive-site.xml

<configuration>
<property>
    <name>hive.exec.scratchdir</name>
    <value>hdfs://linode01.touchworld.link:9000/hive/scratchdir</value>
</property>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://linode01.touchworld.link:9000/hive/warehousedir</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <val

More

Hadoop CDH 5 Installation

环境搭建 hadoop 2019-05-06 06:51:53 117 0 0

Install database

install postgresql

sudo yum install postgresql-server postgresql

init database

sudo su - postgres
initdb -D /var/lib/pgsql/data

start service

systemctl status postgresql.service
systemctl start postgresql.service
systemctl stop postgresql.service

remote access

vi /var/lib/pgsql/data/postgresql.conf
listen_addresses ='*'
vi /var/lib/pgsql/data/pg_hba.conf
host all all 0.0.0.0/0 trust

restart service

systemctl restart postgresql.service

set password

su - postgres
psql
\password postgres
xxxpgxxx

create cloudera-manager database

Connect to PostgreSQL:

sudo -u postgres psql

If you are not using the Cloudera Manager installer, create a database for the Cloudera Manager Server. The database name, user name, and password can be any value. Record the names chosen because you will need them later when running the scm_prepare_database.sh script.

CREATE ROLE scm LOGIN PASSWORD 'scm';
CREATE DATABA

More

Tag - hadoop

Install Jdk

Configure SSH

Install Ntp

Install Hadoop

Download Hadoop

Configure Hadoop

Before You Begin

SSH Configuration

Disable Firewall

Dependency

Install the Oracle JDK

Before You Begin

SSH Configuration

Disable Firewall

Dependency

Install and Configure External Databases

Overview

Configuration

Install Scala

Install Spark

Config Metastore

Install Imp

Configuration

Install Hive

Install Postgresql

Configuration

Install database

install postgresql

init database

start service

remote access

restart service

set password

create cloudera-manager database

Navigation

Recent Posts

Friend Links