Heron

# Building Heron on CentOS 7
## Step 1 - Install the required dependencies

sudo yum install gcc gcc-c++ kernel-devel wget unzip zlib-devel zip git automake cmake patch libtool -y
    yum install python-devel -y
    yum install gmp gmp-devel -y

## Step 2 - Install libunwind from source
    wget http://download.savannah.gnu.org/releases/libunwind/libunwind-1.1.tar.gz
    tar xvf libunwind-1.1.tar.gz
    cd libunwind-1.1
    ./configure
    make
    sudo make install
    
## Step 3 - Set the following environment variables
    export CC=/usr/bin/gcc
    export CCX=/usr/bin/g++

## Step 4 - Install JDK
    
[see Install JDK here](http://note.louyj.com/blog/post/louyj/Install-JDK)
    
## Step 5 - Install Bazel 0.1.2
    wget https://github.com/bazelbuild/bazel/releases/download/0.1.2/bazel-0.1.2-installer-linux-x86_64.sh
    chmod +x bazel-0.1.2-installer-linux-x86_64.sh
    ./bazel-0.1.2-installer-linux-x86_64.sh --user

## Step 6 - Download heron and compile it
    cd
    git clone https://github.com/twitter/heron.git && cd heron
    ./bazel_configure.py
    bazel build --config=centos heron/...

## Step 7 - Build the binary packages
    bazel build --config=centos scripts/packages:binpkgs
    bazel build --config=centos scripts/packages:tarpkgs
This will build the packages below the bazel-bin/scripts/packages/ directory.

## Step 8 - Install Heron using installation scripts
    cd bazel-bin/scripts/packages/
    ./heron-client-install.sh --help
    ./heron-client-install.sh --user
    ./heron-api-install.sh --user
    ./heron-tools-install.sh --user
    
or using prefix

cd bazel-bin/scripts/packages/
    ./heron-client-install.sh --prefix=/root/software/heron 
    ./heron-tools-install.sh --prefix=/root/software/heron
    ./heron-api-install.sh --prefix=/root/software/heron
     
# Launch topology in local mode (single node)

## Step 1 — Launch an example topology
    export JAVA_HOME=/opt/jdk1.8.0_91
    heron submit local ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --deploy-deactivated

This will submit the topology to your locally running Heron cluster but it won’t activate the topology.

Note the output shows if the topology has been launched successfully and the working directory.

To check what’s under the working directory, run:

ls /root/.herondata/topologies/local/root/ExclamationTopology

All instances’ log files can be found in log-files under the working directory:

ls /root/.herondata/topologies/local/root/ExclamationTopology/log-files

## Step 2 — Start Heron Tracker

The Heron Tracker is a web service that continuously gathers information about your Heron cluster. You can launch the tracker by running the heron-tracker command.

heron-tracker --port 8888

You can reach Heron Tracker in your browser at http://localhost:8888 and see something like the following upon successful submission of the topology.

## Step 3 — Start Heron UI

Heron UI is a user interface that uses Heron Tracker to provide detailed visual representations of your Heron topologies. To launch Heron UI:

heron-ui --port=8889 --tracker_url="http://localhost:8888"

You can open Heron UI in your browser at http://localhost:8889 and see something like this upon successful submission of the topology

## Step 4 — Explore topology management commands

In step 1 you submitted a topology to your local cluster. The heron CLI tool also enables you to activate, deactivate, and kill topologies and more.

heron activate local ExclamationTopology
    heron deactivate local ExclamationTopology
    heron kill local ExclamationTopology

# Deploying Heron (multi node)

A Heron deployment requires several components working together. The following must be deployed to run Heron topologies in a cluster:

- Scheduler — Heron requires a scheduler to run its topologies. It can be deployed on an existing cluster running alongside other big data frameworks. Alternatively, it can be deployed on a cluster of its own. Heron currently supports several scheduler options: Aurora, Local or Slurm
 - State Manager — Heron state manager tracks the state of all deployed topologies. The topology state includes its logical plan, physical plan, and execution state. Heron supports the following state managers: Local File System or Zookeeper
 - Uploader — The Heron uploader distributes the topology jars to the servers that run them. Heron supports several uploaders: HDFS, Local File System or Amazon S3
 - Metrics Sinks — Heron collects several metrics during topology execution. These metrics can be routed to a sink for storage and offline analysis. Currently, Heron supports the following sinks: File Sink, Graphite Sink or Scribe Sink
 - Heron Tracker — Tracker serves as the gateway to explore the topologies. It exposes a REST API for exploring logical plan, physical plan of the topologies and also for fetching metrics from them.
 - Heron UI — The UI provides the ability to find and explore topologies visually. UI displays the DAG of the topology and how the DAG is mapped to physical containers running in clusters. Furthermore, it allows the ability to view logs, take heap dump, memory histograms, show metrics, etc.

## Step 1 - Setting Up ZooKeeper State Manager

Heron relies on ZooKeeper for a wide variety of cluster coordination tasks. You can use either a shared or dedicated ZooKeeper cluster.

There are a few things you should be aware of regarding Heron and ZooKeeper:

Heron uses ZooKeeper only for coordination, not for message passing, which means that ZooKeeper load should generally be fairly low. A single-node and/or shared ZooKeeper may suffice for your Heron cluster, depending on usage.
Heron uses ZooKeeper more efficiently than Storm. This makes Heron less likely than Storm to require a bulky or dedicated ZooKeeper cluster, but your use case may require one.
We strongly recommend running ZooKeeper under supervision.

### ZooKeeper State Manager Configuration

You can make Heron aware of the ZooKeeper cluster by modifying the `/root/.heron/conf/aurora/statemgr.yaml` config file specific for the Heron cluster. You’ll need to specify the following for each cluster:

- `heron.class.state.manager` — Indicates the class to be loaded for managing the state in ZooKeeper and this class is loaded using reflection. You should set this to com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager
 - `heron.statemgr.connection.string` — The host IP address and port to connect to ZooKeeper cluster (e.g) “127.0.0.1:2181”.
 - `heron.statemgr.root.path` — The root ZooKeeper node to be used by Heron. We recommend providing Heron with an exclusive root node; if you do not, make sure that the following child nodes are unused: /tmasters, /topologies, /pplans, /executionstate, /schedulers.
 - `heron.statemgr.zookeeper.is.initialize.tree` — Indicates whether the nodes under ZooKeeper root /tmasters, /topologies, /pplans, /executionstate, and /schedulers need to created, if they are not found. Set it to True if you could like Heron to create those nodes. If those nodes are already there, set it to False. The absence of this configuration implies True.
 - `heron.statemgr.zookeeper.session.timeout.ms` — Specifies how much time in milliseconds to wait before declaring the ZooKeeper session is dead.
 - `heron.statemgr.zookeeper.connection.timeout.ms` — Specifies how much time in milliseconds to wait before the connection to ZooKeeper is dead.
 - `heron.statemgr.zookeeper.retry.count` — Count of the number of retry attempts to connect to ZooKeeper
 - `heron.statemgr.zookeeper.retry.interval.ms`: Time in milliseconds to wait between each retry

### Example ZooKeeper State Manager Configuration

# local state manager class for managing state in a persistent fashion
    heron.class.state.manager: com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager
    
    # local state manager connection string
    heron.statemgr.connection.string:  "louyj.top:2181"
    
    # path of the root address to store the state in a local file system
    heron.statemgr.root.path: "/heron001"
    
    # create the zookeeper nodes, if they do not exist
    heron.statemgr.zookeeper.is.initialize.tree: True
    
    # timeout in ms to wait before considering zookeeper session is dead
    heron.statemgr.zookeeper.session.timeout.ms: 30000
    
    # timeout in ms to wait before considering zookeeper connection is dead
    heron.statemgr.zookeeper.connection.timeout.ms: 30000
    
    # timeout in ms to wait before considering zookeeper connection is dead
    heron.statemgr.zookeeper.retry.count: 10
    
    # duration of time to wait until the next retry
    heron.statemgr.zookeeper.retry.interval.ms: 10000
 
## Step 2 - Setting Up Aurora Cluster (Scheduler)

Aurora doesn’t have a Heron scheduler per se. Instead, when a topology is submitted to Heron, heron cli interacts with Aurora to automatically deploy all the components necessary to manage topologies.

### ZooKeeper

To run Heron on Aurora, you’ll need to set up a ZooKeeper cluster and configure Heron to communicate with it.

### Hosting Binaries

o deploy Heron, the Aurora cluster needs access to the Heron core binary, which can be hosted wherever you’d like, so long as it’s accessible to Aurora.

Once your Heron binaries are hosted somewhere that is accessible to Aurora, you should run tests to ensure that Aurora can successfully fetch them.

### Aurora Scheduler Configuration

To configure Heron to use Aurora scheduler, modify the scheduler.yaml config file specific for the Heron cluster. The following must be specified for each cluster:

- `heron.class.scheduler` — Indicates the class to be loaded for Aurora scheduler. You should set this to com.twitter.heron.scheduler.aurora.AuroraScheduler
 - `heron.class.launcher` — Specifies the class to be loaded for launching and submitting topologies. To configure the Aurora launcher, set this to com.twitter.heron.scheduler.aurora.AuroraLauncher
 - `heron.package.core.uri` — Indicates the location of the heron core binary package. The local scheduler uses this URI to download the core package to the working directory.
 - `heron.directory.sandbox.java.home` — Specifies the java home to be used when running topologies in the containers.
 - `heron.scheduler.is.service` — This config indicates whether the scheduler is a service. In the case of Aurora, it should be set to False.

### Example Aurora Scheduler Configuration

# scheduler class for distributing the topology for execution
    heron.class.scheduler: com.twitter.heron.scheduler.aurora.AuroraScheduler
    
    # launcher class for submitting and launching the topology
    heron.class.launcher: com.twitter.heron.scheduler.aurora.AuroraLauncher
    
    # location of the core package
    heron.package.core.uri: file:///root/heron/bazel-bin/scripts/packages/heron-core.tar.gz
    
    # location of java - pick it up from shell environment
    heron.directory.sandbox.java.home: /opt/jdk1.8.0_91/
    
    # Invoke the IScheduler as a library directly
    heron.scheduler.is.service: False

### Working with Topologies

After setting up ZooKeeper and generating an Aurora-accessible Heron core binary release, any machine that has the heron cli tool can be used to manage Heron topologies (i.e. can submit topologies, activate and deactivate them, etc.).

The most important thing at this stage is to ensure that heron cli is available across all machines. Once the cli is available, Aurora as a scheduler can be enabled by specifying the proper configuration when managing topologies.

## Step 3 - Setting Up Local File System Uploader

When you submit a topology to Heron, the topology jars will be uploaded to a stable location. The submitter will provide this location to the scheduler and it will pass it to the executor each container. Heron can use a local file system as a stable storage for topology jar distribution.

There are a few things you should be aware of local file system uploader:

- Local file system uploader is mainly used in conjunction with local scheduler.
 - It is ideal, if you want to run Heron in a single server, laptop or an edge device.
 - Useful for Heron developers for local testing of the components.
 
### Local File System Uploader Configuration

You can make Heron aware of the local file system uploader by modifying the uploader.yaml config file specific for the Heron cluster. You’ll need to specify the following for each cluster:

- `heron.class.uploader` — Indicate the uploader class to be loaded. You should set this to `com.twitter.heron.uploader.localfs.LocalFileSystemUploader`
 - `heron.uploader.localfs.file.system.directory` — Provides the name of the directory where the topology jar should be uploaded. The name of the directory should be unique per cluster You could use the Heron environment variables ${CLUSTER} that will be substituted by cluster name.
 
### Example Local File System Uploader Configuration

# uploader class for transferring the topology jar/tar files to storage
    heron.class.uploader: com.twitter.heron.uploader.localfs.LocalFileSystemUploader
    
    # name of the directory to upload topologies for local file system uploader
    heron.uploader.localfs.file.system.directory: ${HOME}/.herondata/topologies/${CLUSTER}

Navigation

Recent Posts

Friend Links