Falcon - Building & Installing Falcon

Building & Installing Falcon

Building Falcon

git clone https://git-wip-us.apache.org/repos/asf/falcon.git falcon

cd falcon

export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean install [For hadoop 1]
export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean install -Dhadoop.profile=2 [For hadoop 2]

[optionally -Dhadoop.version=<<hadoop.version>> can be appended to build for a specific version of hadoop]

Once the build successfully completes, artifacts can be packaged for deployment. The package can be built in embedded or distributed mode.

Embedded Mode


mvn clean assembly:assembly -DskipTests -DskipCheck=true [For hadoop 1]
mvn clean assembly:assembly -DskipTests -DskipCheck=true -Dhadoop.profile=2 [For hadoop 2]

Tar can be found in {project dir}/target/falcon-${project.version}-bin.tar.gz

Tar is structured as follows


|- bin
   |- falcon
   |- falcon-start
   |- falcon-stop
   |- falcon-config.sh
   |- service-start.sh
   |- service-stop.sh
|- conf
   |- startup.properties
   |- runtime.properties
   |- client.properties
   |- log4j.xml
   |- falcon-env.sh
|- docs
|- client
   |- lib (client support libs)
|- server
   |- webapp
      |- falcon.war
|- hadooplibs
|- README
|- NOTICE.txt
|- LICENSE.txt
|- DISCLAIMER.txt
|- CHANGES.txt

Distributed Mode


mvn clean assembly:assembly -DskipTests -DskipCheck=true -P distributed -Dhadoop.profile=1 [For hadoop 1]
mvn clean assembly:assembly -DskipTests -DskipCheck=true -P distributed -Dhadoop.profile=2 [For hadoop 2]

Tar can be found in {project dir}/target/falcon-distributed-${project.version}-server.tar.gz

Tar is structured as follows


|- bin
   |- falcon
   |- falcon-start
   |- falcon-stop
   |- falcon-config.sh
   |- service-start.sh
   |- service-stop.sh
   |- prism-stop
   |- prism-start
|- conf
   |- startup.properties
   |- runtime.properties
   |- client.properties
   |- log4j.xml
   |- falcon-env.sh
|- docs
|- client
   |- lib (client support libs)
|- server
   |- webapp
      |- falcon.war
      |- prism.war
|- hadooplibs
|- README
|- NOTICE.txt
|- LICENSE.txt
|- DISCLAIMER.txt
|- CHANGES.txt

Installing & running Falcon

Installing falcon

tar -xzvf {falcon package}
cd falcon-distributed-${project.version} or falcon-${project.version}

Configuring Falcon

By default config directory used by faclon is {package dir}/conf. To override this set environemnt variable FALCON_CONF to the path of the conf dir.

falcon-env.sh has been added to the falcon conf. This file can be used to set various enviornment variables that you need for you services. In addition you can set any other environment variables you might need. This file will be sourced by falcon scripts before any commands are exectuted. The following enviornment variables are available to set.

# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
#export JAVA_HOME=

# any additional java opts you want to set. This will apply to both client and server operations
#export FALCON_OPTS=

# any additional java opts that you want to set for client only
#export FALCON_CLIENT_OPTS=

# java heap size we want to set for the client. Default is 1024MB
#export FALCON_CLIENT_HEAP=

# any additional opts you want to set for prisim service.
#export FALCON_PRISM_OPTS=

# java heap size we want to set for the prisim service. Default is 1024MB
#export FALCON_PRISM_HEAP=

# any additional opts you want to set for falcon service.
#export FALCON_SERVER_OPTS=

# java heap size we want to set for the falcon server. Default is 1024MB
#export FALCON_SERVER_HEAP=

# What is is considered as falcon home dir. Default is the base locaion of the installed software
#export FALCON_HOME_DIR=

# Where log files are stored. Defatult is logs directory under the base install location
#export FALCON_LOG_DIR=

# Where pid files are stored. Defatult is logs directory under the base install location
#export FALCON_PID_DIR=

# where the falcon active mq data is stored. Defatult is logs/data directory under the base install location
#export FALCON_DATA_DIR=

# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
#export FALCON_EXPANDED_WEBAPP_DIR=

Starting Falcon Server

bin/falcon-start [-port <port>]

By default, * falcon server starts at port 15000. To change the port, use -port option * falcon server starts embedded active mq. To control this behaviour, set the following system properties using -D option in environment variable FALCON_OPTS:

falcon.embeddedmq=<true/false> - Should server start embedded active mq, default true
falcon.emeddedmq.port=<port> - Port for embedded active mq, default 61616
falcon.embeddedmq.data=<path> - Data path for embedded active mq, default {package dir}/logs/data

* falcon server starts with conf from {package dir}/conf. To override this (to use the same conf with multiple falcon upgrades), set environment variable FALCON_CONF to the path of conf dir

Adding Extension Libraries Library extensions allows users to add custom libraries to entity lifecycles such as feed retention, feed replication and process execution. This is useful for usecases such as adding filesystem extensions. To enable this, add the following configs to startup.properties: *.libext.paths=<paths to be added to all entity lifecycles> *.libext.feed.paths=<paths to be added to all feed lifecycles> *.libext.feed.retentions.paths=<paths to be added to feed retention workflow> *.libext.feed.replication.paths=<paths to be added to feed replication workflow> *.libext.process.paths=<paths to be added to process workflow>

The configured jars are added to falcon classpath and the corresponding workflows

Starting Prism

bin/prism-start [-port <port>]

By default, * falcon server starts at port 16000. To change the port, use -port option * prism starts with conf from {package dir}/conf. To override this (to use the same conf with multiple prism upgrades), set environment variable FALCON_CONF to the path of conf dir

Using Falcon

bin/falcon admin -version
Falcon server build version: {Version:"0.3-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode:"embedded"}

----

bin/falcon help
(for more details about falcon cli usage)

Dashboard

Once falcon / prism is started, you can view the status of falcon entities using the Web-based dashboard. The web UI works in both distributed and embedded mode. You can open your browser at the corresponding port to use the web UI.

Stopping Falcon Server

bin/falcon-stop

Stopping Prism

bin/prism-stop

Running Examples using embedded package

bin/falcon-start

Make sure the hadoop and oozie endpoints are according to your setup in examples/entity/standalone-cluster.xml

bin/falcon entity -submit -type cluster -file examples/entity/standalone-cluster.xml

Submit input and output feeds:

bin/falcon entity -submit -type feed -file examples/entity/in-feed.xml
bin/falcon entity -submit -type feed -file examples/entity/out-feed.xml

Set-up workflow for the process:

hadoop fs -put examples/app /

Submit and schedule the process:

bin/falcon entity -submitAndSchedule -type process -file examples/entity/oozie-mr-process.xml
bin/falcon entity -submitAndSchedule -type process -file examples/entity/pig-process.xml

Generate input data:

examples/data/generate.sh <<hdfs endpoint>>

Get status of instances:

bin/falcon instance -status -type process -name oozie-mr-process -start 2013-11-15T00:05Z -end 2013-11-15T01:00Z

Preparing oozie bundle for use with Falcon

cd <<project home>>
mkdir target/package
src/bin/pacakge.sh <<hadoop-version>>

>> ex. src/bin/pacakge.sh 1.1.2 or src/bin/pacakge.sh 0.20.2-cdh3u5
>> oozie bundle available in target/package/oozie-4.0.0/distro/target/oozie-4.0.0-distro.tar.gz