This project has retired. For details please refer to its Attic page.
Falcon - FalconCLI

FalconCLI

FalconCLI is a interface between user and Falcon. It is a command line utility provided by Falcon. FalconCLI supports Entity Management, Instance Management and Admin operations.There is a set of web services that are used by FalconCLI to interact with Falcon.

Common CLI Options

Falcon URL

Optional -url option indicating the URL of the Falcon system to run the command against can be provided. If not mentioned it will be picked from the system environment variable FALCON_URL. If FALCON_URL is not set then it will be picked from client.properties file. If the option is not provided and also not set in client.properties, Falcon CLI will fail.

Proxy user support

The -doAs option allows the current user to impersonate other users when interacting with the Falcon system. The current user must be configured as a proxyuser in the Falcon system. The proxyuser configuration may restrict from which hosts a user may impersonate users, as well as users of which groups can be impersonated.

Proxyuser support described here.

Debug Mode

If you export FALCON_DEBUG=true then the Falcon CLI will output the Web Services API details used by any commands you execute. This is useful for debugging purposes to or see how the Falcon CLI works with the WS API. Alternately, you can specify '-debug' through the CLI arguments to get the debug statements. Example: $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml -debug

Entity Management Operations

Submit

Submit option is used to set up entity definition.

Usage: $FALCON_HOME/bin/falcon entity -submit -type [cluster|datasource|feed|process] -file <entity-definition.xml>

Example: $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml

Note: The url option in the above and all subsequent commands is optional. If not mentioned it will be picked from client.properties file. If the option is not provided and also not set in client.properties, Falcon CLI will fail.

Schedule

Once submitted, an entity can be scheduled using schedule option. Process and feed can only be scheduled.

Usage: $FALCON_HOME/bin/falcon entity -type [process|feed] -name <<name>> -schedule

Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.

Example: $FALCON_HOME/bin/falcon entity -type process -name sampleProcess -schedule

Suspend

Suspend on an entity results in suspension of the oozie bundle that was scheduled earlier through the schedule function. No further instances are executed on a suspended entity. Only schedule-able entities(process/feed) can be suspended.

Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -suspend

Resume

Puts a suspended process/feed back to active, which in turn resumes applicable oozie bundle.

Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -resume

Delete

Delete removes the submitted entity definition for the specified entity and put it into the archive.

Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -delete

List

Entities of a particular type can be listed with list sub-command.

Usage: $FALCON_HOME/bin/falcon entity -list

Optional Args : -fields <<field1,field2>> -type <<[cluster|datasource|feed|process],[cluster|datasource|feed|process]>> -nameseq <<namesubsequence>> -tagkeys <<tagkeyword1,tagkeyword2>> -filterBy <<field1:value1,field2:value2>> -tags <<tagkey=tagvalue,tagkey=tagvalue>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10

Optional params described here.

Summary

Summary of entities of a particular type and a cluster will be listed. Entity summary has N most recent instances of entity.

Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -summary

Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -fields <<field1,field2>> -filterBy <<field1:value1,field2:value2>> -tags <<tagkey=tagvalue,tagkey=tagvalue>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10 -numInstances 7

Optional params described here.

Update

Update operation allows an already submitted/scheduled entity to be updated. Cluster and datasource updates are currently not allowed.

Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -update -file <<path_to_file>>

Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.

Example: $FALCON_HOME/bin/falcon entity -type process -name HourlyReportsGenerator -update -file /process/definition.xml

Touch

Force Update operation allows an already submitted/scheduled entity to be updated.

Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -touch

Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.

Status

Status returns the current status of the entity.

Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -status

Dependency

With the use of dependency option, we can list all the entities on which the specified entity is dependent. For example for a feed, dependency return the cluster name and for process it returns all the input feeds, output feeds and cluster names.

Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -dependency

Definition

Definition option returns the entity definition submitted earlier during submit step.

Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -definition

Lookup

Lookup option tells you which feed does a given path belong to. This can be useful in several scenarios e.g. generally you would want to have a single definition for common feeds like metadata with same location otherwise it can result in a problem (different retention durations can result in surprises for one team) If you want to check if there are multiple definitions of same metadata then you can pick an instance of that and run through the lookup command like below.

Usage: $FALCON_HOME/bin/falcon entity -type feed -lookup -path /data/projects/my-hourly/2014/10/10/23/

If you have multiple feeds with location as /data/projects/my-hourly/${YEAR}/${MONTH}/${DAY}/${HOUR} then this command will return all of them.

SLAAlert

Since: 0.8

This command lists all the feed instances which have missed sla and are still not available. If a feed instance missed sla but is now available, then it will not be reported in results. The purpose of this API is alerting and hence it doesn't return feed instances which missed SLA but are available as they don't require any action.

* Currently sla monitoring is supported only for feeds.

* Option end is optional and will default to current time if missing.

* Option name is optional, if provided only instances of that feed will be considered.

Usage:

Example 1

$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert -end 2016-05-03T00:00Z -colo local

name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T11:59Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:00Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:01Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:02Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:03Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:04Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:05Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:06Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:07Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:08Z, tags: Missed SLA Low

Response: default/Success!

Request Id: default/216978070@qtp-830047511-4 - f5a6c129-ab42-4feb-a2bf-c3baed356248

Example 2

$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert -end 2016-05-03T00:00Z -colo local -name in

name: in, type: FEED, cluster: local, instanceTime: 2015-09-26T06:00Z, tags: Missed SLA High

Response: default/Success!

Request Id: default/1580107885@qtp-830047511-7 - f16cbc51-5070-4551-ad25-28f75e5e4cf2

Instance Management Options

Kill

Kill sub-command is used to kill all the instances of the specified process whose nominal time is between the given start time and end time.

Note: 1. The start time and end time needs to be specified in TZ format. Example: 01 Jan 2012 01:00 => 2012-01-01T01:00Z

3. Process name is compulsory parameter for each instance management command.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -kill -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

Suspend

Suspend is used to suspend a instance or instances for the given process. This option pauses the parent workflow at the state, which it was in at the time of execution of this command.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

Continue

Continue option is used to continue the failed workflow instance. This option is valid only for process instances in terminal state, i.e. KILLED or FAILED.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -continue -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

Rerun

Rerun option is used to rerun instances of a given process. On issuing a rerun, by default the execution resumes from the last failed node in the workflow. This option is valid only for process instances in terminal state, i.e. SUCCEEDED, KILLED or FAILED. If one wants to forcefully rerun the entire workflow, -force should be passed along with -rerun Additionally, you can also specify properties to override via a properties file and this will be prioritized over force option in case of contradiction.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -rerun -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-force] [-file <<properties file>>]

Resume

Resume option is used to resume any instance that is in suspended state.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -resume -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

Status

Status option via CLI can be used to get the status of a single or multiple instances. If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Along with the status of the instance time is also returned. Log location gives the oozie workflow url If the instance is in WAITING state, missing dependencies are listed. The job urls are populated for all actions of user workflow and non-succeeded actions of the main-workflow. The user then need not go to the underlying scheduler to get the job urls when needed to debug an issue in the job.

Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:

{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status

Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>> -filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10

Optional params described here.

List

List option via CLI can be used to get single or multiple instances. If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Instance time is also returned. Log location gives the oozie workflow url If the instance is in WAITING state, missing dependencies are listed

Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:

{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]}

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -list

Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>> -lifecycle <<lifecycles>> -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10

Optional params described here.

Summary

Summary option via CLI can be used to get the consolidated status of the instances between the specified time period. Each status along with the corresponding instance count are listed for each of the applicable colos. The unscheduled instances between the specified time period are included as UNSCHEDULED in the output to provide more clarity.

Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:

{"status":"SUCCEEDED","message":"getSummary is successful", instancesSummary:[{"cluster": <<name>> "map":[{"SUCCEEDED":"1"}, {"WAITING":"1"}, {"RUNNING":"1"}]}]}

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -summary

Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>> -filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>> -orderBy field -sortOrder <<sortOrder>>

Optional params described here.

Running

Running option provides all the running instances of the mentioned process.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -running

Optional Args : -colo <<colo>> -lifecycle <<lifecycles>> -filterBy <<field1:value1,field2:value2>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10

Optional params described here.

FeedInstanceListing

Get falcon feed instance availability.

Usage: $FALCON_HOME/bin/falcon instance -type feed -name <<name>> -listing

Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>>

Optional params described here.

Logs

Get logs for instance actions

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -logs

Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -runid <<runid>> -colo <<colo>> -lifecycle <<lifecycles>> -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10

Optional params described here.

LifeCycle

Describes list of life cycles of a entity , for feed it can be replication/retention and for process it can be execution. This can be used with instance management options. Default values are replication for feed and execution for process.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status -lifecycle <<lifecycletype>> -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

Triage

Given a feed/process instance this command traces it's ancestors to find what all ancestors have failed. It's useful if lot of instances are failing in a pipeline as it then finds out the root cause of the pipeline being stuck.

Usage: $FALCON_HOME/bin/falcon instance -triage -type <<feed/process>> -name <<name>> -start "yyyy-MM-dd'T'HH:mm'Z'"

Params

Displays the workflow params of a given instance. Where start time is considered as nominal time of that instance and end time won't be considered.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params -start "yyyy-MM-dd'T'HH:mm'Z'"

Dependency

Display the dependent instances which are dependent on the given instance. For example for a given process instance it will list all the input feed instances(if any) and the output feed instances(if any).

An example use case of this command is as follows: Suppose you find out that the data in a feed instance was incorrect and you need to figure out which all process instances consumed this feed instance so that you can reprocess them after correcting the feed instance. You can give the feed instance and it will tell you which process instance produced this feed and which all process instances consumed this feed.

NOTE: 1. instanceTime must be a valid instanceTime e.g. instanceTime of a feed should be in it's validity range on applicable clusters, and it should be in the range of instances produced by the producer process(if any)

2. For processes with inputs like latest() which vary with time the results are not guaranteed to be correct.

Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params -instanceTime "yyyy-MM-dd'T'HH:mm'Z'"

For example: $FALCON_HOME/bin/falcon instance -dependency -type feed -name out -instanceTime 2014-12-15T00:00Z name: producer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:00Z, tags: Output name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:03Z, tags: Input name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:04Z, tags: Input name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:02Z, tags: Input name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:05Z, tags: Input

Response: default/Success!

Request Id: default/1125035965@qtp-503156953-7 - 447be0ad-1d38-4dce-b438-20f3de69b172

Optional params described here.

Metadata Lineage Options

Lineage

dot format. You can use the output and view a graphical representation of DAG using an online graphviz viewer like this.

Usage:

$FALCON_HOME/bin/falcon metadata -lineage -pipeline my-pipeline

pipeline is a mandatory option.

Vertex

Get the vertex with the specified id.

Usage: $FALCON_HOME/bin/falcon metadata -vertex -id <<id>>

Example: $FALCON_HOME/bin/falcon metadata -vertex -id 4

Vertices

Get all vertices for a key index given the specified value.

Usage: $FALCON_HOME/bin/falcon metadata -vertices -key <<key>> -value <<value>>

Example: $FALCON_HOME/bin/falcon metadata -vertices -key type -value feed-instance

Vertex Edges

Get the adjacent vertices or edges of the vertex with the specified direction.

Usage: $FALCON_HOME/bin/falcon metadata -edges -id <<vertex-id>> -direction <<direction>>

Example: $FALCON_HOME/bin/falcon metadata -edges -id 4 -direction both $FALCON_HOME/bin/falcon metadata -edges -id 4 -direction inE

Edge

Get the edge with the specified id.

Usage: $FALCON_HOME/bin/falcon metadata -edge -id <<id>>

Example: $FALCON_HOME/bin/falcon metadata -edge -id Q9n-Q-5g

Metadata Discovery Options

List

Lists of all dimensions of given type. If the user provides optional param cluster, only the dimensions related to the cluster are listed. Usage: $FALCON_HOME/bin/falcon metadata -list -type [cluster_entity|datasource_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines|replication_metrics]

Optional Args : -cluster <<cluster name>>

Example: $FALCON_HOME/bin/falcon metadata -list -type process_entity -cluster primary-cluster $FALCON_HOME/bin/falcon metadata -list -type tags

To display replication metrics from recipe based replication process and from feed replication. Usage: $FALCON_HOME/bin/falcon metadata -list -type replication_metrics -process/-feed <entity name> Optional Args : -numResults <<value>>

Example: $FALCON_HOME/bin/falcon metadata -list -type replication_metrics -process hdfs-replication $FALCON_HOME/bin/falcon metadata -list -type replication_metrics -feed fs-replication

Relations

List all dimensions related to specified Dimension identified by dimension-type and dimension-name. Usage: $FALCON_HOME/bin/falcon metadata -relations -type [cluster_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines] -name <<Dimension Name>>

Example: $FALCON_HOME/bin/falcon metadata -relations -type process_entity -name sample-process

Admin Options

Help

Usage: $FALCON_HOME/bin/falcon admin -help

Version

Version returns the current version of Falcon installed. Usage: $FALCON_HOME/bin/falcon admin -version

Status

Status returns the current state of Falcon (running or stopped). Usage: $FALCON_HOME/bin/falcon admin -status

Recipe Options

Submit Recipe

Submit the specified recipe.

Usage: $FALCON_HOME/bin/falcon recipe -name <name> Name of the recipe. User should have defined <name>-template.xml and <name>.properties in the path specified by falcon.recipe.path in client.properties file. falcon.home path is used if its not specified in client.properties file. If its not specified in client.properties file and also if files cannot be found at falcon.home, Falcon CLI will fail.

Optional Args : -tool <recipeToolClassName> Falcon provides a base tool that recipes can override. If this option is not specified the default Recipe Tool RecipeTool defined is used. This option is required if user defines his own recipe tool class.

Example: $FALCON_HOME/bin/falcon recipe -name hdfs-replication