Attributes and Error Handling

Some statements can be adorned with attributes that affects their behavior. Attributes appear after the statement on the same line right before the do keyword for expressions that have one (examples: define, sub, concurrent, map, foreach). There can be multiple attributes specified on a single statement in which case they are separated with commas. Attributes allow specifying and handling timeouts and handling errors and cancelations.

An attribute has a name and a value. The syntax is: name: value.

The acceptable types for attribute values depend on the attribute, as listed in the following table.

Acceptable Types for Attribute Values

Attribute

Acceptable Type for Attribute Value

Number

wait_task: 1

Strings

on_timeout: skip

Arrays

wait_task: [ "this_task", "that_task"

Definition names with arguments

on_timeout: handle_timeout()

Some of the attributes define behavior that apply to tasks, while others define behavior for the whole process. Processes and tasks are described in detail in Processes. For the purpose of understanding the attributes behavior described below, it is enough to know that a single process consists of one or more tasks. A task is a single thread of execution.

Some attributes attach behavior to the expression they adorn and all their sub-expressions. The sub-expressions of an expression are expressions that belong to the block defined by the parent expression. Not all expressions define blocks so not all expressions have sub-expressions. Expressions that may have sub-expressions include define, sub, concurrent and the looping expressions (foreach, concurrent foreach, etc.).

The exhaustive list of all attributes supported by the language are listed below in alphabetical order:

Attribute Supported by the Cloud Workflow Language

Attribute

Applies to

Possible Values

Description

on_error

define, sub, call, concurrent

name of the definition with arguments, skip or retry 

Behavior to trigger when an error occurs in an expression or a sub-expression. For concurrent blocks, the handler is applied to all sub-expressions.

on_rollback

define, sub, call, concurrent, concurrent map, concurrent foreach

name of the definition with arguments

Name of the definition called when an expression causes a rollback (due to an error or the task being canceled).

on_timeout

sub, concurrent, call

name of the definition with arguments, skip or retry 

Behavior to trigger when a timeout occurs in an expression or a sub-expression.

task_name

sub

string representing the name of a task

Change current task name to given value.

task_prefix

concurrent foreach, concurrent map

string representing the prefix of task names

Specifies the prefix of names of tasks created by concurrent loop (suffix is iteration index).

timeout

sub, call

string representing a duration

Value defines the maximum time allowed for expressions in a statement and any sub-expression to execute.

wait_task

concurrent, concurrent foreach, concurrent map

number of tasks to be waited on or name(s) of task(s) to be waited on

Pause the execution of a task until the condition defined by a value is met

task_label

sub, call, define

string representing label

Labels allow processes to return progress information to clients. They are associated with an arbitrary name that gets returned by the cloud workflow APIs.

The task_name, task_prefix, and wait_task attributes are described in Processes.

The following subsections detail the other attributes dealing with errors and timeouts.

Errors and Error Handling
Resource Action Errors
Handlers and State
Timeouts
Labels
Logging
Attributes and Error Handling Summary

Errors and Error Handling

Defining the steps involved in handling error cases is an integral part of all cloud workflows. This is another area where workflows and traditional programs differ: a workflow needs to describe the steps taken when errors occur the same way it describes the steps taken in the normal flow. Error handlers are thus first class citizen in Cloud Workflow Language and are implemented as definitions themselves. Handling an error could mean alerting someone, cleaning up resources, triggering another workflow, etc. Cloud Workflow Language makes it possible to do any of these things through the on_error attribute. Only the define, sub, and concurrent expressions may be adorned with that attribute.

The associated value is a string that can be a:

skip—aborts the execution of the statement and any sub-expression then proceeds to the next statement. No cancel handler is called in this case.
retry—retries the execution of the statement and any sub-expression.

To illustrate the behavior associated with the different values consider the following snippet:

sub on_error: skip do 

  raise "uh oh" 

end 

log_info("I'm here")

The engine generates an exception when the raise expression executes. This exception causes the parent expression on_error attribute to execute. The associated value is skip which means ignore the error and proceed to run the first expression after the block. The engine then proceeds to the next expression after the block (the log_info expression). If the attribute value associated with the on_error handler had been retry instead, then the engine would have proceeded to re-run the block (which in this case would result in an infinite loop).

As mentioned in the introduction, a cloud workflow may need to define additional steps that need to be executed in case of errors. The on_error attribute allows specifying a definition that gets run when an error occurs. The syntax allows for passing arguments to the definition so that the error handler can be provided with contextual information upon invocation. On top of arguments being passed explicitly, the error handler also has access to all the variables and references that were defined in the scope of the expression that raised the error.

The error handler can stipulate how the caller should behave once it completes by assigning one of the string values listed above (skip or retry) to the special $_error_behavior local variable. If the error definition does not define $_error_behavior, then the caller uses the default behavior (raise) after the error definition completes. This default behavior causes the error to be re-raised so that any error handler defined on a parent scope may handle it. If no error handler is defined or all error handlers end-up re-raising then the task terminates and its final status is failed. The skip behavior will not raise the error and will force the calling block to skip any remaining expressions after the error occurs. The retry behavior will retry the entire caller block again.

The following example shows how to implement a limited number of retries using an error handler:

define handle_retries($attempts) do 

  if $attempts <= 3 

    $_error_behavior = "retry" 

  else 

    $_error_behavior = "skip" 

  end 

end 

$attempts = 0 

sub on_error: handle_retries($attempts) do 

  $attempts = $attempts + 1 

  ... # Statements that will get retried 3 times in case of errors 

end 

Errors can originate from evaluating expressions (example, division by 0) or from executing resource actions (example, trying to launch an already running server). A variation on the former are errors generated intentionally using the raise keyword. In all these cases the most inner error handler defined using the on_error attribute gets executed.

The raise keyword optionally followed with a message causes an error which can be caught by an error handler. All error handlers have access to a special variable that contains information about the error being raised. $_error is a hash that contains three keys:

“type”—A string that describe the error type. All errors raised using the raise keyword have the type set to user.
“message”— A string that contains information specific to this occurrence of the error. The string contains any message given to the raise keyword for user errors.
“origin”—A [ line, column ] array pointing at where the error occurred in the Cloud Workflow Language source.

define handle_error() do 

  log_error($_error["type"] + ": " + $_error["message"]) # Will log "user: ouch" 

  $_error_behavior = "skip" 

end 

sub on_error: handle_error() do 

  raise "ouch" 

end 

Resource Action Errors

Resource actions always operate atomically on resource collections, in other words the expression @servers.launch() is semantically equivalent to making concurrent launch API calls to all resources in the @servers array. This means that multiple errors may happen concurrently if multiple resources in the collection fail to run the action. When that happens an error handler needs to have access to the set of resources that failed as well as the set that succeeded and the initial collection to take the appropriate actions. We have already seen the special $_error variable made available to error handlers in case of an error resulting from calling an action on a resource collection. Cloud Workflow Language also makes available the following variables to the error handler:

@_original—The resource collection that initially executed the action that failed.
@_done—A resource collection containing all the resources that successfully executed the action.
@_partial—A resource collection containing the partial results of the action if the action returns a collection of resources.
$_partial—An array containing the partial results of the action if the action returns an array of values.
$_errors—An array of hashes containing specific error information.

The $_errors variable contains an array of hashes. Each element includes the following values:

resource_href—Href of the underlying resource on which the action failed. Example: /account/71/instances/123
action—Name of the action that failed. Example: run_executable
action_arguments—Hash of action arguments as specified in the definition. Example: { recipe_name: sys:timezone }
request—Hash containing information related to the request including the following values...
url—Full request URL. Example: https://my.rightscale.com/instances/...run_executable
verb—HTTP verb used to make the request. Example: POST
headers—Hash of HTTP request headers and associated value
body—Request body (string)
response: Hash containing information related to the response including the following values.
code—HTTP response code (string)
headers—Hash of HTTP response headers
body—Response body (string)

In case of resource action errors the $_error variable is initialized with the type resource_action and includes the detailed error message with the problem, summary, and resolution fields as a string.

Given the above, the following definition implements a retry:

define handle_terminate_error() do 

  foreach $error in $_errors do 

    @instance = rs_cm.get($error["resource_href"]) # Retrieve the instance that failed to terminate 

    if @instance.state != "stopped"# Make sure it is still running 

      log_error("Instance " + @instance.name + " failed to terminate, retrying..."

      sub on_error: skip do 

        @instance.terminate() # If so try again to terminate but this time ignore any error 

      end 

    end 

  end 

  $_error_behavior = "skip" # Proceed with the next statement in caller 

end 

sub on_error: handle_terminate_error() do 

  @instances.terminate() 

end 

In the definition above the error handler sets the special $_error_behavior local variable to **skip** which means that the process will not raise the error and will instead skip the rest of the block where the error occurred. Note how the handler itself uses on_error to catch errors and ignore them (using skip).

Actions may return nothing, collection of resources, or array of values. In the case an action has a return value (collection or array), the error handler needs to be able to modify that value before it is returned to the calling block. For example, an error handler may retry certain actions and as a result may need to add to the returned value which would initially only contain values for the resources that ran the action successfully. An error handler can achieve this by reading the @_partial collection or the $_partial array, handling the error cases, and returning the complete results as a return value of the error handler definition.

To take a concrete example let's consider the Flexera servers resource launch() action. This action returns a collection of launched instances. The following handler retries any failure to launch and joins the @_partial collection with instances that successfully launched on retry:

define retry_launch() return @instances do 

  @instances = @_partial 

  foreach $error in $_errors do 

    @server = rs_cm.get($error["resource_href"]) # Retrieve the server that failed to launch 

    if @server.state == "stopped" # Make sure it is still stopped 

      log_error("Server " + @server.name + " failed to launch, retrying..."

      sub on_error: skip do 

        @instance = @server.launch() # If so try again to terminate but this time ignore any error 

      end 

      @instances = @instances + @instance # @instance may be empty in case the launch failed again 

    end 

  end 

  $_error_behavior = "skip" # Don't raise the error -- skip the rest of the caller block 

end 

sub on_error: retry_launch() retrieve @instances do 

  @instances = @servers.launch()

end 

The definition above adds any instance that is successfully launched in the retry to the @instances collection as result of any errors in the launch() action.

Handling Errors Returned by create_copies 

create_copies makes it possible to create multiple resources with one expression. Like any other action create_copies executes atomically. That is it attempts to create all resources concurrently and thus may have to report multiple errors.

When a call to create_copies results in errors the value of the "type" field of $_error is set to create_copies_action. In this case each element in the $_errors variable also contains a "copy_index" field which corresponds to the index of the copy whose creation resulted in the error.

create_copies conveniently accepts a set of indices which represents the indices to use when evaluating copy_index in the field values. This makes it simple to retry only the create calls that failed. Here is an example of an implementation of a retry algorithm for create_copies:

# bulk_create creates n deployments. 

# $fields contain the static fields while $copy_fields contains the fields that 

# make use of copy_index() 

define bulk_create($n, $fields, $copy_fields) return @deployments do 

  $attempts = 0 

  $create_indices = $n 

  @created = rs_cm.deployments.empty()

  sub on_error: compute_indices($attempts, @created) retrieve @created, $create_indices do 

    $attempts = $attempts + 1 

    @deployments = rs_cm.deployments.create_copies($create_indices, $fields, $copy_fields)

  end 

end 

 

# compute_indices looks at $_errors and returns the indices that must be retried. 

define compute_indices($attempts, @created) return @created, $failed_indices do 

  @created = @created + @_partial 

  if $_error["type"] == "create_copies_action" && $attempts <= 3 

    $failed_indices = []

    foreach $error in $_errors do 

      $failed_indices << $error['copy_index']

    end 

    $_error_behavior = "retry" 

  else 

    $_error_behavior = "error" 

    delete(@created)

  end 

end 

Handlers and State

We've seen before that definitions executed via call only have access to the references and variables passed as argument (and global references and variables). Definitions executed through handlers, on the other hand, inherit from all the local variables and references defined at the time the handler is invoked (so at the time an exception is thrown, a timeout occurs or a cancelation is triggered).

define handle_errors() do 

 log_error("Process failed while handling " + inspect(@servers)) # Note: handler has access to @servers 

 $_error_behavior = "skip" 

end 

@servers = rs_cm.get(href: "/api/servers/123"

sub on_error: handle_errors() do 

  @servers.launch()

end 

In the snippet above, the error handler has access to @servers even though that collection is defined in the main scope (the various log_xxx() functions allow for appending messages to process logs and the inspect() function produces a human friendly string representation of the object it is given)

Timeouts

The timeout and on_timeout attributes allow setting time limits on the execution of expressions and specifying the behavior when a time limit is reached respectively:

sub timeout: 30m, on_timeout: handle_launch_timeout() do 

  @server = rs_cm.get(href: "/api/server/1234")

  @instance = @server.launch()

  sleep_until(@instance.state == "operational")

  @server = rs_cm.get(href: "/api/servers/1235")

  @instance = @server.launch()

  sleep_until(@instance.state == "operational")

end 

The block in the snippet above must execute in less than 30 minutes otherwise its execution is canceled and the handle_launch_timeout definition is executed. Timeout values can be suffixed with d, h, m, or s (respectively day, hour, minute or second).

Note:There does not need to be an on_timeout associated with all timeout attributes. Instead, the most inner expression that includes the on_timeout attribute gets triggered when a timeout occurs:

sub on_timeout: outer_handler() do 

  ...

  sub timeout: 10m, on_timeout: inner_handler() do 

    ...

    @instance = @server.launch()

    sleep_until(@instance.state == "operational")

    ...

  end 

  ...

end 

In the snippet above, inner_handler gets executed if the sleep_until function takes more than 10 minutes to return.

Similar to the on_error attribute, the on_timeout attribute can be followed by a definition name or one of the behaviors values (skip or retry).

Note:Using the raise behavior in an on_timeout attribute will cause the next on_timeout handler to be executed. Timeouts never cause error handlers to be executed and vice-versa.

On top of specifying the behavior directly in the on_timeout attribute as in:

sub timeout: 10m, on_timeout: skip do 

  @instance = @server.launch()

end 

It's also possible for a definition handling the timeout to specify what the behavior should be by setting the $_timeout_behavior local variable:

define handle_timeout() do 

  $_timeout_behavior = "retry" 

end 

Finally, the timeout handler may accept arguments that can be specified with the on_timeout attribute. The values of the references and variables at the point when the timeout occurs are given to the handler:

define handle_timeout($retries) do 

  if $retries <3 

    $_timeout_behavior = "retry" 

  else 

    $_timeout_behavior = "skip" 

  end 

end 

$retries = 0 

sub timeout: 10m, on_timeout: handle_timeout($retries) do 

  $retries = $retries + 1 

  sleep(10 * 60 + 1) # Force the timeout handler to trigger 

end 

The snippet above will cause the handle_timeout definition to execute three times. The third times $retries is equal to 3, the timeout handler definition sets $_timeout_handler to skip and the block is canceled.

Labels

The task_label attribute is used to report progress information to clients. It does not affect the execution of the process and is simply a way to report what it is currently doing. The label attribute can be used on sub and call:

define main() do 

  sub task_label: "Initialization" do 

    ..

  end 

  sub task_label: "Launching servers" do 

    ..

  end 

  call setup_app() task_label: "Setting up application" 

end 

Logging

Important:The following functions are not yet working. They are here for future planning only.

As shown in the snippet above RCL has built-in support for logging which helps troubleshoot and develop cloud workflows. Each process is associated with a unique log that is automatically created on launch. Logging is done using the following functions:

log_title()To append a section title to the log
log_info()To append informational message to the log
log_error()To append an error message to the log

Logs for a process can be retrieved using the Cloud Workflow API or through the Flexera dashboard by looking at the process audit entries.

Attributes and Error Handling Summary

We have seen how a cloud workflow may use attributes to annotate statements and defining additional behaviors. Attributes apply to the statement they adorned and some also apply to its sub-expressions. Definitions can be written to handle errors, timeouts and cancelation. Definitions handling errors that occur during resource action execution have access to all the underlying low level errors and can modify the return value of the action.