Attributes and Error Handling
An attribute has a name and a value. The syntax is: name: value.
The acceptable types for attribute values depend on the attribute, as listed in the following table.
Attribute |
Acceptable Type for Attribute Value |
Number |
wait_task: 1 |
Strings |
on_timeout: skip |
Arrays |
wait_task: [ "this_task", "that_task" |
Definition names with arguments |
on_timeout: handle_timeout() |
Some of the attributes define behavior that apply to tasks, while others define behavior for the whole process. Processes and tasks are described in detail in Processes. For the purpose of understanding the attributes behavior described below, it is enough to know that a single process consists of one or more tasks. A task is a single thread of execution.
Some attributes attach behavior to the expression they adorn and all their sub-expressions. The sub-expressions of an expression are expressions that belong to the block defined by the parent expression. Not all expressions define blocks so not all expressions have sub-expressions. Expressions that may have sub-expressions include define, sub, concurrent and the looping expressions (foreach, concurrent foreach, etc.).
The exhaustive list of all attributes supported by the language are listed below in alphabetical order:
Attribute |
Applies to |
Possible Values |
Description |
on_error |
define, sub, call, concurrent |
name of the definition with arguments, skip or retry |
Behavior to trigger when an error occurs in an expression or a sub-expression. For concurrent blocks, the handler is applied to all sub-expressions. |
on_rollback |
define, sub, call, concurrent, concurrent map, concurrent foreach |
name of the definition with arguments |
Name of the definition called when an expression causes a rollback (due to an error or the task being canceled). |
on_timeout |
sub, concurrent, call |
name of the definition with arguments, skip or retry |
Behavior to trigger when a timeout occurs in an expression or a sub-expression. |
task_name |
sub |
string representing the name of a task |
Change current task name to given value. |
task_prefix |
concurrent foreach, concurrent map |
string representing the prefix of task names |
Specifies the prefix of names of tasks created by concurrent loop (suffix is iteration index). |
timeout |
sub, call |
string representing a duration |
Value defines the maximum time allowed for expressions in a statement and any sub-expression to execute. |
wait_task |
concurrent, concurrent foreach, concurrent map |
number of tasks to be waited on or name(s) of task(s) to be waited on |
Pause the execution of a task until the condition defined by a value is met |
task_label |
sub, call, define |
string representing label |
Labels allow processes to return progress information to clients. They are associated with an arbitrary name that gets returned by the cloud workflow APIs. |
The task_name, task_prefix, and wait_task attributes are described in Processes.
The following subsections detail the other attributes dealing with errors and timeouts.
• | Errors and Error Handling |
• | Resource Action Errors |
• | Handlers and State |
• | Timeouts |
• | Labels |
• | Logging |
• | Attributes and Error Handling Summary |
Defining the steps involved in handling error cases is an integral part of all cloud workflows. This is another area where workflows and traditional programs differ: a workflow needs to describe the steps taken when errors occur the same way it describes the steps taken in the normal flow. Error handlers are thus first class citizen in Cloud Workflow Language and are implemented as definitions themselves. Handling an error could mean alerting someone, cleaning up resources, triggering another workflow, etc. Cloud Workflow Language makes it possible to do any of these things through the on_error attribute. Only the define, sub, and concurrent expressions may be adorned with that attribute.
The associated value is a string that can be a:
• | skip—aborts the execution of the statement and any sub-expression then proceeds to the next statement. No cancel handler is called in this case. |
• | retry—retries the execution of the statement and any sub-expression. |
To illustrate the behavior associated with the different values consider the following snippet:
sub on_error: skip do
raise "uh oh"
end
log_info("I'm here")
The engine generates an exception when the raise expression executes. This exception causes the parent expression on_error attribute to execute. The associated value is skip which means ignore the error and proceed to run the first expression after the block. The engine then proceeds to the next expression after the block (the log_info expression). If the attribute value associated with the on_error handler had been retry instead, then the engine would have proceeded to re-run the block (which in this case would result in an infinite loop).
As mentioned in the introduction, a cloud workflow may need to define additional steps that need to be executed in case of errors. The on_error attribute allows specifying a definition that gets run when an error occurs. The syntax allows for passing arguments to the definition so that the error handler can be provided with contextual information upon invocation. On top of arguments being passed explicitly, the error handler also has access to all the variables and references that were defined in the scope of the expression that raised the error.
The error handler can stipulate how the caller should behave once it completes by assigning one of the string values listed above (skip or retry) to the special $_error_behavior local variable. If the error definition does not define $_error_behavior, then the caller uses the default behavior (raise) after the error definition completes. This default behavior causes the error to be re-raised so that any error handler defined on a parent scope may handle it. If no error handler is defined or all error handlers end-up re-raising then the task terminates and its final status is failed. The skip behavior will not raise the error and will force the calling block to skip any remaining expressions after the error occurs. The retry behavior will retry the entire caller block again.
The following example shows how to implement a limited number of retries using an error handler:
define handle_retries($attempts) do
if $attempts <= 3
$_error_behavior = "retry"
else
$_error_behavior = "skip"
end
end
$attempts = 0
sub on_error: handle_retries($attempts) do
$attempts = $attempts + 1
... # Statements that will get retried 3 times in case of errors
end
Errors can originate from evaluating expressions (example, division by 0) or from executing resource actions (example, trying to launch an already running server). A variation on the former are errors generated intentionally using the raise keyword. In all these cases the most inner error handler defined using the on_error attribute gets executed.
The raise keyword optionally followed with a message causes an error which can be caught by an error handler. All error handlers have access to a special variable that contains information about the error being raised. $_error is a hash that contains three keys:
• | “type”—A string that describe the error type. All errors raised using the raise keyword have the type set to user. |
• | “message”— A string that contains information specific to this occurrence of the error. The string contains any message given to the raise keyword for user errors. |
• | “origin”—A [ line, column ] array pointing at where the error occurred in the Cloud Workflow Language source. |
define handle_error() do
log_error($_error["type"] + ": " + $_error["message"]) # Will log "user: ouch"
$_error_behavior = "skip"
end
sub on_error: handle_error() do
raise "ouch"
end
Resource actions always operate atomically on resource collections, in other words the expression @servers.launch() is semantically equivalent to making concurrent launch API calls to all resources in the @servers array. This means that multiple errors may happen concurrently if multiple resources in the collection fail to run the action. When that happens an error handler needs to have access to the set of resources that failed as well as the set that succeeded and the initial collection to take the appropriate actions. We have already seen the special $_error variable made available to error handlers in case of an error resulting from calling an action on a resource collection. Cloud Workflow Language also makes available the following variables to the error handler:
• | @_original—The resource collection that initially executed the action that failed. |
• | @_done—A resource collection containing all the resources that successfully executed the action. |
• | @_partial—A resource collection containing the partial results of the action if the action returns a collection of resources. |
• | $_partial—An array containing the partial results of the action if the action returns an array of values. |
• | $_errors—An array of hashes containing specific error information. |
The $_errors variable contains an array of hashes. Each element includes the following values:
• | resource_href—Href of the underlying resource on which the action failed. Example: /account/71/instances/123 |
• | action—Name of the action that failed. Example: run_executable |
• | action_arguments—Hash of action arguments as specified in the definition. Example: { recipe_name: sys:timezone } |
• | request—Hash containing information related to the request including the following values... |
• | url—Full request URL. Example: https://my.rightscale.com/instances/...run_executable |
• | verb—HTTP verb used to make the request. Example: POST |
• | headers—Hash of HTTP request headers and associated value |
• | body—Request body (string) |
• | response: Hash containing information related to the response including the following values. |
• | code—HTTP response code (string) |
• | headers—Hash of HTTP response headers |
• | body—Response body (string) |
In case of resource action errors the $_error variable is initialized with the type resource_action and includes the detailed error message with the problem, summary, and resolution fields as a string.
Given the above, the following definition implements a retry:
define handle_terminate_error() do
foreach $error in $_errors do
@instance = rs_cm.get($error["resource_href"]) # Retrieve the instance that failed to terminate
if @instance.state != "stopped"# Make sure it is still running
log_error("Instance " + @instance.name + " failed to terminate, retrying...")
sub on_error: skip do
@instance.terminate() # If so try again to terminate but this time ignore any error
end
end
end
$_error_behavior = "skip" # Proceed with the next statement in caller
end
sub on_error: handle_terminate_error() do
@instances.terminate()
end
In the definition above the error handler sets the special $_error_behavior local variable to **skip** which means that the process will not raise the error and will instead skip the rest of the block where the error occurred. Note how the handler itself uses on_error to catch errors and ignore them (using skip).
Actions may return nothing, collection of resources, or array of values. In the case an action has a return value (collection or array), the error handler needs to be able to modify that value before it is returned to the calling block. For example, an error handler may retry certain actions and as a result may need to add to the returned value which would initially only contain values for the resources that ran the action successfully. An error handler can achieve this by reading the @_partial collection or the $_partial array, handling the error cases, and returning the complete results as a return value of the error handler definition.
To take a concrete example let's consider the Flexera servers resource launch() action. This action returns a collection of launched instances. The following handler retries any failure to launch and joins the @_partial collection with instances that successfully launched on retry:
define retry_launch() return @instances do
@instances = @_partial
foreach $error in $_errors do
@server = rs_cm.get($error["resource_href"]) # Retrieve the server that failed to launch
if @server.state == "stopped" # Make sure it is still stopped
log_error("Server " + @server.name + " failed to launch, retrying...")
sub on_error: skip do
@instance = @server.launch() # If so try again to terminate but this time ignore any error
end
@instances = @instances + @instance # @instance may be empty in case the launch failed again
end
end
$_error_behavior = "skip" # Don't raise the error -- skip the rest of the caller block
end
sub on_error: retry_launch() retrieve @instances do
@instances = @servers.launch()
end
The definition above adds any instance that is successfully launched in the retry to the @instances collection as result of any errors in the launch() action.
Handling Errors Returned by create_copies
create_copies makes it possible to create multiple resources with one expression. Like any other action create_copies executes atomically. That is it attempts to create all resources concurrently and thus may have to report multiple errors.
When a call to create_copies results in errors the value of the "type" field of $_error is set to create_copies_action. In this case each element in the $_errors variable also contains a "copy_index" field which corresponds to the index of the copy whose creation resulted in the error.
create_copies conveniently accepts a set of indices which represents the indices to use when evaluating copy_index in the field values. This makes it simple to retry only the create calls that failed. Here is an example of an implementation of a retry algorithm for create_copies:
# bulk_create creates n deployments.
# $fields contain the static fields while $copy_fields contains the fields that
# make use of copy_index()
define bulk_create($n, $fields, $copy_fields) return @deployments do
$attempts = 0
$create_indices = $n
@created = rs_cm.deployments.empty()
sub on_error: compute_indices($attempts, @created) retrieve @created, $create_indices do
$attempts = $attempts + 1
@deployments = rs_cm.deployments.create_copies($create_indices, $fields, $copy_fields)
end
end
# compute_indices looks at $_errors and returns the indices that must be retried.
define compute_indices($attempts, @created) return @created, $failed_indices do
@created = @created + @_partial
if $_error["type"] == "create_copies_action" && $attempts <= 3
$failed_indices = []
foreach $error in $_errors do
$failed_indices << $error['copy_index']
end
$_error_behavior = "retry"
else
$_error_behavior = "error"
delete(@created)
end
end
We've seen before that definitions executed via call only have access to the references and variables passed as argument (and global references and variables). Definitions executed through handlers, on the other hand, inherit from all the local variables and references defined at the time the handler is invoked (so at the time an exception is thrown, a timeout occurs or a cancelation is triggered).
define handle_errors() do
log_error("Process failed while handling " + inspect(@servers)) # Note: handler has access to @servers
$_error_behavior = "skip"
end
@servers = rs_cm.get(href: "/api/servers/123")
sub on_error: handle_errors() do
@servers.launch()
end
In the snippet above, the error handler has access to @servers even though that collection is defined in the main scope (the various log_xxx() functions allow for appending messages to process logs and the inspect() function produces a human friendly string representation of the object it is given)
The timeout and on_timeout attributes allow setting time limits on the execution of expressions and specifying the behavior when a time limit is reached respectively:
sub timeout: 30m, on_timeout: handle_launch_timeout() do
@server = rs_cm.get(href: "/api/server/1234")
@instance = @server.launch()
sleep_until(@instance.state == "operational")
@server = rs_cm.get(href: "/api/servers/1235")
@instance = @server.launch()
sleep_until(@instance.state == "operational")
end
The block in the snippet above must execute in less than 30 minutes otherwise its execution is canceled and the handle_launch_timeout definition is executed. Timeout values can be suffixed with d, h, m, or s (respectively day, hour, minute or second).
Note:There does not need to be an on_timeout associated with all timeout attributes. Instead, the most inner expression that includes the on_timeout attribute gets triggered when a timeout occurs:
sub on_timeout: outer_handler() do
...
sub timeout: 10m, on_timeout: inner_handler() do
...
@instance = @server.launch()
sleep_until(@instance.state == "operational")
...
end
...
end
In the snippet above, inner_handler gets executed if the sleep_until function takes more than 10 minutes to return.
Similar to the on_error attribute, the on_timeout attribute can be followed by a definition name or one of the behaviors values (skip or retry).
Note:Using the raise behavior in an on_timeout attribute will cause the next on_timeout handler to be executed. Timeouts never cause error handlers to be executed and vice-versa.
On top of specifying the behavior directly in the on_timeout attribute as in:
sub timeout: 10m, on_timeout: skip do
@instance = @server.launch()
end
It's also possible for a definition handling the timeout to specify what the behavior should be by setting the $_timeout_behavior local variable:
define handle_timeout() do
$_timeout_behavior = "retry"
end
Finally, the timeout handler may accept arguments that can be specified with the on_timeout attribute. The values of the references and variables at the point when the timeout occurs are given to the handler:
define handle_timeout($retries) do
if $retries <3
$_timeout_behavior = "retry"
else
$_timeout_behavior = "skip"
end
end
$retries = 0
sub timeout: 10m, on_timeout: handle_timeout($retries) do
$retries = $retries + 1
sleep(10 * 60 + 1) # Force the timeout handler to trigger
end
The snippet above will cause the handle_timeout definition to execute three times. The third times $retries is equal to 3, the timeout handler definition sets $_timeout_handler to skip and the block is canceled.
The task_label attribute is used to report progress information to clients. It does not affect the execution of the process and is simply a way to report what it is currently doing. The label attribute can be used on sub and call:
define main() do
sub task_label: "Initialization" do
...
end
sub task_label: "Launching servers" do
...
end
call setup_app() task_label: "Setting up application"
end
Important:The following functions are not yet working. They are here for future planning only.
As shown in the snippet above RCL has built-in support for logging which helps troubleshoot and develop cloud workflows. Each process is associated with a unique log that is automatically created on launch. Logging is done using the following functions:
• | log_title()—To append a section title to the log |
• | log_info()—To append informational message to the log |
• | log_error()—To append an error message to the log |
Logs for a process can be retrieved using the Cloud Workflow API or through the Flexera dashboard by looking at the process audit entries.
Attributes and Error Handling Summary
We have seen how a cloud workflow may use attributes to annotate statements and defining additional behaviors. Attributes apply to the statement they adorned and some also apply to its sub-expressions. Definitions can be written to handle errors, timeouts and cancelation. Definitions handling errors that occur during resource action execution have access to all the underlying low level errors and can modify the return value of the action.