StackStorm Prototyping

StackStorm is:

a platform for integration and automation across services and tools. It ties together your existing infrastructure and application environment so you can more easily automate that environment. It has a particular focus on taking actions in response to events.

This sounds a lot like what Amygdala does, hence the following story:

WS-1087 - Getting issue details... STATUS

This page documents the planning and analysis of StackStorm for this story.

Findings

These are just my feelings after doing the prototyping work in this sprint.

Surprises

The polling sensor code is run on the polling interval, whether or not there are triggers configured.

Many different kinds of state in a sensor: sensor configuration, trigger configuration, persistent state, object state.

Pros

StackStorm was very easy to get running using Docker. It does more-or-less what you expect it to do, in more-or-less the way you expect it to.

The sensors and actions are run in separate processes. This protects the system from misbehaving code in the packs. It shouldn't be possible to induce a complete failure of the StackStorm system by putting broken code into it.

While Sensors need to be written in Python to inject events into the system, Actions can be written in a variety of systems, including simply being HTTP endpoints or bash scripts. This adds a lot of flexibility to the integration platform.

StackStorm offers a comprehensive UI. While not the most polished thing in the world, it's very very good.

Cons

First, I found that developing a sensor was not straightforward. Each time I changed the sensor code, I had to reinstall the pack. There may be a more efficient way to do this but it wasn't made clear in the documentation I saw.

Second, there are a lot of ways to fail at making a sensor. It wasn't obvious how to debug it. I wound up tailing the logs. Several times I got into a situation or state where the web service backing the UI was returning a 500—such as when a sensor changed its API significantly. These were resolved by using the command line tool st2.

Third, the developer documentation—while extensive—still failed to include Python API documentation. I wound up reading the source code to existing sensors to discover what the right way to use the APIs was.

StackStorm terminology

Amygdala is a little shy on terminology. It essentially has one major interface, which is Loader and whose role is basically to sit on a certain channel of an AMQP connection and do something with the events that come in.

StackStorm, in contrast, has a fairly rich suite of interfaces:

Actions, which represent behaviors that are associated to rules, and may represent either integrations, generic action code, or custom code and behavior.
Rules, which execute actions when conditions are matched against events
Sensors, which detect or receive events from external systems and act as sources, bringing them into StackStorm
Triggers, which represent events in the StackStorm system

More information about this can be found in the StackStorm "About" documentation.

Alternatives

Amygdala Migration

The purpose of this section is to draft out what Amygdala processes exist, and how to convert them to StackStorm processes.

Monitor and Control Systems

The package edu.nrao.archive.amygdala.bridges contains several classes which together collaborate to handle new science data models (SDMs) and binary data files (BDFs) as they come off the array. The earliest moment our software could possibly be apprised of these files is when the metadata capture and formatting (MCAF) process finishes producing the SDM, or when the correlator backend (CBE) finishes producing the BDF. Amygdala was architected around this idea: the class MCMulticastBridge connects to MCAF and registers a web hook with itself and begins processing Monitor & Control UDP packets. It level-changes these into Java EE events and AMQP events.

From here, the BdfIngestionBatcher runs on an adjustable but currently 5 minute timer, batch ingesting however many BDFs have arrived since the last run. SdmIngestionTrigger simply initiates an SDM ingestion whenever a complete SDM is generated; this occurs just once per observation, in contrast to the BDFs, which are generated continuously during ingestion.

This architecture is now considered flawed. It is prone to network filesystem synchronization issues: just because the CBE is done writing a file does not mean that the file is yet visible somewhere else in the network. Also, we have seen that the webhook for MCAF needs to be reissued whenever MCAF is restarted. But since we aren't apprised of when exactly that happens, we wind up resetting the webhook on a periodic timer.

Overall, it's just very clear that this whole approach is pretty brittle and poses no specific benefits in comparison to the older way of doing this pioneered by John Benson, which is to say, monitoring the filesystem periodically.

StackStorm version

This will wind up being a pair of very short workflow that looks something like this:

Filesystem monitor for BDF files
Run Workspaces workflow/capability of ingestion

StackStorm: Necessary Functionality

Monitor and Control Systems

Filesystem Monitoring

Per the discussion above, the correct approach here is to come up with a filesystem monitor. Every so often (configurable) emit an event if there are new files present in the directory.

There are several ways to do this:

Emit a trigger about new files arriving in the directory
Emit a trigger when there are new files, but include all the files in the directory
Emit a trigger whenever there are any files in the directory, including or not including all the files

Option 1 seems like the most useful, but requires the most state from the trigger. It also opens the question of whether StackStorm is an "at least once" or an "at most once" system.

Option 2 requires less state and seems like it would be more robust if StackStorm turns out to be an "at most once" system.

Option 3 seems like it would be the most irritating and least helpful, except for the fact that it would be totally sufficient for this problem and requires no particular state.

Queue Runner/Loaders

These come in two flavors: ALMA and VLBA. They share a similar architecture, which is very interesting and cool (I wrote the first one) but mainly exists to enable throttling of executions at a certain number, which is an intrinsic functionality of Workspaces (capability queue limits). So it should not be necessary to bring much of the complexity along for the migration.

ALMA

Despite the names, AlmaReingestionQueueLoader actually accounts for three things:

Reingestion of raw ASDMs from the ALMA instrument
Reingestion of calibrations from ALMA
Marking ASDMs as not calibrateable

What is needed to implement this is:

Source of ALMA ASDMs to load
Source of ALMA calibrations to load
Action of marking ASDM as not calibrateable

VLBA

The VLBA system is simpler, it only handles the appearance of VLBA data to ingest. There is no automatic calibration of VLBA data. What is needed to implement this is therefore only a source of VLBA data to load.

StackStorm Development Notes

There is not a super adequate amount of developer API documentation inside the Python code for StackStorm. A certain amount of reverse engineering was required to create the new file detector. Here are some notes about what I discovered while doing this implementation.

Sensors

You're going to get your sensor instantiated exactly once.

A lot of weird stuff about the API is somewhat explained by the fact that your sensor can define many trigger types. Each trigger type has a parameter type and a payload type. These are expressed using JSON Schema. So the Python code winds up accepting and producing lots of dictionaries representing JSON.

Triggers get manufactured by StackStorm when the user configures a rule to depend on a trigger type coming out of your sensor, but they aren't instances of a real class, just a dictionary. When this happens, StackStorm will call add_trigger on your sensor instance with a trigger dictionary. The most interesting slot of this to you will be the trigger's parameter, which will have whatever parameters you defined in the parameter schema in the YAML file. So for the directory watcher, this is the directory to check. It could be other things too, like the glob string to use.

It's important to hold onto this configuration somewhere because StackStorm will then issue update and delete calls with the same dictionary, and you need the dictionary to call sensor_service.dispatch with. It's these calls that create events in the StackStorm system which the action in a rule will catch.

State

In terms of saving state, using your sensor instance isn't super safe, because the service could go down and come back. Helpfully, StackStorm provides a key-value store that is keyed to your instance. Unhelpfully, it's a key-value store, so you'll have to invent keys for whatever properties you want to hold on to for longer than the life of a single instance.

Development Cycle

The easiest thing to do, following the advice of StackStorm Docker, is this:

Clone the docker repo
Run docker compose up -d .
Develop your pack in the packs.dev directory inside the cloned Docker repo
Run docker-compose exec st2client bash and then change to the directory packs.dev
Run st2 pack install file://$PWD/<your-pack>
Navigate to the UI at http://localhost and create a rule using your actions and sensors

Whenever you modify the pack sourcecode, you'll need to rerun st2 pack install file://$PWD/<your pack> . In another console, follow the logs with: docker compose logs -f

This is fairly onerous. I am not entirely sure what can be done about it at this time.

StackStorm Checklist

Sensor: Filesystem monitor
Action: Workspaces Capability create-and-run
Action: Workspaces Workflow create-and-run
Sensor: ALMA ASDMs to load
Sensor: ALMA calibrations to load
Sensor: VLBA data to load
Action: Mark ASDM as not calibrateable

Page tree

StackStorm Prototyping

Findings

Surprises

Pros

Cons

StackStorm terminology

Alternatives

Amygdala Migration

Monitor and Control Systems

StackStorm version

StackStorm: Necessary Functionality

Monitor and Control Systems

Filesystem Monitoring

Queue Runner/Loaders

ALMA

VLBA

StackStorm Development Notes

Sensors

State

Development Cycle

StackStorm Checklist