User Guide

This guide provides information on Padas User Interface menu and configuration items.

Streaming Configurations

Following sections describe how to configure Padas to run streaming tasks in order to transform events and/or apply a set of filtering rules to generate alerts. Please refer to Introduction before moving forward in order to understand engins'e processing concepts.

All of the configuration views (Topologies, Pipelines, Tasks, Rules) provide the ability to bulk upload or download configurations in JSON format.

Topologies

A topology is simply a group of one or more ordered pipelines where it reads from a single input topic and writes to one or more output topic(s). Both input and output topic(s) are mandatory requirements for a topology that Padas engine runs. A topology consists of one or more ordered pipelines where an output from one pipeline becomes an input for the next pipeline definition.

It's possible to define any number of topologies per Padas Engine, where each topology starts a different processing task within one or more threads. For more detailed architectural description on Kafka streams processor topology please refer to Confluent Documentation.

Description of fields can be found below.

Field	Description
ID	Unique identifier. This ID is also used as a key when updating/deleting the entry.
Name	A descriptive name.
Description	Detailed description of topology functionality.
Group	Consumer group associated with this topology.
Input	Input topic to consume and apply configured pipeline(s).
Output	One or more output topics to send transformed data to.
Enabled	Set to `true` when enabled, `false` otherwise.
Pipelines	An ordered list of Pipelines to execute when streaming data from specified Input. When multiple Pipelines are specified, output from a Pipeline becomes an input for the next Pipeline.

Pipelines

A pipeline consists of one or more ordered tasks where an output from one task becomes an input for the next task definition. A pipeline is a logical grouping of tasks for specific goals. For example, in terms of processing tasks, a single pipeline with 12 different tasks is the same as having 3 consecutive pipelines with 4 different tasks each.

Description of fields can be found below.

Field	Description
ID	Unique identifier. This ID is also used as a key when updating/deleting the entry.
Name	A descriptive name.
Description	Detailed description of the pipeline functionality.
Tasks	An ordered list of Tasks to execute. When multiple Tasks are specified, output from a Task becomes an input for the next Task.

Tasks

A task is the single unit of work performed on event data. Each task has the following built-in functions that can perform processing on an event:

FILTER: Filter an event (keep or drop) based on PDL or regex definition. For PDL, the input must be JSON.
EXTRACT: Extract any event input with provided Regular Expression defition (named groups). The output is JSON.
PARSE_CSV: Parse input CSV event into JSON.
PARSE_KV: Parse input key-value pairs event into JSON.
TIMESTAMP: Define a field from within the event data (JSON formatted) to use as the timestamp.
EVAL: Add, remove or rename fields within JSON data. Both input and output are JSON.
APPLY_RULES: Apply predefined rules that are tagged with specific data models to events. It's possible to provide a PDL condition for events to match certain datamodels.

Description of fields can be found below.

Field	Description
ID	Unique identifier. This ID is also used as a key when updating/deleting the entry.
Name	A descriptive name.
Description	Detailed description of the task functionality.
Function	One of the built-in functions for this task.
Definition	Function definition. Each function has different definition parameters. Please see below for details.

`FILTER` Definition

This function allows filtering (keep or drop) an event if it matches the specified regular expression (regex) or PDL query (pdl). Event is not transformed and kept the same.

Field	Description
Type	Must be `pdl` or `regex`. Defines the type of filgering.
Value	Depending on the Type, this must be a PDL query or a Regular Expression to match the event.	⁠
Action	When the query/regex matches, this action processed on the event. Must be `keep` or `drop`.	⁠

`EXTRACT` Definition

This function allows usage of named capturing groups in regular expression to extract fields from event data. The output is in JSON formatted event with named groups as fields.

Field	Description
Regex	Named group capturing Regular Expression to match the event. Captured named groups will be JSON field names.
Keep Raw	Boolean to keep raw data in a separate field. If set to `true` a field name should be provided.	⁠
Raw Field Name	If raw data is to be kept, this will be the field to store it in. `_raw` is the default.	⁠

`PARSE_CSV` Definition

This function allows parsing CSV formatted data with any delimiter. The output is JSON formatted event with specified field names.

Field	Description
Field Names	Comma separated list of field names for the CSV data.
Delimiter	Field separator for the CSV items. Default is comma `,`.	⁠
Keep Raw	Boolean to keep raw data in a separate field. If set to `true` a field name should be provided.	⁠
Raw Field Name	If raw data is to be kept, this will be the field to store it in. `_raw` is the default.	⁠

`PARSE_KV` Definition

This function allows parsing key-value pairs within event data with any delimiter. Left side of the delimiter becomes the field name and right side becomes the field value. The output is JSON formatted event.

Field	Description
Delimiter	Field separator for the Key-Value items. Default is comma `=`.
Keep Raw	Boolean to keep raw data in a separate field. If set to `true` a field name should be provided.	⁠
Raw Field Name	If raw data is to be kept, this will be the field to store it in. `_raw` is the default.	⁠

`TIMESTAMP` Definition

This function extracts event timestamp from the given field with the provided format. The output is time in milliseconds in a new field (if specified). The timestamp information is utilized by stream processing.

Field	Description
Field	JSON data field name that holds the timestamp value to be parsed.
Format	Pattern to extract field data timestamp information based on Java SE Patterns for Formatting and Parsing. Default pattern is `yyyy-MM-dd'T'HH:mm:ss.SSSZ`.	⁠
Add New Field	Boolean to add a new field for extracted timestamp, represented in milliseconds. Default is `true`.	⁠
Time Field Name	Field name to add if the above is set to `true`. Default is `_time`.	⁠

`EVAL` Definition

This function allows data enrichment via various additional mini-functionality. Input must be JSON since fields and conditions require this in order to process event data.

Field	Description
Condition	Matching PDL query in order to execute the EVAL action specified. Empty or null query matches all events.
Action	Must be on of `add`, `alias`, `regex`, `remove`, `rename`.	⁠
Field	Field name to implement the action. `add`: new field name `alias`: existing field name to create an alias for `regex`: existing field to apply regular expression. `remove`: existing field to remove `rename`: existing field to rename	⁠
Value	Each action represents different value. `add`: new field value `alias`: new alias field name `regex`: named capturing regular expression where matched fields are added. `remove`: N/A `rename`: new field name	⁠

`APPY_RULES` Definition

This function applies pre-defined rules (PDL queries) in order to generate event alerts that match them. The output is enriched with padas_rules object array that contain matching rule information as well as the event data.

Field	Description
Condition	Matching PDL query in order to associate the event with the Data model, so that matching rules can be evaluated.
Data model	Rules with this matching data model will be evaluated against the event.	⁠
Match All	If set to `true`, all rules for this data model are evaluated. If set to `false`, first match wins and evaluation stops.	⁠

Rules

A rule is a PDL query that matches an event. The goal is to associate a rule with a specific data model and assign one or more annotations (e.g. MITRE ATT&CK technique IDs) for further processing by other analytics systems.

Description of fields can be found below.

Field	Description
ID	Unique identifier. This ID is also used as a key when updating/deleting the entry.
Name	A descriptive name.
Description	Detailed description of the rule functionality.
Data model	Data model name where this rule is applicable to. This is specified in a Task with `APPLY_RULES` function. Any arbitrary data model can be specified. Please refer to Datamodel Reference for more generic types and conventions.
PDL	PDL query to match the streaming data. Please refer to PDL Reference for details. Sample rules can also be found as PadasRules_sample.json
Annotations	List of applicable annotations for this rule. For example, a common usage would be adding MITRE ATT&CK Technique IDs.
Enabled	Set to `true` when enabled, `false` otherwise.

Test

Test view allows a simple interface to play with sample data and verify configurations.

Description of fields and state details can be found below.

Field	Description
Test Function	Functionality to test, can be a Task, Pipeline or Rule.
Tasks/Pipelines/Rules	Based on the above choice, this selection lists available/configured options.
Event Data	Copy/paste your sample event data here.
Result	Output will be displayed under this section.

Example Test View:

Management Configurations

Users

As an administrator user, you can view and edit all configuration items including user account settings. Currently there are 2 roles available for a user: admin and user, where user has read-only access to configurations. "Users" view can be accessed via "Settings --> Users" menu.

Nodes

Node Information table provides details on registered Padas engine instances.

Description of fields and state details can be found below.

Field	Description	Example
UUID	Unique identifier for this instance.	`26ee88e3-a753-4c8b-9adf-e0432abbbded`
Host	Hostname of the instance.	`padas.local`
REST	REST API endpoint where UI will connect to.	`https://padas.local:8999`
Group	Consumer group associated with this instance.	`default`
State	Current state of this streaming application. See below table for details.	`RUNNING`

State Details

Padas Engine is built as a Kafka Streams application and the state information is inherited from KafkaStreams.State. The following is a section from this link. Padas Engine instance must only be in one state at a time. The expected state transition is defined as:

NOTE: In order to reach a RUNNING state, you need at least 1 enabled Topology configuration that is assigned to the same group as the Padas Engine.

Topics

Topics view displays information on required Padas topics for storing configuration items. Details can be found in Topic Properties section in Admin Guide.

User Guide

Streaming Configurations

Topologies

Pipelines

Tasks

FILTER Definition

EXTRACT Definition

PARSE_CSV Definition

PARSE_KV Definition

TIMESTAMP Definition

EVAL Definition

APPY_RULES Definition

Rules

Test

Management Configurations

Users

Nodes

Topics

`FILTER` Definition

`EXTRACT` Definition

`PARSE_CSV` Definition

`PARSE_KV` Definition

`TIMESTAMP` Definition

`EVAL` Definition

`APPY_RULES` Definition