CrowdStrike Parsing Standard (CPS)

The standard for our data format as parsed in Next-Gen SIEM.

The standard is based on Elastic Common Schema (ECS), with all deviations and clarifications noted below.

Changelog

1.1.0

Compared to the previous standard from the Package Standards document, the Parsing Standard is changed in the following ways:
- Adds rule of keeping original Vendor. field when normalizing to ECS
- Adds event.severity mapping rules
- Updates the rules and explanation for parser versioning
- Adds rule of using array:append with event.category and event.type
- Adds rule to lowercase all *.email field values
- Adds rule that event.kind := "alert" should only be set when event.category, event.type, and event.severity fields are present and set

1.0.0

The Parsing Standard was previously embedded in the old Package Standards document. That document still exists to document our approach to packages as a whole, but the parsing standard has been extracted so it can be referenced outside of packages. Going forward, the PaSta acronym refers to the parsing standard only.
Compared to the previous standard from the Package Standards document, the Parsing Standard is changed in the following ways:
- Adds new fields to tag
- Removes the Product field, replaced by guidelines for event.module and event.dataset
- Removes the event.code field (to be reinstated later)
- Removes the related fields
- Normalises values for a range of new fields

Version 1.1.0

We use the latest 9.x version of ECS (which is the current major version at the time of writing). We are free to upgrade to minor and patch revisions without updating this standard, but going to a new major version requires a new revision of the standard.

Rules

The following fields shall be tagged:
1. Cps.version
2. Vendor
3. ecs.version
4. event.dataset
5. event.kind
6. event.module
7. event.outcome
8. observer.type

The following fields shall always be populated for all events, unless there is no applicable ECS value:

Event categorization fields (kind, type, category, outcome)
- event.outcome shall only be assigned when an event can logically contain an outcome.
- event.type and event.category shall be assigned as LogScale arrays utilizing array:append() function
ecs.version
- This field shall contain the version of ECS that is being followed by the parser.
Cps.version
1. This field shall contain a MAJOR.MINOR.PATCH version number à la Semantic Versioning.
2. The version denotes the version of this standard, which was targeted by the parser during ingest.
Parser.version
1. This field shall contain a MAJOR.MINOR.PATCH version number à la Semantic Versioning.
2. This version number is specific to the parser which parsed the event, and not related to the version of the package the parser may have been installed from.
3. The rules for updating the version number are:
  MAJOR
  - Modification or removal of one of the key ECS/tagged fields (event.kind, event.module, event.dataset, event.category, event.type, etc.)
  - Modification of 5 or more unique existing fields (e.g. name changes/value changes) in the dataset
  - Major overhauls or design changes to the parser
  MINOR
  - Modification of 4 or less unique existing fields that are not a patch/error fix (e.g. fixing misspelling or small accidental issues)
  - Additional features or fields that do not meet the threshold of a major change
  PATCH
  - Modifications to existing fields or architecture that have minimal impact on the parser (e.g. updating ecs.version)
  - No extra features or changes, just bug fixes and corrections. This may include misspellings or corrections to field mappings that do not meet the threshold of a minor change.
Vendor
1. If the event was parsed with a parser from a package, the vendor name used here must match the vendor name used in the package scope (e.g. “fortinet” for “fortinet/fortigate”).
2. If a parser sets any of the following fields, those must be consistent with vendor names used in other CPS-compliant parsers:
  - observer.vendor
  - vulnerability.scanner.vendor
  - device.manufacturer
3. See Vendor guidelines for guidelines on which vendor name to use.

event.kind

This field shall contain a value representing the type of information the incoming event contains.

If assigning event.kind := "alert":

The fields event.category[], event.type[], and event.severity are required. This adheres to the Common Information Model mappings and is required for proper detections logic, if these fields are not available then the event should be set to event.kind := "event" and not event.kind:= "alert".

event.severity shall always map to a value within the range 1-100. The detections logic interprets numerical severities as follows:

Vendor severity	Range
Informational	1-19
Low	20-39
Medium	40-59
High	60-79
Critical	80-100

event.module shall contain roughly the name of the product or service that the event belongs to.
1. Existing event.module values shall be reused whenever appropriate. See event.module guidelines for more guidelines.
event.dataset shall either:
1. Contain the specific name of the dataset within the module described by event.module, prefixed by the value of event.module with a dot in between.
2. Not exist, if it doesn’t contain any information beyond what is present in event.module.

Example combinations of the above fields can be:

`Vendor`	`event.module`	`event.dataset`
`microsoft`	`azure`	`azure.entraid`
`zscaler`	`zia`	`zia.web`
`cloudflare`	`zero-trust`	`zero-trust.network-session`

For any given data source, the author of the parser shall determine, on a best-effort basis, which domain specific fields are applicable to the data.
The only fields in the ECS which parsers shall deviate from are:
1. Fields which we use as tags have their names prefixed with #.
2. The field event.original shall not be present, since we use @rawstring instead.
3. The field event.ingested shall not be present, since we use @ingesttimestamp instead.
4. The field @timestamp shall contain a Unix timestamp, rather than a human readable timestamp.
5. The field event.code shall not be present for the moment, since we plan to tag it in the future (thus introducing a breaking change).
  - The value from event.code can still be available to use in a vendor-specific field, e.g. Vendor.event_type.
6. The related fields shall not be present.
7. The following fields shall all have their values lowercased by the en-us locale.
  1. *.address
  2. *.domain
  3. email.*.address
  4. host.hostname
  5. *.hash.*
  6. event.module
  7. event.dataset
  8. Vendor
  9. *.email
Parsers shall strive to make all fields in a log event available as actual LogScale fields, even if they don’t match a field in ECS.
1. Fields from the event which do not exist in ECS, shall have its name prefixed with the string literal Vendor.
  - This gives the ECS fields the “root” namespace, while vendor specific fields can always be found with the Vendor. prefix.
2. Always keep the original Vendor. field when normalizing third party fields to ECS.
  1. When vendor specific field names are not present in the logs:
    1. If vendor documentation provides field names:
      - Map values to both Vendor. AND ECS fields.
    2. If vendor documentation does not provide field names:
      - First try to map to an appropriate ECS field.
      - If no ECS field exists, create a descriptive Vendor. field name.
  2. Maintain original field names from vendors:
    1. Keep Vendor. field names and values unchanged to ensure direct correlation with source logs.
    2. Exception: Replace spaces with underscores (_) in field names.
      - Example: Vendor.User Name should become Vendor.User_Name.
When adding new fields to this standard, all fields which are not taken directly from ECS must have a capital letter from the point where they differ from the schema.
1. Using capital letters for field names follows the guidance from ECS on how to add event fields outside the schema.
2. Example of fully custom field: Parser.version is similar to ecs.version, but the Parser namespace is our own, so must start with a capital letter.
3. Example of extending ECS with custom field: observer.Fictional_field where observer is an existing namespace in the schema, but Fictional_field is our own field inside that namespace.

Appendix

Appendix A - Reasoning behind packaging standard v0.1.0

These are the reasons we settled on using ECS as our schema, and the deviations we make from it. The decision was made May 31st 2023.

There were several options in play initially:

We could decide on not having a data model now, but just normalizing field names, so we could apply a data model later via field aliasing.
- This was not deemed good enough to move forward. The main concern is that we will break compatibility with data when we make any move, so instead of making half a move now and half a move later when decide on a data model, we should just make the full leap.
The OCSF project is decided against, as there is low confidence in its viability in the long run.
XDR is not considered broad enough of a model to cover all the general purpose logging that LogScale covers
We also do not want to create yet another custom data model, as that’s quite a slog to get started on and do right
Elastic Common Schema (ECS) is the only candidate which seems viable
- It covers a broad range of topics, making it applicable across different log sources
- It can be implemented in a piece-meal fashion
- It has been formally donated to OpenTelemetry (OTel), so we can hopefully refer to it via them, instead of via Elastic (a competitor)
Where we have decided to deviate from ECS:
- We want to have the original fields of the log available next to the ECS fields. In order to avoid name conflicts between those fields and the ECS fields, all the original fields will be available with the “Vendor.” prefix. Additionally, users can filter on fields which allow them to find only the logs for the product and vendor they care about. With those logs in hand, they can search using the “Vendor.” fields for that product.
- We need to decide up front which fields across all ECS event types need to be tagged. That’s because these tags must be consistent across data sources for the search experience to be coherent.

Appendix B - Deviating from ECS

There’s been a lot of debate about whether we should deviate from ECS or not, with the arguments and decisions captured here. Those who wish to comply 100% with ECS want us to be able to have simple messaging around being ECS compliant, and not have to attach any asterixes to the statement that we are compliant. Further, there is a concern that if we “edit” the standard in any way, that we will continue to do so in the future, provoking unnecessary churn for all users.

On the other hand, those who wish to deviate from ECS are wishing to do so in order to have it play well with LogScale in the long run, and make sure we don’t invite unnecessary costs and user experience pains.

As such, the overall principals we are using for deviating from ECS today are:

Tagging fields changes their names, but we must be able to tag for performance reasons. However, we are keeping tagging to categorisation-like fields, and not other fields, to make the tagging more predictable for users.
Field values can be subject to different rules than in ECS, like enforcing casing of text in places where ECS does not enforce it.
Any fields we add outside of ECS must comply with the ECS naming convention for custom fields.
If we ban certain fields, we do so because they would inflict more user pain than gain in the long run.

Overall, we are committed to our ECS mappings being predictable, and we intend to live with the standard as it is in most places.

Appendix C - Reasoning behind moving to parsing standard v1.0.0

We have built and released a bunch of parsers that normalize incoming events to ECS, so we have gathered some experience now, and we want a stable foundation for NG-SIEM to work off of, so we are making the update now, before NG-SIEM takes off. The main concerns that we want the standard to support going forward are (in no particular order):

Supporting faster search speeds as people search across more data
Keeping performance high and and COGS low as best we can going forward
Enabling users to search across fields in a more consistent manner

This primarily means:

applying more tags than before
removing the related fields
normalising additional fields
clarifying ambiguities in the first standard

These changes have led to some in-depth discussions, which have their outcomes captured below:

Tagging arrays

We want to tag all the four ECS categorisation fields, but two of those fields are arrays, which don’t work well with the tagging mechanism we have today. In the end, we settled on not tagging these arrays, as the approaches we could come up with within the current system all had some big flaws. As such, we are hoping that arrays will become properly taggable at some point in the future.

The two approaches that were discussed was:

Changing array entries to dedicated fields of their own

So event.category[0] := "network" would become something like event.category.network := "true"

Merging the array fields into a single field

So if an event has event.category[0] := "network" and event.category[1] := "api", it would instead have event.category := "network|api"

Approach #1 suffers from an unintuitive searching experience, where these tags are really just “presence” tags. That is, we don’t want to assign them any values, since those are meaningless. We only want to tag whether they are present or not. But creating all the values as individual tags also makes our safety net of tag grouping ineffective. That is, tag grouping will be enabled when a single tag creates too many data sources, but if all these values have their own tags, tag grouping cannot apply to them collectively, and we must instead rely on the “next” safety net, where no new data sources get created, which is also quite bad.

Approach #2 suffers from a very fragile user experience. It requires users to search the field differently than other fields, but it also only works as a tag if all the values in the field always have the same sorting. It’s also very easy for users to search in a wrong way. For example, they may think that if they are looking for events which belong to the api and network categories, that they can search with something like event.category = "api|network", but that won’t find events which are categorized ala event.category = "api|authentication|network". And if a new entry is ever added to the list of categories, which happens to be a substring of another category (e.g. file and profile), it becomes even easier to write queries which are wrong.

Removing `related` fields

ECS defines the following fields, which we will remove completely:

related.ip
related.user
related.hosts
related.hash

These fields would serve two overall purposes in LogScale:

On a given event, show a nice list of all IP addresses, user names, etc. contained within.
Allow pivoting from e.g. an IP address to seeing all events which relate to that address by running a search ala:

array:contains("related.ip[]", value="1.1.1.1")

However, this comes with a number of tradeoffs in LogScale:

Data is duplicated across fields, increasing storage costs and slowing search performance
Parsers become more complex to write and maintain
Anyone using field aliasing to map data to ECS will be unable to use these arrays properly, as mapping fields to arrays today is very brittle
In cases where an event contains a lot of data, the duplication increases the likelihood that the event will hit the 1000 field limit and become truncated
The system is limited to only four “types” today, and if we want to extend that, we have to either add our own new fields here (which risks conflicting with future ECS versions), or we add them in a custom namespace to avoid conflicts. Additionally, any time we want to add an extra type, we also increase the size of events.

We are removing the fields to avoid these tradeoffs. Today we do have some options for supporting the use cases of those fields in a different way, like using raw text search and saved queries, but these also have unsatisfactory tradeoffs. For example, raw text search can result in poorer search performance and false positive results in some cases, while saved queries require maintenance but won’t have clear ownership for now at least.

However, we believe we can build functionality which supports the previous uses without these tradeoffs. That will take some time though, and we will need to rely on saved queries and raw text search in the interim.

CrowdStrike Parsing Standard (CPS)

Changelog #

1.1.0 #

1.0.0 #

Version 1.1.0 #

Rules #

MAJOR #

MINOR #

PATCH #

Appendix #

Appendix A - Reasoning behind packaging standard v0.1.0 #

Appendix B - Deviating from ECS #

Appendix C - Reasoning behind moving to parsing standard v1.0.0 #

Tagging arrays #

Removing related fields #