The standard is based on Elastic Common Schema (ECS), with all deviations and clarifications noted below.
Vendor.
field when normalizing to ECSevent.severity
mapping rulesevent.kind := "alert"
should only be set when event.category
, event.type
, and event.severity
fields are present and setProduct
field, replaced by guidelines for event.module
and event.dataset
event.code
field (to be reinstated later)related
fieldsWe use the latest 9.x version of ECS (which is the current major version at the time of writing). We are free to upgrade to minor and patch revisions without updating this standard, but going to a new major version requires a new revision of the standard.
The following fields shall be tagged:
Cps.version
Vendor
ecs.version
event.dataset
event.kind
event.module
event.outcome
observer.type
The following fields shall always be populated for all events, unless there is no applicable ECS value:
Event categorization fields (kind, type, category, outcome)
event.outcome
shall only be assigned when an event can logically contain an outcome.event.type
and event.category
shall be assigned as LogScale arrays utilizing array:append()
functionecs.version
Cps.version
Parser.version
event.kind
, event.module
, event.dataset
, event.category
, event.type
, etc.)Vendor
observer.vendor
vulnerability.scanner.vendor
device.manufacturer
event.kind
event.kind := "alert"
:
event.category[]
, event.type[]
, and event.severity
are required. This adheres to the Common Information Model mappings and is required for proper detections logic, if these fields are not available then the event should be set to event.kind := "event"
and not event.kind:= "alert"
.
event.severity
shall always map to a value within the range 1-100. The detections logic interprets numerical severities as follows:
Vendor severity | Range |
---|---|
Informational | 1-19 |
Low | 20-39 |
Medium | 40-59 |
High | 60-79 |
Critical | 80-100 |
event.module
shall contain roughly the name of the product or service that the event belongs to.
event.module
values shall be reused whenever appropriate. See event.module
guidelines for more guidelines.event.dataset
shall either:
event.module
, prefixed by the value of event.module
with a dot in between.event.module
.Vendor |
event.module |
event.dataset |
---|---|---|
microsoft |
azure |
azure.entraid |
zscaler |
zia |
zia.web |
cloudflare |
zero-trust |
zero-trust.network-session |
For any given data source, the author of the parser shall determine, on a best-effort basis, which domain specific fields are applicable to the data.
The only fields in the ECS which parsers shall deviate from are:
#
.event.original
shall not be present, since we use @rawstring
instead.event.ingested
shall not be present, since we use @ingesttimestamp
instead.@timestamp
shall contain a Unix timestamp, rather than a human readable timestamp.event.code
shall not be present for the moment, since we plan to tag it in the future (thus introducing a breaking change).
event.code
can still be available to use in a vendor-specific field, e.g. Vendor.event_type
.related
fields shall not be present.en-us
locale.
*.address
*.domain
email.*.address
host.hostname
*.hash.*
event.module
event.dataset
Vendor
*.email
Parsers shall strive to make all fields in a log event available as actual LogScale fields, even if they don’t match a field in ECS.
Vendor.
Vendor.
prefix.Vendor.
field when normalizing third party fields to ECS.
Vendor.
AND ECS fields.Vendor.
field name.Vendor.
field names and values unchanged to ensure direct correlation with source logs.Vendor.User Name
should become Vendor.User_Name
.When adding new fields to this standard, all fields which are not taken directly from ECS must have a capital letter from the point where they differ from the schema.
Parser.version
is similar to ecs.version
, but the Parser
namespace is our own, so must start with a capital letter.observer.Fictional_field
where observer
is an existing namespace in the schema, but Fictional_field
is our own field inside that namespace.These are the reasons we settled on using ECS as our schema, and the deviations we make from it. The decision was made May 31st 2023.
There were several options in play initially:
There’s been a lot of debate about whether we should deviate from ECS or not, with the arguments and decisions captured here. Those who wish to comply 100% with ECS want us to be able to have simple messaging around being ECS compliant, and not have to attach any asterixes to the statement that we are compliant. Further, there is a concern that if we “edit” the standard in any way, that we will continue to do so in the future, provoking unnecessary churn for all users.
On the other hand, those who wish to deviate from ECS are wishing to do so in order to have it play well with LogScale in the long run, and make sure we don’t invite unnecessary costs and user experience pains.
As such, the overall principals we are using for deviating from ECS today are:
Overall, we are committed to our ECS mappings being predictable, and we intend to live with the standard as it is in most places.
We have built and released a bunch of parsers that normalize incoming events to ECS, so we have gathered some experience now, and we want a stable foundation for NG-SIEM to work off of, so we are making the update now, before NG-SIEM takes off. The main concerns that we want the standard to support going forward are (in no particular order):
This primarily means:
related
fieldsThese changes have led to some in-depth discussions, which have their outcomes captured below:
We want to tag all the four ECS categorisation fields, but two of those fields are arrays, which don’t work well with the tagging mechanism we have today. In the end, we settled on not tagging these arrays, as the approaches we could come up with within the current system all had some big flaws. As such, we are hoping that arrays will become properly taggable at some point in the future.
The two approaches that were discussed was:
event.category[0] := "network"
would become something like event.category.network := "true"
event.category[0] := "network"
and event.category[1] := "api"
, it would instead have event.category := "network|api"
Approach #1 suffers from an unintuitive searching experience, where these tags are really just “presence” tags. That is, we don’t want to assign them any values, since those are meaningless. We only want to tag whether they are present or not. But creating all the values as individual tags also makes our safety net of tag grouping ineffective. That is, tag grouping will be enabled when a single tag creates too many data sources, but if all these values have their own tags, tag grouping cannot apply to them collectively, and we must instead rely on the “next” safety net, where no new data sources get created, which is also quite bad.
Approach #2 suffers from a very fragile user experience.
It requires users to search the field differently than other fields, but it also only works as a tag if all the values in the field always have the same sorting.
It’s also very easy for users to search in a wrong way.
For example, they may think that if they are looking for events which belong to the api
and network
categories, that they can search with something like event.category = "api|network"
, but that won’t find events which are categorized ala event.category = "api|authentication|network"
.
And if a new entry is ever added to the list of categories, which happens to be a substring of another category (e.g. file
and profile
), it becomes even easier to write queries which are wrong.
related
fieldsECS defines the following fields, which we will remove completely:
These fields would serve two overall purposes in LogScale:
array:contains("related.ip[]", value="1.1.1.1")
However, this comes with a number of tradeoffs in LogScale:
We are removing the fields to avoid these tradeoffs. Today we do have some options for supporting the use cases of those fields in a different way, like using raw text search and saved queries, but these also have unsatisfactory tradeoffs. For example, raw text search can result in poorer search performance and false positive results in some cases, while saved queries require maintenance but won’t have clear ownership for now at least.
However, we believe we can build functionality which supports the previous uses without these tradeoffs. That will take some time though, and we will need to rely on saved queries and raw text search in the interim.