Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Tracking changes to index templates, component templates and ingest pipelines #108469

Open
flash1293 opened this issue May 9, 2024 · 9 comments
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss Team:Data Management Meta label for data/management team

Comments

@flash1293
Copy link
Contributor

flash1293 commented May 9, 2024

Description

The "Logs+" initiative in Observability tries to make the experience around logs in the Elastic stack as seamless as possible.

An important part of this is detecting and mitigating ingestion issues. Most of the time ingestion issues start because something in the system changed. This can either be a change on the collection side or on the Elasticsearch side (mappings / ingest pipelines were rearranged, fleet integration packages got updated, ...)

When investigating an issue in this area, it would be very helpful to be able to understand what changes were made when things started to go south. There already is a very important building block for this - via the _ignored field and the failure store, it's possible to reconstruct when things started to act up.

The other important part is correlating the occurring errors with changes to the system - in a visual way, this is what I'm trying to get to:
Screenshot 2024-05-09 at 17 06 57

It's already possible to plot the errors over time, what's challenging is to give the user access to the annotations - changes to the configuration of the system. However, having access to this information and correlating both signals should speed up time-to-resolution a lot in a lot of cases. Having this information also would allow to automate or at least to simplify getting back to a working system by rolling back applied changes.

Some rough ideas / thoughts:

  • For each datastream, there could be a hidden .changes index which is written to each time an index template matching the stream, a component template referenced in this index template or an ingest pipeline referenced in it is updated
  • The change documents would need to contain:
    • timestamp of the change
    • delta of the change (what part of the configuration got updated how)
    • metadata about the change (who triggered it)
  • This isn't really something that can live on the Kibana layer - Kibana could track changes made through fleet automation, but it would miss changes that target Elasticsearch APIs directly which can be quite common based on the users setup
  • There are permission and storage concerns - who can access this information and how long should it live?
  • This is slightly distinct from the whole "stack monitoring" use case, as it's ultimately about the soundness of the configuration, not operational concerns - for example even on serverless this kind of information would be relevant to users

Any thoughts @ruflin @dakrone @felixbarny ?

@flash1293 flash1293 added discuss needs:triage Requires assignment of a team area label labels May 9, 2024
@dakrone dakrone added :Data Management/Indices APIs APIs to create and manage indices and templates :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed needs:triage Requires assignment of a team area label labels May 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 9, 2024
@ruflin
Copy link
Member

ruflin commented May 14, 2024

My ideal scenario would be that Elasticsearch versions its assets like ingest pipeline, index templates which would not only allow to track changes (also historically) but allow to roll back changes. As this will be a massive effort, we should start simpler.

Few constraints:

  • A single template / ingest pipeline can affect many data streams
  • Especially component templates are reused, changing one can affect all data streams
  • Upgrade of Elasticsearch cluster is also a change that can affect things
  • Rollover can have an affect, as this is when the templates apply
  • Mappings / settings can also be changed directly on the data stream itself

The above sounds a lot like an audit log. How much of this is captured today in audit logs? Instead of having this per data stream, could we have a global data stream for it with all the changes. If we have all the changes, it would allow Kibana to "stich togehther" the different changes and show it where relevant. For example if logs@custom change, the change would show up in all data streams have have rolled over since the change (which reminds me of #75031).

To start simple, only admin users would have access to the full changelog.

Besides having the audit log, ideally also on the asset itself like ingest pipelines, the system would automatically update meta information around created, last_changed and changed_by.

@flash1293
Copy link
Contributor Author

If we have all the changes, it would allow Kibana to "stich togehther" the different changes and show it where relevant

I can imagine this part getting complicated over time, but I agree that piping the audit log into a separate data stream seems like a good way to get started here.

@dakrone
Copy link
Member

dakrone commented May 14, 2024

The above sounds a lot like an audit log. How much of this is captured today in audit logs?

We already do have an audit log in ES, and it can be configured to emit request bodies, however, I would argue that its purpose is separate from the intended use-case for this. I think this new concept is more of a changelog and less tied to security/auditing and the permissions granted for a particular API.

Instead of having this per data stream, could we have a global data stream for it with all the changes.

To me it makes more sense to have this be global also. It could likely go into its own data stream.

the system would automatically update meta information around created, last_changed and changed_by.

We do have these for some of our configuration items (like ILM policies), and we can expand that list fairly easily. There's a little bit of a discussion around leaking username information in a changed_by field, but we could have a separate discussion about that.

@ruflin
Copy link
Member

ruflin commented May 15, 2024

We do have these for some of our configuration items (like ILM policies), and we can expand that list fairly easily. There's a little bit of a discussion around leaking username information in a changed_by field, but we could have a separate discussion about that.

I could see this as a low hanging fruit to get started as it at least would allow us to indicate the most recent change, no history.

@flash1293
Copy link
Contributor Author

I could see this as a low hanging fruit to get started as it at least would allow us to indicate the most recent change, no history.

Agreed, this is a good starting point. Together with the rollover timestamp of individual datastreams this can probably go quite far in terms of providing visibility.

@flash1293
Copy link
Contributor Author

@dakrone

We do have these for some of our configuration items (like ILM policies), and we can expand that list fairly easily. There's a little bit of a discussion around leaking username information in a changed_by field, but we could have a separate discussion about that

Would it be worth it opening a separate more implementation focused issue around that?

@dakrone
Copy link
Member

dakrone commented May 15, 2024

Would it be worth it opening a separate more implementation focused issue around that?

Yes, that would be useful, to keep this one a bit more focused.

@flash1293
Copy link
Contributor Author

Split out this first step into #108754

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

4 participants