Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge causes recorder to stop writing to the DB until HA is restarted #117263

Open
HertogArjan opened this issue May 11, 2024 · 9 comments
Open
Labels
database: sqlite3 integration: recorder problem in database The problem is with the database setup and cannot be fixed in Home Assistant

Comments

@HertogArjan
Copy link

HertogArjan commented May 11, 2024

The problem

Every night at around ~4:10 the histories for all entities stop. This has been happening since at least April 9th. I updated Home Assistant to 2024.4.1 on April 5th, but I can't say for sure if this issue started directly afterwards. A restart of Home Assistant allows recording again but does not restore the history missed since ~4:10. I suspect it has something to do with the Recorder auto purge at 4:12 because the same symptoms happen when the purge is run manually.

I don't think the manual or automatic purge is currently able to finish because the (SQLite) database seems way too large (>6GB) for my configured purge_keep_days of 7.

If I run recorder.purge from the web UI the same symptoms happen like during the night. By looking at the mtime it is clear home-assistant_v2.db does not get written to anymore. htop shows HA using 100% of one CPU core continously and iotop show HA reading from disk at ~400MB/s continously. This went on for at least 25 minutes before I stopped the process.

The logs show nothing unusual happening around 4:12. When I run recorder.purge from the web UI with verbose logging enabled the logs just show:

2024-05-11 15:16:28.560 INFO (MainThread) [homeassistant.helpers.script.websocket_api_script] websocket_api script: Running websocket_api script
2024-05-11 15:16:28.560 INFO (MainThread) [homeassistant.helpers.script.websocket_api_script] websocket_api script: Executing step call service

When HA is stopped using SIGTERM the shutdown takes a long time and it is clear from the logs it is waiting for a Recorder task:

2024-05-11 15:20:00.573 WARNING (MainThread) [homeassistant.core] Shutdown stage 'final write': still running: <Task pending name='Task-2684' coro=<Recorder._async_shutdown() running at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py:475> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.12/asyncio/futures.py:387, <1 more>, Task.task_wakeup()] created at /usr/lib/python3.12/asyncio/base_events.py:449> cb=[set.remove()] created at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/util/async_.py:40>

See the rest of the relevant messages during shutdown below.

What version of Home Assistant Core has the issue?

core-2024.5.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Core

Integration causing the issue

Recorder

Link to integration documentation on our website

https://www.home-assistant.io/integrations/recorder/#service-purge

Diagnostics information

No response

Example YAML snippet

recorder:
  # keep 30 days of history for all states by default
  purge_keep_days: 7
  exclude:
    domains:
      - weather
    entities:
      - sun.sun
    entity_globs:
      - 'automation.abrp_live_data_*'
      - 'timer.abrp_live_data_*'
      - 'automation.pvoutput_*'
      - 'timer.pvoutput_*'
      - 'sensor.sampled_stroomsterkte_fase_l?'
      - 'sensor.stroomsterkte_fase_l?_*_sec_gem'

Anything in the logs that might be useful for us?

2024-05-11 15:20:00.573 WARNING (MainThread) [homeassistant.core] Timed out waiting for final writes to complete, the shutdown will continue
2024-05-11 15:20:00.573 WARNING (MainThread) [homeassistant.core] Shutdown stage 'final write': still running: <Task pending name='Task-2684' coro=<Recorder._async_shutdown() running at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py:475> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.12/asyncio/futures.py:387, <1 more>, Task.task_wakeup()] created at /usr/lib/python3.12/asyncio/base_events.py:449> cb=[set.remove()] created at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/util/async_.py:40>
2024-05-11 15:20:30.580 WARNING (MainThread) [homeassistant.core] Timed out waiting for close event to be processed, the shutdown will continue
2024-05-11 15:20:30.580 WARNING (MainThread) [homeassistant.core] Shutdown stage 'close': still running: <Task pending name='Task-2684' coro=<Recorder._async_shutdown() running at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py:475> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.12/asyncio/futures.py:387, <1 more>, Task.task_wakeup()] created at /usr/lib/python3.12/asyncio/base_events.py:449> cb=[set.remove()] created at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/util/async_.py:40>
2024-05-11 15:20:30.580 WARNING (MainThread) [homeassistant.core] Shutdown stage 'close': still running: <Task pending name='Task-2714' coro=<Recorder._async_close() running at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py:467> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.12/asyncio/futures.py:387, <1 more>, Task.task_wakeup()] created at /usr/lib/python3.12/asyncio/base_events.py:449> cb=[set.remove()] created at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/util/async_.py:40>
2024-05-11 15:20:30.752 WARNING (Thread-4 (_do_shutdown)) [homeassistant.util.executor] Thread[SyncWorker_2] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
2024-05-11 15:20:30.919 WARNING (Thread-4 (_do_shutdown)) [homeassistant.util.executor] Thread[SyncWorker_4] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
2024-05-11 15:20:31.403 WARNING (Thread-4 (_do_shutdown)) [homeassistant.util.executor] Thread[SyncWorker_2] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
2024-05-11 15:20:31.887 WARNING (Thread-4 (_do_shutdown)) [homeassistant.util.executor] Thread[SyncWorker_4] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
2024-05-11 15:20:40.751 WARNING (MainThread) [homeassistant.util.executor] Thread[SyncWorker_2] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
2024-05-11 15:20:40.918 WARNING (MainThread) [homeassistant.util.executor] Thread[SyncWorker_4] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
2024-05-11 15:20:41.402 WARNING (MainThread) [homeassistant.util.executor] Thread[SyncWorker_2] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
2024-05-11 15:20:41.886 WARNING (MainThread) [homeassistant.util.executor] Thread[SyncWorker_4] is still running at shutdown: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.12/threading.py", line 1147, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):

Additional information

I thought maybe my database could be corrupted, so with HA shutdown I ran mv home-assistant_v2.db home-assistant_v2_old.db; sqlite3 home-assistant_v2_old.db ".recover" | sqlite3 home-assistant_v2.db and then tried to run a purge again. Unfortunately the problem was not resolved. My database did shrink by about 1.5 GB.

@home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of recorder can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign recorder Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


recorder documentation
recorder source
(message by IssueLinks)

@bdraco
Copy link
Member

bdraco commented May 11, 2024

It sounds like you have an integration that is filling the database with state changes so quickly that the system cannot keep up.

Try enabling debug logging for homeassistant.core and check which states are being changed very frequently

@HertogArjan
Copy link
Author

Hi, I enabled debug logging for homeassistant.core and while there are certainly a lot of state changes being reported, they do not increase notably after starting a purge when the CPU and I/O starts spiking.

I did just now install Home Assistant Container on a separate machine using the same configuration and noticed I could not reproduce the issue by running a manual purge. This could mean the issue is somehow caused by the installation environment.

After doing a manual purge on the Container version the database size has been reduced to 620 MB, nice.

When doing a purge on my original system with debug logging enabled for homeassistant.components.recorder all messages from the recorder stop abruptly after:

2024-05-11 18:29:27.435 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 1059 attributes_ids to remove
2024-05-11 18:29:27.485 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7f41add689f0> states to remove old_state_id
2024-05-11 18:29:27.521 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4052270> states
2024-05-11 18:29:27.538 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 1021 attributes_ids to remove
2024-05-11 18:29:27.601 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4052270> states to remove old_state_id
2024-05-11 18:29:27.628 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41add689f0> states
2024-05-11 18:29:27.640 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 1027 attributes_ids to remove
2024-05-11 18:29:27.689 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4775550> states to remove old_state_id
2024-05-11 18:29:27.716 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4052270> states
2024-05-11 18:29:28.065 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 25155 shared attributes to remove
2024-05-11 18:29:28.208 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b1745a20> attribute states
2024-05-11 18:29:28.301 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4f81fd0> attribute states
2024-05-11 18:29:28.359 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b2c79da0> attribute states
2024-05-11 18:29:28.414 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b1747e70> attribute states
2024-05-11 18:29:28.467 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b2c79da0> attribute states
2024-05-11 18:29:28.520 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4f81fd0> attribute states
2024-05-11 18:29:28.537 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b2c79da0> attribute states
2024-05-11 18:29:28.538 DEBUG (Recorder) [homeassistant.components.recorder.purge] After purging states and attributes_ids remaining=True
2024-05-11 18:29:28.736 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 event ids and 817 data_ids to remove

@bdraco
Copy link
Member

bdraco commented May 11, 2024

That sounds like a corrupt index

You might try running an integrity check https://www.sqlite.org/pragma.html#pragma_integrity_check on the database and reindex anything that comes up: https://www.sqlite.org/lang_reindex.html

It could also indicate a problem with your storage medium

@bdraco bdraco added database: sqlite3 problem in database The problem is with the database setup and cannot be fixed in Home Assistant labels May 11, 2024
@HertogArjan
Copy link
Author

It is pretty much certainly an issue with my installation of HA Core and not with the database or storage medium. I installed HA Container on the same machine and now the issue is gone.

Any suggestions why the issue only occurs with HA Core?

@bdraco
Copy link
Member

bdraco commented May 11, 2024

Do you have a different version of SQLite on the core install?

@HertogArjan
Copy link
Author

HertogArjan commented May 12, 2024

No because SQLite (the CLI tool) is not part of the Core dependencies. The Recorder page mentions Home Assistant uses SQLAlchemy for database access, but that module is not installed manually when installing Core. I expect HA installs this itself and also keeps it up to date. I don't know how I would compare the version numbers for this module.

@bdraco
Copy link
Member

bdraco commented May 12, 2024

You can find the version under setting, system, repairs, three dot menu, system information

@HertogArjan
Copy link
Author

Thanks @bdraco. According to the system information my Core install does indeed use a newer SQLite version than the Container install (3.45.3 > 3.44.2). It also uses a slightly newer Python version (3.12.3 > 3.12.2) for a slightly lower HA Core version (core-2024.5.2 < core-2024.5.3).

I can try and see what happens if I equalize the Core install to the Container install. Do you know how I would downgrade the SQLite version?

If you're interested I attach the system information outputs for the Core and Container installs:
sys_info_core.txt
sys_info_container.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database: sqlite3 integration: recorder problem in database The problem is with the database setup and cannot be fixed in Home Assistant
Projects
None yet
Development

No branches or pull requests

2 participants