Skip to content

OOM-kill/SIGKILL of postmaster leaves backends running, never restarts #10726

@sharnoff

Description

@sharnoff

This is a follow-up to INC-395

Steps to reproduce

Either:

  1. Trigger an OOM-kill and get lucky that postmaster was a target; or
  2. kill -9 the postmaster process (the only one named /usr/local/bin/postgres ..., instead of postgres: ...)

Expected result

Remaining postgres backends exit, and postgres is restarted.

Actual result

Some backends are left running, and postgres is never restarted.

For example, when I ran this on staging, I saw these two left around:

$ ps aux | grep postgres
...
postgres  1523  0.0  0.5 357964  5428 ?        Ssl  18:48   0:00 postgres: rag_jina_reranker_v1_tiny_en reranking background worker 
postgres  1524  0.0  0.5 357964  5428 ?        Ssl  18:48   0:00 postgres: rag_bge_small_en_v15 embeddings background worker
...

But in the first occurrence of the issue we saw, there were many idle backends also left running.

Logs, links

Metadata

Metadata

Assignees

No one assigned

    Labels

    c/computeComponent: compute, excluding postgres itselfmigrated_to_jirat/bugIssue Type: Bug

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions