Skip to content

Conversation

@Davda-James
Copy link
Contributor

@Davda-James Davda-James commented Oct 5, 2025

Proposed change

  • Queue-based Architecture: Database operations are enqueued and processed by a separate writer thread
  • Persistent Storage: Uses litequeue for disk-backed job persistence across process restarts
  • Fallback Support: Falls back to in-memory queues and synchronous writes when litequeue is unavailable
  • WAL Mode: Enabled SQLite WAL mode for improved concurrent access
  • CLI Worker: Added standalone database worker process for production deployments

Closes issue: #609

Type of change

Feat + Fix

  • New or existing module/payload change
  • Dependency upgrade
  • Written test for new change

Files Changed

writer.py
db.py
db_worker.py
pyproject.toml
test_writer.py

Note

pyproject.toml has been updated with litequeue (^0.9), hence poetry.lock

Checklist

  • I've followed the [contributing guidelines][contributing-guidelines]
  • I've run make pre-commit, it didn't generate any changes
  • I've run make test, all tests passed locally

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 5, 2025

Summary by CodeRabbit

  • New Features

    • Added a background database writer for faster, non-blocking data persistence with batching and graceful shutdown.
    • Introduced a CLI command to drain or monitor the writer queue, with options for one-time drain, batch size, interval, max items, and summary output.
  • Tests

    • Added end-to-end tests verifying queued writes are persisted correctly.
  • Chores

    • Registered the new CLI entry point in the package scripts.
    • Added a new dependency to support the queue backend.

Walkthrough

Adds a DBWriter batching subsystem and CLI worker, integrates writer enqueueing into three DB submit functions with direct-write fallback, registers a CLI script and litequeue dependency, and adds end-to-end tests validating persisted report and host-log writes.

Changes

Cohort / File(s) Summary of changes
Writer subsystem & CLI
nettacker/database/writer.py, nettacker/cli/db_worker.py
New DBWriter class and helpers (get_writer, get_writer_configured, get_stats) implementing batched, background writes using litequeue or multiprocessing Queue; enqueue API, start/stop, drain_once, stats, and a CLI (db_worker) with args --once, --batch-size, --interval, --max-items, --summary and signal handling.
DB integration
nettacker/database/db.py
submit_report_to_db, submit_logs_to_db, and submit_temp_logs_to_db now try to enqueue jobs to the writer and, on enqueue failure/exception, fall back to the existing direct SQLAlchemy write path.
Project config
pyproject.toml
Adds nettacker-db-worker script entry pointing to nettacker.cli.db_worker:run and adds dependency litequeue = "^0.9".
Tests
tests/database/test_writer.py
New end-to-end test that configures a DBWriter against a temporary SQLite DB, enqueues insert_report and insert_hostslog jobs, drains the writer, and asserts persisted rows in reports and scan_events tables.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the primary change by stating that database lock issues are addressed using litequeue, which directly reflects the new queue-based writer architecture introduced in this PR.
Description Check ✅ Passed The description thoroughly outlines the new queue-based architecture, litequeue integration, fallback strategy, WAL mode, CLI worker, and the linked issue, all of which align with the actual changes in the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad76ce5 and dc1ea7b.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • nettacker/cli/db_worker.py (1 hunks)
  • nettacker/database/db.py (4 hunks)
  • nettacker/database/writer.py (1 hunks)
  • pyproject.toml (2 hunks)
  • tests/database/test_writer.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indents in Python code
Limit lines to 99 characters (ruff/ruff-format/isort profile=black)
Module and file names should use lower_snake_case
Function and variable names should use lower_snake_case
Class names should use PascalCase
Constants should use UPPER_SNAKE_CASE
Keep functions small and add type hints where practical

Files:

  • nettacker/database/writer.py
  • nettacker/database/db.py
  • tests/database/test_writer.py
  • nettacker/cli/db_worker.py
nettacker/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Add docstrings for public APIs in the nettacker package

Files:

  • nettacker/database/writer.py
  • nettacker/database/db.py
  • nettacker/cli/db_worker.py
tests/**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/test_*.py: Place tests under tests/ and name files as test_*.py
Use pytest (with pytest-asyncio, xdist) for tests

Files:

  • tests/database/test_writer.py
tests/**

📄 CodeRabbit inference engine (AGENTS.md)

Mirror package layout in tests (e.g., tests/core/, tests/lib/, etc.)

Files:

  • tests/database/test_writer.py
🧠 Learnings (1)
📚 Learning: 2025-09-07T19:20:58.332Z
Learnt from: CR
PR: OWASP/Nettacker#0
File: AGENTS.md:0-0
Timestamp: 2025-09-07T19:20:58.332Z
Learning: Applies to nettacker.py : Repository provides nettacker.py as an entry script (use as Python entry point)

Applied to files:

  • pyproject.toml
🧬 Code graph analysis (4)
nettacker/database/writer.py (2)
nettacker/config.py (2)
  • Config (183-187)
  • as_dict (26-27)
nettacker/database/models.py (3)
  • Report (7-27)
  • HostsLog (67-100)
  • TempEvents (30-64)
nettacker/database/db.py (1)
nettacker/database/writer.py (2)
  • get_writer (288-296)
  • enqueue (105-114)
tests/database/test_writer.py (1)
nettacker/database/writer.py (3)
  • DBWriter (17-281)
  • enqueue (105-114)
  • drain_once (209-239)
nettacker/cli/db_worker.py (1)
nettacker/database/writer.py (5)
  • get_writer (288-296)
  • stop (100-103)
  • get_writer_configured (299-315)
  • get_stats (318-336)
  • drain_once (209-239)
🪛 Ruff (0.13.3)
nettacker/database/writer.py

45-45: Do not catch blind exception: Exception

(BLE001)


67-67: Do not catch blind exception: Exception

(BLE001)


88-89: try-except-pass detected, consider logging the exception

(S110)


88-88: Do not catch blind exception: Exception

(BLE001)


111-111: Consider moving this statement to an else block

(TRY300)


112-112: Do not catch blind exception: Exception

(BLE001)


129-129: Do not catch blind exception: Exception

(BLE001)


149-150: try-except-pass detected, consider logging the exception

(S110)


149-149: Do not catch blind exception: Exception

(BLE001)


153-153: Do not catch blind exception: Exception

(BLE001)


158-158: Do not catch blind exception: Exception

(BLE001)


178-178: Do not catch blind exception: Exception

(BLE001)


183-183: Do not catch blind exception: Exception

(BLE001)


188-188: Do not catch blind exception: Exception

(BLE001)


200-200: Do not catch blind exception: Exception

(BLE001)


202-203: try-except-pass detected, consider logging the exception

(S110)


202-202: Do not catch blind exception: Exception

(BLE001)


206-207: try-except-pass detected, consider logging the exception

(S110)


206-206: Do not catch blind exception: Exception

(BLE001)


225-225: Do not catch blind exception: Exception

(BLE001)


231-231: Do not catch blind exception: Exception

(BLE001)


236-237: try-except-pass detected, consider logging the exception

(S110)


236-236: Do not catch blind exception: Exception

(BLE001)


294-295: try-except-pass detected, consider logging the exception

(S110)


294-294: Do not catch blind exception: Exception

(BLE001)


308-309: try-except-pass detected, consider logging the exception

(S110)


308-308: Do not catch blind exception: Exception

(BLE001)


313-314: try-except-pass detected, consider logging the exception

(S110)


313-313: Do not catch blind exception: Exception

(BLE001)


329-329: Do not catch blind exception: Exception

(BLE001)


334-334: Do not catch blind exception: Exception

(BLE001)

nettacker/database/db.py

112-113: try-except-pass detected, consider logging the exception

(S110)


112-112: Do not catch blind exception: Exception

(BLE001)


165-166: try-except-pass detected, consider logging the exception

(S110)


165-165: Do not catch blind exception: Exception

(BLE001)


202-203: try-except-pass detected, consider logging the exception

(S110)


202-202: Do not catch blind exception: Exception

(BLE001)

tests/database/test_writer.py

16-16: subprocess call: check for execution of untrusted input

(S603)

nettacker/cli/db_worker.py

8-8: Unused function argument: signum

(ARG001)


8-8: Unused function argument: frame

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run tests
  • GitHub Check: CodeQL (python)
  • GitHub Check: CodeQL (javascript)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
nettacker/database/writer.py (2)

84-89: Prefer SQLAlchemy's text() construct for raw SQL.

The PRAGMA statement uses a raw string. While acceptable here, using text() is more explicit and aligns with SQLAlchemy best practices.

Apply this diff:

+from sqlalchemy import text
 
 if engine_url.startswith("sqlite"):
     try:
         with self.engine.connect() as conn:
-            conn.execute("PRAGMA journal_mode=WAL")
+            conn.execute(text("PRAGMA journal_mode=WAL"))
     except Exception:
         pass

324-341: Consider thread-safety for runtime configuration changes.

Mutating batch_size and interval on a running writer (lines 332, 337) is not thread-safe if called concurrently with the writer thread reading these values. In practice this may be acceptable if configuration only happens at startup, but be aware of the potential race.

If concurrent configuration changes are expected, protect these properties with a lock or use atomic operations.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc1ea7b and 702c9e9.

📒 Files selected for processing (2)
  • nettacker/cli/db_worker.py (1 hunks)
  • nettacker/database/writer.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indents in Python code
Limit lines to 99 characters (ruff/ruff-format/isort profile=black)
Module and file names should use lower_snake_case
Function and variable names should use lower_snake_case
Class names should use PascalCase
Constants should use UPPER_SNAKE_CASE
Keep functions small and add type hints where practical

Files:

  • nettacker/cli/db_worker.py
  • nettacker/database/writer.py
nettacker/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Add docstrings for public APIs in the nettacker package

Files:

  • nettacker/cli/db_worker.py
  • nettacker/database/writer.py
🧬 Code graph analysis (2)
nettacker/cli/db_worker.py (1)
nettacker/database/writer.py (5)
  • get_writer (313-321)
  • stop (100-103)
  • get_writer_configured (324-340)
  • get_stats (343-361)
  • drain_once (226-264)
nettacker/database/writer.py (2)
nettacker/config.py (2)
  • Config (183-187)
  • as_dict (26-27)
nettacker/database/models.py (3)
  • Report (7-27)
  • HostsLog (67-100)
  • TempEvents (30-64)
🪛 Ruff (0.13.3)
nettacker/cli/db_worker.py

8-8: Unused function argument: signum

(ARG001)


8-8: Unused function argument: frame

(ARG001)

nettacker/database/writer.py

45-45: Do not catch blind exception: Exception

(BLE001)


67-67: Do not catch blind exception: Exception

(BLE001)


88-89: try-except-pass detected, consider logging the exception

(S110)


88-88: Do not catch blind exception: Exception

(BLE001)


111-111: Consider moving this statement to an else block

(TRY300)


112-112: Do not catch blind exception: Exception

(BLE001)


122-123: try-except-pass detected, consider logging the exception

(S110)


122-122: Do not catch blind exception: Exception

(BLE001)


155-155: Do not catch blind exception: Exception

(BLE001)


160-160: Consider moving this statement to an else block

(TRY300)


162-162: Do not catch blind exception: Exception

(BLE001)


188-188: Do not catch blind exception: Exception

(BLE001)


197-197: Do not catch blind exception: Exception

(BLE001)


202-202: Do not catch blind exception: Exception

(BLE001)


217-217: Do not catch blind exception: Exception

(BLE001)


219-220: try-except-pass detected, consider logging the exception

(S110)


219-219: Do not catch blind exception: Exception

(BLE001)


223-224: try-except-pass detected, consider logging the exception

(S110)


223-223: Do not catch blind exception: Exception

(BLE001)


246-246: Do not catch blind exception: Exception

(BLE001)


256-256: Do not catch blind exception: Exception

(BLE001)


261-262: try-except-pass detected, consider logging the exception

(S110)


261-261: Do not catch blind exception: Exception

(BLE001)


319-320: try-except-pass detected, consider logging the exception

(S110)


319-319: Do not catch blind exception: Exception

(BLE001)


333-334: try-except-pass detected, consider logging the exception

(S110)


333-333: Do not catch blind exception: Exception

(BLE001)


338-339: try-except-pass detected, consider logging the exception

(S110)


338-338: Do not catch blind exception: Exception

(BLE001)


354-354: Do not catch blind exception: Exception

(BLE001)


359-359: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build package
  • GitHub Check: Test Docker 26.1.4-1ubuntu.24.04noble image build
  • GitHub Check: Test Docker image
  • GitHub Check: Test Docker 26.0.0-1ubuntu.24.04noble image build
  • GitHub Check: Test Docker 27.5.0-1ubuntu.24.04noble image build
🔇 Additional comments (9)
nettacker/cli/db_worker.py (3)

8-11: Signal handler fix confirmed.

The previous issue about the process not exiting after SIGINT/SIGTERM has been resolved. The raise SystemExit(0) on Line 11 ensures the process terminates cleanly after stopping the writer.

Note: The static analysis warnings about unused signum and frame arguments are false positives—these parameters are required by the signal.signal() API signature.


14-39: LGTM!

The argument parsing, signal registration, and configuration logic are well-structured:

  • CLI options appropriately support both drain-once and continuous modes.
  • Runtime configuration via get_writer_configured allows batch size and interval tuning.
  • Summary output provides useful stats when requested.

41-47: LGTM!

The main loop correctly relies on signal handlers for termination. The infinite sleep loop is appropriate since _handle_sig will call writer.stop() and raise SystemExit, cleanly terminating the process.

nettacker/database/writer.py (6)

93-104: LGTM!

The lifecycle methods correctly manage the writer thread:

  • start() prevents duplicate thread creation.
  • stop() signals shutdown and waits with a timeout.
  • Daemon thread ensures graceful process termination.

105-124: LGTM!

The enqueue and acknowledgment methods correctly implement the deferred acknowledgment pattern:

  • Messages are serialized to JSON for litequeue.
  • _acknowledge_message() is only invoked after successful commit (addressed in processing loops).
  • Fallback logging on enqueue failure is appropriate.

125-164: LGTM!

The _pop_one() method correctly implements non-destructive pop semantics:

  • Messages are retrieved but not acknowledged immediately.
  • The message_id is returned alongside the payload for deferred acknowledgment after commit.
  • Handles various litequeue API patterns gracefully.

205-224: LGTM!

The shutdown cleanup loop correctly processes jobs one-at-a-time with immediate commit-then-acknowledge, avoiding the batch rollback issue present in the main loop. Each job's transaction is isolated, so rollback on failure only affects that single job.


266-307: LGTM!

The _apply_job() method correctly maps action types to ORM models:

  • Properly constructs Report, HostsLog, and TempEvents instances.
  • Uses json.dumps() for structured fields.
  • Logs unsupported actions.

343-361: LGTM!

The get_stats() function correctly queries queue size across both litequeue and standard queue backends, providing useful runtime observability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
nettacker/database/writer.py (4)

84-89: Log WAL setup failures for debugging.

The silent exception handling on WAL mode setup may hide SQLite configuration issues. Consider logging the exception to aid troubleshooting.

Apply this diff:

         if engine_url.startswith("sqlite"):
             try:
                 with self.engine.connect() as conn:
                     conn.execute("PRAGMA journal_mode=WAL")
-            except Exception:
-                pass
+            except Exception as e:
+                log.debug(f"Failed to enable WAL mode: {e}")

192-194: Use log.exception for better error context.

Replace log.error with log.exception to automatically include the stack trace, which aids debugging without requiring explicit traceback formatting.

Apply this diff:

                         except Exception as e:
                             job_session.rollback()
-                            log.error(f"Failed to process job: {e}")
+                            log.exception("Failed to process job")
                             # Job is not acknowledged, so it can be retried

260-262: Use log.exception for better error context.

Replace log.error with log.exception in both exception handlers to automatically include stack traces.

Apply this diff:

                 except Exception as e:
                     job_session.rollback()
-                    log.error(f"Failed to process job during drain: {e}")
+                    log.exception("Failed to process job during drain")
                     # Job is not acknowledged, so it can be retried
                 finally:
                     job_session.close()
 
                 iterations += 1
         except Exception as e:
-            log.error(f"Error during drain operation: {e}")
+            log.exception("Error during drain operation")

Also applies to: 268-269


331-347: Consider thread safety for runtime configuration updates.

The direct attribute assignments to batch_size and interval (lines 339, 344) are read by the background thread without synchronization. While Python's GIL provides basic atomicity for simple assignments, there's no guarantee of immediate visibility or ordering. For production use, consider whether runtime reconfiguration is necessary or if settings should be fixed at initialization.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 702c9e9 and 8a2c1d1.

📒 Files selected for processing (1)
  • nettacker/database/writer.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indents in Python code
Limit lines to 99 characters (ruff/ruff-format/isort profile=black)
Module and file names should use lower_snake_case
Function and variable names should use lower_snake_case
Class names should use PascalCase
Constants should use UPPER_SNAKE_CASE
Keep functions small and add type hints where practical

Files:

  • nettacker/database/writer.py
nettacker/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Add docstrings for public APIs in the nettacker package

Files:

  • nettacker/database/writer.py
🧬 Code graph analysis (1)
nettacker/database/writer.py (2)
nettacker/config.py (2)
  • Config (183-187)
  • as_dict (26-27)
nettacker/database/models.py (3)
  • Report (7-27)
  • HostsLog (67-100)
  • TempEvents (30-64)
🪛 Ruff (0.13.3)
nettacker/database/writer.py

45-45: Do not catch blind exception: Exception

(BLE001)


67-67: Do not catch blind exception: Exception

(BLE001)


88-89: try-except-pass detected, consider logging the exception

(S110)


88-88: Do not catch blind exception: Exception

(BLE001)


111-111: Consider moving this statement to an else block

(TRY300)


112-112: Do not catch blind exception: Exception

(BLE001)


122-123: try-except-pass detected, consider logging the exception

(S110)


122-122: Do not catch blind exception: Exception

(BLE001)


155-155: Do not catch blind exception: Exception

(BLE001)


160-160: Consider moving this statement to an else block

(TRY300)


162-162: Do not catch blind exception: Exception

(BLE001)


192-192: Do not catch blind exception: Exception

(BLE001)


194-194: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


202-202: Do not catch blind exception: Exception

(BLE001)


224-224: Do not catch blind exception: Exception

(BLE001)


226-226: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


229-230: try-except-pass detected, consider logging the exception

(S110)


229-229: Do not catch blind exception: Exception

(BLE001)


260-260: Do not catch blind exception: Exception

(BLE001)


262-262: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


268-268: Do not catch blind exception: Exception

(BLE001)


269-269: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


326-327: try-except-pass detected, consider logging the exception

(S110)


326-326: Do not catch blind exception: Exception

(BLE001)


340-341: try-except-pass detected, consider logging the exception

(S110)


340-340: Do not catch blind exception: Exception

(BLE001)


345-346: try-except-pass detected, consider logging the exception

(S110)


345-345: Do not catch blind exception: Exception

(BLE001)


361-361: Do not catch blind exception: Exception

(BLE001)


366-366: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test Docker 26.0.0-1ubuntu.24.04noble image build
  • GitHub Check: Test Docker 26.1.4-1ubuntu.24.04noble image build
  • GitHub Check: Test Docker 27.5.0-1ubuntu.24.04noble image build
  • GitHub Check: Test Docker image
🔇 Additional comments (3)
nettacker/database/writer.py (3)

125-163: Well done: Deferred acknowledgment implemented correctly.

This method properly returns both the job data and message_id, enabling acknowledgment only after successful processing. This addresses the critical issue from the previous review about acknowledging messages before writes succeed.


175-199: Excellent: Critical session rollback issue resolved.

The code now processes each job individually with immediate commits, ensuring that a failure in one job does not roll back previously successful jobs. Message acknowledgment is correctly deferred until after the commit succeeds. This fully addresses the critical issue from the previous review.


246-265: Excellent: Individual job processing in drain_once.

The drain method correctly processes each job individually with immediate commits, matching the pattern in _run(). This ensures durability and proper message acknowledgment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in accordance to the main architecture. You're defining a new argument parsing module which is not necessary imo. Please look into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will look into that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using pytest for qa automation

# Verify the jobs were written to the database
import sqlite3

conn = sqlite3.connect(db_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really don't want to make actual live connections during testing. I think you should mock the database call if you're doing unit testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants