-
-
Notifications
You must be signed in to change notification settings - Fork 207
Sync www-repopsitories #2164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Sync www-repopsitories #2164
Conversation
Summary by CodeRabbit
WalkthroughAdds repository Markdown content extraction, new Django management commands to update repository chunks and context using that extractor, corresponding Makefile targets, and extensive tests. Also updates Chunk model with uniqueness constraints, logging, and duplicate-handling in bulk save, plus a spelling dictionary entry. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
docker-compose/local.yaml (1)
25-25
: Bug: backend service sets DJANGO_REDIS_PASSWORD from DJANGO_REDIS_HOST.This will set the password to the host value when DJANGO_REDIS_HOST is defined.
- DJANGO_REDIS_PASSWORD: ${DJANGO_REDIS_HOST:-nest-cache-password} + DJANGO_REDIS_PASSWORD: ${DJANGO_REDIS_PASSWORD:-nest-cache-password}backend/apps/ai/models/chunk.py (1)
85-93
: Handle race with unique constraint in update_data().Concurrent writers can still insert the same (context, text) between the exists() check and save(), raising IntegrityError. Catch and return None to match the method contract.
- if save: - chunk.save() + if save: + from django.db import IntegrityError + try: + chunk.save() + except IntegrityError: + return Nonebackend/apps/slack/events/__init__.py (1)
1-20
: Make event configuration idempotent and skip heavy imports when Slack app is disabledPrevents duplicate registrations and avoids importing event modules if tokens are missing.
+_EVENTS_CONFIGURED = False + def configure_slack_events(): """Configure Slack events after Django apps are ready.""" - from apps.slack.apps import SlackConfig - from apps.slack.events import ( - app_home_opened, - message_posted, - team_join, - url_verification, - ) - from apps.slack.events.event import EventBase - from apps.slack.events.member_joined_channel import ( - catch_all, - contribute, - gsoc, - project_nest, - ) - - if SlackConfig.app: - EventBase.configure_events() + from apps.slack.apps import SlackConfig + global _EVENTS_CONFIGURED + if not SlackConfig.app or _EVENTS_CONFIGURED: + return + from apps.slack.events import ( + app_home_opened, + message_posted, + team_join, + url_verification, + ) + from apps.slack.events.event import EventBase + from apps.slack.events.member_joined_channel import ( + catch_all, + contribute, + gsoc, + project_nest, + ) + EventBase.configure_events() + _EVENTS_CONFIGURED = Truebackend/tests/apps/ai/common/extractors/repository_test.py (1)
434-455
: Log message lacks repository context expected by testsTests assert the debug log contains
test-repo
. Update extractor logging to include repo key/owner/path as shown above.
🧹 Nitpick comments (48)
backend/test_commands.py (1)
1-1
: Avoid stray test_ module outside tests/*Pytest will collect this empty module, adding noise. Either remove it or rename/move under a non-collected path (e.g.,
backend/commands_testutils.py
) or intotests/
with actual tests.backend/settings/base.py (1)
145-152
: Harden RQ config: prefer URL form and make timeout configurable.Using a single Redis URL avoids partial env mismatches and mirrors your cache config; exposing timeout via env helps ops tuning.
Apply within this block:
- "default": { - "HOST": REDIS_HOST, - "PORT": 6379, - "PASSWORD": REDIS_PASSWORD, - "DEFAULT_TIMEOUT": 360, - } + "default": { + # Use URL form for parity with CACHES and simpler overrides. + "URL": f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:6379/0", + # Make tunable via env; 360s default preserved. + "DEFAULT_TIMEOUT": values.IntegerValue( + environ_name="RQ_DEFAULT_TIMEOUT", + default=360, + ), + }Optionally consider adding:
- RQ exception handlers that report to Sentry.
- Separate queues (e.g., "ai", "slack") with different timeouts.
backend/tests/apps/ai/common/utils_test.py (1)
108-116
: Add a minimal assertion to lock invocation count.Verifies we don’t loop on failures.
result = create_chunks_and_embeddings( chunk_texts=["some text"], context=None, openai_client=mock_openai_client, ) mock_logger.exception.assert_called_once_with("Failed to create chunks and embeddings") + mock_update_data.assert_called_once() assert result == []
backend/apps/slack/constants.py (1)
25-80
: Use frozenset for OWASP_KEYWORDS and prune overly generic entries
Convert OWASP_KEYWORDS to a frozenset to prevent accidental mutation; remove broad terms (e.g. “security”, “project”, “event”) to cut down on false positives—matching is already case-insensitive via text.lower(). Evaluate the false-positive rate on a representative Slack log before enabling auto-replies.backend/apps/ai/models/chunk.py (1)
5-5
: Also import IntegrityError.If you adopt the changes above, add IntegrityError to imports.
-from django.db import DatabaseError, models +from django.db import DatabaseError, IntegrityError, modelsbackend/tests/apps/ai/models/chunk_test.py (4)
31-42
: Assert list-clearing contract after bulk_save.If bulk_save is expected to clear the input list (as BulkSaveModel does), assert it to prevent regressions.
with patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk_save: Chunk.bulk_save(mock_chunks) mock_bulk_save.assert_called_once_with(Chunk, mock_chunks, fields=None) + assert mock_chunks == []
43-55
: Mirror the clearing assertion for the fields path.Keep expectations consistent across both code paths.
with patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk_save: Chunk.bulk_save(mock_chunks, fields=fields) mock_bulk_save.assert_called_once_with(Chunk, mock_chunks, fields=fields) + assert mock_chunks == []
65-86
: Remove redundant @patch on Chunk.save.You replace Chunk with a Mock later; the decorator patch becomes moot and can confuse readers.
-@patch("apps.ai.models.chunk.Chunk.save") -def test_update_data_save_with_context(self, mock_save): +def test_update_data_save_with_context(self):
31-55
: Add a test ensuring updates aren’t filtered out.Guard against regressions where chunks with ids (updates) get dropped by existence filtering.
Proposed new test:
def test_bulk_save_includes_updates(): c1 = Mock(spec=Chunk); c1.id = 42; c1.context = Mock(); c1.text = "t1" c2 = Mock(spec=Chunk); c2.id = None; c2.context = Mock(); c2.text = "t2" with patch("apps.ai.models.chunk.Chunk.objects.filter") as mock_filter, \ patch("apps.common.models.BulkSaveModel.bulk_save") as mock_bulk: mock_filter.return_value.exists.return_value = False chunks = [c1, c2] Chunk.bulk_save(chunks) # c1 (update) must still be passed through mock_bulk.assert_called_once() passed = mock_bulk.call_args.args[1] assert c1 in passed and c2 in passedWant me to push this into the suite?
backend/apps/ai/common/extractors/repository.py (5)
130-146
: Broaden markdown coverage (common variants).Consider additional common filenames/paths to increase hit rate on OWASP www repos (case and format variants).
markdown_files = [ "README.md", + "README.rst", + "README.adoc", + "readme.md", + "README", "CONTRIBUTING.md", "CODE_OF_CONDUCT.md", "SECURITY.md", "CHANGELOG.md", - "LICENSE.md", + "LICENSE.md", + "LICENSE", + "docs/README.md", "docs/README.md", "docs/index.md", "docs/overview.md", "docs/getting-started.md", "docs/installation.md", "docs/usage.md", "docs/api.md", "docs/development.md", "docs/deployment.md", + "index.md", + "content/index.md", + "content/README.md", + "site/README.md", ]
156-160
: Branch fallback for site repos.If default_branch is missing, try gh-pages then master before main to better handle www repos.
- raw_url = ( - f"https://raw.githubusercontent.com/{owner}/{repository.key}/" - f"{repository.default_branch or 'main'}/{file_path}" - ) + branch = ( + repository.default_branch + or ("gh-pages" if getattr(repository, "has_pages", False) else None) + or "master" + or "main" + ) + raw_url = f"https://raw.githubusercontent.com/{owner}/{repository.key}/{branch}/{file_path}"
167-169
: Improve logging with context.Log which file/owner/repo failed and keep traceback; current message loses diagnostics.
- except (ValueError, TypeError, OSError, ConnectionError): - logger.debug("Failed to fetch") + except (ValueError, TypeError, OSError, ConnectionError): + logger.debug( + "Failed to fetch markdown file", + extra={"file_path": file_path, "owner": owner, "repo": getattr(repository, "key", None)}, + exc_info=True, + ) continue
148-166
: Surface fetched files in metadata (and drop unused variable otherwise).Either remove fetched_files or expose it so downstream context can reference what was pulled.
- fetched_files = [] + fetched_files = [] @@ - if content and content.strip(): - fetched_files.append(file_path) + if content and content.strip(): + fetched_files.append(file_path) prose_parts.append(f"## {file_path}\n\n{content}") @@ return ( - DELIMITER.join(filter(None, prose_parts)), - DELIMITER.join(filter(None, metadata_parts)), + DELIMITER.join(filter(None, prose_parts)), + DELIMITER.join( + filter( + None, + metadata_parts + + ( + [f"Fetched Markdown Files: {', '.join(fetched_files)}"] + if fetched_files + else [] + ), + ) + ), )Also applies to: 171-174
148-166
: Consider rate limits and 404 noise from raw URLs.Fetching many files per repo can hit GitHub rate limits and raw 404s may return HTML. Prefer GitHub API for contents (with If-None-Match) or add status checks in get_repository_file_content to return "" on non-200.
I can provide an API-backed extractor variant and harden get_repository_file_content to ignore non-200 responses. Want a follow-up PR?
backend/apps/slack/admin/conversation.py (1)
30-31
: Expose the new flag in filters for quicker triageAdding the boolean to list_filter eases admin workflows without altering data model.
Apply outside this hunk:
@@ list_filter = ( "sync_messages", "is_archived", "is_channel", "is_general", "is_im", "is_private", + "is_nest_bot_assistant_enabled", )
backend/apps/slack/commands/ai.py (2)
21-23
: Avoid KeyError when Slack omits text; use a safe default.
Use.get("text") or ""
to guard against missing/None and prevent an AttributeError on.strip()
.- return get_blocks( - query=command["text"].strip(), - ) + return get_blocks( + query=(command.get("text") or "").strip(), + )
19-20
: Document the local import intent.
Add a short comment so future readers know this is deliberate for lazy-loading/cycle avoidance.- from apps.slack.common.handlers.ai import get_blocks + # Local import to avoid circular import and reduce import-time overhead. + from apps.slack.common.handlers.ai import get_blocksbackend/apps/ai/Makefile (1)
37-44
: Declare PHONY targets for the new commands.
Prevents make from treating similarly named files as up-to-date targets.ai-update-repository-context: @echo "Updating repository context" @CMD="python manage.py ai_update_repository_context" $(MAKE) exec-backend-command +.PHONY: ai-update-repository-chunks ai-update-repository-context
backend/apps/ai/management/commands/ai_update_repository_context.py (1)
18-28
: Eager-load FKs used by the extractor.
extract_repository_markdown_content
accessesorganization.login
andowner.login
; addselect_related
to avoid N+1 queries during batch runs.return ( super() .get_base_queryset() .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) - ) + ).select_related("organization", "owner")backend/apps/slack/services/message_auto_reply.py (1)
33-35
: Race window note (optional).
Small race exists between the replies check and posting; acceptable, but consider an idempotency key or re-check on 409s if this becomes noisy.backend/apps/slack/common/handlers/ai.py (4)
16-19
: Fix docstring: remove undocumented parameter.
Thepresentation
argument isn’t in the signature.- presentation (EntityPresentation | None): Configuration for entity presentation.
26-28
: Guard against oversized Slack block text.
Slack mrkdwn text is limited (~3000 chars per section). Chunk long responses into multiple blocks.- if ai_response: - return [markdown(ai_response)] + if ai_response: + max_len = 2900 + if len(ai_response) <= max_len: + return [markdown(ai_response)] + return [markdown(ai_response[i : i + max_len]) for i in range(0, len(ai_response), max_len)]
31-46
: Make models configurable via settings.
Avoid hard-coding model IDs; read from Django settings with sane defaults.+ from django.conf import settings rag_tool = RagTool( - chat_model="gpt-4o", - embedding_model="text-embedding-3-small", + chat_model=getattr(settings, "AI_CHAT_MODEL", "gpt-4o"), + embedding_model=getattr(settings, "AI_EMBEDDING_MODEL", "text-embedding-3-small"), )
10-10
: Unused logger.
Either uselogger
for diagnostics or remove it.backend/apps/slack/MANIFEST.yaml (2)
111-124
: Tighten bot scopes (least privilege).Line 123 adds channels:history, which is needed for message.channels. Line 122 (channels:manage) appears unnecessary for this feature set—drop it if unused.
- users:read - - groups:write - - channels:manage + - groups:write + # Removed channels:manage (not required for read/AI reply workflows) - channels:history
131-134
: Validate event coverage vs. intended surfaces.message.channels covers public channels only. If replies are desired in private channels/DMs, add message.groups and message.im and corresponding groups:history/im:history scopes; otherwise consider app_mention to reduce noise.
backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py (1)
11-17
: Field OK; consider index and default rollout.
- If this flag is frequently filtered, add db_index=True or a model Index.
- Default False will disable the assistant for all existing conversations. Confirm that’s intended or add a data migration to enable for selected workspaces/channels.
- field=models.BooleanField(default=False, verbose_name="Is Nest Bot Assistant Enabled"), + field=models.BooleanField( + default=False, + verbose_name="Is Nest Bot Assistant Enabled", + db_index=True, + ),backend/apps/ai/management/commands/ai_update_repository_chunks.py (1)
18-28
: Avoid N+1 on owner/org during extraction.The extractor touches repository.organization/owner; select_related will prevent per-row FK hits.
def get_base_queryset(self) -> QuerySet: """Return the base queryset with filtering for OWASP site repositories.""" return ( super() .get_base_queryset() + .select_related("organization", "owner") .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) )
backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (3)
45-47
: Fix pluralization to “repositories”.Current expectation hardcodes a misspelling (“repositorys”) and bakes it into UX.
- assert command.entity_name_plural == "repositorys" + assert command.entity_name_plural == "repositories"Outside this file, consider deriving from model meta to avoid manual pluralization:
# In BaseAICommand (or similar) @property def entity_name_plural(self) -> str: return getattr(self.model_class._meta, "verbose_name_plural", f"{self.entity_name}s")
67-82
: Patch target is brittle.Patching command.class.bases[0].get_base_queryset relies on MRO order. Patch the concrete base symbol instead.
- with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super: + from apps.ai.common.base.chunk_command import BaseChunkCommand + with patch.object(BaseChunkCommand, "get_base_queryset") as mock_super:
104-119
: Remove duplicated test logic.This repeats test_get_base_queryset. Drop to reduce noise.
- def test_queryset_filtering_logic(self, command): - """Test that the queryset filtering logic works correctly.""" - with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super: - mock_queryset = Mock() - mock_super.return_value = mock_queryset - mock_queryset.filter.return_value = "filtered_queryset" - - result = command.get_base_queryset() - - mock_queryset.filter.assert_called_once_with( - is_owasp_site_repository=True, - is_archived=False, - is_empty=False, - ) - assert result == "filtered_queryset" + # Removed: duplicate of test_get_base_querysetbackend/tests/apps/slack/common/handlers/ai_test.py (2)
75-83
: Avoid hard-coding model IDs; read from settings.Make chat/embedding model names configurable; update test to assert against settings-derived values.
- mock_rag_tool.assert_called_once_with( - chat_model="gpt-4o", - embedding_model="text-embedding-3-small", - ) + from django.conf import settings + mock_rag_tool.assert_called_once_with( + chat_model=getattr(settings, "SLACK_AI_CHAT_MODEL", "gpt-4o"), + embedding_model=getattr(settings, "SLACK_AI_EMBEDDING_MODEL", "text-embedding-3-small"), + )Outside this file (handler):
from django.conf import settings rag_tool = RagTool( chat_model=getattr(settings, "SLACK_AI_CHAT_MODEL", "gpt-4o"), embedding_model=getattr(settings, "SLACK_AI_EMBEDDING_MODEL", "text-embedding-3-small"), )
132-143
: Add a guard for blank-only queries.Avoid calling the RAG pipeline when the stripped query is empty; return error blocks directly and add a test.
+ def test_get_blocks_with_blank_query(self): + with patch("apps.slack.common.handlers.ai.get_error_blocks") as mock_err: + mock_err.return_value = [{"type": "section"}] + with patch("apps.slack.common.handlers.ai.process_ai_query") as mock_proc: + result = get_blocks(" ") + mock_proc.assert_not_called() + assert result == [{"type": "section"}]backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (2)
67-83
: Replace brittle base-class patching with import-path patching.Using bases[0] is fragile. Patch the known symbol instead.
- with patch.object(command.__class__.__bases__[0], "get_base_queryset") as mock_super: + with patch("apps.ai.common.base.context_command.BaseContextCommand.get_base_queryset") as mock_super: mock_queryset = Mock() mock_super.return_value = mock_queryset mock_queryset.filter.return_value = "filtered_queryset"Repeat the same replacement in the other tests that patch the base queryset.
Also applies to: 104-119, 127-141, 142-156, 157-171
127-171
: DRY up the three queryset-filter tests.They assert the same filter arguments. Parametrize into one test to reduce noise and speed up the suite.
If helpful, I can provide a parametrized version.
backend/tests/apps/slack/services/message_auto_reply_test.py (3)
48-96
: Also assert the retrieval call contract.Validate we call Message.objects.get with pk for added confidence.
generate_ai_reply_if_unanswered(mock_message.id) + mock_message_get.assert_called_once_with(pk=mock_message.id) mock_client.conversations_replies.assert_called_once_with( channel=mock_message.conversation.slack_channel_id, ts=mock_message.slack_message_id, limit=1, )
97-106
: Add assertion for not-found path.Confirms correct lookup argument.
generate_ai_reply_if_unanswered(99999) + mock_message_get.assert_called_once_with(pk=99999)
172-211
: Merge duplicate “no/empty AI response” tests via parametrization.Reduces repetition while preserving coverage.
- @patch("apps.slack.services.message_auto_reply.Message.objects.get") - @patch("apps.slack.services.message_auto_reply.WebClient") - @patch("apps.slack.services.message_auto_reply.process_ai_query") - def test_generate_ai_reply_no_ai_response( - self, - mock_process_ai_query, - mock_webclient, - mock_message_get, - mock_message, - ): - """Test when AI doesn't return a response.""" - mock_message_get.return_value = mock_message - mock_client = Mock() - mock_webclient.return_value = mock_client - mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]} - mock_process_ai_query.return_value = None - - generate_ai_reply_if_unanswered(mock_message.id) - - mock_client.chat_postMessage.assert_not_called() - - @patch("apps.slack.services.message_auto_reply.Message.objects.get") - @patch("apps.slack.services.message_auto_reply.WebClient") - @patch("apps.slack.services.message_auto_reply.process_ai_query") - def test_generate_ai_reply_empty_response( - self, - mock_process_ai_query, - mock_webclient, - mock_message_get, - mock_message, - ): - """Test when AI returns empty response.""" - mock_message_get.return_value = mock_message - mock_client = Mock() - mock_webclient.return_value = mock_client - mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]} - mock_process_ai_query.return_value = "" - - generate_ai_reply_if_unanswered(mock_message.id) - - mock_client.chat_postMessage.assert_not_called() + @pytest.mark.parametrize("ai_response", [None, ""]) + @patch("apps.slack.services.message_auto_reply.Message.objects.get") + @patch("apps.slack.services.message_auto_reply.WebClient") + @patch("apps.slack.services.message_auto_reply.process_ai_query") + def test_generate_ai_reply_no_or_empty_ai_response( + self, mock_process_ai_query, mock_webclient, mock_message_get, mock_message, ai_response + ): + """Test when AI returns no/empty response.""" + mock_message_get.return_value = mock_message + mock_client = Mock() + mock_webclient.return_value = mock_client + mock_client.conversations_replies.return_value = {"messages": [{"reply_count": 0}]} + mock_process_ai_query.return_value = ai_response + + generate_ai_reply_if_unanswered(mock_message.id) + + mock_client.chat_postMessage.assert_not_called()backend/tests/apps/slack/common/question_detector_test.py (3)
18-23
: Avoid brittle assertions on pattern countsAsserting exact counts for
question_patterns
/compiled_patterns
couples tests to implementation details.- assert len(detector.question_patterns) == 4 - assert len(detector.compiled_patterns) == 4 + assert len(detector.question_patterns) >= 4 + assert len(detector.compiled_patterns) >= 4
49-64
: Confirm intent: treating “advice/recommend” statements as questionsTests expect declaratives like “I recommend this approach” to be flagged as questions. If that’s not intended, constrain these patterns (e.g., require a question mark or leading question terms).
144-156
: Reduce reliance on pattern orderIndex-based checks (
compiled_patterns[0/1]
) are fragile. Prefer matching by pattern content or behavior.- question_mark_pattern = detector.compiled_patterns[0] - assert question_mark_pattern.search("Is this right?") is not None + assert any(p.pattern == r"\?" and p.search("Is this right?") for p in detector.compiled_patterns) - question_word_pattern = detector.compiled_patterns[1] - assert question_word_pattern.search("What is this") is not None - assert question_word_pattern.search("How does it work") is not None + expected = r"^(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)" + assert any(p.pattern == expected and p.search("What is this") for p in detector.compiled_patterns) + assert any(p.pattern == expected and p.search("How does it work") for p in detector.compiled_patterns)backend/apps/slack/common/question_detector.py (3)
16-19
: Normalize keywords to lowercase onceEnsure case-insensitive matching regardless of how
OWASP_KEYWORDS
is defined.- self.owasp_keywords = OWASP_KEYWORDS + self.owasp_keywords = {k.lower() for k in OWASP_KEYWORDS}
20-25
: Tighten regex to reduce false positivesAdd word-boundaries and allow optional leading whitespace for question starters; keeps behavior but trims accidental matches like “whatsoever”.
- r"^(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)", - r"(help|explain|tell me|show me|guide|tutorial|example)", - r"(recommend|suggest|advice|opinion)", + r"^\s*\b(what|how|why|when|where|which|who|can|could|would|should|is|are|does|do|did)\b", + r"\b(help|explain|tell me|show me|guide|tutorial|example)\b", + r"\b(recommend|suggest|advice|opinion)\b",
48-57
: Boundary-aware phrase matching for multi-word keywordsSubstring matching can overmatch. Consider boundary-aware search that ignores punctuation/hyphens.
# Example enhancement: def contains_owasp_keywords(self, text: str) -> bool: words = re.findall(r"\b\w+\b", text) text_words = set(words) if self.owasp_keywords.intersection(text_words): return True # Multi-word phrases: collapse whitespace/punctuation and match by tokens normalized = re.sub(r"[\s\-_]+", " ", text) for keyword in self.owasp_keywords: if " " in keyword and re.search(rf"\b{re.escape(keyword)}\b", normalized): return True return Falsebackend/tests/apps/slack/commands/ai_test.py (1)
13-17
: Remove unused autouse fixture or reuse itYou recreate
Ai()
in tests; either reuseself.ai_command
or drop the autouse fixture.- @pytest.fixture(autouse=True) - def setup_method(self): - """Set up test data before each test method.""" - self.ai_command = Ai() + @pytest.fixture + def ai_command(self): + """Ai instance per test.""" + return Ai()Then use
ai_command
in tests.backend/tests/apps/ai/common/extractors/repository_test.py (1)
397-405
: Use call.args for robust URL assertionsStringifying
call
is brittle. Inspect the first positional arg.- assert any( - "https://raw.githubusercontent.com/test-org/test-repo/develop/" in str(call) - for call in mock_get_content.call_args_list - ) + assert any( + "https://raw.githubusercontent.com/test-org/test-repo/develop/" in c.args[0] + for c in mock_get_content.call_args_list + )backend/tests/apps/slack/events/message_posted_test.py (2)
95-123
: Add a case for mid-message mentions (optional)If mentions anywhere should trigger AI reply, add a test where the mention is not at the start (e.g., “Hi <@u987654> what is OWASP?”) and confirm the stripped query is correct. If only leading mentions should trigger, assert non-trigger instead.
95-123
: Consider guarding against None blocks from get_blocksIf
get_blocks
can return None,chat_postMessage
may fail. Either assert in tests that we skip posting when None, or handle it in the event handler.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (36)
backend/apps/ai/Makefile
(1 hunks)backend/apps/ai/common/extractors/repository.py
(1 hunks)backend/apps/ai/management/commands/ai_update_repository_chunks.py
(1 hunks)backend/apps/ai/management/commands/ai_update_repository_context.py
(1 hunks)backend/apps/ai/models/chunk.py
(3 hunks)backend/apps/slack/MANIFEST.yaml
(3 hunks)backend/apps/slack/admin/conversation.py
(1 hunks)backend/apps/slack/apps.py
(1 hunks)backend/apps/slack/commands/__init__.py
(1 hunks)backend/apps/slack/commands/ai.py
(1 hunks)backend/apps/slack/common/handlers/ai.py
(1 hunks)backend/apps/slack/common/question_detector.py
(1 hunks)backend/apps/slack/constants.py
(1 hunks)backend/apps/slack/events/__init__.py
(1 hunks)backend/apps/slack/events/message_posted.py
(1 hunks)backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py
(1 hunks)backend/apps/slack/models/conversation.py
(1 hunks)backend/apps/slack/services/__init__.py
(1 hunks)backend/apps/slack/services/message_auto_reply.py
(1 hunks)backend/apps/slack/templates/commands/ai.jinja
(1 hunks)backend/pyproject.toml
(1 hunks)backend/settings/base.py
(2 hunks)backend/settings/urls.py
(1 hunks)backend/test_commands.py
(1 hunks)backend/tests/apps/ai/common/extractors/repository_test.py
(1 hunks)backend/tests/apps/ai/common/utils_test.py
(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
(1 hunks)backend/tests/apps/ai/models/chunk_test.py
(4 hunks)backend/tests/apps/slack/commands/ai_test.py
(1 hunks)backend/tests/apps/slack/common/handlers/ai_test.py
(1 hunks)backend/tests/apps/slack/common/question_detector_test.py
(1 hunks)backend/tests/apps/slack/events/message_posted_test.py
(1 hunks)backend/tests/apps/slack/services/message_auto_reply_test.py
(1 hunks)cspell/custom-dict.txt
(2 hunks)docker-compose/local.yaml
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (20)
backend/apps/slack/apps.py (2)
backend/apps/mentorship/apps.py (1)
ready
(12-14)backend/apps/slack/events/__init__.py (1)
configure_slack_events
(1-19)
backend/apps/slack/common/handlers/ai.py (2)
backend/apps/ai/agent/tools/rag/rag_tool.py (2)
RagTool
(13-63)query
(34-63)backend/apps/slack/blocks.py (1)
markdown
(21-34)
backend/tests/apps/slack/common/handlers/ai_test.py (2)
backend/apps/slack/common/handlers/ai.py (2)
get_error_blocks
(49-61)process_ai_query
(31-46)backend/apps/ai/agent/tools/rag/rag_tool.py (1)
query
(34-63)
backend/apps/ai/management/commands/ai_update_repository_context.py (4)
backend/apps/ai/common/base/context_command.py (1)
BaseContextCommand
(9-54)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_markdown_content
(114-174)backend/apps/github/models/repository.py (1)
Repository
(25-349)backend/apps/ai/management/commands/ai_update_repository_chunks.py (5)
Command
(10-36)extract_content
(14-16)get_base_queryset
(18-28)get_default_queryset
(30-32)source_name
(34-36)
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command
(10-36)source_name
(34-36)extract_content
(14-16)get_base_queryset
(18-28)get_default_queryset
(30-32)backend/apps/ai/common/base/context_command.py (1)
BaseContextCommand
(9-54)backend/apps/github/models/repository.py (1)
Repository
(25-349)
backend/apps/slack/commands/ai.py (2)
backend/apps/slack/commands/command.py (1)
CommandBase
(20-157)backend/apps/slack/common/handlers/ai.py (1)
get_blocks
(13-28)
backend/apps/slack/events/message_posted.py (7)
backend/apps/slack/common/handlers/ai.py (1)
get_blocks
(13-28)backend/apps/slack/common/question_detector.py (2)
QuestionDetector
(13-57)is_owasp_question
(31-42)backend/apps/slack/events/event.py (1)
EventBase
(23-265)backend/apps/slack/models/conversation.py (2)
Conversation
(15-107)update_data
(85-107)backend/apps/slack/models/member.py (1)
Member
(10-78)backend/apps/slack/models/message.py (2)
Message
(15-160)text
(83-85)backend/apps/slack/services/message_auto_reply.py (1)
generate_ai_reply_if_unanswered
(16-48)
backend/apps/slack/common/question_detector.py (1)
backend/apps/slack/models/message.py (1)
text
(83-85)
backend/tests/apps/slack/services/message_auto_reply_test.py (4)
backend/apps/slack/models/conversation.py (1)
Conversation
(15-107)backend/apps/slack/models/message.py (3)
Message
(15-160)text
(83-85)ts
(88-90)backend/apps/slack/models/workspace.py (2)
Workspace
(11-44)bot_token
(37-44)backend/apps/slack/services/message_auto_reply.py (1)
generate_ai_reply_if_unanswered
(16-48)
backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py (3)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (5)
Command
(10-36)source_name
(34-36)extract_content
(14-16)get_base_queryset
(18-28)get_default_queryset
(30-32)backend/apps/ai/common/base/chunk_command.py (1)
BaseChunkCommand
(12-91)backend/apps/github/models/repository.py (1)
Repository
(25-349)
backend/apps/slack/services/message_auto_reply.py (4)
backend/apps/slack/common/handlers/ai.py (1)
process_ai_query
(31-46)backend/apps/slack/models/message.py (3)
Message
(15-160)ts
(88-90)text
(83-85)backend/apps/slack/models/workspace.py (1)
bot_token
(37-44)backend/apps/ai/agent/tools/rag/rag_tool.py (1)
query
(34-63)
backend/apps/ai/common/extractors/repository.py (1)
backend/apps/github/utils.py (1)
get_repository_file_content
(60-79)
backend/apps/slack/events/__init__.py (2)
backend/apps/slack/apps.py (1)
SlackConfig
(13-33)backend/apps/slack/events/event.py (2)
EventBase
(23-265)configure_events
(30-37)
backend/tests/apps/ai/models/chunk_test.py (3)
backend/tests/apps/ai/common/base/chunk_command_test.py (2)
mock_chunks
(72-81)mock_context
(51-60)backend/apps/ai/models/chunk.py (2)
Chunk
(16-93)bulk_save
(34-53)backend/tests/apps/ai/common/base/context_command_test.py (1)
mock_context
(48-55)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
BaseChunkCommand
(12-91)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_markdown_content
(114-174)backend/apps/github/models/repository.py (1)
Repository
(25-349)backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command
(10-36)extract_content
(14-16)get_base_queryset
(18-28)get_default_queryset
(30-32)source_name
(34-36)
backend/tests/apps/slack/common/question_detector_test.py (2)
backend/apps/slack/common/question_detector.py (4)
QuestionDetector
(13-57)is_question
(44-46)contains_owasp_keywords
(48-57)is_owasp_question
(31-42)backend/tests/apps/slack/events/message_posted_test.py (1)
test_init
(44-48)
backend/tests/apps/slack/events/message_posted_test.py (3)
backend/apps/slack/events/message_posted.py (1)
MessagePosted
(17-77)backend/apps/slack/models/conversation.py (1)
Conversation
(15-107)backend/apps/slack/models/member.py (1)
Member
(10-78)
backend/apps/ai/models/chunk.py (3)
backend/apps/ai/models/context.py (1)
Context
(15-61)backend/apps/common/models.py (2)
BulkSaveModel
(10-34)TimestampedModel
(37-46)backend/apps/common/utils.py (1)
truncate
(164-176)
backend/tests/apps/slack/commands/ai_test.py (2)
backend/apps/slack/commands/ai.py (1)
Ai
(6-23)backend/apps/slack/commands/command.py (1)
CommandBase
(20-157)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (2)
extract_repository_content
(11-111)extract_repository_markdown_content
(114-174)
🔇 Additional comments (26)
backend/pyproject.toml (1)
22-22
: Manually verify django-rq compatibility with Python 3.13 and Django 5.1
Poetry wasn’t available to run the compatibility checks; please confirm at runtime that django-rq 3.1 (and its rq/redis transitive dependencies) works under Python 3.13 and Django 5.1, and pin rq/redis in pyproject.toml if needed.cspell/custom-dict.txt (1)
109-109
: Dictionary additions OK“rqworker” and “webgoat” entries make sense for this repo’s domain.
Also applies to: 125-125
backend/apps/slack/services/__init__.py (1)
1-1
: Package initializer LGTMClear, minimal package docstring. No issues.
backend/settings/urls.py (1)
31-31
: Django-RQ dashboard is staff-only by default
Unconditionalpath("django-rq/", include("django_rq.urls"))
is safe since django-rq v3.1+ enforces staff-only access on its dashboard views.backend/settings/base.py (1)
52-53
: Django RQ app registration looks correct.Adding "django_rq" to THIRD_PARTY_APPS is appropriate and consistent with how INSTALLED_APPS is composed.
backend/tests/apps/ai/common/utils_test.py (1)
97-107
: Decorator order and side-effect wiring are correct.Patching order matches argument order; the AttributeError simulates the None-context path as intended.
backend/apps/ai/models/chunk.py (2)
34-53
: No downstream code depends onchunks
being cleared.Chunk.bulk_save
is only called inchunk_command.py
(where the list isn’t used after the call) and in tests (which don’t assert on the list), so the updated clear-only-when-empty logic doesn’t break any callers.
19-23
: Replace deprecatedunique_together
with aUniqueConstraint
and plan a safe migrationclass Meta: db_table = "ai_chunks" verbose_name = "Chunk" - unique_together = ("context", "text") + constraints = [ + models.UniqueConstraint( + fields=("context", "text"), + name="uq_ai_chunks_context_text", + ) + ]Add a data migration to dedupe existing rows before enforcing the constraint and use
CREATE INDEX CONCURRENTLY
on Postgres to avoid downtime.backend/tests/apps/ai/models/chunk_test.py (1)
101-104
: LGTM on direct context-based existence checks.The assertions correctly verify the switch to context=mock_context, text=text.
Also applies to: 125-128, 148-150
backend/apps/ai/common/extractors/repository.py (1)
108-111
: LGTM on structured repository metadata extraction.The prose/metadata split and DELIMITER usage align with the RAG pipeline.
backend/apps/slack/commands/__init__.py (1)
5-5
: LGTM: /ai command is now exported/loadedImporting ai ensures command discovery via CommandBase.configure_commands().
backend/apps/slack/models/conversation.py (1)
30-32
: LGTM: opt-in feature flag on ConversationGood that from_slack() does not overwrite this local flag.
backend/apps/ai/management/commands/ai_update_repository_context.py (1)
30-32
: Upstream get_base_queryset has no is_active filter BaseAICommand.get_base_queryset returns .objects.all() with no implicit filters, so overriding get_default_queryset correctly avoids filtering on the nonexistent is_active field.backend/apps/slack/MANIFEST.yaml (1)
98-103
: /ai slash command looks correct.URL, usage, and escaping align with the existing pattern. Ensure the handler responds within 3s to Slack’s slash command ACK.
backend/apps/slack/migrations/0019_conversation_is_nest_bot_assistant_enabled.py (1)
1-1
: Django version compatibility confirmed
The project’spyproject.toml
declaresdjango = "^5.1"
(i.e. ≥5.1,<6.0), which includes Django 5.2.5, so the migration header is consistent with the dependency range.backend/apps/ai/management/commands/ai_update_repository_chunks.py (2)
14-17
: Good: delegates extraction cleanly.
34-36
: Consistent source_name.Matches context command; good for downstream filters.
backend/tests/apps/slack/services/message_auto_reply_test.py (1)
17-43
: Fixtures are clean and focused. LGTM.backend/tests/apps/slack/common/question_detector_test.py (2)
170-186
: Good real-world coverageNice parameterized cases that exercise diverse OWASP queries.
188-196
: Initialization patching looks solidMocking
OWASP_KEYWORDS
verifies configurability without leaking global state.backend/tests/apps/slack/commands/ai_test.py (2)
18-42
: Patch target and whitespace trimming behavior look correctVerifies the local import usage and exact query passed to
get_blocks
.
127-140
: Exception propagation test is appropriateConfirms
render_blocks
does not swallow errors, leaving caller to handle.backend/tests/apps/ai/common/extractors/repository_test.py (1)
241-256
: Baseline behavior verifiedDescription is included; metadata intentionally empty for markdown extraction. Looks good.
backend/tests/apps/slack/events/message_posted_test.py (3)
95-123
: Bot-mention flow is well coveredAuth caching, mention stripping, and block rendering checks are solid.
210-292
: OWASP question happy-path coverage looks goodValidates DB updates and delayed enqueue to the default queue.
348-375
: Whitespace-trim behavior verifiedEnsures query cleanup before rendering blocks. Nice.
def test_extract_repository_markdown_content_file_fetch_exception(self, mock_get_content): | ||
"""Test extraction when file fetching raises an exception.""" | ||
repository = MagicMock() | ||
repository.description = "Test repository" | ||
repository.organization = MagicMock(login="test-org") | ||
repository.key = "test-repo" | ||
repository.default_branch = "main" | ||
|
||
mock_get_content.side_effect = Exception("Network error") | ||
|
||
prose, metadata = extract_repository_markdown_content(repository) | ||
|
||
assert "Repository Description: Test repository" in prose | ||
assert metadata == "" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatch: extractor doesn’t catch generic exceptions
extract_repository_markdown_content
currently catches a narrow set; these tests raise Exception
, which will bubble and fail. Broaden exception handling in the extractor and keep going.
Proposed change in apps/ai/common/extractors/repository.py:
# inside the for file_path in markdown_files loop
raw_url = f"https://raw.githubusercontent.com/{owner}/{repository.key}/{repository.default_branch or 'main'}/{file_path}"
try:
content = get_repository_file_content(raw_url)
except Exception as exc: # broad by design; network/parsing can vary
logger.debug(
"Failed to fetch %s for %s/%s from branch %s: %s",
file_path,
owner or "(unknown owner)",
repository.key or "(no key)",
repository.default_branch or "main",
exc,
)
continue
🤖 Prompt for AI Agents
In backend/tests/apps/ai/common/extractors/repository_test.py around lines 340
to 354, the test triggers a generic Exception when fetching file content but the
extractor only catches narrow exceptions; update
apps/ai/common/extractors/repository.py inside the markdown_files loop to wrap
the get_repository_file_content call in a try/except that catches Exception, log
a debug message including file_path, owner (or "(unknown owner)"),
repository.key (or "(no key)"), branch (or "main") and the exception, then
continue to the next file so extraction proceeds without raising.
backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
Show resolved
Hide resolved
6439df7
to
b198ecf
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/apps/ai/models/chunk.py (1)
85-93
: update_data has a race on duplicates; rely on the DB constraintThe pre-check can race; catch IntegrityError instead and return None.
- if Chunk.objects.filter(context=context, text=text).exists(): - return None - - chunk = Chunk(text=text, embedding=embedding, context=context) - - if save: - chunk.save() - - return chunk + chunk = Chunk(text=text, embedding=embedding, context=context) + if save: + try: + chunk.save() + except Exception as exc: # narrowed below via local import + from django.db import IntegrityError + if isinstance(exc, IntegrityError): + # Duplicate per unique (context, text); treat as no-op + logger.debug("Chunk already exists for context_id=%s; skipping", getattr(context, "id", None)) + return None + raise + return chunkIf you prefer top-level imports, add:
-from django.db import DatabaseError, models +from django.db import DatabaseError, IntegrityError, models
♻️ Duplicate comments (2)
backend/apps/ai/models/chunk.py (1)
35-53
: bulk_save drops updates, does N+1 queries, and breaks the clear-list contract
- Updates are filtered out (ids present) if an identical (context, text) exists.
- Per-object .exists() causes O(N) queries.
- Passing a new list to BulkSaveModel means the caller’s list isn’t cleared unless no uniques remain.
- Duplicate races aren’t handled; rely on the DB unique constraint and catch IntegrityError.
Replace the body with a single-probe, race-safe implementation that preserves the contract.
@staticmethod def bulk_save(chunks, fields=None): - """Bulk save chunks with duplicate handling.""" - unique_chunks = [] - unique_chunks = [ - chunk - for chunk in chunks - if not Chunk.objects.filter(context=chunk.context, text=chunk.text).exists() - ] - - if unique_chunks: - try: - BulkSaveModel.bulk_save(Chunk, unique_chunks, fields=fields) - except (ValueError, TypeError, DatabaseError): - for chunk in unique_chunks: - try: - chunk.save() - except (ValueError, TypeError, DatabaseError) as save_error: - logger.exception("Failed to save chunk", exc_info=save_error) - else: - chunks.clear() + """Bulk save chunks with duplicate handling and minimal queries.""" + if not chunks: + return + + # Partition: updates vs. creates + to_update = [c for c in chunks if getattr(c, "id", None)] + to_create = [c for c in chunks if not getattr(c, "id", None)] + + # In-memory dedupe among creates by (context_id, text) + seen: set[tuple[int | None, str]] = set() + deduped_create = [] + for c in to_create: + ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) + key = (ctx_id, c.text) + if key in seen: + continue + seen.add(key) + deduped_create.append(c) + + # One DB probe for existing keys among incoming creates + ctx_ids = {getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) for c in deduped_create} + texts = {c.text for c in deduped_create} + existing_keys: set[tuple[int, str]] = set() + if ctx_ids and texts: + existing_keys = set( + Chunk.objects.filter(context_id__in=ctx_ids, text__in=texts) + .values_list("context_id", "text") + ) + + final_create = [] + for c in deduped_create: + ctx_id = getattr(c, "context_id", None) or getattr(getattr(c, "context", None), "id", None) + if (ctx_id, c.text) not in existing_keys: + final_create.append(c) + + unique_chunks = to_update + final_create + if not unique_chunks: + chunks.clear() + return + + try: + # Maintain contract: operate on caller list so BulkSaveModel clears it + chunks[:] = unique_chunks + BulkSaveModel.bulk_save(Chunk, chunks, fields=fields) + except (ValueError, TypeError, DatabaseError) as exc: + # Fallback: per-object saves with race-safe duplicate handling + from django.db import IntegrityError # local import to avoid global if unused + for obj in unique_chunks: + try: + if getattr(obj, "id", None) and fields: + obj.save(update_fields=fields) + else: + obj.save() + except IntegrityError: + # Duplicate create due to race/uniqueness; ignore + continue + except (ValueError, TypeError, DatabaseError): + logger.exception("Failed to save chunk id=%s", getattr(obj, "id", None))backend/tests/apps/ai/common/extractors/repository_test.py (1)
340-354
: Add a test for generic exceptions from the fetcherPast feedback asked to catch generic Exception in the extractor; add a case that raises Exception to ensure we don’t break on unexpected errors.
@patch("apps.ai.common.extractors.repository.get_repository_file_content") def test_extract_repository_markdown_content_file_fetch_exception(self, mock_get_content): """Test extraction when file fetching raises an exception.""" repository = MagicMock() repository.description = "Test repository" repository.organization = MagicMock(login="test-org") repository.key = "test-repo" repository.default_branch = "main" - mock_get_content.side_effect = ConnectionError("Network error") + mock_get_content.side_effect = Exception("Network error") prose, metadata = extract_repository_markdown_content(repository) assert "Repository Description: Test repository" in prose assert metadata == ""
🧹 Nitpick comments (5)
backend/apps/ai/models/chunk.py (2)
50-51
: Use logger.exception correctlylogger.exception already includes traceback; passing exc_info object is unnecessary and atypical.
- logger.exception("Failed to save chunk", exc_info=save_error) + logger.exception("Failed to save chunk id=%s", getattr(chunk, "id", None))
19-23
: Prefer UniqueConstraint over unique_togetherDjango now favors explicit UniqueConstraint with a name; easier to reference in migrations and logs.
- unique_together = ("context", "text") + constraints = [ + models.UniqueConstraint( + fields=["context", "text"], + name="uniq_ai_chunk_context_text", + ) + ]backend/apps/ai/management/commands/ai_update_repository_chunks.py (1)
18-28
: Optimize queryset for extractor accessSelect related FKs to avoid extra queries when building raw URLs (owner/org), and narrow fields if desired.
return ( super() .get_base_queryset() .filter( is_owasp_site_repository=True, is_archived=False, is_empty=False, ) - ) + ).select_related("organization", "owner")backend/tests/apps/ai/common/extractors/repository_test.py (2)
407-420
: Assert no network calls when owner/org missingWhen both are absent, extractor should not attempt fetches. Patch and assert not called.
- def test_extract_repository_markdown_content_no_owner_or_org(self): + @patch("apps.ai.common.extractors.repository.get_repository_file_content") + def test_extract_repository_markdown_content_no_owner_or_org(self, mock_get_content): """Test extraction when repository has neither organization nor owner.""" repository = MagicMock() repository.description = "Test repository" repository.organization = None repository.owner = None repository.key = "test-repo" repository.default_branch = "main" prose, metadata = extract_repository_markdown_content(repository) assert "Repository Description: Test repository" in prose assert metadata == "" + mock_get_content.assert_not_called()
421-433
: Also assert no network calls when key is missingKey is required to form the raw URL; ensure no fetch attempted.
- def test_extract_repository_markdown_content_no_key(self): + @patch("apps.ai.common.extractors.repository.get_repository_file_content") + def test_extract_repository_markdown_content_no_key(self, mock_get_content): """Test extraction when repository has no key.""" repository = MagicMock() repository.description = "Test repository" repository.organization = MagicMock(login="test-org") repository.key = None repository.default_branch = "main" prose, metadata = extract_repository_markdown_content(repository) assert "Repository Description: Test repository" in prose assert metadata == "" + mock_get_content.assert_not_called()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
backend/apps/ai/Makefile
(1 hunks)backend/apps/ai/common/extractors/repository.py
(1 hunks)backend/apps/ai/management/commands/ai_update_repository_chunks.py
(1 hunks)backend/apps/ai/management/commands/ai_update_repository_context.py
(1 hunks)backend/apps/ai/models/chunk.py
(3 hunks)backend/test_commands.py
(1 hunks)backend/tests/apps/ai/common/extractors/repository_test.py
(1 hunks)backend/tests/apps/ai/common/utils_test.py
(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
(1 hunks)backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
(1 hunks)backend/tests/apps/ai/models/chunk_test.py
(4 hunks)cspell/custom-dict.txt
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- backend/test_commands.py
- backend/tests/apps/ai/management/commands/ai_update_repository_context_test.py
- backend/tests/apps/ai/models/chunk_test.py
- backend/apps/ai/management/commands/ai_update_repository_context.py
- backend/apps/ai/common/extractors/repository.py
- cspell/custom-dict.txt
- backend/apps/ai/Makefile
- backend/tests/apps/ai/management/commands/ai_update_repository_chunks_test.py
- backend/tests/apps/ai/common/utils_test.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/tests/apps/ai/common/extractors/repository_test.py (1)
backend/apps/ai/common/extractors/repository.py (2)
extract_repository_content
(11-111)extract_repository_markdown_content
(114-174)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
BaseChunkCommand
(12-91)backend/apps/ai/common/extractors/repository.py (1)
extract_repository_markdown_content
(114-174)backend/apps/github/models/repository.py (1)
Repository
(25-349)backend/apps/ai/management/commands/ai_update_repository_context.py (5)
Command
(10-36)extract_content
(14-16)get_base_queryset
(18-28)get_default_queryset
(30-32)source_name
(34-36)
backend/apps/ai/models/chunk.py (2)
backend/apps/ai/models/context.py (1)
Context
(15-61)backend/apps/common/models.py (2)
BulkSaveModel
(10-34)TimestampedModel
(37-46)
🔇 Additional comments (5)
backend/apps/ai/management/commands/ai_update_repository_chunks.py (4)
10-12
: LGTM: command wiring is correctkey_field_name and model_class look right.
14-16
: LGTM: delegates to the markdown extractorExtraction path aligns with the PR goals.
30-36
: LGTM: default queryset override avoids is_activeRepository lacks is_active; good override.
18-28
: Verified OWASP site repository filter useswww-
prefixes
check_owasp_site_repository
keys offkey.startswith(("www-chapter-", "www-committee-", "www-event", "www-project-"))
, so the queryset filter aligns with the intendedwww-
heuristic.backend/tests/apps/ai/common/extractors/repository_test.py (1)
3-3
: CI/python requirements support datetime.UTC
Project’spyproject.toml
specifiespython = "^3.13"
(>= 3.13), which includes thedatetime.UTC
alias introduced in Python 3.11—no action needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dishant1804 you've submitted 1000+ lines of code for review.
I have a couple of questions:
- could you explain what OWASP Nest problem this PR solves?
- how did you verify that problem is solved with this PR?
this PR gets the .md files that we have in www- repositories and saves it chunks and context so that we can have more knowledge |
Alright, so it sound like you checked it with at least one repository and some relevant queries. Can you provide some examples? |
this has a lot of chunks that are not up to the mark, maybe we should change the chunking delimiter or chunking strategy for this case, because it has lot of delimiters |
I meant your general approach for testing, e.g. did OWASP Nest repository content help to improve NestBot answers/context for any specific questions? Did it get new info based on this? My opinion -- it didn't. I haven't tested it yet but based on this code I can reasonably assume that no new [useful] content was added by these jobs and PR in general. Also I reasonably assume that this is an AI generated code that has nothing to do with real life scenario. I call it a cosplay-code. The bottom line is that code has 0 value in its current form but you requested mentors to review it w/o testing. I don't appreciate this kind of disrespect to mentors' time. This is not acceptable. |
I'll update the chunking startegy for this and then request for review, we are getting the information from readme and contributing .md files, the chunking needs to be proper |
Proposed change
Resolves #2144
www-
repositoriesChecklist
make check-test
locally; all checks and tests passed.