-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[NA] Add comprehensive database architecture Cursor rules #3712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added 7 low-maintenance database rules documenting patterns and conventions: Rules Added: - database-overview.mdc: Dual-database architecture (MySQL + ClickHouse), decision matrix - mysql-schema-patterns.mdc: MySQL schema patterns, naming conventions, table templates - clickhouse-schema-patterns.mdc: ClickHouse patterns, ReplacingMergeTree, ORDER BY design - jdbi-dao-patterns.mdc: JDBI3 DAO interface patterns for MySQL access - r2dbc-reactive-patterns.mdc: R2DBC reactive patterns for ClickHouse access - database-migration-workflow.mdc: Liquibase workflow, naming conventions, best practices - database-query-best-practices.mdc: Query optimization, security, performance patterns Design Philosophy: - Pattern-focused (not specific schemas) - stays current without maintenance - References source of truth (migrations, DAO interfaces) - Teaches AI to search codebase for current schema - Core stable patterns only - no exhaustive table listings - Cross-references between related rules Maintenance: Near-zero - only update when architectural patterns change, not for schema evolution.
andrescrz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a top level review as this is a draft. A few comments before moving forward:
- These rule are placed a top level, but they are mostly related to to the backend. They should be placed in the appropriate folder.
- They're missing the proper configuration, so context is only populated smartly by the agent:
globs,alwaysApply. This is critical for good token usage. - Many of the content of this rules is already covered in the existing rules. Please merge, update etc. as it applies. We should avoid duplication.
I can go on a deeper review once this is moved to non-draft state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive Cursor rules documentation for Opik's dual-database architecture. The rules follow a pattern-focused approach that requires minimal maintenance by documenting stable architectural patterns rather than exhaustive schema details. The documentation covers MySQL (state database) and ClickHouse (analytics database), with complete guidance on access patterns, schema conventions, and migration workflows.
Key Changes:
- Added 7 database architecture documentation files covering patterns, conventions, and workflows
- Established cross-referenced rule structure with navigation between related topics
- Documented both JDBI3 (MySQL) and R2DBC (ClickHouse) access patterns
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| database-overview.mdc | Dual-database architecture overview with decision matrix for database selection |
| mysql-schema-patterns.mdc | MySQL schema conventions, table patterns, and naming standards |
| clickhouse-schema-patterns.mdc | ClickHouse ReplacingMergeTree patterns and ORDER BY design |
| jdbi-dao-patterns.mdc | JDBI3 DAO interface patterns with annotations and query examples |
| r2dbc-reactive-patterns.mdc | R2DBC reactive patterns using Mono/Flux for async operations |
| database-migration-workflow.mdc | Liquibase migration workflow with naming conventions and examples |
| database-query-best-practices.mdc | Query optimization, security, and performance patterns |
| private final TraceDAO traceDAO; | ||
|
|
||
| public Mono<Trace> create(TraceCreate request) { | ||
| return asyncTemplate.nonTransaction(READ_ONLY, connection -> { |
Copilot
AI
Oct 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name 'nonTransaction' is used with READ_ONLY, but ClickHouse operations don't support traditional transactions. This is misleading since the comment at line 129 correctly states 'ClickHouse doesn't support traditional transactions'. Consider renaming 'nonTransaction' to something more descriptive like 'executeAsync' or adding clarifying comments at each usage.
| ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private'; | ||
|
|
||
| --rollback ALTER TABLE projects DROP COLUMN visibility; |
Copilot
AI
Oct 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a NOT NULL column with a default value directly is safe for empty tables but can cause issues with large existing tables in production. Consider documenting the pattern of adding the column as nullable first, backfilling data, then making it NOT NULL in a subsequent migration for production safety.
| ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private'; | |
| --rollback ALTER TABLE projects DROP COLUMN visibility; | |
| -- Step 1: Add the column as nullable (no default) | |
| ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NULL; | |
| -- Step 2: Backfill existing rows with the intended default | |
| UPDATE projects SET visibility = 'private' WHERE visibility IS NULL; | |
| -- Step 3: Alter the column to be NOT NULL with a default | |
| ALTER TABLE projects MODIFY COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private'; | |
| --rollback | |
| ALTER TABLE projects DROP COLUMN visibility; |
| SELECT * FROM traces | ||
| WHERE workspace_id = :workspaceId AND id = :id | ||
| ``` | ||
|
|
Copilot
AI
Oct 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states 'may return multiple versions' but doesn't explain when this is acceptable or preferred. Add guidance on when to use FINAL vs LIMIT BY vs no deduplication based on performance trade-offs, since the r2dbc-reactive-patterns.mdc suggests avoiding FINAL in most cases.
| > **Guidance: Choosing FINAL, LIMIT BY, or No Deduplication** | |
| > | |
| > - **Avoid `FINAL` in most cases:** Using `FINAL` forces ClickHouse to deduplicate rows at query time, which can be very slow on large tables. Only use `FINAL` when you absolutely require the latest version of each row and cannot tolerate duplicates. | |
| > - **Prefer `LIMIT BY` for performance:** If you need only one version per key (e.g., latest per id), consider using `LIMIT 1 BY id` or similar patterns, which are much faster than `FINAL` but may not always guarantee the most up-to-date row if merges are pending. | |
| > - **No deduplication:** If your use case can tolerate multiple versions (e.g., for audit/history purposes), you can query without `FINAL` or `LIMIT BY` for maximum performance. | |
| > - **Application-level deduplication:** In some cases, it may be more efficient to fetch all versions and deduplicate in application code, especially for small result sets. | |
| > | |
| > See also: [r2dbc-reactive-patterns.mdc](mdc:rules/r2dbc-reactive-patterns.mdc) for more on query performance and deduplication strategies. |
|
|
||
| // Create | ||
| @SqlUpdate("INSERT INTO projects (id, name, description, workspace_id, visibility, created_by, last_updated_by) " + | ||
| "VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, COALESCE(:bean.visibility, 'private'), :bean.createdBy, :bean.lastUpdatedBy)") |
Copilot
AI
Oct 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using COALESCE in the INSERT statement with a hardcoded default value 'private' duplicates the default logic that should be in the database schema or application layer. This makes it harder to maintain consistent defaults. Consider removing COALESCE and relying on database defaults or setting the value in the application layer before passing to the DAO.
| "VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, COALESCE(:bean.visibility, 'private'), :bean.createdBy, :bean.lastUpdatedBy)") | |
| "VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, :bean.visibility, :bean.createdBy, :bean.lastUpdatedBy)") |
| AND id IN (<ids>) -- < 100 items | ||
|
|
||
| -- ⚠️ CONSIDER: For large lists, use temp table or batch queries | ||
| -- If ids list > 1000, consider alternative approach |
Copilot
AI
Oct 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions considering an alternative approach for large lists (>1000) but doesn't specify what that alternative should be. Add specific guidance or reference to a pattern for handling large ID lists, such as using temporary tables, batch processing, or chunking strategies.
| -- If ids list > 1000, consider alternative approach | |
| -- If ids list > 1000, use a temporary table to store IDs and join, or batch the queries in chunks of 1000 or less. |
andrescrz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the rules to the appropriate place and also add the proper configuration.
After that, we can review their actual content.
|
@YarivHashaiComet closing this PR as stale for about 1 month. Please feel free to reopen if you get back to it. Thanks. |
Details
Added 7 low-maintenance database rules documenting patterns and conventions:
Rules Added:
Design Philosophy:
Maintenance: Near-zero - only update when architectural patterns change, not for schema evolution.
Change checklist
Issues
Testing
Documentation