[NA] Add comprehensive database architecture Cursor rules #3712

YarivHashaiComet · 2025-10-18T10:15:29Z

Details

Added 7 low-maintenance database rules documenting patterns and conventions:

Rules Added:

database-overview.mdc: Dual-database architecture (MySQL + ClickHouse), decision matrix
mysql-schema-patterns.mdc: MySQL schema patterns, naming conventions, table templates
clickhouse-schema-patterns.mdc: ClickHouse patterns, ReplacingMergeTree, ORDER BY design
jdbi-dao-patterns.mdc: JDBI3 DAO interface patterns for MySQL access
r2dbc-reactive-patterns.mdc: R2DBC reactive patterns for ClickHouse access
database-migration-workflow.mdc: Liquibase workflow, naming conventions, best practices
database-query-best-practices.mdc: Query optimization, security, performance patterns

Design Philosophy:

Pattern-focused (not specific schemas) - stays current without maintenance
References source of truth (migrations, DAO interfaces)
Teaches AI to search codebase for current schema
Core stable patterns only - no exhaustive table listings
Cross-references between related rules

Maintenance: Near-zero - only update when architectural patterns change, not for schema evolution.

Change checklist

User facing
Documentation update

Issues

Resolves #
OPIK-

Testing

Documentation

Added 7 low-maintenance database rules documenting patterns and conventions: Rules Added: - database-overview.mdc: Dual-database architecture (MySQL + ClickHouse), decision matrix - mysql-schema-patterns.mdc: MySQL schema patterns, naming conventions, table templates - clickhouse-schema-patterns.mdc: ClickHouse patterns, ReplacingMergeTree, ORDER BY design - jdbi-dao-patterns.mdc: JDBI3 DAO interface patterns for MySQL access - r2dbc-reactive-patterns.mdc: R2DBC reactive patterns for ClickHouse access - database-migration-workflow.mdc: Liquibase workflow, naming conventions, best practices - database-query-best-practices.mdc: Query optimization, security, performance patterns Design Philosophy: - Pattern-focused (not specific schemas) - stays current without maintenance - References source of truth (migrations, DAO interfaces) - Teaches AI to search codebase for current schema - Core stable patterns only - no exhaustive table listings - Cross-references between related rules Maintenance: Near-zero - only update when architectural patterns change, not for schema evolution.

andrescrz

Leaving a top level review as this is a draft. A few comments before moving forward:

These rule are placed a top level, but they are mostly related to to the backend. They should be placed in the appropriate folder.
They're missing the proper configuration, so context is only populated smartly by the agent: globs, alwaysApply. This is critical for good token usage.
Many of the content of this rules is already covered in the existing rules. Please merge, update etc. as it applies. We should avoid duplication.

I can go on a deeper review once this is moved to non-draft state.

Copilot

Pull Request Overview

This PR adds comprehensive Cursor rules documentation for Opik's dual-database architecture. The rules follow a pattern-focused approach that requires minimal maintenance by documenting stable architectural patterns rather than exhaustive schema details. The documentation covers MySQL (state database) and ClickHouse (analytics database), with complete guidance on access patterns, schema conventions, and migration workflows.

Key Changes:

Added 7 database architecture documentation files covering patterns, conventions, and workflows
Established cross-referenced rule structure with navigation between related topics
Documented both JDBI3 (MySQL) and R2DBC (ClickHouse) access patterns

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
database-overview.mdc	Dual-database architecture overview with decision matrix for database selection
mysql-schema-patterns.mdc	MySQL schema conventions, table patterns, and naming standards
clickhouse-schema-patterns.mdc	ClickHouse ReplacingMergeTree patterns and ORDER BY design
jdbi-dao-patterns.mdc	JDBI3 DAO interface patterns with annotations and query examples
r2dbc-reactive-patterns.mdc	R2DBC reactive patterns using Mono/Flux for async operations
database-migration-workflow.mdc	Liquibase migration workflow with naming conventions and examples
database-query-best-practices.mdc	Query optimization, security, and performance patterns

Copilot · 2025-10-20T12:51:09Z

.cursor/rules/r2dbc-reactive-patterns.mdc

+    private final TraceDAO traceDAO;
+
+    public Mono<Trace> create(TraceCreate request) {
+        return asyncTemplate.nonTransaction(READ_ONLY, connection -> {


The method name 'nonTransaction' is used with READ_ONLY, but ClickHouse operations don't support traditional transactions. This is misleading since the comment at line 129 correctly states 'ClickHouse doesn't support traditional transactions'. Consider renaming 'nonTransaction' to something more descriptive like 'executeAsync' or adding clarifying comments at each usage.

Copilot · 2025-10-20T12:51:09Z

.cursor/rules/database-migration-workflow.mdc

+ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';
+
+--rollback ALTER TABLE projects DROP COLUMN visibility;


Adding a NOT NULL column with a default value directly is safe for empty tables but can cause issues with large existing tables in production. Consider documenting the pattern of adding the column as nullable first, backfilling data, then making it NOT NULL in a subsequent migration for production safety.

Suggested change

ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';

--rollback ALTER TABLE projects DROP COLUMN visibility;

-- Step 1: Add the column as nullable (no default)

ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NULL;

-- Step 2: Backfill existing rows with the intended default

UPDATE projects SET visibility = 'private' WHERE visibility IS NULL;

-- Step 3: Alter the column to be NOT NULL with a default

ALTER TABLE projects MODIFY COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';

--rollback

ALTER TABLE projects DROP COLUMN visibility;

Copilot · 2025-10-20T12:51:09Z

.cursor/rules/clickhouse-schema-patterns.mdc

+SELECT * FROM traces
+WHERE workspace_id = :workspaceId AND id = :id
+```
+


The documentation states 'may return multiple versions' but doesn't explain when this is acceptable or preferred. Add guidance on when to use FINAL vs LIMIT BY vs no deduplication based on performance trade-offs, since the r2dbc-reactive-patterns.mdc suggests avoiding FINAL in most cases.

Suggested change

> **Guidance: Choosing FINAL, LIMIT BY, or No Deduplication**

>

> - **Avoid `FINAL` in most cases:** Using `FINAL` forces ClickHouse to deduplicate rows at query time, which can be very slow on large tables. Only use `FINAL` when you absolutely require the latest version of each row and cannot tolerate duplicates.

> - **Prefer `LIMIT BY` for performance:** If you need only one version per key (e.g., latest per id), consider using `LIMIT 1 BY id` or similar patterns, which are much faster than `FINAL` but may not always guarantee the most up-to-date row if merges are pending.

> - **No deduplication:** If your use case can tolerate multiple versions (e.g., for audit/history purposes), you can query without `FINAL` or `LIMIT BY` for maximum performance.

> - **Application-level deduplication:** In some cases, it may be more efficient to fetch all versions and deduplicate in application code, especially for small result sets.

>

> See also: [r2dbc-reactive-patterns.mdc](mdc:rules/r2dbc-reactive-patterns.mdc) for more on query performance and deduplication strategies.

Copilot · 2025-10-20T12:51:10Z

.cursor/rules/jdbi-dao-patterns.mdc

+
+    // Create
+    @SqlUpdate("INSERT INTO projects (id, name, description, workspace_id, visibility, created_by, last_updated_by) " +
+               "VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, COALESCE(:bean.visibility, 'private'), :bean.createdBy, :bean.lastUpdatedBy)")


Using COALESCE in the INSERT statement with a hardcoded default value 'private' duplicates the default logic that should be in the database schema or application layer. This makes it harder to maintain consistent defaults. Consider removing COALESCE and relying on database defaults or setting the value in the application layer before passing to the DAO.

Suggested change

"VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, COALESCE(:bean.visibility, 'private'), :bean.createdBy, :bean.lastUpdatedBy)")

"VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, :bean.visibility, :bean.createdBy, :bean.lastUpdatedBy)")

Copilot · 2025-10-20T12:51:10Z

.cursor/rules/database-query-best-practices.mdc

+  AND id IN (<ids>)  -- < 100 items
+
+-- ⚠️ CONSIDER: For large lists, use temp table or batch queries
+-- If ids list > 1000, consider alternative approach


The comment mentions considering an alternative approach for large lists (>1000) but doesn't specify what that alternative should be. Add specific guidance or reference to a pattern for handling large ID lists, such as using temporary tables, batch processing, or chunking strategies.

Suggested change

-- If ids list > 1000, consider alternative approach

-- If ids list > 1000, use a temporary table to store IDs and join, or batch the queries in chunks of 1000 or less.

andrescrz

Let's move the rules to the appropriate place and also add the proper configuration.

After that, we can review their actual content.

andrescrz · 2025-11-13T12:31:01Z

@YarivHashaiComet closing this PR as stale for about 1 month. Please feel free to reopen if you get back to it. Thanks.

github-actions bot assigned YarivHashaiComet Oct 18, 2025

andrescrz reviewed Oct 20, 2025

View reviewed changes

YarivHashaiComet marked this pull request as ready for review October 20, 2025 12:49

YarivHashaiComet requested a review from a team as a code owner October 20, 2025 12:49

Copilot AI review requested due to automatic review settings October 20, 2025 12:49

Copilot AI reviewed Oct 20, 2025

View reviewed changes

andrescrz requested changes Oct 21, 2025

View reviewed changes

andrescrz closed this Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NA] Add comprehensive database architecture Cursor rules #3712

[NA] Add comprehensive database architecture Cursor rules #3712

Uh oh!

YarivHashaiComet commented Oct 18, 2025

Uh oh!

andrescrz left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 20, 2025

Uh oh!

Copilot AI Oct 20, 2025

Uh oh!

Copilot AI Oct 20, 2025

Uh oh!

Copilot AI Oct 20, 2025

Uh oh!

Copilot AI Oct 20, 2025

Uh oh!

andrescrz left a comment

Uh oh!

andrescrz commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';

		--rollback ALTER TABLE projects DROP COLUMN visibility;

-ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';
---rollback ALTER TABLE projects DROP COLUMN visibility;
+-- Step 1: Add the column as nullable (no default)
+ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NULL;
+-- Step 2: Backfill existing rows with the intended default
+UPDATE projects SET visibility = 'private' WHERE visibility IS NULL;
+-- Step 3: Alter the column to be NOT NULL with a default
+ALTER TABLE projects MODIFY COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';
+--rollback
+ALTER TABLE projects DROP COLUMN visibility;

+> **Guidance: Choosing FINAL, LIMIT BY, or No Deduplication**
+>
+> - **Avoid `FINAL` in most cases:** Using `FINAL` forces ClickHouse to deduplicate rows at query time, which can be very slow on large tables. Only use `FINAL` when you absolutely require the latest version of each row and cannot tolerate duplicates.
+> - **Prefer `LIMIT BY` for performance:** If you need only one version per key (e.g., latest per id), consider using `LIMIT 1 BY id` or similar patterns, which are much faster than `FINAL` but may not always guarantee the most up-to-date row if merges are pending.
+> - **No deduplication:** If your use case can tolerate multiple versions (e.g., for audit/history purposes), you can query without `FINAL` or `LIMIT BY` for maximum performance.
+> - **Application-level deduplication:** In some cases, it may be more efficient to fetch all versions and deduplicate in application code, especially for small result sets.
+>
+> See also: [r2dbc-reactive-patterns.mdc](mdc:rules/r2dbc-reactive-patterns.mdc) for more on query performance and deduplication strategies.

	"VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, COALESCE(:bean.visibility, 'private'), :bean.createdBy, :bean.lastUpdatedBy)")
	"VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, :bean.visibility, :bean.createdBy, :bean.lastUpdatedBy)")

	-- If ids list > 1000, consider alternative approach
	-- If ids list > 1000, use a temporary table to store IDs and join, or batch the queries in chunks of 1000 or less.

[NA] Add comprehensive database architecture Cursor rules #3712

[NA] Add comprehensive database architecture Cursor rules #3712

Uh oh!

Conversation

YarivHashaiComet commented Oct 18, 2025

Details

Change checklist

Issues

Testing

Documentation

Uh oh!

andrescrz left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

andrescrz left a comment

Choose a reason for hiding this comment

Uh oh!

andrescrz commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants