Skip to content

Conversation

@YarivHashaiComet
Copy link
Collaborator

Details

Added 7 low-maintenance database rules documenting patterns and conventions:

Rules Added:

  • database-overview.mdc: Dual-database architecture (MySQL + ClickHouse), decision matrix
  • mysql-schema-patterns.mdc: MySQL schema patterns, naming conventions, table templates
  • clickhouse-schema-patterns.mdc: ClickHouse patterns, ReplacingMergeTree, ORDER BY design
  • jdbi-dao-patterns.mdc: JDBI3 DAO interface patterns for MySQL access
  • r2dbc-reactive-patterns.mdc: R2DBC reactive patterns for ClickHouse access
  • database-migration-workflow.mdc: Liquibase workflow, naming conventions, best practices
  • database-query-best-practices.mdc: Query optimization, security, performance patterns

Design Philosophy:

  • Pattern-focused (not specific schemas) - stays current without maintenance
  • References source of truth (migrations, DAO interfaces)
  • Teaches AI to search codebase for current schema
  • Core stable patterns only - no exhaustive table listings
  • Cross-references between related rules

Maintenance: Near-zero - only update when architectural patterns change, not for schema evolution.

Change checklist

  • User facing
  • Documentation update

Issues

  • Resolves #
  • OPIK-

Testing

Documentation

Added 7 low-maintenance database rules documenting patterns and conventions:

Rules Added:
- database-overview.mdc: Dual-database architecture (MySQL + ClickHouse), decision matrix
- mysql-schema-patterns.mdc: MySQL schema patterns, naming conventions, table templates
- clickhouse-schema-patterns.mdc: ClickHouse patterns, ReplacingMergeTree, ORDER BY design
- jdbi-dao-patterns.mdc: JDBI3 DAO interface patterns for MySQL access
- r2dbc-reactive-patterns.mdc: R2DBC reactive patterns for ClickHouse access
- database-migration-workflow.mdc: Liquibase workflow, naming conventions, best practices
- database-query-best-practices.mdc: Query optimization, security, performance patterns

Design Philosophy:
- Pattern-focused (not specific schemas) - stays current without maintenance
- References source of truth (migrations, DAO interfaces)
- Teaches AI to search codebase for current schema
- Core stable patterns only - no exhaustive table listings
- Cross-references between related rules

Maintenance: Near-zero - only update when architectural patterns change, not for schema evolution.
Copy link
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a top level review as this is a draft. A few comments before moving forward:

  1. These rule are placed a top level, but they are mostly related to to the backend. They should be placed in the appropriate folder.
  2. They're missing the proper configuration, so context is only populated smartly by the agent: globs, alwaysApply. This is critical for good token usage.
  3. Many of the content of this rules is already covered in the existing rules. Please merge, update etc. as it applies. We should avoid duplication.

I can go on a deeper review once this is moved to non-draft state.

@YarivHashaiComet YarivHashaiComet marked this pull request as ready for review October 20, 2025 12:49
@YarivHashaiComet YarivHashaiComet requested a review from a team as a code owner October 20, 2025 12:49
Copilot AI review requested due to automatic review settings October 20, 2025 12:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive Cursor rules documentation for Opik's dual-database architecture. The rules follow a pattern-focused approach that requires minimal maintenance by documenting stable architectural patterns rather than exhaustive schema details. The documentation covers MySQL (state database) and ClickHouse (analytics database), with complete guidance on access patterns, schema conventions, and migration workflows.

Key Changes:

  • Added 7 database architecture documentation files covering patterns, conventions, and workflows
  • Established cross-referenced rule structure with navigation between related topics
  • Documented both JDBI3 (MySQL) and R2DBC (ClickHouse) access patterns

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
database-overview.mdc Dual-database architecture overview with decision matrix for database selection
mysql-schema-patterns.mdc MySQL schema conventions, table patterns, and naming standards
clickhouse-schema-patterns.mdc ClickHouse ReplacingMergeTree patterns and ORDER BY design
jdbi-dao-patterns.mdc JDBI3 DAO interface patterns with annotations and query examples
r2dbc-reactive-patterns.mdc R2DBC reactive patterns using Mono/Flux for async operations
database-migration-workflow.mdc Liquibase migration workflow with naming conventions and examples
database-query-best-practices.mdc Query optimization, security, and performance patterns

private final TraceDAO traceDAO;

public Mono<Trace> create(TraceCreate request) {
return asyncTemplate.nonTransaction(READ_ONLY, connection -> {
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name 'nonTransaction' is used with READ_ONLY, but ClickHouse operations don't support traditional transactions. This is misleading since the comment at line 129 correctly states 'ClickHouse doesn't support traditional transactions'. Consider renaming 'nonTransaction' to something more descriptive like 'executeAsync' or adding clarifying comments at each usage.

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +96
ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';

--rollback ALTER TABLE projects DROP COLUMN visibility;
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a NOT NULL column with a default value directly is safe for empty tables but can cause issues with large existing tables in production. Consider documenting the pattern of adding the column as nullable first, backfilling data, then making it NOT NULL in a subsequent migration for production safety.

Suggested change
ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';
--rollback ALTER TABLE projects DROP COLUMN visibility;
-- Step 1: Add the column as nullable (no default)
ALTER TABLE projects ADD COLUMN visibility ENUM('private', 'public') NULL;
-- Step 2: Backfill existing rows with the intended default
UPDATE projects SET visibility = 'private' WHERE visibility IS NULL;
-- Step 3: Alter the column to be NOT NULL with a default
ALTER TABLE projects MODIFY COLUMN visibility ENUM('private', 'public') NOT NULL DEFAULT 'private';
--rollback
ALTER TABLE projects DROP COLUMN visibility;

Copilot uses AI. Check for mistakes.
SELECT * FROM traces
WHERE workspace_id = :workspaceId AND id = :id
```

Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states 'may return multiple versions' but doesn't explain when this is acceptable or preferred. Add guidance on when to use FINAL vs LIMIT BY vs no deduplication based on performance trade-offs, since the r2dbc-reactive-patterns.mdc suggests avoiding FINAL in most cases.

Suggested change
> **Guidance: Choosing FINAL, LIMIT BY, or No Deduplication**
>
> - **Avoid `FINAL` in most cases:** Using `FINAL` forces ClickHouse to deduplicate rows at query time, which can be very slow on large tables. Only use `FINAL` when you absolutely require the latest version of each row and cannot tolerate duplicates.
> - **Prefer `LIMIT BY` for performance:** If you need only one version per key (e.g., latest per id), consider using `LIMIT 1 BY id` or similar patterns, which are much faster than `FINAL` but may not always guarantee the most up-to-date row if merges are pending.
> - **No deduplication:** If your use case can tolerate multiple versions (e.g., for audit/history purposes), you can query without `FINAL` or `LIMIT BY` for maximum performance.
> - **Application-level deduplication:** In some cases, it may be more efficient to fetch all versions and deduplicate in application code, especially for small result sets.
>
> See also: [r2dbc-reactive-patterns.mdc](mdc:rules/r2dbc-reactive-patterns.mdc) for more on query performance and deduplication strategies.

Copilot uses AI. Check for mistakes.

// Create
@SqlUpdate("INSERT INTO projects (id, name, description, workspace_id, visibility, created_by, last_updated_by) " +
"VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, COALESCE(:bean.visibility, 'private'), :bean.createdBy, :bean.lastUpdatedBy)")
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using COALESCE in the INSERT statement with a hardcoded default value 'private' duplicates the default logic that should be in the database schema or application layer. This makes it harder to maintain consistent defaults. Consider removing COALESCE and relying on database defaults or setting the value in the application layer before passing to the DAO.

Suggested change
"VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, COALESCE(:bean.visibility, 'private'), :bean.createdBy, :bean.lastUpdatedBy)")
"VALUES (:bean.id, :bean.name, :bean.description, :workspaceId, :bean.visibility, :bean.createdBy, :bean.lastUpdatedBy)")

Copilot uses AI. Check for mistakes.
AND id IN (<ids>) -- < 100 items

-- ⚠️ CONSIDER: For large lists, use temp table or batch queries
-- If ids list > 1000, consider alternative approach
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions considering an alternative approach for large lists (>1000) but doesn't specify what that alternative should be. Add specific guidance or reference to a pattern for handling large ID lists, such as using temporary tables, batch processing, or chunking strategies.

Suggested change
-- If ids list > 1000, consider alternative approach
-- If ids list > 1000, use a temporary table to store IDs and join, or batch the queries in chunks of 1000 or less.

Copilot uses AI. Check for mistakes.
Copy link
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the rules to the appropriate place and also add the proper configuration.

After that, we can review their actual content.

@andrescrz
Copy link
Member

@YarivHashaiComet closing this PR as stale for about 1 month. Please feel free to reopen if you get back to it. Thanks.

@andrescrz andrescrz closed this Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants