NVIDIA-NeMo · jgerh · Jun 25, 2025 · Jun 25, 2025 · Jun 25, 2025 · Jun 26, 2025
diff --git a/.cursor/rules/docs-assist-merge-conflict-resolution.mdc b/.cursor/rules/docs-assist-merge-conflict-resolution.mdc
@@ -0,0 +1,71 @@
+---
+description: Resolve the merge conflicts using a "manual rebase" approach
+globs: 
+alwaysApply: false
+---
+# LLM-Assisted Merge Conflict Resolution
+
+When documentation branches fall behind main, use this "smart rebase" approach with LLM assistance to resolve conflicts safely and accurately.
+
+## Prerequisites
+- Identify the current branch name (`user/example-branch`) that contains your changes
+- Ensure you have no uncommitted changes in your working directory
+
+## Resolution Steps
+
+1. Create a backup of your current branch:
+   ```bash
+   git branch backup/user-branch-$(date +%Y-%m-%d) user/example-branch
+   ```
+
+2. Get the latest main branch:
+   ```bash
+   git checkout main
+   git pull origin main
+   ```
+
+3. Create a new resolution branch:
+   ```bash
+   git checkout -b conflict-resolution/example-branch
+   ```
+
+4. Bring in changes from your original branch:
+   ```bash
+   git merge --squash user/example-branch
+   ```
+
+5. LLM-Assisted Conflict Resolution:
+   a. For each conflict marker (`<<<<<<<`), the LLM will:
+      - Analyze and explain the conflict context
+      - Show the semantic differences between versions
+      - Provide a recommended resolution with rationale
+      - Wait for user approval before proceeding
+
+   b. After each resolution:
+      - LLM verifies the resolved content maintains technical accuracy
+      - LLM checks for documentation consistency
+      - LLM ensures cross-references remain valid
+
+6. Final Validation:
+   - LLM performs comprehensive review of all resolved files
+   - Verifies all conflict markers are removed
+   - Checks documentation structure remains intact
+   - Ensures all technical content is accurate
+   - Validates all internal references and links
+
+7. Commit the resolved changes:
+   ```bash
+   git add .
+   git commit -m "Resolve conflicts from user/example-branch
+
+   - List major conflicts resolved
+   - Note any significant decisions made
+   - Reference relevant documentation updates"
+   ```
+
+## Post-Resolution
+You now have a clean branch based on latest main with your changes properly integrated. The backup branch can be deleted once you've verified everything is correct:
+```bash
+git branch -D backup/user-branch-YYYY-MM-DD
+```
+
diff --git a/.cursor/rules/docs-bump-version.mdc b/.cursor/rules/docs-bump-version.mdc
@@ -0,0 +1,13 @@
+---
+description: Version Bump Instructions for Docs Publishing
+globs:
+alwaysApply: false
+---
+
+1. Update the [versions1.json](mdc:docs/versions1.json) file with the user's provided version by adding a new entry at the top and updating preferred to false for the previous entry.
+2. Update the [repo.toml](mdc:repo.toml) `version` to the latest version provided by the user.
+2. Create a tag for the latest commit on the `main` branch in the format of "git tag docs-v{}.{}.{}`.
+3. Push the tag.
+4. Recap everything you did to prepare for the release.
+
+If a user asks you to bump the version but hasn't provided a full version number, ask for clarification on the version number.
diff --git a/.cursor/rules/docs-check-source.mdc b/.cursor/rules/docs-check-source.mdc
@@ -0,0 +1,8 @@
+---
+description: Where to find source code for drafting details.
+globs: 
+alwaysApply: false
+---
+- You can find the source code used to draft docs in the `nemo_run/` directory.
+
+- Make sure the nmp submodule is pointing to the `main` branch when verifying this information, as that is the bleeding edge.
diff --git a/.cursor/rules/docs-frontmatter.mdc b/.cursor/rules/docs-frontmatter.mdc
@@ -0,0 +1,235 @@
+---
+description: 
+globs: 
+alwaysApply: false
+---
+# Documentation Frontmatter Taxonomy Framework
+
+Every markdown file in `docs/` should have frontmatter following this taxonomy framework at the very top of the page:
+
+## Required Frontmatter Structure
+
+```yaml
+---
+description: "Brief description of the content (1-2 sentences)"
+categories: ["primary-category"]  # Single category (required)
+tags: ["tag1", "tag2", "tag3"]    # 2-8 tags recommended
+personas: ["persona1", "persona2"] # Target audience(s)
+difficulty: "beginner|intermediate|advanced|reference"
+content_type: "tutorial|concept|reference|troubleshooting|example"
+modality: "text-only|image-only|video-only|multimodal|universal"
+# only: not ga  # Optional: content gating
+---
+```
+
+## Taxonomy Guidelines
+
+### Categories (Choose ONE - Required)
+
+**Primary functional domains aligned with user workflows:**
+
+- `getting-started` - Installation, setup, quickstart guides
+- `concepts-architecture` - Core concepts, fundamentals, system architecture
+- `training-algorithms` - RL algorithms (GRPO, DPO, SFT), policy optimization
+- `model-development` - Model integration, validation, custom architectures
+- `research-advanced` - Custom algorithms, ablation studies, theory
+- `deployment-operations` - Infrastructure, deployment, maintenance
+- `integrations-apis` - External services, API docs, custom integrations  
+- `reference` - API documentation, configuration, troubleshooting
+
+### Tags (Select 2-8 - Recommended)
+
+**Training Techniques:**
+
+
+
+- `reinforcement-learning` - RL algorithms and policy optimization
+- `policy-optimization` - Policy gradient methods and optimization
+- `model-training` - Model training and fine-tuning
+- `loss-functions` - Loss function implementations and gradients
+- `convergence` - Training convergence and stability
+- `environments` - RL environment interfaces and implementations
+- `rollouts` - Experience collection and rollout management
+- `evaluation` - Model evaluation and metrics
+
+
+
+**Technical Implementation:**
+
+- `gpu-accelerated` - GPU-specific processing
+- `distributed` - Multi-node/distributed processing
+- `kubernetes` - K8s deployment content
+- `slurm` - HPC/Slurm-related content
+- `docker` - Container-related content
+
+- `python-api` - Python API usage
+- `configuration` - Config file management
+
+
+**Data Types & Formats:**
+
+- `webdataset` - WebDataset format handling
+- `jsonl` - JSONL data format
+
+- `parquet` - Parquet data format
+- `bitext` - Parallel/bilingual text
+- `code-data` - Programming code datasets
+
+- `multimodal` - Cross-modal content
+
+**Workflow Stages:**
+
+
+- `data-loading` - Data ingestion processes
+- `data-processing` - Core processing steps
+- `data-export` - Output and export
+
+- `pipeline` - End-to-end workflows
+- `monitoring` - Observability and tracking
+
+
+**Performance & Scale:**
+
+- `large-scale` - Large dataset processing
+- `optimization` - Performance tuning
+- `memory-management` - Memory optimization
+
+- `batch-processing` - Batch operation techniques
+
+### Personas (Select 1+ - Required)
+
+**Target audiences based on user roles:**
+
+- `data-scientist-focused` - Analytics, metrics, model behavior
+- `mle-focused` - Implementation details, pipelines, optimization
+- `admin-focused` - Deployment, operations, maintenance
+- `devops-focused` - Infrastructure, automation, monitoring
+
+### Difficulty Levels
+
+- `beginner` - New users, basic concepts
+- `intermediate` - Some experience required
+- `advanced` - Expert-level content
+- `reference` - API docs, detailed specs
+
+### Content Types
+
+- `tutorial` - Step-by-step guides
+- `concept` - Explanatory content
+- `reference` - API/config documentation
+- `troubleshooting` - Problem-solving guides
+- `example` - Code samples and demos
+
+
+### Modality Focus
+
+- `text-only` - Text-specific content
+- `image-only` - Image-specific content
+- `video-only` - Video-specific content (EA-only)
+
+- `multimodal` - Cross-modal content
+- `universal` - Applies to all modalities
+
+## Example Frontmatter
+
+### Tutorial Example
+
+
+```yaml
+---
+description: "Step-by-step guide to setting up distributed GRPO training for large language models"
+categories: ["training-algorithms"]
+tags: ["reinforcement-learning", "distributed", "large-scale", "configuration", "gpu-accelerated"]
+personas: ["mle-focused", "admin-focused"]
+
+difficulty: "intermediate"
+content_type: "tutorial"
+modality: "text-only"
+---
+```
+
+
+### Concept Example
+
+```yaml
+---
+description: "Core concepts behind experiment management and distributed computing for machine learning models"
+categories: ["concepts-architecture"]
+tags: ["loss-functions", "convergence", "multimodal", "machine-learning"]
+
+personas: ["data-scientist-focused", "mle-focused"]
+difficulty: "beginner"
+content_type: "concept"
+modality: "multimodal"
+
+---
+```
+
+### Reference Example
+
+```yaml
+---
+description: "Complete API reference for NeMo Run training algorithms and experiment management methods"
+categories: ["reference"]
+
+tags: ["reinforcement-learning", "python-api", "policy-optimization", "configuration"]
+personas: ["mle-focused", "data-scientist-focused"]
+difficulty: "reference"
+content_type: "reference"
+modality: "universal"
+---
+```
+
+### Operations Example
+
+```yaml
+---
+description: "Deploy NeMo Run on Kubernetes clusters with GPU acceleration and distributed training"
+categories: ["deployment-operations"]
+tags: ["kubernetes", "gpu-accelerated", "monitoring", "distributed", "docker"]
+personas: ["admin-focused", "devops-focused"]
+difficulty: "advanced"
+content_type: "tutorial"
+modality: "universal"
+---
+```
+
+## Content Gating Integration
+
+For early access or internal content, add the `only` field:
+
+```yaml
+---
+description: "Video pipeline customization and advanced GPU processing techniques"
+categories: ["research-advanced"]
+tags: ["pipeline", "gpu-accelerated", "customization"]
+personas: ["mle-focused"]
+difficulty: "advanced"
+
+content_type: "tutorial"
+modality: "video-only"
+only: not ga  # Exclude from GA builds
+---
+```
+
+## Benefits
+
+
+
+1. **User-Centric Navigation** - Categories align with natural user workflows
+2. **Flexible Discovery** - Tags enable cross-cutting content discovery  
+3. **Persona Adaptation** - Content serves different user types effectively
+4. **Search Optimization** - Rich metadata improves search relevance
+5. **Content Strategy** - Clear guidelines for authors and maintainers
+6. **Scalability** - Structure accommodates future content and features
+
+## Validation
+When adding frontmatter, ensure:
+
+- ✅ Description is 1-2 sentences and informative
+- ✅ Exactly one category is specified
+- ✅ 2-8 relevant tags are selected
+- ✅ At least one persona is specified
+- ✅ Difficulty and content type are appropriate
+- ✅ Modality focus matches the content
+- ✅ Content gating (`only`) is used when needed