Skip to content

Commit 6bb8e15

Browse files
committed
issue-11979 - WIP - Phase 3 Plan
Signed-off-by: Helber Belmiro <[email protected]>
1 parent e213c9d commit 6bb8e15

File tree

1 file changed

+82
-1
lines changed

1 file changed

+82
-1
lines changed

CONTEXT.md

Lines changed: 82 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -358,4 +358,85 @@ if actualExecutedTasks > 0 {
358358
-**DAG Completion Logic**: Working as designed for actual execution patterns
359359
-**Test Infrastructure**: Proper isolation and validation
360360

361-
**The original DAG completion logic fixes were correct and working properly. The issue was test expectations not matching the actual KFP v2 execution model.**
361+
**The original DAG completion logic fixes were correct and working properly. The issue was test expectations not matching the actual KFP v2 execution model.**
362+
363+
## **Phase 3 Plan: Fix ParallelFor Parent DAG Completion Logic** 🎯
364+
365+
### **Problem Analysis**
366+
367+
ParallelFor parent DAGs remain in RUNNING state even when all child iteration DAGs complete. Current issues:
368+
369+
1. **Parent DAG Completion**: Parent DAGs don't transition to COMPLETE when all iterations finish
370+
2. **Task Counting**: `total_dag_tasks` should equal `iteration_count` but shows incorrect values
371+
3. **Child DAG Detection**: Parent completion logic may not properly detect completed child DAGs
372+
373+
### **Detailed Implementation Plan**
374+
375+
#### **Phase 3 Task 1: Analyze ParallelFor DAG Structure**
376+
**Goal**: Understand how ParallelFor creates DAG hierarchies and what should trigger completion
377+
378+
**Actions**:
379+
1. **Run ParallelFor integration test** to see current behavior
380+
2. **Examine MLMD structure** for ParallelFor runs:
381+
- Identify parent DAG vs iteration DAG properties
382+
- Check parent-child relationships
383+
- Validate `iteration_count` vs `iteration_index` usage
384+
3. **Review ParallelFor YAML structure** to understand expected execution flow
385+
4. **Debug current `isParallelForParentDAG()` detection logic**
386+
387+
#### **Phase 3 Task 2: Debug Parent DAG Completion Detection**
388+
**Goal**: Identify why parent DAGs don't complete when child iterations finish
389+
390+
**Actions**:
391+
1. **Add comprehensive debug logging** to ParallelFor completion logic
392+
2. **Trace `GetExecutionsInDAG()` behavior** for parent DAGs:
393+
- Check if child DAG executions are properly returned
394+
- Verify filtering logic doesn't exclude child DAGs
395+
3. **Debug child DAG counting logic**:
396+
- Verify `dagExecutions` count is correct
397+
- Check `completedChildDags` calculation
398+
4. **Test parent-child DAG relationship queries**
399+
400+
#### **Phase 3 Task 3: Fix ParallelFor Parent Completion Logic**
401+
**Goal**: Implement correct completion detection for ParallelFor parent DAGs
402+
403+
**Actions**:
404+
1. **Fix child DAG detection** if `GetExecutionsInDAG()` isn't returning child DAGs properly
405+
2. **Correct completion criteria**:
406+
- Ensure parent completes when ALL child iteration DAGs are complete
407+
- Handle edge cases (0 iterations, failed iterations)
408+
3. **Fix `total_dag_tasks` calculation** for ParallelFor parent DAGs:
409+
- Should equal `iteration_count`, not a fixed value
410+
4. **Update parent completion logic** to properly count completed child DAGs
411+
412+
#### **Phase 3 Task 4: Test and Validate Fix**
413+
**Goal**: Ensure ParallelFor completion works correctly
414+
415+
**Actions**:
416+
1. **Run single ParallelFor test** to verify fix works
417+
2. **Test edge cases**:
418+
- Dynamic iteration counts (2, 5, 10 iterations)
419+
- Failed iterations
420+
- Zero iterations
421+
3. **Validate MLMD state consistency**:
422+
- Parent DAG reaches `COMPLETE` state
423+
- `total_dag_tasks` equals `iteration_count`
424+
4. **Run full test suite** to ensure no regressions
425+
426+
### **Success Criteria**
427+
428+
- [ ] ParallelFor parent DAGs transition from `RUNNING``COMPLETE` when all child iterations finish
429+
- [ ] `total_dag_tasks` equals `iteration_count` for ParallelFor parent DAGs
430+
- [ ] ParallelFor integration tests pass consistently
431+
- [ ] Dynamic iteration counts work correctly (2, 5, 10 iterations)
432+
- [ ] Failed iterations cause parent DAG to transition to `FAILED` state
433+
- [ ] No regression in conditional DAG logic or other DAG types
434+
435+
### **Expected Implementation Areas**
436+
437+
1. **`isParallelForParentDAG()` detection** (lines 1052-1057 in client.go)
438+
2. **Parent DAG completion logic** (lines 898-914 in client.go)
439+
3. **`GetExecutionsInDAG()` filtering** for child DAG relationships
440+
4. **Task counting logic** for ParallelFor parent DAGs (lines 830-870 in client.go)
441+
442+
This approach will systematically identify and fix the root cause of ParallelFor parent DAG completion issues, similar to how we successfully resolved the conditional DAG problems.

0 commit comments

Comments
 (0)