Skip to content

Conversation

jlevydev
Copy link
Contributor

@jlevydev jlevydev commented Mar 5, 2025

When upgrading Panfactum versions and applying the infrastructure the current workflow is to use terragrunt run-all apply in order to run the full dependency graph for an environment. While a good utility in a stable state, this command seems like the wrong tool in the context of an upgrade where there may be breakages. This is because run-all is an all or nothing operation and all the work from previous layers of the dependency graph must be repeated to get to later layers in the case of a failure.

This pull request introduces a custom utility as an alternative pf-apply-all. It let's you do a terragrunt run-all apply in a more interactive way. The script works as follows:

  1. Generate the dependency graph using terragrunt output-module-groups
  2. For each group ask to run the group or skip
  3. On run:
    a. Run the whole group in parallel
    b. Report all failed module and allow the user to infinitely re-run each failed module interactively one by one
  4. Repeat 2-3 until all the groups are processed
    This lets the user debug individual module failures and then continue forward. It also enables starting at an arbitrary depth of the dependency graph.

A consideration for the WIP script is that it creates and then destroys log files in a temp directory at the root using mktemp. Might be worth moving that to a directory in the repo and not deleting them so you have a more complete audit history of the applies that happened in the background. If the logs were retained it would improve visibility vs the previous paradigm where the logs are all comingled and somewhat worthless to the user.

Tested successfully on WSL.


Important

Introduces pf-apply-all.sh for interactive Terragrunt module application with improved failure handling and parallel processing.

  • Behavior:
    • Introduces pf-apply-all.sh for interactive Terragrunt module application.
    • Allows retry, skip, or exit on module failures.
    • Processes modules in parallel within groups.
  • Implementation:
    • Uses terragrunt output-module-groups to generate dependency graph.
    • Creates temporary log files for each module application.
    • Deletes log files after processing, with a note on potential improvement for audit history.
  • Dependencies:
    • Requires jq for JSON processing.

This description was created by Ellipsis for 170e3c2. It will automatically update as commits are pushed.

@jlevydev jlevydev requested a review from fullykubed as a code owner March 5, 2025 16:58
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 170e3c2 in 2 minutes and 57 seconds

More details
  • Looked at 248 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. packages/nix/packages/scripts/pf-apply-all.sh:82
  • Draft comment:
    Reading user input directly via /dev/tty is good; however, consider validating input strictly in case of unexpected responses.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
2. packages/nix/packages/scripts/pf-apply-all.sh:241
  • Draft comment:
    No newline at the end of file. Consider adding a newline.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    While having a newline at the end of files is a common Unix convention, this is a very minor style issue that doesn't affect functionality. The rules state not to make comments that are obvious or unimportant. This seems like a trivial issue that doesn't require a comment.
    Missing newlines can sometimes cause issues with certain tools or when concatenating files. It is a widely accepted best practice.
    While true, any serious issues would be caught by linters or the build system. This is too minor to warrant a PR comment.
    Delete this comment as it points out a trivial style issue that doesn't meaningfully impact the code.
3. packages/nix/packages/scripts/pf-apply-all.sh:99
  • Draft comment:
    Consider adding a trap to clean up the temporary directory in case the script terminates unexpectedly.
  • Reason this comment was not posted:
    Marked as duplicate.
4. packages/nix/packages/scripts/pf-apply-all.sh:51
  • Draft comment:
    Check that the module path exists before attempting to change directory. This could provide clearer error messaging if the path is invalid.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The module paths come from terragrunt itself, so they should be valid. If a path is invalid, the cd will fail and be caught by the error handling. Adding an explicit check doesn't add much value since the error will be caught and handled gracefully anyway. The script already has good error reporting via log files and colored output.
    The comment has a point that explicit validation could provide clearer error messages. There's a small chance the paths from terragrunt are stale.
    The existing error handling is sufficient - failed modules are clearly reported with logs, and the user can retry or skip them. Adding path validation would be redundant.
    The comment should be deleted as the existing error handling is adequate and adding path validation would be redundant.
5. packages/nix/packages/scripts/pf-apply-all.sh:83
  • Draft comment:
    Handle empty user input from /dev/tty. If the user simply sends an EOF (or empty input), the script might behave unexpectedly.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The current code actually handles empty input gracefully - if response is empty, it won't match ^[Ee]$ or ^[Ss]$, so it will default to retrying which is a reasonable behavior. The suggestion would make the empty case more explicit but doesn't change the behavior. The code is already defensive.
    I could be underestimating the importance of explicitly handling EOF conditions in shell scripts. There may be edge cases I'm not considering.
    While being explicit about EOF handling is good practice, the current code already handles empty input safely by falling through to the retry case. The suggested change doesn't improve functionality.
    The comment should be removed since the current code already handles empty input gracefully and the suggestion doesn't improve the behavior.
6. packages/nix/packages/scripts/pf-apply-all.sh:155
  • Draft comment:
    Using separate arrays (pids, module_paths, log_files) to correlate modules can be error-prone. Consider using associative arrays or a single compound data structure.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The suggestion has merit from a code quality perspective - using separate indexed arrays can be error-prone if indices get out of sync. However, in this case the code is relatively straightforward, the arrays are scoped locally, and the indexing is managed consistently. The suggested alternatives (associative arrays) would add complexity without clear benefits.
    The comment identifies a real code quality concern. Parallel array usage is generally considered an anti-pattern in software engineering.
    While the concern is valid, the current implementation is simple and works well for this specific use case. The arrays are local in scope and the indexing is straightforward.
    The comment should be removed as it suggests a refactor that would add complexity without significant benefit in this specific context.
7. packages/nix/packages/scripts/pf-apply-all.sh:241
  • Draft comment:
    Add a newline at the end of the file. This is a minor POSIX best practice.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None

Workflow ID: wflow_2SSJkOAuVR296Ufd


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -0,0 +1,241 @@
#!/bin/bash

set -e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a more robust shell option (e.g. set -euo pipefail) to catch unbound variables and pipeline errors.

Suggested change
set -e
set -euo pipefail

}

# Create a temporary directory for log files
setup_temp_dir() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a trap to clean up the temporary directory in case of unexpected script termination.

print_status "$YELLOW" "Starting module: $module_path"
(cd "$module_path" && terragrunt apply -auto-approve > "$log_file" 2>&1)
local exit_code=$?
echo "EXIT_CODE=$exit_code" >> "$log_file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a more unique marker for exit codes. Relying on EXIT_CODE=0 might conflict with terragrunt output. Consider a prefix that's less likely to collide.

Suggested change
echo "EXIT_CODE=$exit_code" >> "$log_file"
echo "UNIQUE_EXIT_CODE=$exit_code" >> "$log_file"

print_status "$GREEN" "All groups processed!"
}

# Check for jq dependency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terragrunt is a required dependency but is not checked. Consider verifying that terragrunt is installed before proceeding.

# Confirm before proceeding with this group
print_status "$BLUE" "Would you like to proceed with $group_name? (y/n)"
# Read from /dev/tty to ensure we're getting user input
read -r response < /dev/tty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a fallback or warning if /dev/tty is not available (e.g., in non-interactive environments) when using it for reading user input.

Suggested change
read -r response < /dev/tty
if [ -e /dev/tty ]; then read -r response < /dev/tty; else echo "Warning: /dev/tty not available"; response="n"; fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant