Skip to content

Conversation

@Lehmann-Fabian
Copy link
Contributor

This plugin visualizes and traces the physical DAG in Nextflow pipelines. It provides insights into data flow, process dependencies, and file sizes. I’d love to share this with the community!

Signed-off-by: Lehmann_Fabian <[email protected]>
Signed-off-by: Lehmann_Fabian <[email protected]>
Signed-off-by: Lehmann_Fabian <[email protected]>
@Lehmann-Fabian
Copy link
Contributor Author

Thank you, I fixed it!

@bentsherman
Copy link
Member

Thanks Fabian. This plugin looks really cool. A few general comments:

  1. Have you looked at the dag format in nf-prov ? It is doing a very similar thing. It looks like your plugin is more detailed though. Would you be open to contributing your improvements there instead of a new plugin? nf-prov is widely used in nf-core so you would have greater visibility. I won't stop you from doing your own thing, just letting you know

  2. I'm concerned that the name nf-dataflow is too vague, because dataflow is such a core concept in Nextflow. I would be more comfortable with a more specific name, like nf-dataflow-dag or nf-dataflow-viz

@Lehmann-Fabian
Copy link
Contributor Author

Hi Ben, Thanks for your review. The purpose of this plugin is to understand the dataflow without needing to go through all Nextflow files. The visualization is just an addition, as the data is available. However, the primary purpose was to write the input and output CSV.

  • Thanks for pointing to the plugin. Indeed, my plugin is similar in terms of rendering the physical DAG. For me, it is particularly interesting to see the amount of data exchanged. Also, showing individual files might get a bit too much. I am not sure if I can bring all my demands into the prov plugin and I try to avoid waiting for PR reviews, which is why I prefer my own plugin.
  • The primary purpose is not to visualize the DAG. How about nf-task-io-logger? But I still prefer dataflow as this sounds much better and, to me, describes the purpose of my plugin better.

@bentsherman
Copy link
Member

Okay, I see what you mean about the CSV reports. In that case, you might want to take a look at this... nextflow-io/nextflow#5715

We are developing a datastore to track all of the execution metadata, including task inputs/outputs and file sizes. The PR includes a few CLI commands if you want to play with it. If I understand correctly, your reports could be generated from this datastore

@Lehmann-Fabian
Copy link
Contributor Author

It's nice to see that Nextflow will natively support it. But for now, I stick to my plugin as it 100% covers my needs. And for the case that someone else would benefit, I ask you to accept my plugin :)

@bentsherman
Copy link
Member

Fair enough. In that case, I must insist on a more specific name. How about nf-dataflow-tracker

@bentsherman
Copy link
Member

Or nf-dataflow-logger

@Lehmann-Fabian
Copy link
Contributor Author

Thanks, I will rename it. How about nf-datawatch or nf-datatrail to keep it a single word?

@bentsherman
Copy link
Member

Sure. nf-datatrail seems catchy

@Lehmann-Fabian
Copy link
Contributor Author

Okay, it took much longer than I thought, but it is finished now.
I rewrote all logic to not depend on Nextflow's DAG renderers.
The physical DAG can now be rendered with different options to ensure you can view it even for thousands of tasks.
And I renamed it, of course :)

@Lehmann-Fabian Lehmann-Fabian changed the title Added nf-dataflow version 0.0.1 Added nf-datatrail version 0.0.1 Apr 11, 2025
Signed-off-by: Lehmann_Fabian <[email protected]>
@bentsherman bentsherman merged commit d85f3bc into nextflow-io:main Apr 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants