Skip to content

Conversation

JAicewizard
Copy link
Contributor

This currently also includes the changes of #483, but I will rebase once that is merged.

This adds multiple ways to append, using all the variations of table sources.
Using just row-based is actually slower than the current solution, which may be expected as it operates very similarly but with extra steps. The real performance gains come from the parallel implementations.
The benchmarks don't show much improvement (~25% for both parallel row and parallel chunk), however this is due to the fact that they are both bottlenecked on appending the chunks in DuckDB itself. If instead the bottleneck would be the data ingestion/computation on the user-side (instead of a simple counter), this would be much more favorable to the parallel variations. Currently they barely show up as parallel on my laptop.

Another improvement would be setting entire vectors of data at a time, instead of individual values, this could be done in a future PR.

@JAicewizard
Copy link
Contributor Author

Benchmarks for the new function:

BenchmarkAppenderNested-16                 	    100	 12083417 ns/op
BenchmarkAppenderSingle/int8-16            	    288	  4121898 ns/op
BenchmarkAppenderRowSingle/int8-16         	    217	  5464273 ns/op
BenchmarkAppenderParallelRowSingle/int8-16 	    391	  3000902 ns/op
BenchmarkAppenderChunkSingle/int8-16       	    378	  3183649 ns/op
BenchmarkAppenderParallelChunkSingle/int8-16         	    379	  3030166 ns/op

You may also notice that this is actually integrated into the already existing appender API, instead of what I did before. The current appender implementation was so similar to what I was making, it would be a waste of code to duplicate all of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant