-
Notifications
You must be signed in to change notification settings - Fork 374
Closed
Description
what do you think of the utility of having a method of the flatten function in DataFrames that expands a dataframe type df2 in the form dfexp?
df = DataFrame(a=1:3)
df1=DataFrame(b=11:15)
CSV.write("tmp1.csv", df)
CSV.write("tmp2.csv", df)
CSV.write("tmp3.csv", df1)
julia> df2
3×3 DataFrame
Row │ files other_cols subdf
│ String Int64 DataFrame
─────┼─────────────────────────────────────
1 │ tmp1.csv 1 3×1 DataFrame
2 │ tmp2.csv 2 3×1 DataFrame
3 │ tmp3.csv 3 5×1 DataFrame
julia> dfexp=flatten(df2,:subdf)
11×4 DataFrame
Row │ files other_cols a b
│ String Int64 Int64? Int64?
─────┼────────────────────────────────────────
1 │ tmp1.csv 1 1 missing
2 │ tmp1.csv 1 2 missing
3 │ tmp1.csv 1 3 missing
4 │ tmp2.csv 2 1 missing
5 │ tmp2.csv 2 2 missing
6 │ tmp2.csv 2 3 missing
7 │ tmp3.csv 3 missing 11
8 │ tmp3.csv 3 missing 12
9 │ tmp3.csv 3 missing 13
10 │ tmp3.csv 3 missing 14
11 │ tmp3.csv 3 missing 15
this is a function that does what is indicated, just to give an idea of what is required not to suggest how it should be done.
mapreduce(r->crossjoin(DataFrame(r[Not(:subdf)]),r.subdf), (x, y) -> vcat(x, y; cols = :union), eachrow(df2))