Skip to content

Understanding the value of .I for non-matching rows when using .EACHI #5457

@Henrik-P

Description

@Henrik-P

I join two data.tables, use .I in j, and by = .EACHI. When a row in i has no match to x the result is 0. I wish to understand why this is the case.

Some toy data:

d1 = data.table(v = c("A", "B", "C", "A", "C"))

# add column identical (value-wise) to .I
d1[ , i := .I]

d2 = data.table(v = c("D", "A", "G", "C"))

d1
#    v i
# 1: A 1
# 2: B 2
# 3: C 3
# 4: A 4
# 5: C 5

d2
#    v
# 1: D
# 2: A
# 3: G
3 4: C

Join the two tables on 'v'. In j, call either "i" or .I. Use by = .EACHI ("evaluates j for the groups in 'DT' that each row in i joins to").

When j is "i" (which at least "looks the same" as .I), non-matched rows evaluates to NA. To me, this seems consistent with the default nomatch behaviour: "When a row in i has no match to x, nomatch=NA (default) means NA is returned":

d1[d2, on = .(v), i, by = .EACHI]
#    v  i
# 1: D NA # unmatched row in `i` evaluates to NA
# 2: A  1
# 3: A  4
# 4: G NA # unmatched row in `i` evaluates to NA
# 5: C  3
# 6: C  5

On the other hand, when j is .I, non-matched rows evaluates to 0:

d1[d2, on = .(v), .I, by = .EACHI]
#    v I
# 1: D 0 # unmatched row in `i` evaluates to 0
# 2: A 1
# 3: A 4
# 4: G 0 # unmatched row in `i` evaluates to 0
# 5: C 3
# 6: C 5

From ?.I:

While grouping, it holds for each item in the group, its row location in x

However, I fail to find documentation on how unmatched rows in i evaluate to 0 when j = .I. Can someone help me understand this seemingly inconsistent behaviour?


R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Tried on:
data.table_1.14.2 &
data.table 1.14.3 IN DEVELOPMENT built 2022-07-20 18:26:12 UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions