-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
I join two data.tables, use .I
in j
, and by = .EACHI
. When a row in i
has no match to x
the result is 0
. I wish to understand why this is the case.
Some toy data:
d1 = data.table(v = c("A", "B", "C", "A", "C"))
# add column identical (value-wise) to .I
d1[ , i := .I]
d2 = data.table(v = c("D", "A", "G", "C"))
d1
# v i
# 1: A 1
# 2: B 2
# 3: C 3
# 4: A 4
# 5: C 5
d2
# v
# 1: D
# 2: A
# 3: G
3 4: C
Join the two tables on 'v'. In j
, call either "i" or .I
. Use by = .EACHI
("evaluates j
for the groups in 'DT' that each row in i
joins to").
When j
is "i" (which at least "looks the same" as .I
), non-matched rows evaluates to NA
. To me, this seems consistent with the default nomatch
behaviour: "When a row in i
has no match to x
, nomatch=NA
(default) means NA
is returned":
d1[d2, on = .(v), i, by = .EACHI]
# v i
# 1: D NA # unmatched row in `i` evaluates to NA
# 2: A 1
# 3: A 4
# 4: G NA # unmatched row in `i` evaluates to NA
# 5: C 3
# 6: C 5
On the other hand, when j
is .I
, non-matched rows evaluates to 0
:
d1[d2, on = .(v), .I, by = .EACHI]
# v I
# 1: D 0 # unmatched row in `i` evaluates to 0
# 2: A 1
# 3: A 4
# 4: G 0 # unmatched row in `i` evaluates to 0
# 5: C 3
# 6: C 5
From ?.I
:
While grouping, it holds for each item in the group, its row location in
x
However, I fail to find documentation on how unmatched rows in i
evaluate to 0 when j = .I
. Can someone help me understand this seemingly inconsistent behaviour?
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Tried on:
data.table_1.14.2 &
data.table 1.14.3 IN DEVELOPMENT built 2022-07-20 18:26:12 UTC