-
Notifications
You must be signed in to change notification settings - Fork 1k
Warn when by/keyby
is used with column selection via character or numeric j
#7254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Generated via commit b150b0a Download link for the artifact containing the test results: ↓ atime-results.zip
|
man/data.table.Rd
Outdated
@@ -97,8 +97,8 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac | |||
|
|||
See \href{../doc/datatable-intro.html}{\code{vignette("datatable-intro")}} and \code{example(data.table)}.} | |||
|
|||
\item{by}{ Column names are seen as if they are variables (as in \code{j} when \code{with=TRUE}). The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{by} accepts: | |||
|
|||
\item{by}{ Column names are seen as if they are variables (as in \code{j} when \code{with=TRUE}). \emph{Note that `by` and `keyby` are ignored when `j` is a character or numeric vector used for selecting columns (i.e., when the internal `with=FALSE` is triggered).} The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{by} accepts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use ` in Rd files like this, use \code{} instead
R/data.table.R
Outdated
@@ -739,6 +739,13 @@ replace_dot_alias = function(e) { | |||
if (!length(j) && !notj) return( null.data.table() ) | |||
if (is.factor(j)) j = as.character(j) # fix for FR: #358 | |||
if (is.character(j)) { | |||
if (!missingby) { | |||
warning( | |||
"`by` or `keyby` is ignored when `j` is a character vector used for column selection. ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not use backticks in warnings (only in markdown), please remove.
also please use one long character string (easier to translate), instead of breaking into several strings/lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly it's not so simple:
l = make_linter_from_xpath(
"
//SYMBOL_FUNCTION_CALL[
starts-with(text(), 'stop')
or starts-with(text(), 'warning')
or starts-with(text(), 'message')
or starts-with(text(), 'packageStartupMessage')
]
/parent::expr[
following-sibling::expr
//STR_CONST[contains(text(), '`')]
]
",
'message with backtick'
)
lint_dir("R", l())
Gives hits
data.table.R:315:9: warning: [l] message with backtick
stopf("Invalid use of `:=` inside `{}`. `:=` must be the only expression inside `{}` when used in `j`. Instead of: DT[{tmp1 <- ...; tmp2 <- ...; someCol := tmp1 * tmp2}], Use: DT[, someCol := {tmp1 <- ...; tmp2 <- ...; tmp1 * tmp2}]")
^~~~~
data.table.R:724:31: warning: [l] message with backtick
if (jsub %iscall% ":=") stopf("`:=` is only supported under with=TRUE, see ?`:=`.")
^~~~~
data.table.R:2671:40: warning: [l] message with backtick
if (identical(name, quote(`*tmp*`))) stopf("setalloccol attempting to modify `*tmp*`")
^~~~~
data.table.R:2887:3: warning: [l] message with backtick
stopf('Check that is.data.table(DT) == TRUE. Otherwise, `:=` is defined for use in j, once only and in particular ways. See help(":=", "data.table"). A common reason for this error is allocating a new column in `j` and using `<-` instead of `:=`; e.g., `DT[, new_col <- 1]` should be `DT[, new_col := 1]`. Another is using `:=` in a multi-statement `{...}` block; please use `:=` as the only statement in `j`.', class="dt_invalid_let_error")
^~~~~
data.table.R:3241:5: warning: [l] message with backtick
stopf("It looks like you re-used `:=` in argument %d a functional assignment call -- use `=` instead: %s(col1=val1, col2=val2, ...)", jj-1L, call_name)
^~~~~
foverlaps.R:3:47: warning: [l] message with backtick
if (!is.data.table(y) || !is.data.table(x)) stopf("y and x must both be data.tables. Use `setDT()` to convert list/data.frames to data.tables by reference or as.data.table() to convert to data.tables by copying.")
^~~~~
groupingsets.R:72:5: warning: [l] message with backtick
stopf("When using `id=TRUE` the 'x' data.table must not have a column named 'grouping'.")
^~~~~
groupingsets.R:76:5: warning: [l] message with backtick
warningf("'sets' contains a duplicate (i.e., equivalent up to sorting) element at index %d; as such, there will be duplicate rows in the output -- note that grouping by A,B and B,A will produce the same aggregations. Use `sets=unique(lapply(sets, sort))` to eliminate duplicates.", idx)
^~~~~~~~
groupingsets.R:121:5: warning: [l] message with backtick
stopf("When using `id=TRUE` the 'j' expression must not evaluate to a column named 'grouping'.")
^~~~~
groupingsets.R:123:5: warning: [l] message with backtick
stopf("There exists duplicated column names in the results, ensure the column passed/evaluated in `j` and those in `by` are not overlapping.")
^~~~~
merge.R:32:5: warning: [l] message with backtick
stopf("`by.x` and `by.y` must be of same length.")
^~~~~
merge.R:34:5: warning: [l] message with backtick
warningf("Supplied both `by` and `by.x`/`by.y`. `by` argument will be ignored.")
^~~~~~~~
merge.R:37:7: warning: [l] message with backtick
stopf("A non-empty vector of column names is required for `by.x` and `by.y`.", class="dt_invalid_input_error")
^~~~~
merge.R:39:7: warning: [l] message with backtick
stopf("The following columns listed in `%s` are missing from %s: %s", "by.x", "x", brackify(by.x[!idx]))
^~~~~
merge.R:42:7: warning: [l] message with backtick
stopf("The following columns listed in `%s` are missing from %s: %s", "by.y", "y", brackify(by.y[!idx]))
^~~~~
merge.R:54:7: warning: [l] message with backtick
stopf("A non-empty vector of column names for `by` is required.")
^~~~~
merge.R:56:7: warning: [l] message with backtick
stopf("The following columns listed in `%s` are missing from %s: %s", "by", "x", brackify(by[!idx]))
^~~~~
merge.R:59:7: warning: [l] message with backtick
stopf("The following columns listed in `%s` are missing from %s: %s", "by", "y", brackify(by[!idx]))
^~~~~
rowwiseDT.R:4:5: warning: [l] message with backtick
stopf("Must provide at least one column (use `name=`). See ?rowwiseDT for details")
^~~~~
setops.R:197:7: warning: [l] message with backtick
warningf("Argument 'tolerance' was forced to lowest accepted value `sqrt(.Machine$double.eps)` from provided %s", format(tolerance, scientific=FALSE))
^~~~~~~~
xts.R:11:33: warning: [l] message with backtick
if (index_nm %chin% names(x)) stopf("Input xts object should not have '%s' column because it would result in duplicate column names. Rename '%s' column in xts or use `keep.rownames` to change the index column name.", index_nm, index_nm)
^~~~~
xts.R:21:36: warning: [l] message with backtick
if (!xts::is.timeBased(x[[1L]])) stopf("data.table must have a time based column in first position, use `setcolorder` function to change the order, or see ?timeBased for supported types")
^~~~~
Someday, we should make a guide around message styling (how to refer to functions, arguments, R values, etc) and try to enforce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch, thanks
R/data.table.R
Outdated
@@ -762,6 +769,12 @@ replace_dot_alias = function(e) { | |||
# else the NA in ansvals are for join inherited scope (test 1973), and NA could be in irows from join and data in i should be returned (test 1977) | |||
# in both cases leave to the R-level subsetting of i and x together further below | |||
} else if (is.numeric(j)) { | |||
if (!missingby) { | |||
warning( | |||
"`by` or `keyby` is ignored when `j` is a numeric vector used for column selection. ", "Perhaps you intended to use `.SD`? For example: DT[, .SD[, ", deparse(jsub), "], by = ...]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same as above? can this duplication be avoided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried implementing it .
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7254 +/- ##
=========================================
Coverage ? 98.79%
=========================================
Files ? 81
Lines ? 15261
Branches ? 0
=========================================
Hits ? 15077
Misses ? 184
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
R/data.table.R
Outdated
if (is.character(j) || (is.numeric(j) && !is.logical(j))) { | ||
if (!missingby) { | ||
j_type = if (is.character(j)) "a character" else "a numeric" | ||
warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have warningf function that could be used here nicely
hi @tdhock can you please tell me the further improvements needed for the pr . |
Closes #5397.
This PR adds a warning when
by/keyby
are supplied butj
is a simple column selection vector (character/numeric). Previously this was accepted silently; this change now warns the user and guides them to use .SD if grouping was intended.The warning is suppressed when
with=FALSE
is explicitly provided, respecting user intent. New tests and documentation updates are included.hi @tdhock — please take a look when you have time.
Thanks!