- 
                Notifications
    
You must be signed in to change notification settings  - Fork 49
 
Open
Labels
Milestone
Description
I encountered unexpected behaviour when attempting to filter values of type Missing. I found a solution at https://discourse.julialang.org/t/query-jl-filtering-on-missing-data/14898. I suppose this issue is a feature request for documentation that clarifies missing values in Query.jl.
Anyhow, the case is as follows.
julia> using DataFrames, Query
julia> df = DataFrame(a=[1,2,3], b=[1,2,missing])
3×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 1   │ 1     │ 1       │
│ 2   │ 2     │ 2       │
│ 3   │ 3     │ missing │Attempting to filter for rows without values missing.
julia> df |> @filter(_.b !== missing) |> DataFrame
3×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 1   │ 1     │ 1       │
│ 2   │ 2     │ 2       │
│ 3   │ 3     │ missing │
# Expected behaviour.
julia> df[df.b .!== missing, :]
2×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ Int64⍰ │
├─────┼───────┼────────┤
│ 1   │ 1     │ 1      │
│ 2   │ 2     │ 2      │Attempting to filter for rows with values missing.
julia> df |> @filter(_.b === missing) |> DataFrame
0×2 DataFrame
# Expected behaviour.
julia> df[df.b .=== missing, :]
1×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 1   │ 3     │ missing │Using DataValues.jl's isna function solution provides the expected result.
df |> @filter(!Query.isna(_.b)) |> DataFrame
2×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ Int64⍰ │
├─────┼───────┼────────┤
│ 1   │ 1     │ 1      │
│ 2   │ 2     │ 2      │
julia> df |> @filter(Query.isna(_.b)) |> DataFrame
1×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 1   │ 3     │ missing │samuela