Skip to content

Filtering missing #265

@CiaranOMara

Description

@CiaranOMara

I encountered unexpected behaviour when attempting to filter values of type Missing. I found a solution at https://discourse.julialang.org/t/query-jl-filtering-on-missing-data/14898. I suppose this issue is a feature request for documentation that clarifies missing values in Query.jl.

Anyhow, the case is as follows.

julia> using DataFrames, Query

julia> df = DataFrame(a=[1,2,3], b=[1,2,missing])
3×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 111       │
│ 222       │
│ 33missing

Attempting to filter for rows without values missing.

julia> df |> @filter(_.b !== missing) |> DataFrame
3×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 111       │
│ 222       │
│ 33missing# Expected behaviour.
julia> df[df.b .!== missing, :]
2×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ Int64⍰ │
├─────┼───────┼────────┤
│ 111      │
│ 222

Attempting to filter for rows with values missing.

julia> df |> @filter(_.b === missing) |> DataFrame
0×2 DataFrame

# Expected behaviour.
julia> df[df.b .=== missing, :]
1×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 13missing

Using DataValues.jl's isna function solution provides the expected result.

df |> @filter(!Query.isna(_.b)) |> DataFrame
2×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ Int64⍰ │
├─────┼───────┼────────┤
│ 111      │
│ 222      │

julia> df |> @filter(Query.isna(_.b)) |> DataFrame
1×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Int64⍰  │
├─────┼───────┼─────────┤
│ 13missing

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions