Skip to content

Commit 04a0ccf

Browse files
authored
Document support for special terms at run-time, one-sided formula (#155)
* warning about programmatic terms (and fix spell-o) * two-pass Poly example * document runtime-friendly special terms * update doctests for runtime (plus GLM/dataframes updates) * a note about one-sided formula * use property syntax for dataframe columns * run doctest=:fix again, printing numbers and var"" * run-time->runtime, pointer in poly example, clarify context * move runtime up above context; missing methods; move up lm example * run with doctest=:fix * final little note * move runtime poly method into main example and revise discussion * run doctest = :fix * bump patch version 0.6.5
1 parent 05dbb50 commit 04a0ccf

File tree

5 files changed

+255
-78
lines changed

5 files changed

+255
-78
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "StatsModels"
22
uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
3-
version = "0.6.4"
3+
version = "0.6.5"
44

55
[deps]
66
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"

docs/src/formula.md

Lines changed: 53 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Predictors:
5454
b(unknown) & c(unknown)
5555
5656
julia> df = DataFrame(y = rand(9), a = 1:9, b = rand(9), c = repeat(["d","e","f"], 3))
57-
9×4 DataFrame
57+
9×4 DataFrames.DataFrame
5858
│ Row │ y │ a │ b │ c │
5959
│ │ Float64 │ Int64 │ Float64 │ String │
6060
├─────┼────────────┼───────┼───────────┼────────┤
@@ -108,6 +108,11 @@ The left-hand side has one term `y` which means that the response variable is
108108
the column from the data named `:y`. The response can be accessed with the
109109
analogous `response(f, df)` function.
110110

111+
!!! note
112+
113+
To make a "one-sided" formula (with no response), put a `0` on the left-hand
114+
side, like `@formula(0 ~ 1 + a + b)`.
115+
111116
The right hand side is made up of a number of different **terms**, separated by
112117
`+`: `1 + a + b + c + b&c`. Each term corresponds to one or more columns in the
113118
generated model matrix:
@@ -214,34 +219,34 @@ For instance, to fit a linear regression to a log-transformed response:
214219
julia> using GLM
215220
216221
julia> lm(@formula(log(y) ~ 1 + a + b), df)
217-
StatsModels.TableRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
222+
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
218223
219224
:(log(y)) ~ 1 + a + b
220225
221226
Coefficients:
222-
──────────────────────────────────────────────────────
223-
Estimate Std.Error t value Pr(>|t|)
224-
──────────────────────────────────────────────────────
225-
(Intercept) -4.16168 2.98788 -1.39285 0.2131
226-
a 0.357482 0.342126 1.04489 0.3363
227-
b 2.32528 3.13735 0.741159 0.4866
228-
──────────────────────────────────────────────────────
227+
──────────────────────────────────────────────────────────────────────────────
228+
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
229+
──────────────────────────────────────────────────────────────────────────────
230+
(Intercept) -4.16168 2.98788 -1.39285 0.2131 -11.4727 3.14939
231+
a 0.357482 0.342126 1.04489 0.3363 -0.479669 1.19463
232+
b 2.32528 3.13735 0.741159 0.4866 -5.35154 10.0021
233+
──────────────────────────────────────────────────────────────────────────────
229234
230-
julia> df[:log_y] = log.(df[:y]);
235+
julia> df.log_y = log.(df.y);
231236
232237
julia> lm(@formula(log_y ~ 1 + a + b), df) # equivalent
233-
StatsModels.TableRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
238+
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
234239
235240
log_y ~ 1 + a + b
236241
237242
Coefficients:
238-
──────────────────────────────────────────────────────
239-
Estimate Std.Error t value Pr(>|t|)
240-
──────────────────────────────────────────────────────
241-
(Intercept) -4.16168 2.98788 -1.39285 0.2131
242-
a 0.357482 0.342126 1.04489 0.3363
243-
b 2.32528 3.13735 0.741159 0.4866
244-
──────────────────────────────────────────────────────
243+
──────────────────────────────────────────────────────────────────────────────
244+
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
245+
──────────────────────────────────────────────────────────────────────────────
246+
(Intercept) -4.16168 2.98788 -1.39285 0.2131 -11.4727 3.14939
247+
a 0.357482 0.342126 1.04489 0.3363 -0.479669 1.19463
248+
b 2.32528 3.13735 0.741159 0.4866 -5.35154 10.0021
249+
──────────────────────────────────────────────────────────────────────────────
245250
246251
```
247252

@@ -262,9 +267,9 @@ julia> modelmatrix(@formula(y ~ 1 + b + identity(1+b)), df)
262267
1.0 0.0203749 1.02037
263268
```
264269

265-
## Constructing a formula programatically
270+
## Constructing a formula programmatically
266271

267-
A formula can be constructed at run-time by creating `Term`s and combining them
272+
A formula can be constructed at runtime by creating `Term`s and combining them
268273
with the formula operators `+`, `&`, and `~`:
269274

270275
```jldoctest 1
@@ -279,6 +284,20 @@ Predictors:
279284
a(unknown) & b(unknown)
280285
```
281286

287+
!!! warning
288+
289+
Even though the `@formula` macro supports arbitrary julia functions,
290+
runtime (programmatic) formula construction does not. This is because to
291+
resolve a symbol giving a function's _name_ into the actual _function_
292+
itself, it's necessary to `eval`. In practice this is not often an issue,
293+
_except_ in cases where a package provides special syntax by overloading a
294+
function (like `|` for
295+
[MixedModels.jl](https://github.com/dmbates/MixedModels.jl), or `absorb`
296+
for [Econometrics.jl](https://github.com/Nosferican/Econometrics.jl)). In
297+
these cases, you should use the corresponding constructors for the actual
298+
terms themselves (e.g., `RanefTerm` and `FixedEffectsTerm` respectively), as
299+
long as the packages have [implemented support for them](@ref extend-runtime).
300+
282301
The [`term`](@ref) function constructs a term of the appropriate type from
283302
symbols and numbers, which makes it easy to work with collections of mixed type:
284303

@@ -338,26 +357,26 @@ julia> β_true = 1:8;
338357
339358
julia> ϵ = randn(100)*0.1;
340359
341-
julia> data[:y] = X*β_true .+ ϵ;
360+
julia> data.y = X*β_true .+ ϵ;
342361
343362
julia> mod = fit(LinearModel, @formula(y ~ 1 + a*b), data)
344-
StatsModels.TableRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
363+
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
345364
346365
y ~ 1 + a + b + a & b
347366
348367
Coefficients:
349-
───────────────────────────────────────────────────
350-
Estimate Std.Error t value Pr(>|t|)
351-
───────────────────────────────────────────────────
352-
(Intercept) 0.98878 0.0384341 25.7266 <1e-43
353-
a 2.00843 0.0779388 25.7694 <1e-43
354-
b: e 3.03726 0.0616371 49.2764 <1e-67
355-
b: f 4.03909 0.0572857 70.5078 <1e-81
356-
b: g 5.02948 0.0587224 85.6484 <1e-88
357-
a & b: e 5.9385 0.10753 55.2264 <1e-71
358-
a & b: f 6.9073 0.112483 61.4075 <1e-75
359-
a & b: g 7.93918 0.111285 71.3407 <1e-81
360-
───────────────────────────────────────────────────
368+
──────────────────────────────────────────────────────────────────────────
369+
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
370+
──────────────────────────────────────────────────────────────────────────
371+
(Intercept) 0.98878 0.0384341 25.7266 <1e-43 0.912447 1.06511
372+
a 2.00843 0.0779388 25.7694 <1e-43 1.85364 2.16323
373+
b: e 3.03726 0.0616371 49.2764 <1e-67 2.91484 3.15967
374+
b: f 4.03909 0.0572857 70.5078 <1e-81 3.92531 4.15286
375+
b: g 5.02948 0.0587224 85.6484 <1e-88 4.91285 5.14611
376+
a & b: e 5.9385 0.10753 55.2264 <1e-71 5.72494 6.15207
377+
a & b: f 6.9073 0.112483 61.4075 <1e-75 6.6839 7.1307
378+
a & b: g 7.93918 0.111285 71.3407 <1e-81 7.71816 8.16021
379+
──────────────────────────────────────────────────────────────────────────
361380
362381
```
363382

0 commit comments

Comments
 (0)