Skip to content

Commit 920efb6

Browse files
authored
Merge pull request #48 from JuliaAI/dev
For a 1.0 release 🎉
2 parents c5b7218 + 73d89eb commit 920efb6

32 files changed

+335
-136
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "LearnAPI"
22
uuid = "92ad9a40-7767-427a-9ee6-6e577f1266cb"
33
authors = ["Anthony D. Blaom <[email protected]>"]
4-
version = "0.2.0"
4+
version = "1.0.0"
55

66
[compat]
77
julia = "1.10"

README.md

Lines changed: 37 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -6,45 +6,58 @@ A base Julia interface for machine learning and statistics
66
[![Build Status](https://github.com/JuliaAI/LearnAPI.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/LearnAPI.jl/actions)
77
[![codecov](https://codecov.io/gh/JuliaAI/LearnAPI.jl/graph/badge.svg?token=9IWT9KYINZ)](https://codecov.io/gh/JuliaAI/LearnAPI.jl?branch=dev)
88
[![Docs](https://img.shields.io/badge/docs-dev-blue.svg)](https://juliaai.github.io/LearnAPI.jl/dev/)
9-
10-
Comprehensive documentation is [here](https://juliaai.github.io/LearnAPI.jl/dev/).
9+
[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/LearnAPI.jl/stable/)
1110

1211
New contributions welcome. See the [road map](ROADMAP.md).
1312

14-
## Code snippet
13+
## Synopsis
1514

16-
Configure a machine learning algorithm:
15+
LearnAPI.jl provides for variations and elaborations on the following basic pattern in machine
16+
learning and statistics:
1717

1818
```julia
19-
julia> ridge = Ridge(lambda=0.1)
19+
model = fit(learner, data)
20+
predict(model, newdata)
2021
```
2122

22-
Inspect available functionality:
23+
Here `learner` specifies the configuration the algorithm (the hyperparameters) while
24+
`model` stores learned parameters and any byproducts of algorithm execution.
2325

24-
```
25-
julia> @functions ridge
26-
(fit, LearnAPI.learner, LearnAPI.strip, obs, LearnAPI.features, LearnAPI.target, predict, LearnAPI.coefficients)
27-
```
26+
LearnAPI.jl is mostly method stubs and lots of documentation. It does not provide
27+
meta-algorithms, such as cross-validation or hyperparameter optimization, but does aim to
28+
support such algorithms.
2829

29-
Train:
30+
## Related packages
3031

31-
```julia
32-
julia> model = fit(ridge, data)
33-
```
32+
- [MLCore.jl](https://github.com/JuliaML/MLCore.jl): The default sub-sampling API (`getobs`/`numbobs`) for LearnAPI.jl implementations, which supports tables and arrays.
3433

35-
Predict:
34+
- [LearnTestAPI.jl](https://github.com/JuliaAI/LearnTestAPI.jl): Package to test implementations of LearnAPI.jl (but documented here)
3635

37-
```julia
38-
julia> predict(model, data)[1]
39-
"virginica"
40-
```
36+
- [LearnDataFrontEnds.jl](https://github.com/JuliaAI/LearnDataFrontEnds.jl): For including flexible, user-friendly, data front ends for LearnAPI.jl implementations ([docs](https://juliaai.github.io/LearnDataFrontEnds.jl/stable/))
4137

42-
Predict a probability distribution ([proxy](https://juliaai.github.io/LearnAPI.jl/dev/kinds_of_target_proxy/#proxy_types) for the target):
38+
- [StatisticalMeasures.jl](https://github.com/JuliaAI/StatisticalMeasures.jl): Package providing metrics, compatible with LearnAPI.jl
39+
40+
### Selected packages providing alternative API's
41+
42+
The following alphabetical list of packages provide public base API's. Some provide
43+
additional functionality. PR's to add missing items welcome.
44+
45+
- [AutoMLPipeline.jl](https://github.com/IBM/AutoMLPipeline.jl)
46+
47+
- [BetaML.jl](https://github.com/sylvaticus/BetaML.jl)
48+
49+
- [FastAI.jl](https://github.com/FluxML/FastAI.jl) (focused on deep learning)
50+
51+
- [LearnBase.jl](https://github.com/JuliaML/LearnBase.jl) (now archived but of historical interest)
52+
53+
- [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl)
54+
55+
- [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl) (more than a base API, focused on deep learning)
56+
57+
- [ScikitLearn.jl](https://github.com/cstjean/ScikitLearn.jl) (an API in addition to being a wrapper for [scikit-learn](https://scikit-learn.org/stable/))
58+
59+
- [StatsAPI.jl](https://github.com/JuliaStats/StatsAPI.jl/tree/main) (specialized to needs of traditional statistical models)
4360

44-
```julia
45-
julia> predict(model, Distribution(), data)[1]
46-
UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.25, virginica=>0.75)
47-
```
4861

4962
## Credits
5063

ROADMAP.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Road map
22

3-
- [ ] Mock up a challenging `update` use-case: controlling an iterative algorithm that
3+
- [x] Mock up a challenging `update` use-case: controlling an iterative algorithm that
44
wants, for efficiency, to internally compute the out-of-sample predictions that will
55
be for used to *externally* determined early stopping cc: @jeremiedb
66

77
- [ ] Get code coverage to 100% (see next item)
88

9-
- [ ] Add to this repo or a utility repo methods to test a valid implementation of
9+
- [x] Add to this repo or a utility repo methods to test a valid implementation of
1010
LearnAPI.jl
1111
1212
- [ ] Flush out "Common Implementation Patterns". The current plan is to mock up example
@@ -18,28 +18,28 @@
1818
- [ ] clustering
1919
- [x] gradient descent
2020
- [x] iterative algorithms
21-
- [ ] incremental algorithms
22-
- [ ] dimension reduction
21+
- [x] incremental algorithms
22+
- [x] dimension reduction
2323
- [x] feature engineering
2424
- [x] static algorithms
2525
- [ ] missing value imputation
26-
- [ ] transformers
26+
- [x] transformers
2727
- [x] ensemble algorithms
2828
- [ ] time series forecasting
2929
- [ ] time series classification
3030
- [ ] survival analysis
31-
- [ ] density estimation
31+
- [x] density estimation
3232
- [ ] Bayesian algorithms
3333
- [ ] outlier detection
3434
- [ ] collaborative filtering
3535
- [ ] text analysis
3636
- [ ] audio analysis
3737
- [ ] natural language processing
3838
- [ ] image processing
39-
- [ ] meta-algorithms
39+
- [x] meta-algorithms
4040

41-
- [ ] In a utility package provide:
42-
- [ ] Methods to facilitate common-use case data interfaces: support simultaneously
41+
- [x] In a utility package provide:
42+
- [x] Methods to facilitate common-use case data interfaces: support simultaneously
4343
`fit` data of the form `data = (X, y)` where `X` is table *or* matrix, and `data` a
4444
table with target specified by hyperparameter; here `obs` will return a thin wrapping
4545
of the matrix of `X`, the target `y`, and the names of all fields. We can have

docs/Project.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
33
DocumenterInterLinks = "d12716ef-a0f6-4df4-a9f1-a5a34e75c656"
44
LearnAPI = "92ad9a40-7767-427a-9ee6-6e577f1266cb"
5-
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
5+
LearnTestAPI = "3111ed91-c4f2-40e7-bb19-7f6c618409b8"
6+
MLCore = "c2834f40-e789-41da-a90e-33b280584a8c"
67
ScientificTypesBase = "30f210dd-8aff-4c5f-94ba-8e64358c1161"
78
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
89

docs/make.jl

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,12 @@ using Documenter
22
using LearnAPI
33
using ScientificTypesBase
44
using DocumenterInterLinks
5+
using LearnTestAPI
56

67
const REPO = Remotes.GitHub("JuliaAI", "LearnAPI.jl")
78

89
makedocs(
9-
modules=[LearnAPI,],
10+
modules=[LearnAPI, LearnTestAPI],
1011
format=Documenter.HTML(
1112
prettyurls = true,#get(ENV, "CI", nothing) == "true",
1213
collapselevel = 1,
@@ -16,6 +17,7 @@ makedocs(
1617
"Anatomy of an Implementation" => "anatomy_of_an_implementation.md",
1718
"Reference" => [
1819
"Overview" => "reference.md",
20+
"Public Names" => "list_of_public_names.md",
1921
"fit/update" => "fit_update.md",
2022
"predict/transform" => "predict_transform.md",
2123
"Kinds of Target Proxy" => "kinds_of_target_proxy.md",

docs/src/anatomy_of_an_implementation.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ nothing # hide
105105
```
106106

107107
Note that we also include `learner` in the struct, for it must be possible to recover
108-
`learner` from the output of `fit`; see [Accessor functions](@ref) below.
108+
`learner` from the output of `fit`; see [Accessor functions](@ref af) below.
109109

110110
The implementation of `fit` looks like this:
111111

@@ -159,7 +159,7 @@ first element of the tuple returned by [`LearnAPI.kinds_of_proxy(learner)`](@ref
159159
we overload appropriately below.
160160

161161

162-
### Accessor functions
162+
### [Accessor functions](@id af)
163163

164164
An [accessor function](@ref accessor_functions) has the output of [`fit`](@ref) as it's
165165
sole argument. Every new implementation must implement the accessor function
@@ -334,7 +334,7 @@ assumptions about data from those made above.
334334

335335
- If the `data` object consumed by `fit`, `predict`, or `transform` is not not a suitable
336336
table¹, array³, tuple of tables and arrays, or some other object implementing the
337-
[MLUtils.jl](https://juliaml.github.io/MLUtils.jl/dev/) `getobs`/`numobs` interface,
337+
[MLCore.jl](https://juliaml.github.io/MLCore.jl/dev/) `getobs`/`numobs` interface,
338338
then an implementation must: (i) overload [`obs`](@ref) to articulate how provided data
339339
can be transformed into a form that does support this interface, as illustrated below
340340
under [Providing a separate data front end](@ref) below; or (ii) overload the trait
@@ -419,7 +419,7 @@ The [`obs`](@ref) methods exist to:
419419
how it works.
420420

421421
In the typical case, where [`LearnAPI.data_interface`](@ref) is not overloaded, the
422-
alternative data representations must implement the MLUtils.jl `getobs/numobs` interface
422+
alternative data representations must implement the MLCore.jl `getobs/numobs` interface
423423
for observation subsampling, which is generally all a user or meta-algorithm will need,
424424
before passing the data on to `fit`/`predict`, as you would the original data.
425425

@@ -436,14 +436,14 @@ one enables the following alternative:
436436
observations = obs(learner, data) # preprocessed training data
437437

438438
# optional subsampling:
439-
observations = MLUtils.getobs(observations, train_indices)
439+
observations = MLCore.getobs(observations, train_indices)
440440

441441
model = fit(learner, observations)
442442

443443
newobservations = obs(model, newdata)
444444

445445
# optional subsampling:
446-
newobservations = MLUtils.getobs(observations, test_indices)
446+
newobservations = MLCore.getobs(observations, test_indices)
447447

448448
predict(model, newobservations)
449449
```
@@ -555,8 +555,8 @@ above. Here we must explicitly overload them, so that they also handle the outpu
555555

556556
```@example anatomy2
557557
LearnAPI.features(::Ridge, observations::RidgeFitObs) = observations.A
558-
LearnAPI.target(::Ridge, observations::RidgeFitObs) = observations.y
559558
LearnAPI.features(learner::Ridge, data) = LearnAPI.features(learner, obs(learner, data))
559+
LearnAPI.target(::Ridge, observations::RidgeFitObs) = observations.y
560560
LearnAPI.target(learner::Ridge, data) = LearnAPI.target(learner, obs(learner, data))
561561
```
562562

@@ -568,15 +568,15 @@ LearnAPI.target(learner::Ridge, data) = LearnAPI.target(learner, obs(learner, da
568568
are generally different.
569569

570570
- We need the adjoint operator, `'`, because the last dimension in arrays is the
571-
observation dimension, according to the MLUtils.jl convention. Remember, `Xnew` is a
571+
observation dimension, according to the MLCore.jl convention. Remember, `Xnew` is a
572572
table here.
573573

574574
Since LearnAPI.jl provides fallbacks for `obs` that simply return the unadulterated data
575575
argument, overloading `obs` is optional. This is provided data in publicized
576576
`fit`/`predict` signatures already consists only of objects implement the
577577
[`LearnAPI.RandomAccess`](@ref) interface (most tables¹, arrays³, and tuples thereof).
578578

579-
To opt out of supporting the MLUtils.jl interface altogether, an implementation must
579+
To opt out of supporting the MLCore.jl interface altogether, an implementation must
580580
overload the trait, [`LearnAPI.data_interface(learner)`](@ref). See [Data
581581
interfaces](@ref data_interfaces) for details.
582582

@@ -593,15 +593,15 @@ LearnAPI.fit(learner::Ridge, X, y; kwargs...) = fit(learner, (X, y); kwargs...)
593593
## [Demonstration of an advanced `obs` workflow](@id advanced_demo)
594594

595595
We now can train and predict using internal data representations, resampled using the
596-
generic MLUtils.jl interface:
596+
generic MLCore.jl interface:
597597

598598
```@example anatomy2
599-
import MLUtils
599+
import MLCore
600600
learner = Ridge()
601601
observations_for_fit = obs(learner, (X, y))
602-
model = fit(learner, MLUtils.getobs(observations_for_fit, train))
602+
model = fit(learner, MLCore.getobs(observations_for_fit, train))
603603
observations_for_predict = obs(model, X)
604-
ẑ = predict(model, MLUtils.getobs(observations_for_predict, test))
604+
ẑ = predict(model, MLCore.getobs(observations_for_predict, test))
605605
```
606606

607607
```julia
@@ -616,7 +616,7 @@ obs_workflows).
616616
¹ In LearnAPI.jl a *table* is any object `X` implementing the
617617
[Tables.jl](https://tables.juliadata.org/dev/) interface, additionally satisfying
618618
`Tables.istable(X) == true` and implementing `DataAPI.nrow` (and whence
619-
`MLUtils.numobs`). Tables that are also (unnamed) tuples are disallowed.
619+
`MLCore.numobs`). Tables that are also (unnamed) tuples are disallowed.
620620

621621
² An implementation can provide further accessor functions, if necessary, but
622622
like the native ones, they must be included in the [`LearnAPI.functions`](@ref)

docs/src/examples.md

Lines changed: 91 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ Below is the complete source code for the ridge implementations described in the
44
[Anatomy of an Implementation](@ref).
55

66
- [Basic implementation](@ref)
7-
- [Implementation with data front end](@ref)
7+
- [Implementation with a data front end](@ref)
8+
- [Implementation with a canned data front end](@ref)
89

910

1011
## Basic implementation
@@ -85,7 +86,7 @@ LearnAPI.strip(model::RidgeFitted) =
8586
LearnAPI.fit(learner::Ridge, X, y; kwargs...) = fit(learner, (X, y); kwargs...)
8687
```
8788

88-
# Implementation with data front end
89+
# Implementation with a data front end
8990

9091
```julia
9192
using LearnAPI
@@ -190,3 +191,91 @@ LearnAPI.strip(model::RidgeFitted) =
190191
)
191192

192193
```
194+
195+
# Implementation with a canned data front end
196+
197+
The following implements the `Saffron` data front end from
198+
[LearnDataFrontEnds.jl](https://juliaai.github.io/LearnDataFrontEnds.jl/stable/), which
199+
allows for a greater variety of forms of input to `fit` and `predict`. Refer to that
200+
package's [documentation](https://juliaai.github.io/LearnDataFrontEnds.jl/stable/) for details.
201+
202+
```julia
203+
using LearnAPI
204+
import LearnDataFrontEnds as FrontEnds
205+
using LinearAlgebra, Tables
206+
207+
struct Ridge{T<:Real}
208+
lambda::T
209+
end
210+
211+
Ridge(; lambda=0.1) = Ridge(lambda)
212+
213+
# struct for output of `fit`:
214+
struct RidgeFitted{T,F}
215+
learner::Ridge
216+
coefficients::Vector{T}
217+
named_coefficients::F
218+
end
219+
220+
frontend = FrontEnds.Saffron()
221+
222+
# these will return objects of type `FrontEnds.Obs`:
223+
LearnAPI.obs(learner::Ridge, data) = FrontEnds.fitobs(learner, data, frontend)
224+
LearnAPI.obs(model::RidgeFitted, data) = obs(model, data, frontend)
225+
226+
function LearnAPI.fit(learner::Ridge, observations::FrontEnds.Obs; verbosity=1)
227+
228+
lambda = learner.lambda
229+
230+
A = observations.features
231+
names = observations.names
232+
y = observations.target
233+
234+
# apply core learner:
235+
coefficients = (A*A' + learner.lambda*I)\(A*y) # 1 x p matrix
236+
237+
# determine named coefficients:
238+
named_coefficients = [names[j] => coefficients[j] for j in eachindex(names)]
239+
240+
# make some noise, if allowed:
241+
verbosity > 0 && @info "Coefficients: $named_coefficients"
242+
243+
return RidgeFitted(learner, coefficients, named_coefficients)
244+
245+
end
246+
LearnAPI.fit(learner::Ridge, data; kwargs...) =
247+
fit(learner, obs(learner, data); kwargs...)
248+
249+
LearnAPI.predict(model::RidgeFitted, ::Point, observations::FrontEnds.Obs) =
250+
(observations.features)'*model.coefficients
251+
LearnAPI.predict(model::RidgeFitted, ::Point, Xnew) =
252+
predict(model, Point(), obs(model, Xnew))
253+
254+
# training data deconstructors:
255+
LearnAPI.features(learner::Ridge, data) = LearnAPI.features(learner, data, frontend)
256+
LearnAPI.target(learner::Ridge, data) = LearnAPI.target(learner, data, frontend)
257+
258+
# accessor functions:
259+
LearnAPI.learner(model::RidgeFitted) = model.learner
260+
LearnAPI.coefficients(model::RidgeFitted) = model.named_coefficients
261+
LearnAPI.strip(model::RidgeFitted) =
262+
RidgeFitted(model.learner, model.coefficients, nothing)
263+
264+
@trait(
265+
Ridge,
266+
constructor = Ridge,
267+
kinds_of_proxy=(Point(),),
268+
tags = ("regression",),
269+
functions = (
270+
:(LearnAPI.fit),
271+
:(LearnAPI.learner),
272+
:(LearnAPI.clone),
273+
:(LearnAPI.strip),
274+
:(LearnAPI.obs),
275+
:(LearnAPI.features),
276+
:(LearnAPI.target),
277+
:(LearnAPI.predict),
278+
:(LearnAPI.coefficients),
279+
)
280+
)
281+
```

0 commit comments

Comments
 (0)