Skip to content

Commit f832a15

Browse files
committed
finished pass through documentation
1 parent 3fc5a0e commit f832a15

File tree

6 files changed

+177
-113
lines changed

6 files changed

+177
-113
lines changed

docs/src/ref/learning.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,10 @@ end
5151
```
5252

5353
Let's suppose we are training the generative model.
54-
The first step is to initialize the values of the trainable parameters, which for generative functions constructed using the built-in modeling languages, we do with [`init_param!`](@ref):
54+
The first step is to initialize the values of the trainable parameters, which for generative functions constructed using the built-in modeling languages, we do with [`init_parameter!`](@ref):
5555
```julia
56-
init_param!(model, :a, 0.)
57-
init_param!(model, :b, 0.)
56+
init_parameter!((model, :a), 0.0)
57+
init_parameter!((model, :b), 0.0)
5858
```
5959
Each trace in the collection contains the observed data from an independent draw from our model.
6060
We can populate each trace with its observed data using [`generate`](@ref):
@@ -76,24 +76,24 @@ for trace in traces
7676
accumulate_param_gradients!(trace)
7777
end
7878
```
79-
Finally, we can construct and gradient-based update with [`ParamUpdate`](@ref) and apply it with [`apply!`](@ref).
79+
Finally, we can construct and gradient-based update with [`init_optimizer`](@ref) and apply it with [`apply_update!`](@ref).
8080
We can put this all together into a function:
8181
```julia
8282
function train_model(data::Vector{ChoiceMap})
83-
init_param!(model, :theta, 0.1)
83+
init_parameter!((model, :theta), 0.1)
8484
traces = []
8585
for observations in data
8686
trace, = generate(model, model_args, observations)
8787
push!(traces, trace)
8888
end
89-
update = ParamUpdate(FixedStepSizeGradientDescent(0.001), model)
89+
optimizer = init_optimizer(FixedStepGradientDescent(0.001), model)
9090
for iter=1:max_iter
9191
objective = sum([get_score(trace) for trace in traces])
9292
println("objective: $objective")
9393
for trace in traces
9494
accumulate_param_gradients!(trace)
9595
end
96-
apply!(update)
96+
apply_update!(optimizer)
9797
end
9898
end
9999
```
@@ -139,14 +139,14 @@ There are many variants possible, based on which Monte Carlo inference algorithm
139139
For example:
140140
```julia
141141
function train_model(data::Vector{ChoiceMap})
142-
init_param!(model, :theta, 0.1)
143-
update = ParamUpdate(FixedStepSizeGradientDescent(0.001), model)
142+
init_parameter!((model, :theta), 0.1)
143+
optimizer = init_optimizer(FixedStepGradientDescent(0.001), model)
144144
for iter=1:max_iter
145145
traces = do_monte_carlo_inference(data)
146146
for trace in traces
147147
accumulate_param_gradients!(trace)
148148
end
149-
apply!(update)
149+
apply_update!(optimizer)
150150
end
151151
end
152152

@@ -160,14 +160,14 @@ end
160160
Note that it is also possible to use a weighted collection of traces directly without resampling:
161161
```julia
162162
function train_model(data::Vector{ChoiceMap})
163-
init_param!(model, :theta, 0.1)
164-
update = ParamUpdate(FixedStepSizeGradientDescent(0.001), model)
163+
init_parameter!((model, :theta), 0.1)
164+
optimizer = init_optimizer(FixedStepGradientDescent(0.001), model)
165165
for iter=1:max_iter
166166
traces, weights = do_monte_carlo_inference_with_weights(data)
167167
for (trace, weight) in zip(traces, weights)
168168
accumulate_param_gradients!(trace, nothing, weight)
169169
end
170-
apply!(update)
170+
apply_update!(optimizer)
171171
end
172172
end
173173
```

docs/src/ref/modeling.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,7 @@ See [Generative Function Interface](@ref) for more information about traces.
254254

255255
A `@gen` function may begin with an optional block of *trainable parameter declarations*.
256256
The block consists of a sequence of statements, beginning with `@param`, that declare the name and Julia type for each trainable parameter.
257+
The Julia type must be either a subtype of `Real` or subtype of `Array{<:Real}`.
257258
The function below has a single trainable parameter `theta` with type `Float64`:
258259
```julia
259260
@gen function foo(prob::Float64)
@@ -264,23 +265,22 @@ The function below has a single trainable parameter `theta` with type `Float64`:
264265
end
265266
```
266267
Trainable parameters obey the same scoping rules as Julia local variables defined at the beginning of the function body.
267-
The value of a trainable parameter is undefined until it is initialized using [`init_param!`](@ref).
268+
After the definition of the generative function, you must register all of the parameters used by the generative function using [`register_parameters!`](@ref) (this is not required if you instead use the [Static Modeling Language](@ref)):
269+
```julia
270+
register_parameters!(foo, [:theta])
271+
```
272+
The value of a trainable parameter is undefined until it is initialized using [`init_parameter!`](@ref):
273+
```julia
274+
init_parameter!((foo, :theta), 0.0)
275+
```
268276
In addition to the current value, each trainable parameter has a current **gradient accumulator** value.
269277
The gradient accumulator value has the same shape (e.g. array dimension) as the parameter value.
270-
It is initialized to all zeros, and is incremented by [`accumulate_param_gradients!`](@ref).
271-
272-
The following methods are exported for the trainable parameters of `@gen` functions:
278+
It is initialized to all zeros, and is incremented by calling [`accumulate_param_gradients!`](@ref) on a trace.
279+
Additional functions for retrieving and manipulating the values of trainable parameters and their gradient accumulators are described in [Optimizing Trainable Parameters](@ref).
273280
```@docs
274-
init_param!
275-
get_param
276-
get_param_grad
277-
set_param!
278-
zero_param_grad!
281+
register_parameters!
279282
```
280283

281-
Trainable parameters are designed to be trained using gradient-based methods.
282-
This is discussed in the next section.
283-
284284
## Differentiable programming
285285

286286
Given a trace of a `@gen` function, Gen supports automatic differentiation of the log probability (density) of all of the random choices made in the trace with respect to the following types of inputs:

docs/src/ref/parameter_optimization.md

Lines changed: 67 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,82 @@
11
# Optimizing Trainable Parameters
22

3-
Trainable parameters of generative functions are initialized differently depending on the type of generative function.
4-
Trainable parameters of the built-in modeling language are initialized with [`init_param!`](@ref).
3+
## Parameter stores
54

6-
Gradient-based optimization of the trainable parameters of generative functions is based on interleaving two steps:
5+
Multiple traces of a generative function typically reference the same trainable parameters of the generative function, which are stored outside of the trace in a **parameter store**.
6+
Different types of generative functions may use different types of parameter stores.
7+
For example, the [`JuliaParameterStore`](@ref) (discussed below) stores parameters as Julia values in the memory of the Julia runtime process.
8+
Other types of parameter stores may store parameters in GPU memory, in a filesystem, or even remotely.
79

8-
- Incrementing gradient accumulators for trainable parameters by calling [`accumulate_param_gradients!`](@ref) on one or more traces.
10+
When generating a trace of a generative function with [`simulate`](@ref) or [`generate`](@ref), we may pass in an optional **parameter context**, which is a `Dict` that provides information about which parameter store(s) in which to look up the value of parameters.
11+
A generative function obtains a reference to a specific type of parameter store by looking up its key in the parameter context.
912

10-
- Updating the value of trainable parameters and resetting the gradient accumulators to zero, by calling [`apply!`](@ref) on a *parameter update*, as described below.
13+
If you are just learning Gen, and are only using the built-in modeling language to write generative functions, you can ignore this complexity, because there is a [`default_julia_parameter_store`](@ref) and a default parameter context [`default_parameter_context`](@ref) that points to this default Julia parameter store that will be used if a parameter context is not provided in the call to `simulate` and `generate`.
14+
```@docs
15+
default_parameter_context
16+
default_julia_parameter_store
17+
```
18+
19+
## Julia parameter store
20+
21+
Parameters declared using the `@param` keyword in the built-in modeling language are stored in a type of parameter store called a [`JuliaParameterStore`](@ref).
22+
A generative function can obtain a reference to a `JuliaParameterStore` by looking up the key [`JULIA_PARAMETER_STORE_KEY`](@ref) in a parameter context.
23+
This is how the built-in modeling language implementation finds the parameter stores to use for `@param`-declared parameters.
24+
Note that if you are defining your own [custom generative functions](@ref #Custom-generative-functions), you can also use a [`JuliaParameterStore`](@ref) (including the same parameter store used to store parameters of built-in modeling language generative functions) to store and optimize your trainable parameters.
1125

12-
## Parameter update
26+
Different types of parameter stores provide different APIs for reading, writing, and updating the values of parameters and gradient accumulators for parameters.
27+
The `JuliaParameterStore` API is given below.
28+
The API uses tuples of the form `(gen_fn::GenerativeFunction, name::Symbol)` to identify parameters.
29+
(Note that most user learning code only needs to use [`init_parameter!`](@ref), as the other API functions are called by [Optimizers](@ref) which are discussed below.)
1330

14-
A *parameter update* reads from the gradient accumulators for certain trainable parameters, updates the values of those parameters, and resets the gradient accumulators to zero.
15-
A paramter update is constructed by combining an *update configuration* with the set of trainable parameters to which the update should be applied:
1631
```@docs
17-
ParamUpdate
32+
JuliaParameterStore
33+
init_parameter!
34+
increment_gradient!
35+
reset_gradient!
36+
get_parameter_value
37+
get_gradient
38+
JULIA_PARAMETER_STORE_KEY
1839
```
19-
The set of possible update configurations is described in [Update configurations](@ref).
20-
An update is applied with:
40+
41+
### Multi-threaded gradient accumulation
42+
43+
Note that the [`increment_gradient!`](@ref) call is thread-safe, so that multiple threads can concurrently increment the gradient for the same parameters. This is helpful for parallelizing gradient computation for a batch of traces within stochastic gradient descent learning algorithms.
44+
45+
## Optimizers
46+
47+
Gradient-based optimization typically involves iterating between two steps:
48+
(i) computing gradients or estimates of gradients with respect to parameters, and
49+
(ii) updating the value of the parameters based on the gradient estimates according to some mathematical rule.
50+
Sometimes the optimization algorithm also has its own state that is separate from the value of the parameters and the gradient estimates.
51+
Gradient-based optimization algorithms in Gen are implemented by **optimizers**.
52+
Each type of parameter store provides implementations of optimizers for standard mathematical update rules.
53+
54+
The mathematical rules are defined in **optimizer configuration** objects.
55+
The currently supported optimizer configurations are:
2156
```@docs
22-
apply!
57+
FixedStepGradientDescent
58+
DecayStepGradientDescent
59+
```
60+
61+
The most common way to construct an optimizer is via:
62+
```julia
63+
optimizer = init_optimizer(conf, gen_fn)
2364
```
65+
which returns an optimizer that applies the mathematical rule defined by `conf` to all parameters used by `gen_fn` (even when the generative function uses parameters that are housed in multiple parameter stores).
66+
You can also pass a parameter context keyword argument to customize the parameter store(s) that the optimizer should use.
67+
Then, after accumulating gradients with [`accumulate_param_gradients!`](@ref), you can apply the update with:
68+
```julia
69+
apply_update!(optimizer)
70+
```
71+
72+
The `init_optimizer` method described above constructs an optimizer that actually invokes multiple optimizers, one for each parameter store.
73+
To add support to a parameter store type for a new optimizer configuration type, you must implement the per-parameter-store optimizer methods:
2474

25-
## Update configurations
75+
- `init_optimizer(conf, parameter_ids, store)`, which takes in an optimizer configuration object, and list of parameter IDs, and the parameter store in which to apply the updates, and returns an optimizer thata mutates the given parameter store.
76+
77+
- `apply_update!(optimizer)`, which takes in an a single argument (the optimizer) and applies its update rule, which mutates the value of the parameters in its parameter store (and typically also resets the values of the gradient accumulators to zero).
2678

27-
Gen has built-in support for the following types of update configurations.
2879
```@docs
29-
FixedStepGradientDescent
30-
GradientDescent
31-
ADAM
80+
init_optimizer
81+
apply_update!
3282
```
33-
For adding new types of update configurations, see [Optimizing Trainable Parameters (Internal)](@ref optimizing-internal).

src/builtin_optimization.jl

Whitespace-only changes.

src/dynamic/dynamic.jl

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,14 @@ end
5656
"""
5757
register_parameters!(gen_fn::DynamicDSLFunction, parameters)
5858
59-
Register the altrainable parameters that are used by a DML generative function.
59+
Register the trainable parameters that used by a DML generative function.
6060
61-
This includes all parameters used within any calls made by the generative function.
61+
This includes all parameters used within any calls made by the generative function, and includes any parameters that may be used by any possible trace (stochastic control flow may cause a parameter to be used by one trace but not another).
6262
63-
There are two variants:
64-
65-
# TODO document the variants
63+
The second argument is either a `Vector` or a `Function` that takes a parameter context and returns a `Dict` that maps parameter stores to `Vector`s of parameter IDs.
64+
When the second argument is a `Vector`, each element is either a `Symbol` that is the name of a parameter declared in the body of `gen_fn` using `@param`, or is a tuple `(other_gen_fn::GenerativeFunction, name::Symbol)` where `@param <name>` was declared in the body of `other_gen_fn`.
65+
The `Function` input is used when `gen_fn` uses parameters that come from more than one parameter store, including parameters that are housed in parameter stores that are not `JuliaParameterStore`s (e.g. if `gen_fn` invokes a generative function that executes in another non-Julia runtime).
66+
See [Optimizing Trainable Parameters](@ref) for details on parameter contexts, and parameter stores.
6667
"""
6768
function register_parameters!(gen_fn::DynamicDSLFunction, parameters)
6869
gen_fn.parameters = parameters

0 commit comments

Comments
 (0)