Skip to content

Commit d7a515e

Browse files
committed
RFC Sampling APIs
1 parent 110db88 commit d7a515e

File tree

1 file changed

+106
-0
lines changed

1 file changed

+106
-0
lines changed

rfc_samplers.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
**DISCLAIMER** This RFC is currently lacking (at least):
2+
3+
- a "time-based" equivalent of dilation
4+
- a "time-based" equivalent of random uniform sampling.
5+
6+
Main design principles
7+
----------------------
8+
9+
- The existing feature space of different libraries (torchvision, torchmm,
10+
internal stuff) is supported. In particular "Random", "Uniform" and
11+
"Periodic" samplers are supported, with hopefully more descriptive names.
12+
- Sampler is agnostic to decoding options: the decoder object is passed to the
13+
sampler by the user.
14+
- Explicit **non goal**: to support arbitrary strategy that we haven't
15+
observed or heard usage of. For example, in all existing libraries the clip
16+
content is never random and just determined by 2 parameters:
17+
`num_frames_per_clip` and `step_between_frames` (dilation). We allow for
18+
future extensibility, but we're not explicitly enabling additional
19+
clip-content strategies.
20+
- The term "clip" is being used to denote a sequence of frames in presentation
21+
order. The frames aren't necessarily consecutive (when
22+
`step_between_frames > 1`). We should add this term to the glossary. I don't
23+
feel too strongly about using the name "clip", but it's used in all existing
24+
libraries, and I don't know of an alternative that would be universally
25+
understood.
26+
27+
28+
Option 1
29+
--------
30+
31+
- One function per sampling strategy
32+
- Note: this is not 100% stateless, the decoder object is seeked so there are
33+
side effects.
34+
35+
```py
36+
from torchcodec.decoders import SimpleVideoDecoder
37+
from torchcodec import samplers
38+
39+
video = "cool_vid"
40+
decoder = SimpleVideoDecoder(video)
41+
42+
clips = samplers.get_uniformly_random_clips(
43+
decoder,
44+
num_clips=12,
45+
# clip content params:
46+
num_frames_per_clip=4,
47+
step_between_frames=2, # often called "dilation"
48+
# sampler-specific params:
49+
prioritize_keyframes=False, # to implement "fast" sampling.
50+
# Might need to be a separate function.
51+
)
52+
53+
clips = samplers.get_evenly_spaced_clips( # often called "Uniform"
54+
decoder,
55+
num_clips=12,
56+
# clip content params:
57+
num_frames_per_clip=4,
58+
step_between_frames=2,
59+
)
60+
61+
clips = samplers.get_evenly_timed_clips( # often called "Periodic"
62+
decoder,
63+
num_clips_per_second=3,
64+
# clip content params:
65+
num_frames_per_clip=4,
66+
step_between_frames=2,
67+
# sampler-specific params:
68+
clip_range_start_seconds=3,
69+
clip_range_end_seconds=4,
70+
)
71+
72+
Option 2
73+
--------
74+
75+
- One ClipSampler object where each sampling strategy is a method.
76+
- Bonus: the underlying `decoder` object can be re-initilized by the sampler
77+
in-between calls, possibly improving seek time??
78+
79+
```py
80+
from torchcodec.sampler import ClipSampler
81+
82+
sampler = ClipSampler(
83+
# Here: parameters that apply to all sampling strategies
84+
decoder,
85+
num_frames_per_clip=4,
86+
step_between_frames=2,
87+
)
88+
# One method per sampling strat
89+
sampler.get_uniformly_random_clips(num_clips=12, prioritize_keyframes=True)
90+
sampler.get_evenly_spaced_clips(num_clips=12)
91+
sampler.get_evenly_timed_clips(
92+
num_clips_per_second=3,
93+
clip_range_start_seconds=3,
94+
cip_range_end_seconds=4
95+
)
96+
```
97+
98+
99+
Questions
100+
---------
101+
102+
- should the returned `clips` be...?
103+
- a List[FrameBatch] where the FrameBatch.data is 4D
104+
- a FrameBatch where the FrameBatch.data is 5D
105+
- This is OK as long as num_frames_per_clip is not random (it is currently
106+
never random)

0 commit comments

Comments
 (0)