|
| 1 | +**DISCLAIMER** This RFC is currently lacking (at least): |
| 2 | + |
| 3 | +- a "time-based" equivalent of dilation |
| 4 | +- a "time-based" equivalent of random uniform sampling. |
| 5 | + |
| 6 | +Main design principles |
| 7 | +---------------------- |
| 8 | + |
| 9 | +- The existing feature space of different libraries (torchvision, torchmm, |
| 10 | + internal stuff) is supported. In particular "Random", "Uniform" and |
| 11 | + "Periodic" samplers are supported, with hopefully more descriptive names. |
| 12 | +- Sampler is agnostic to decoding options: the decoder object is passed to the |
| 13 | + sampler by the user. |
| 14 | +- Explicit **non goal**: to support arbitrary strategy that we haven't |
| 15 | + observed or heard usage of. For example, in all existing libraries the clip |
| 16 | + content is never random and just determined by 2 parameters: |
| 17 | + `num_frames_per_clip` and `step_between_frames` (dilation). We allow for |
| 18 | + future extensibility, but we're not explicitly enabling additional |
| 19 | + clip-content strategies. |
| 20 | +- The term "clip" is being used to denote a sequence of frames in presentation |
| 21 | + order. The frames aren't necessarily consecutive (when |
| 22 | + `step_between_frames > 1`). We should add this term to the glossary. I don't |
| 23 | + feel too strongly about using the name "clip", but it's used in all existing |
| 24 | + libraries, and I don't know of an alternative that would be universally |
| 25 | + understood. |
| 26 | + |
| 27 | + |
| 28 | +Option 1 |
| 29 | +-------- |
| 30 | + |
| 31 | +- One function per sampling strategy |
| 32 | +- Note: this is not 100% stateless, the decoder object is seeked so there are |
| 33 | + side effects. |
| 34 | + |
| 35 | +```py |
| 36 | +from torchcodec.decoders import SimpleVideoDecoder |
| 37 | +from torchcodec import samplers |
| 38 | + |
| 39 | +video = "cool_vid" |
| 40 | +decoder = SimpleVideoDecoder(video) |
| 41 | + |
| 42 | +clips = samplers.get_uniformly_random_clips( |
| 43 | + decoder, |
| 44 | + num_clips=12, |
| 45 | + # clip content params: |
| 46 | + num_frames_per_clip=4, |
| 47 | + step_between_frames=2, # often called "dilation" |
| 48 | + # sampler-specific params: |
| 49 | + prioritize_keyframes=False, # to implement "fast" sampling. |
| 50 | + # Might need to be a separate function. |
| 51 | +) |
| 52 | + |
| 53 | +clips = samplers.get_evenly_spaced_clips( # often called "Uniform" |
| 54 | + decoder, |
| 55 | + num_clips=12, |
| 56 | + # clip content params: |
| 57 | + num_frames_per_clip=4, |
| 58 | + step_between_frames=2, |
| 59 | +) |
| 60 | + |
| 61 | +clips = samplers.get_evenly_timed_clips( # often called "Periodic" |
| 62 | + decoder, |
| 63 | + num_clips_per_second=3, |
| 64 | + # clip content params: |
| 65 | + num_frames_per_clip=4, |
| 66 | + step_between_frames=2, |
| 67 | + # sampler-specific params: |
| 68 | + clip_range_start_seconds=3, |
| 69 | + clip_range_end_seconds=4, |
| 70 | +) |
| 71 | + |
| 72 | +Option 2 |
| 73 | +-------- |
| 74 | + |
| 75 | +- One ClipSampler object where each sampling strategy is a method. |
| 76 | +- Bonus: the underlying `decoder` object can be re-initilized by the sampler |
| 77 | + in-between calls, possibly improving seek time?? |
| 78 | + |
| 79 | +```py |
| 80 | +from torchcodec.sampler import ClipSampler |
| 81 | + |
| 82 | +sampler = ClipSampler( |
| 83 | + # Here: parameters that apply to all sampling strategies |
| 84 | + decoder, |
| 85 | + num_frames_per_clip=4, |
| 86 | + step_between_frames=2, |
| 87 | +) |
| 88 | +# One method per sampling strat |
| 89 | +sampler.get_uniformly_random_clips(num_clips=12, prioritize_keyframes=True) |
| 90 | +sampler.get_evenly_spaced_clips(num_clips=12) |
| 91 | +sampler.get_evenly_timed_clips( |
| 92 | + num_clips_per_second=3, |
| 93 | + clip_range_start_seconds=3, |
| 94 | + cip_range_end_seconds=4 |
| 95 | +) |
| 96 | +``` |
| 97 | + |
| 98 | + |
| 99 | +Questions |
| 100 | +--------- |
| 101 | + |
| 102 | +- should the returned `clips` be...? |
| 103 | + - a List[FrameBatch] where the FrameBatch.data is 4D |
| 104 | + - a FrameBatch where the FrameBatch.data is 5D |
| 105 | + - This is OK as long as num_frames_per_clip is not random (it is currently |
| 106 | + never random) |
0 commit comments