Skip to content

Commit db2ea07

Browse files
committed
Clean up comment
1 parent 343ed3e commit db2ea07

File tree

1 file changed

+1
-38
lines changed

1 file changed

+1
-38
lines changed

src/torchcodec/_core/Frame.h

Lines changed: 1 addition & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -58,49 +58,12 @@ struct AudioFramesOutput {
5858
// FRAME TENSOR ALLOCATION APIs
5959
// --------------------------------------------------------------------------
6060

61-
// Note [Frame Tensor allocation and height and width]
61+
// Note [Frame Tensor allocation]
6262
//
6363
// We always allocate [N]HWC tensors. The low-level decoding functions all
6464
// assume HWC tensors, since this is what FFmpeg natively handles. It's up to
6565
// the high-level decoding entry-points to permute that back to CHW, by calling
6666
// maybePermuteHWC2CHW().
67-
//
68-
// TODO: Rationalize the comment below with refactoring.
69-
//
70-
// Also, importantly, the way we figure out the the height and width of the
71-
// output frame tensor varies, and depends on the decoding entry-point. In
72-
// *decreasing order of accuracy*, we use the following sources for determining
73-
// height and width:
74-
// - getHeightAndWidthFromResizedAVFrame(). This is the height and width of the
75-
// AVframe, *post*-resizing. This is only used for single-frame decoding APIs,
76-
// on CPU, with filtergraph.
77-
// - getHeightAndWidthFromOptionsOrAVFrame(). This is the height and width from
78-
// the user-specified options if they exist, or the height and width of the
79-
// AVFrame *before* it is resized. In theory, i.e. if there are no bugs within
80-
// our code or within FFmpeg code, this should be exactly the same as
81-
// getHeightAndWidthFromResizedAVFrame(). This is used by single-frame
82-
// decoding APIs, on CPU with swscale, and on GPU.
83-
// - getHeightAndWidthFromOptionsOrMetadata(). This is the height and width from
84-
// the user-specified options if they exist, or the height and width form the
85-
// stream metadata, which itself got its value from the CodecContext, when the
86-
// stream was added. This is used by batch decoding APIs, for both GPU and
87-
// CPU.
88-
//
89-
// The source of truth for height and width really is the (resized) AVFrame: it
90-
// comes from the decoded ouptut of FFmpeg. The info from the metadata (i.e.
91-
// from the CodecContext) may not be as accurate. However, the AVFrame is only
92-
// available late in the call stack, when the frame is decoded, while the
93-
// CodecContext is available early when a stream is added. This is why we use
94-
// the CodecContext for pre-allocating batched output tensors (we could
95-
// pre-allocate those only once we decode the first frame to get the info frame
96-
// the AVFrame, but that's a more complex logic).
97-
//
98-
// Because the sources for height and width may disagree, we may end up with
99-
// conflicts: e.g. if we pre-allocate a batch output tensor based on the
100-
// metadata info, but the decoded AVFrame has a different height and width.
101-
// it is very important to check the height and width assumptions where the
102-
// tensors memory is used/filled in order to avoid segfaults.
103-
10467
torch::Tensor allocateEmptyHWCTensor(
10568
const FrameDims& frameDims,
10669
const torch::Device& device,

0 commit comments

Comments
 (0)