Skip to content

Conversation

Dan-Flores
Copy link
Contributor

This PR implements the backend functions needed for video encoding.
The main changes are in Encoder.cpp/h and test_ops.py, the rest are scaffolding to allow the code to build / test.

Encoder.cpp implements primary functions:

  • initializeEncoder
  • convertTensorToAVFrame
  • encodeFrame
  • encode (the top level function)

One test is implemented in test_video_encoder_test_round_trip:

  • This test converts the nasa_13013.mp4 to mp4, mov, and avi, with a resulting peak-signal to noise ratio around 30 - 40.

Remaining items to implement are marked with TODO-VideoEncoder:

  • Remove hardcoding of H264 codec
  • Validate sample rate, similar to the AudioEncoder
  • Enable selection of VideoStreamOptions data to be set or remove them if unneeded
    • height, width, bitrate, GOP size, max B-frames
  • test additional video container formats

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 3, 2025
@Dan-Flores Dan-Flores marked this pull request as ready for review September 5, 2025 14:06
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dan-Flores this is a great start! I left a lot of comments but most of them are nits and TODOs as follow-up.


// TODO-VideoEncoder: Reorder tensor if in NHWC format
int channelSize = inHeight_ * inWidth_;
// Reorder RGB -> GBR for AV_PIX_FMT_GBRP or AV_PIX_FMT_GBRAP formats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a TODO to investigate this: ideally, we should be able to pass packed RGB data as RGBRGBRGB... without having to do anything fancy (like GBR). I'm not sure. I don't fully understand this yet.

Also let's remove the ref to AV_PIX_FMT_GBRAP as I don't think we'll want to support that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also a bit confused on the formats, here's why I think this is necessary:

  • The video decoder outputs a tensor in packed RGB24 pixel format, which gets converted in SingleStreamDecoder::maybePermuteHWC2CHW to NCHW planar format, with the channels in RGB order.
  • Rather than convert the planar data to packed data, this code sets the input pixel format to planar GBR format, and reorders the frame's data pointers to the correct channels (this is why my images were green earlier).

Comment on lines 770 to 777
av_packet_rescale_ts(
packet.get(),
avCodecContext_->time_base,
avFormatContext_->streams[streamIndex_]->time_base);
packet->stream_index = streamIndex_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why that part is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is leftover from debugging the frame rates in my test. Since we aren't modifying the frame_rate or time_base, I'll remove this.

@NicolasHug
Copy link
Contributor

On the Windows test failures with FFmpeg 6 https://github.com/pytorch/torchcodec/actions/runs/17468896896/job/49613138742?pr=866

I think we can safely just skip the test on Windows + FFmpeg 6 and make this is TODO

frames.sizes()[1] == 3,
"frame must have 3 channels (R, G, B), got ",
frames.sizes()[1]);
// TODO-VideoEncoder: Investigate if non-contiguous frames can be returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about input, not output :)

Suggested change
// TODO-VideoEncoder: Investigate if non-contiguous frames can be returned
// TODO-VideoEncoder: Investigate if non-contiguous frames can be accepted

test/test_ops.py Outdated
class TestVideoEncoderOps:

def test_bad_input(self, tmp_path):
output_file = str(tmp_path / ".mp3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't be mp3 for video?

int frameIndex);
void encodeFrame(AutoAVPacket& autoAVPacket, const UniqueAVFrame& avFrame);
void flushBuffers();
void close_avio();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't declared but not defined - remove

int64_t frame_rate,
std::string_view file_name) {
VideoStreamOptions videoStreamOptions;
VideoEncoder(frames, frame_rate, file_name, videoStreamOptions).encode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still relevant

Comment on lines 728 to 731
// Reorder RGB -> GBR for AV_PIX_FMT_GBRP format
inputFrame->data[0] = tensorData + channelSize;
inputFrame->data[1] = tensorData + (2 * channelSize);
inputFrame->data[2] = tensorData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment on #866 (comment), but I'm still a bit surprised there's no native for native RGBRGBRGB support in FFmpeg. It might be the case, I haven't checked, but I'd recommend leaving a TODO to investigate further

// TODO-VideoEncoder: Reorder tensor if in NHWC format
int channelSize = inHeight_ * inWidth_;
// Reorder RGB -> GBR for AV_PIX_FMT_GBRP format
// TODO-VideoEncoder: Determine if FFmpeg supports RGB input format directly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://ffmpeg.org/ffmpeg-codecs.html#libx264_002c-libx264rgb: "the libx264rgb encoder is the same as libx264, except it accepts packed RGB pixel formats as input instead of YUV."

In general, it's encoder specific whether it accepts RGB input or not. You need to check with each encoder separately on that. Note that there are multiple implementations for each codec as soon as you consider HW accelerated ones. The story might be different for each of them.

Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dan-Flores !

@pytest.mark.parametrize("format", ("mov", "mp4", "avi"))
# TODO-VideoEncoder: enable additional formats (mkv, webm)
def test_video_encoder_test_round_trip(self, tmp_path, format):
asset = NASA_VIDEO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a TODO to use FFmpeg's testsrc2 instead https://www.youtube.com/watch?v=dd7HvxZfIgU

@Dan-Flores Dan-Flores merged commit 67b4381 into meta-pytorch:main Sep 10, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants