VideoEncoder first pass, round trip test #866

Dan-Flores · 2025-09-03T02:43:49Z

This PR implements the backend functions needed for video encoding.
The main changes are in Encoder.cpp/h and test_ops.py, the rest are scaffolding to allow the code to build / test.

Encoder.cpp implements primary functions:

initializeEncoder
convertTensorToAVFrame
encodeFrame
encode (the top level function)

One test is implemented in test_video_encoder_test_round_trip:

This test converts the nasa_13013.mp4 to mp4, mov, and avi, with a resulting peak-signal to noise ratio around 30 - 40.

Remaining items to implement are marked with TODO-VideoEncoder:

Remove hardcoding of H264 codec
Validate sample rate, similar to the AudioEncoder
Enable selection of VideoStreamOptions data to be set or remove them if unneeded
- height, width, bitrate, GOP size, max B-frames
test additional video container formats

NicolasHug

Thanks @Dan-Flores this is a great start! I left a lot of comments but most of them are nits and TODOs as follow-up.

src/torchcodec/_core/Encoder.h

src/torchcodec/_core/Encoder.cpp

src/torchcodec/_core/Encoder.h

test/test_ops.py

src/torchcodec/_core/Encoder.cpp

NicolasHug · 2025-09-05T15:07:25Z

src/torchcodec/_core/Encoder.cpp

+
+  // TODO-VideoEncoder: Reorder tensor if in NHWC format
+  int channelSize = inHeight_ * inWidth_;
+  // Reorder RGB -> GBR for AV_PIX_FMT_GBRP or AV_PIX_FMT_GBRAP formats


Let's add a TODO to investigate this: ideally, we should be able to pass packed RGB data as RGBRGBRGB... without having to do anything fancy (like GBR). I'm not sure. I don't fully understand this yet.

Also let's remove the ref to AV_PIX_FMT_GBRAP as I don't think we'll want to support that.

I was also a bit confused on the formats, here's why I think this is necessary:

The video decoder outputs a tensor in packed RGB24 pixel format, which gets converted in SingleStreamDecoder::maybePermuteHWC2CHW to NCHW planar format, with the channels in RGB order.

Rather than convert the planar data to packed data, this code sets the input pixel format to planar GBR format, and reorders the frame's data pointers to the correct channels (this is why my images were green earlier).

NicolasHug · 2025-09-05T15:12:02Z

src/torchcodec/_core/Encoder.cpp

+    av_packet_rescale_ts(
+        packet.get(),
+        avCodecContext_->time_base,
+        avFormatContext_->streams[streamIndex_]->time_base);
+    packet->stream_index = streamIndex_;


Can you help me understand why that part is needed?

I believe this is leftover from debugging the frame rates in my test. Since we aren't modifying the frame_rate or time_base, I'll remove this.

NicolasHug · 2025-09-05T15:14:02Z

On the Windows test failures with FFmpeg 6 https://github.com/pytorch/torchcodec/actions/runs/17468896896/job/49613138742?pr=866

I think we can safely just skip the test on Windows + FFmpeg 6 and make this is TODO

NicolasHug · 2025-09-09T10:41:15Z

src/torchcodec/_core/Encoder.cpp

+      frames.sizes()[1] == 3,
+      "frame must have 3 channels (R, G, B), got ",
+      frames.sizes()[1]);
+  // TODO-VideoEncoder: Investigate if non-contiguous frames can be returned


It's about input, not output :)

Suggested change

// TODO-VideoEncoder: Investigate if non-contiguous frames can be returned

// TODO-VideoEncoder: Investigate if non-contiguous frames can be accepted

NicolasHug · 2025-09-09T10:41:48Z

test/test_ops.py

+class TestVideoEncoderOps:
+
+    def test_bad_input(self, tmp_path):
+        output_file = str(tmp_path / ".mp3")


Probably shouldn't be mp3 for video?

NicolasHug · 2025-09-09T10:44:27Z

src/torchcodec/_core/Encoder.h

+      int frameIndex);
+  void encodeFrame(AutoAVPacket& autoAVPacket, const UniqueAVFrame& avFrame);
+  void flushBuffers();
+  void close_avio();


this isn't declared but not defined - remove

NicolasHug · 2025-09-09T10:45:29Z

src/torchcodec/_core/custom_ops.cpp

+    int64_t frame_rate,
+    std::string_view file_name) {
+  VideoStreamOptions videoStreamOptions;
+  VideoEncoder(frames, frame_rate, file_name, videoStreamOptions).encode();


I think this is still relevant

src/torchcodec/_core/Encoder.cpp

NicolasHug · 2025-09-09T10:49:50Z

src/torchcodec/_core/Encoder.cpp

+  // Reorder RGB -> GBR for AV_PIX_FMT_GBRP format
+  inputFrame->data[0] = tensorData + channelSize;
+  inputFrame->data[1] = tensorData + (2 * channelSize);
+  inputFrame->data[2] = tensorData;


I agree with your assessment on #866 (comment), but I'm still a bit surprised there's no native for native RGBRGBRGB support in FFmpeg. It might be the case, I haven't checked, but I'd recommend leaving a TODO to investigate further

dvrogozh · 2025-09-09T21:53:45Z

src/torchcodec/_core/Encoder.cpp

+  // TODO-VideoEncoder: Reorder tensor if in NHWC format
+  int channelSize = inHeight_ * inWidth_;
+  // Reorder RGB -> GBR for AV_PIX_FMT_GBRP format
+  // TODO-VideoEncoder: Determine if FFmpeg supports RGB input format directly


From https://ffmpeg.org/ffmpeg-codecs.html#libx264_002c-libx264rgb: "the libx264rgb encoder is the same as libx264, except it accepts packed RGB pixel formats as input instead of YUV."

In general, it's encoder specific whether it accepts RGB input or not. You need to check with each encoder separately on that. Note that there are multiple implementations for each codec as soon as you consider HW accelerated ones. The story might be different for each of them.

NicolasHug

Thanks @Dan-Flores !

NicolasHug · 2025-09-10T09:01:20Z

test/test_ops.py

+    @pytest.mark.parametrize("format", ("mov", "mp4", "avi"))
+    # TODO-VideoEncoder: enable additional formats (mkv, webm)
+    def test_video_encoder_test_round_trip(self, tmp_path, format):
+        asset = NASA_VIDEO


Let's have a TODO to use FFmpeg's testsrc2 instead https://www.youtube.com/watch?v=dd7HvxZfIgU

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 3, 2025

Dan-Flores marked this pull request as ready for review September 5, 2025 14:06

NicolasHug reviewed Sep 5, 2025

View reviewed changes

Daniel Flores added 10 commits September 8, 2025 00:42

VideoEncoder first pass, round trip test

f4fd9fd

remove debugging import

8578060

additional nits

040d4b2

fix cpp tests?

dc7319b

remove unused avioContextHolder_

dfde1cc

Assume NCHW

bc102f0

reorder test variables

c021019

reorder header

529ae61

add suggestions

1cca109

test input correctness

adbf151

Dan-Flores force-pushed the video_encoder branch from b853bca to adbf151 Compare September 8, 2025 15:09

Encoder adjustments to fix tests

08b5851

NicolasHug reviewed Sep 9, 2025

View reviewed changes

Daniel Flores added 2 commits September 9, 2025 10:32

add commented suggestions

b96bd56

validate int64 to int

68b9de6

dvrogozh reviewed Sep 9, 2025

View reviewed changes

NicolasHug approved these changes Sep 10, 2025

View reviewed changes

add testing todo

49ab01f

Dan-Flores merged commit 67b4381 into meta-pytorch:main Sep 10, 2025
47 checks passed

	// TODO-VideoEncoder: Investigate if non-contiguous frames can be returned
	// TODO-VideoEncoder: Investigate if non-contiguous frames can be accepted

VideoEncoder first pass, round trip test #866

VideoEncoder first pass, round trip test #866

Uh oh!

Conversation

Dan-Flores commented Sep 3, 2025

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Sep 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!