-
Notifications
You must be signed in to change notification settings - Fork 62
VideoEncoder first pass, round trip test #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dan-Flores this is a great start! I left a lot of comments but most of them are nits and TODOs as follow-up.
src/torchcodec/_core/Encoder.cpp
Outdated
|
||
// TODO-VideoEncoder: Reorder tensor if in NHWC format | ||
int channelSize = inHeight_ * inWidth_; | ||
// Reorder RGB -> GBR for AV_PIX_FMT_GBRP or AV_PIX_FMT_GBRAP formats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a TODO to investigate this: ideally, we should be able to pass packed RGB data as RGBRGBRGB... without having to do anything fancy (like GBR). I'm not sure. I don't fully understand this yet.
Also let's remove the ref to AV_PIX_FMT_GBRAP as I don't think we'll want to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also a bit confused on the formats, here's why I think this is necessary:
- The video decoder outputs a tensor in packed RGB24 pixel format, which gets converted in
SingleStreamDecoder::maybePermuteHWC2CHW
to NCHW planar format, with the channels in RGB order. - Rather than convert the planar data to packed data, this code sets the input pixel format to planar GBR format, and reorders the frame's data pointers to the correct channels (this is why my images were green earlier).
src/torchcodec/_core/Encoder.cpp
Outdated
av_packet_rescale_ts( | ||
packet.get(), | ||
avCodecContext_->time_base, | ||
avFormatContext_->streams[streamIndex_]->time_base); | ||
packet->stream_index = streamIndex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why that part is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is leftover from debugging the frame rates in my test. Since we aren't modifying the frame_rate
or time_base
, I'll remove this.
On the Windows test failures with FFmpeg 6 https://github.com/pytorch/torchcodec/actions/runs/17468896896/job/49613138742?pr=866 I think we can safely just skip the test on Windows + FFmpeg 6 and make this is TODO |
b853bca
to
adbf151
Compare
src/torchcodec/_core/Encoder.cpp
Outdated
frames.sizes()[1] == 3, | ||
"frame must have 3 channels (R, G, B), got ", | ||
frames.sizes()[1]); | ||
// TODO-VideoEncoder: Investigate if non-contiguous frames can be returned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about input, not output :)
// TODO-VideoEncoder: Investigate if non-contiguous frames can be returned | |
// TODO-VideoEncoder: Investigate if non-contiguous frames can be accepted |
test/test_ops.py
Outdated
class TestVideoEncoderOps: | ||
|
||
def test_bad_input(self, tmp_path): | ||
output_file = str(tmp_path / ".mp3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't be mp3 for video?
src/torchcodec/_core/Encoder.h
Outdated
int frameIndex); | ||
void encodeFrame(AutoAVPacket& autoAVPacket, const UniqueAVFrame& avFrame); | ||
void flushBuffers(); | ||
void close_avio(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't declared but not defined - remove
src/torchcodec/_core/custom_ops.cpp
Outdated
int64_t frame_rate, | ||
std::string_view file_name) { | ||
VideoStreamOptions videoStreamOptions; | ||
VideoEncoder(frames, frame_rate, file_name, videoStreamOptions).encode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still relevant
// Reorder RGB -> GBR for AV_PIX_FMT_GBRP format | ||
inputFrame->data[0] = tensorData + channelSize; | ||
inputFrame->data[1] = tensorData + (2 * channelSize); | ||
inputFrame->data[2] = tensorData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment on #866 (comment), but I'm still a bit surprised there's no native for native RGBRGBRGB support in FFmpeg. It might be the case, I haven't checked, but I'd recommend leaving a TODO to investigate further
src/torchcodec/_core/Encoder.cpp
Outdated
// TODO-VideoEncoder: Reorder tensor if in NHWC format | ||
int channelSize = inHeight_ * inWidth_; | ||
// Reorder RGB -> GBR for AV_PIX_FMT_GBRP format | ||
// TODO-VideoEncoder: Determine if FFmpeg supports RGB input format directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://ffmpeg.org/ffmpeg-codecs.html#libx264_002c-libx264rgb: "the libx264rgb encoder is the same as libx264, except it accepts packed RGB pixel formats as input instead of YUV."
In general, it's encoder specific whether it accepts RGB input or not. You need to check with each encoder separately on that. Note that there are multiple implementations for each codec as soon as you consider HW accelerated ones. The story might be different for each of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dan-Flores !
@pytest.mark.parametrize("format", ("mov", "mp4", "avi")) | ||
# TODO-VideoEncoder: enable additional formats (mkv, webm) | ||
def test_video_encoder_test_round_trip(self, tmp_path, format): | ||
asset = NASA_VIDEO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a TODO to use FFmpeg's testsrc2 instead https://www.youtube.com/watch?v=dd7HvxZfIgU
This PR implements the backend functions needed for video encoding.
The main changes are in
Encoder.cpp/h
andtest_ops.py
, the rest are scaffolding to allow the code to build / test.Encoder.cpp implements primary functions:
initializeEncoder
convertTensorToAVFrame
encodeFrame
encode
(the top level function)One test is implemented in
test_video_encoder_test_round_trip
:nasa_13013.mp4
tomp4
,mov
, andavi
, with a resulting peak-signal to noise ratio around 30 - 40.Remaining items to implement are marked with
TODO-VideoEncoder
: