-
Notifications
You must be signed in to change notification settings - Fork 302
Release Notes
This is a major release, focusing on smaller code size, less 3rd party dependencies (just Zstd), OpenCL support, faster ETC1S encoding, and fully multithreading/parallel processing.
-
ETC1S encoding is now approximately 30% faster. We added more optimizations to the encoder's backend and more SSE optimizations to the frontend.
-
Optional OpenCL support has been added to the ETC1S encoder. To enable it, compile with "cmake . -DOPENCL=1 -DSSE=1", then use the -opencl command line option when you run basisu.exe. The command line tool will display "OpenCL support initialized successfully" if OpenCL support is available and enabled.
Note that OpenCL may not be any faster when encoding individual files - it highly depends on your hardware. OpenCL encoding is intended (but not required) to be used with parallel processing (see the -parallel option, next).
The BASISU_SUPPORT_OPENCL
macro must be set to 1 to enable OpenCL usage in the encoder. Otherwise it's completely disabled.
OpenCL support has been tested under Windows using AMD/NVidia/Intel drivers, under Ubuntu Linux using NVidia drivers, and on an Intel OSX MacBook.
On very old NVidia GPU's the driver may crash during OpenCL encoding. The encoder should transparently fall back to CPU-only processing when this occurs. Also, too much GPU memory may be required when processing very large/long ETC1S videos, in which case it will also fall back to CPU-only encoding.
-
We added a new encoder option: "-parallel". Normally the encoder processes a single file at a time, on a single thread, where it spawns numerous jobs to parallelize specific stages of compression. This is very inefficient due to Amdahl's law. "-parallel" causes the encoder to instead spawn a single job per input texture, which can be far more efficient. -opencl and -parallel can be used together for significantly faster ETC1S encoding.
-
The default behavior in v1.16 is now to process each source file individually, instead of always generating a texture array like in previous versions. To create a texture array, specify the "-tex_array" command line option.
-
Encoder/transcoder test mode: Specify "-test" to have the command line tool invoke the ETC1S/UASTC encoders/transcoders on a known set of test files loaded in the "test_files" subdirectory. Specify "-test_dir X" to override this directory. The process exit status will be set to EXIT_SUCCESS (0) or EXIT_FAILURE (1).
-
Other new command line options:
"-max_threads X" - Sets the # of maximum total threads to X if threading is enabled (also see -no_multithreading)
"-ktx_only" - Write only KTX files during unpacking (no PNG's)
"-split X.png" - Splits a PNG file to a 24bpp file and four grayscale PNG files (one each for each component)
"-combine rgb.png a.png" - Combines two PNG files into a single 32bpp PNG file
-
Some old/dead code was removed from the encoder and transcoder: transcoder/basisu_global_selector_palette.h. (The feature supported by these classes in this file were removed during KTX2 standardization.) This impacts the API: All pointers to "etc1_global_selector_codebook" have been removed. The very earliest versions of the ETC1S encoder could output .basis files that used global selector palettes. It's unlikely you have any files this old (we quickly disabled this feature because it was too slow), but if you do you'll have to re-encode them.
-
A simplified C-style API has been added to the encoder: See
basis_compress()
andbasis_parallel_compress()
in encoder/basisu.comp.h. -
The webgl, webgl_videotest, and contrib directories have been updated and re-tested to reflect the API changes (the removal of the global selector codebook related pointers).
-
Except for Zstandard, all 3rd party code dependencies have been removed from the repo to simplify certification and code licensing. PNG reading/writing is now handled by new code (pvpngreader.cpp) instead of lodepng, Android's ASTC block unpacker is no longer required to build the encoder (we instead unpack the UASTC pixels directly to 24/32bpp pixels), and BMP loading has been temporarily removed.
The removal of BMP reading is unfortunate - let us know if this is an issue for anyone.
- The -compare option now displays RGBA histograms.
- When compiling the encoder with emscripten, be sure to compile in the file encoder/basisu_resample.cpp, otherwise mipmapping will always fail. This is a known problem with emscripten (the link should fail with an undefined symbol, but it silently doesn't and the global g_num_resample_filters will be 0 which is invalid).
- We fixed a rare bug found by a user in the ETC1S encoder's backend that would cause files to fail CRC checks during compression. This would cause the command line tool to exit with an error, and the compressor's
process()
method to return an error code.
- The KTX2 file format and ZStandard is now fully supported throughout the system. By default the encoder still outputs .basis files - use the "-ktx2" command line option to output KTX2 format files. (Or, in basis_compressor_params in encoder/basisu_comp.h, set the m_create_ktx2_file member variable to true.)
Note that KTX2 1D textures are not supported yet - we're working on it. Importantly, if you use the official toktx
tool from the KTX-Software repo, it defaults to writing 1D images if the input height is 1. You'll need to always specify the '-2d' option when using toktx if you intend on reading its files with Basis Universal's current transcoder.
IMPORTANT: By default KTX2 support is enabled when compiling basisu_transcoder.cpp. Because KTX2 utilizes Zstandard (for UASTC supercompression), by default this means the transcoder now depends on the Zstandard single file decoder in the "zstd" subdirectory (zstd/zstddeclib.c). (This is new for v1.15. Before we had no other dependencies to any other .C/CPP files.) The transcoder supports entirely disabling KTX2/Zstandard support at compile time: Set BASISD_SUPPORT_KTX2 to 0 to disable KTX2 and Zstandard entirely, and set BASISD_SUPPORT_KTX2_ZSTD to 0 to just disable Zstandard support. If you leave KTX2 support enabled, but disable Zstandard, the transcoder (and encoder) will not be able to transcode (or encode) supercompressed UASTC files, so this isn't recommended unless you're positive you won't be using or ever transcoding Zstandard compressed UASTC files.
When compiling the WASM transcoder and encoder, you can disable KTX2 and KTX2 Zstandard support by setting the "KTX2" and "KTX2_ZSTANDARD" CMake options to 0: "emcmake cmake ../ -DKTX2=0" etc. By default KTX2 and Zstandard support are enabled. (The transcoder supports disabling either KTX2 or Zstandard. The encoder always supports KTX2 but allows for Zstandard to be disabled.)
The encoder always supports generating KTX2 files. However, you can disable Zstandard compression support by setting BASISD_SUPPORT_KTX2_ZSTD to 0. By default the encoder now depends on the Zstandard compressor and decompressor.
We have included the single source file versions of the Zstandard compressor/decompressor in the "zstd" subdirectory. If your project already links in Zstandard, you can modify your CMake files to not use the single file variants. We only depend on 2 Zstandard API's (the ones for simple memory buffer compression/decompression). Notably, Zstandard uses the BSD license, while most of the files in the repo use the Apache 2.0 or zlib licenses.
The -unpack/-info/-validate options now support KTX2 files. Or you can just specify a KTX2 file on the command line and it'll be automatically unpacked. The unpack mode writes .PNG and .KTX1 format files (KTX1 because most tools don't support KTX2 yet).
The WASM JavaScript wrappers (in webgl/transcoder/basisu_wrappers.cpp) now fully support KTX2. See the "webgl/ktx2_encode_test" sample for example code on how to encode and transcode KTX2 files in JavaScript.
The KTX2 transcoder does not support 1D files yet. Importantly, the "toktx2" tool, by default, outputs 1D textures if the height of the input is 1. We will be fixing this very soon (should be simple).
The official KTX2 validation tool (ktx2check) has a bug that doesn't account for mipPadding alignment on non-supercompressed UASTC files. To workaround this bug, the encoder will sometimes add a small variable length key-value field into the output, consisting of all 0x7F bytes, in order to force the mip levels array to be aligned on a 16 byte boundary without having to use any mip padding bytes. We'll remove this workaround once the validator tool is fixed. (This is not an issue with the KTX2 specification itself. It's just a bug in the validation tool.)
- ETC1S encoder is now 3-4.5x faster when our built-in multithreading is disabled. With threading enabled, the speedup is lower (approximately 2-3x). If you care about lowest overall encoding time across a large set of textures, disable our multithreading and call our encoder across multiple threads in parallel with different textures. This is classic Amdahl's Law in action, because our built-in multithreading has large purely serial sections in between fork & join-style sections. These perf optimizations will also improve the WASM build.
- SSE 4.1 support is in. Enabled with "-D SSE=1" in cmake. Use -no_sse command line option to disable. Results in 15-30% faster encoding.
- The UASTC RDO encoder has been updated for higher quality per bit. Unfortunately the command line parameter which controls quality has changed. It's now "-uastc_rdo_l X". The higher the value, the lower the quality but the more compressible the file. Values close to 0 result in very little quality loss. We have more optimizations to RDO UASTC encoding coming in the next release.
- std::vector usage has been removed.
- cppspmd_fast is used for SSE 4.1. support. It uses the Apache 2.0 license.
- All encoder/compressor files moved to the "encoder" directory. The encoder can be easily placed into a library now.
- Encoder now supports being compiled to WebAssembly using emscripten. (Currently multithreading is disabled, but we hope to enable it soon once we figure out why std::function and lambdas are failing with a stack overflow.)
- Added the webgl/encode directory, which compiles the encoder and transcoder to WebAssembly.
- Added the webgl/encode_test sample, which shows how to use the compressor from JavaScript.
- Added new API's to the JavaScript wrappers in webgl/transcoder/basis_wrappers.cpp. There are now JavaScript wrappers for compression, container independent transcoding, and .basis file information retrieival. Added lots of comments to basis_wrappers.cpp. Every codec feature is now available from JavaScript.
- Added fuzz-safe JPEG reading. We support full-safe JPEG/BMP/TGA/PNG now.
- UASTC support is in. We have removed BC7 mode 6 support from the ETC1S transcoder to reduce its size, although the format enum still works (it aliases to BC7 mode 5). We are still updating the docs for UASTC.
- Adding fuzz-safe BMP support using apg_bmp. We will be adding fuzz-safe JPEG and TGA next.
- Automatic global selector palettes are disabled by default, because searching the virtual selector codebook is very slow. You can enable them by specifying -auto_global_sel_pal on the command line, for slightly smaller files on small textures/images.
- PVRTC2 RGB support added. This format looks great and transcoding is fast - approximately as good as BC1. It supports non-power of 2, non-square textures, and should be used instead of PVRTC1 whenever possible.
- PVRTC2 RGBA support added. This format looks OK if the texture has a very simple alpha channel (like simple opacity mask). The texture should use premulitplied alpha, otherwise on alpha=0 pixels the color channel may slightly leak into the alpha channel due to issues with the PVRTC2 format itself. Transcoding is fast unless the texture's alpha channel is very complex. It's a tossup whether PVRTC1 or PVRTC2 would look better for alpha textures.
- ETC2 EAC R11/RG11 (unsigned) support checked in. Thanks to Juan Linietsky for suggesting it.
- The format enum names have changed, but I tried to keep compatibility with old code. The actual values haven't changed so Javascript code should work without modifications.
- We're now using "enum class transcoder_texture_format" instead of "enum transcoder_texture_format" in basisu_transcoder.h
- Fixed a couple encoder bugs (one assert in basisu_enc.h), and a uninitialized variable issue in the frontend. Neither issue would cause corrupted files or artifacts.
- FXT1 RGB support is checked in, for Intel/3DFX GPU's. Mostly for completeness and to test block sizes other than 4x4.
- The PVRTC1 wrap vs. clamp flag has been removed from the entire codebase, because PVRTC1 always uses wrap addressing when fetching the adjacent blocks (even when the user selects clamp UV addressing).
- Beware that the "transcoder_texture_format" enum names and their values are in flux as we add new texture formats. This issue particularly affects Javascript code. Passing the old enum values to the transcoder will cause bugs. We are adding a few more texture formats, renaming the enums and then stabilizing them on the next minor release (within a couple days or so).
- This is a major transcoder update. The encoder hasn't been modified at all. A minor update will be coming in a couple days which adds additional lower priority formats (notable PVRTC2 4bpp RGB) to the transcoder.
- When the "BASISD_SUPPORT_BC7" transcoder macro is set to 0, both mode 5 and mode 6 BC7 transcoders are disabled. When cross compiling the transcoder for Web use to WebAssembly/asm.js, be sure to set BASISD_SUPPORT_BC7=0. You can also just disable the mode 6 transcoder by just setting BASISD_SUPPORT_BC7_MODE6_OPAQUE_ONLY=0. The older BC7 mode-6 RGB function seriously bloats the transcoder's compiled size. (The mode-6 transcoder is of marginal value and might be disabled by default or just removed.) The new BC7 mode 5 RGB/RGBA transcoder uses substantially smaller lookup tables and provides basically the same quality as mode-6 for RGB (becaue we're starting with ETC1S texture data.) Set BASISD_SUPPORT_BC7_MODE6_OPAQUE_ONLY to 0 when compiling on platforms which don't support BC7 well/at all, or if transcoder size is an issue.
- Added ATC RGB/RGBA, ASTC 4x4 L/LA/RGB/RGBA, BC7 mode 5 RGB/RGBA, and PVRTC1 4bpp RGBA support to the transcoder and KTX writer.
- Major perf. optimizations to all the transcoders. Transcoding to BC1 is approx. 2x faster when compiled native and executed on a Core i7. Similar perf. improvements should be seen when executed in WebAssembly. This was done by more closely coupling the .basis file decompression and format transcoding steps (before we unpacked to plain ETC1/ETC1S, then transcoded those bits, which was costly.)
- PVRTC1 4bpp RGB opaque is slightly higher quality
- Added various uncompressed raster pixel formats to the transcoder. When outputting raw pixels, the transcoder writes to regular raster images, not blocks. No dithering or downsampling yet, but it's coming. A couple of the parameters to basisu_transcoder::transcode_image_level() and basisu_transcoder::transcode_slice() have new meanings when these methods are used with uncompressed raster pixel formats: "output_blocks_buf_size_in_blocks_or_pixels" and "output_row_pitch_in_blocks_or_pixels". There's also a new parameter, "output_rows_in_pixels". When transcoding to uncompressed raster pixel formats, these parameters are in pixels, not blocks. The output buffer is also treated as a plain raster image, not a 2D array of compressed blocks. These parameters are sanity checked, and if they look fishy the transcoder will return an error.
- basisu command line tool's "-level" command line option changed to "-comp_level", to avoid confusion vs. the "-q" option. This option is NOT the same as the -q option, which directly controls the output quality. Most users shouldn't use this option. (See below.)