Digital Digest -> Software -> x264 -> History & Old Downloads

x264 -> Version History

This is the version history page for x264. You can return to the main page for this software by clicking here.

Index:


Version r2851 Version r2833 Version r2762 (Jan 30) Version r2762 Version r2744 Version r2705 Version r2694 Version r2692 Version r2665 Version r2664 Version r2638 Version r2597 Version r2579 Version r2538 Version r2525 Version r2491 Version r2479 Version r2453 Version r2431 Version r2409 Version r2389 Version r2377 Version r2356 Version r2345 Version r2334
  • Release Date: May 22, 2013
  • Download(s):
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2310
  • Release Date: May 6, 2013
  • Download(s):
  • Fix two bugs in slice-min-mbs and slices-max
  • Slices-max broke slice-max-size when slice-max wasn't used.
  • Slice-min-mbs broke in rare cases near the end of a threadslice.
Version r2309
  • Release Date: Apr 26, 2013
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2273
  • Release Date: Mar 1, 2013
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2245
  • Release Date: Jan 11, 2013
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2230
  • Release Date: Nov 10, 2012
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2216
  • Release Date: Sep 18, 2012
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2208
  • Release Date: Jul 19, 2012
  • Revert r2204 - People don't seem to like this so I'm just going to get rid of it.
Version r2207
  • Release Date: Jul 18, 2012
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2200
  • Release Date: May 23, 2012
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2197
  • Release Date: Apr 26, 2012
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2184
  • Release Date: Mar 13, 2012
  • Fix clobbering of mutex/cvs
  • Regression in r2183.
  • Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower.
  • Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads.
Version r2183
  • Release Date: Mar 8, 2012
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2146
  • Release Date: Feb 9, 2012
  • Please refer to the changelog http://mirror03.x264.nl/x264/changelog.txt for a full list of changes
Version r2145
  • Release Date: Jan 16, 2012
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2120
  • Release Date: Dec 7, 2011
  • Fix regression in r2118
  • Broke trellis with i16x16 macroblocks.
Version r2119
  • Release Date: Dec 7, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2106
  • Release Date: Oct 23, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2085
  • Release Date: Sep 22, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2074
  • Release Date: Aug 25, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2057
  • Release Date: Aug 11, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2044
  • Release Date: Jul 31, 2011
  • r2044: Use assembly versions of some deblocking functions in MBAFF
  • r2043: Move X264_VERSION / X264_POINTVER from config.h to x264_config.h
    This makes them available to external programs as part of the public API.
  • r2042: Fix padding bug in x264_expand_border_mbpair
  • r2041: Timecode parsing: Add missing initialization
    Fix crash when failed to parse timecode file before malloc pts.
    Fix detection of user timebase considered to be exceeding H.264 maximum.
  • r2040: Fix crash with high bitdepth 4:2:0 input
  • r2039: x86 asm cosmetics
    Use FDEC_STRIDEB where appropriate.
  • r2038: Fix a bug in lossless sub-8x8 RD
    Caused crashes in rare cases with lossless encoding. Regression in 4:4:4.
Version r2037
  • Release Date: Jul 23, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2019
  • Release Date: Jul 11, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r2008
  • Release Date: Jun 15, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r1995
  • Release Date: May 13, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r1947
  • Release Date: Apr 27, 2011
  • r1947: Precalculate CABAC initialization contexts
    Slightly faster encoding with lots of slices.
  • r1946: Avoid redundant log2f calls in mv cost initialization
    Saves around 100 million clock cycles on x264 init.
  • r1945: CABAC residual: cleanup and optimizations
    Also kill all Hungarian notation while we're at it.
    Trim an instruction off cabac_encode_bypass.
  • r1944: Validate input parameters more carefully
    Get rid of redundant warnings upon encoder_reconfig calls.
    Also avoid encoder_reconfig turning off psy_rd/trellis.
  • r1943: Fix VFR MB-tree to work as intended
    Should improve quality with FPSs much larger or smaller than 25.
  • r1942: Support more recent GPAC versions
  • r1941: Fix decoder desync with positive --chroma-qp-offset and zones
  • r1940: Use AVMEDIA_TYPE_VIDEO instead of deprecated CODEC_TYPE_VIDEO
    Fixes build with lavf/lavc 53.
  • r1939: Force pic-struct for Blu-ray compat + fake-interlaced
  • r1938: Fix open-gop with no-psy
Version r1937
  • Release Date: Apr 15, 2011
  • Fix build with disabled asm
Version r1936
  • Release Date: Apr 15, 2011
  • r1936: Improve Blu-ray compliance
    Use dec_ref_pic_marking SEIs to repeat B-ref referencing information.
    Don't allow B-frames to reference frames outside their minigop.
  • r1935: Consolidate Blu-ray hacks into --bluray-compat
    This option is now required for Blu-ray compatibility.
    --open-gop bluray is now gone (using bluray-compat and open-gop implies a Blu-ray compatible open-gop).
    This option doesn't automatically enforce every aspect of Blu-ray compatibility (e.g. resolution, framerate, level, etc).
  • r1934: Add SSE support to rectangle.h for 16-byte stores
    Uses GCC vector intrinsics; may be suboptimal on particularly old GCC versions.
  • r1933: Do not force Intel Compiler to target pre-mmx architecture for x86
    Caused a speed penalty against gcc equivalents.
  • r1932: Warn users when using --(psnr|ssim) without --tune (psnr|ssim)
    This is a counter to the proliferation of incredibly stupid psnr/ssim "benchmarks" of x264 in which the benchmarker conveniently "forgot" --tune psnr/ssim, crippling x264 in the test.
  • r1931: Remove redundant mbcmp calls in weightp analysis
  • r1930: Use integer math for filler size calculation
  • r1929: Disable progress for FFMS input with --no-progress
  • r1928: Fix bug in intra-refresh ratecontrol
    Row SATDs were slightly incorrect.
  • r1927: Cosmetics: fix some signedness issues found by -Wsign-compare
  • r1926: Minor fixes
    Fix a comment typo.
    Align an array properly.
    Make x264_scan8 unsigned: saves a bunch of movsxd instructions on x86_64.
  • r1925: Improve C99 support checks in configure
    Fixes configuration with Intel compiler in some cases.
Version r1924
  • Release Date: Mar 27, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r1913
  • Release Date: Feb 19, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r1900
  • Release Date: Feb 7, 2011
  • Please refer to the changelog http://mirror02.x264.nl/x264/changelog.txt for a full list of changes
Version r1884
  • Release Date: Jan 30, 2011
  • r1884: Hotfix for some bugs in VBV emergency
  • r1883: Fix warnings in cpu.c
Version r1882
  • Release Date: Jan 29, 2011
  • Please refer to the changelog http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1867
  • Release Date: Jan 11, 2011
  • Please refer to the changelog http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1834
  • Release Date: Dec 15, 2010
  • Please refer to the changelog http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1820
  • Release Date: Dec 8, 2010
  • Please refer to the changelog http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1804
  • Release Date: Nov 26, 2010
  • Please refer to the changelog http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1790
  • Release Date: Nov 23, 2010
  • r1790: Fix resize filter rounding code
  • r1789: Fix regression in chroma weightp
    Missing cache calls could cause artifacts, encoder/decoder desync.
Version r1788
  • Release Date: Nov 21, 2010
  • Please refer to the changelog http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1772
  • Release Date: Nov 15, 2010
  • r1772: Improve flash detection algorithm change in r1765
    Now only disables scenecuts only near real end of video, not just prior to forced keyframes.
  • r1771: Update ffms2 support for its latest API break.
  • r1770: Modify the x264 header accordingly if --disable-gpl is used
  • r1769: Save a bit of memory with weightp + high bit depth
  • r1768: Fix bugs in qpfile parsing with omitted QPs
  • r1767: Fix HRD with intra-refresh
    x264 was incorrectly calculating cpb_removal_delay with respect to the first keyframe.
    It should have been calculating cpb_removal_delay with respect to the last keyframe.
Version r1766
  • Release Date: Nov 11, 2010
  • Please refer to the changelog http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1745
  • Release Date: Oct 11, 2010
  • r1745: Address remaining cacheline split issues in avg2
    Slightly improved performance on core 2.
    Also fix profiling misattribution of w8/16/20 mmxext cacheline loops.
  • r1744: Trim a few bytes off some x86 intra pred functions
  • r1743: Move DTS compression from libx264 to x264cli
    DTS compression is an ugly stupid hack and starting to encroach on unrelated areas like VBV.
    Some people want it in the mp4 muxer for devices and/or splitters that don't support Edit Boxes.
    We just say "throw these broken devices out the window".
    DTS compression will remain as a muxer option, --dts-compress, at the user's own risk.
    This option is disabled by default.
  • r1742: Use a larger pic_init_qp with high bit depth
    Modify pic_init_qs for consistency.
  • r1741: Update some of the information in doc/
  • r1740: Update header in depth.c
  • r1739: Remove some old unused stuff in the build tree
    Regression test (hasn't been updated since svn).
    Doxy (was never used).
  • r1738: Various cosmetics
    Exorcise some CamelCase.
  • r1737: Add missing mod4 stack check to sse2_misalign mc_chroma
    Required for ICC compilation.
  • r1736: Fix 2pass ratecontrol with --nal-hrd cbr
  • r1735: Fix minor bug in intra pred with intra refresh
    i8x8 blocks didn't properly avoid predicting from top-right when necessary.
    This could cause intra refresh to not completely refresh the frame.
  • r1734: Fix filter parsing with --extra-cflags="-DNDEBUG"
  • r1733: Make sigint handler variable volatile
    Didn't actually cause any problems, but is necessary because it can be modified by another thread (the signal call).
Version r1732
  • Release Date: Sep 28, 2010
  • r1732: Add High 10 Intra profile support (AVC-Intra)
    x264 should now be able to encode compliant AVC-Intra 50.
    With a 10-bit-compiled version of x264, a sample commandline for 1080i25 might be:
    --interlaced --keyint 1 --vbv-bufsize 2000 --bitrate 50000 --vbv-maxrate 50000 --nal-hrd cbr

    Also print "Constrained Baseline" for baseline profile, since that's all x264 (and everything else in the world) supports.
    Also reorganize parameter validation a bit to reduce some spurious warnings.
  • r1731: Finish support for high-depth video throughout x264
    Add support for high depth input in libx264.
    Add support for 16-bit colorspaces in the filtering system.
    Add support for input bit depths in the interval [9,16] with the raw demuxer.
    Add a depth filter to dither input to x264.
  • r1730: Chroma mode decision/subpel for B-frames
    Improves compression ~0.4-1%. Helps more on videos with lots of chroma detail.
    Enabled at subme 9 (preset slower) and higher.
  • r1729: Various cosmetics
  • r1728: Make slice-max-size more aggressive in considering escape bytes
    The x264 assumption of randomly distributed escape bytes fails in the case of CABAC + an enormous number of identical macroblocks.
    This patch attempts to compensate for this.
    It is probably safe to assume in calling applications that x264 practically never violates the slice size limitation.
  • r1727: Add missing emms for dump-yuv
  • r1726: Fix CFR ratecontrol with timebase != 1/fps
    Fixes VBV + DTS compression, among other things.
  • r1725: Fix DTS/bitrate calculation if the first PTS wasn't zero
    Fix bitrate calculation with DTS compression.
Version r1724
  • Release Date: Sep 20, 2010
  • r1724: Fix regression in r1716
  • r1723: Cosmetics in me.c and frame.c
Version r1722
  • Release Date: Sep 20, 2010
  • r1722: Add support for arbitrary user SEIs
    This allows calling applications to insert SEIs that x264 doesn't know about while maintaining HRD/VBV accuracy.
  • r1721: Add full chroma input flag to swscale
    Improves quality of colorspace conversions involving RGB(A).
  • r1720: Add --disable-gpl option to configure
    Used for commercially-licensed versions of x264.
    Doesn't currently change anything, but may be used to disable GPL-only CLI tools, such as video filters, in the future.
    Also print the x264 license and libavformat license in version info.
  • r1719: Update source file headers
    Update dates, improve file descriptions, make things more consistent.
    Also add information about commercial licensing.
  • r1718: Fix intra refresh to not exceed max recovery_frame_cnt
    The spec constrains recovery_frame_cnt to [0, MaxFrameNum-1].
    So make MaxFrameNum bigger in the case of intra refresh.
  • r1717: Make intra refresh finish one frame faster In some cases, the last frame of intra refresh was redundant.
    Saves a few bits.
  • r1716: Fix intra refresh to not predict from invalid pixels
    The blocks on the right side of the intra refresh column should not predict from top-right.
  • r1715: Add configure check for mingw64 prefixing
    This compensates for the inconsistent prefixing seen in different versions of the compiler.
  • r1714: Update some Altivec function prototypes
    Silences a lot of warnings.
Version r1713
  • Release Date: Sep 4, 2010
  • r1713: Add support for level 1b
    This level is a stupid hack in the H.264 spec, so it's a stupid hack in x264 too.
    Since level is an integer, calling applications need to set level_idc=9 to use it.
    String-based option handling will accept "1b" just fine though, so CLI users don't have to worry.
  • r1712: Use smaller values for idr_pic_id
    Saves a few bits and fixes problems on certain fantastically terrible decoders, such as the Apple iPad.
  • r1711: Use POC type 2 for streams with no B-frames
    Saves a few bits per slice header.
  • r1710: Faster cabac_encode_ue_bypass
    Use CLZ + a lut instead of a loop.
  • r1709: Faster nal_escape asm
  • r1708: Allow --demuxer forcing with known extensions
  • r1707: Minor fixes/cosmeticcs in commandling parsing
  • r1706: Fix overflow in stats printing
  • r1705: Fix bug in 2pass if the first P-frames are all skip
    last_qscale_for was read before being initialized in this case, resulting in the value from the previous iteration being used instead.
  • r1704: Don't do deblock-aware RD if deblocking is off
Version r1703
  • Release Date: Aug 25, 2010
  • r1703: CAVLC "trellis"
    ~3-10% improved compression with CAVLC.
    --trellis is now a valid option with CAVLC.
    Perhaps more importantly, this means psy-trellis now works with CAVLC.

    This isn't a real trellis; it's actually just a simplified QNS.
    But it takes enough shortcuts that it's still roughly as fast as a trellis; just not quite optimal.
    Thus the name is a bit of a misnomer, but we're reusing the option name because it does the same thing.
    A real trellis would be better, but CAVLC is much harder to trellis than CABAC.
    I'm not aware of any published polynomial-time solutions that are significantly close to optimal.
  • r1702: Add global #define for maximum reference count This should make it easier to play around with reference frame counts that exceed the spec maximum.
  • r1701: Simplify addressing logic for interlaced-related arrays
    In progressive mode, just make [0] and [1] point to the same place.
  • r1700: Add missing emms to x264_nal_encode
    Only matters for applications using the low-latency callback feature.
  • r1699: Fix 2 bugs with slice-max-size
    Macroblock re-encoding didn't restore mv/tex bit counters (slightly inaccurate 2-pass).
    Bitstream buffer check didn't work correctly (insanely large frames could break encoding).
Version r1698
  • Release Date: Aug 16, 2010
  • r1698: NV12 version of Altivec chroma MC
  • r1697: Deblock-aware RD
    Small quality gain (~0.5%) at lower bitrates, potentially larger with QPRD.
    May help more with psy, maybe not.
    Enabled at subme >= 9. Small speed cost (a few %).
  • r1696: Correct X header path usage in configure
    Don't unconditionally set the header path for OpenBSD but do so if the
    --enable-visualize flag is specified.
  • r1695: Fix lavf input with delayed frames
  • r1694: Slightly improve the filtering section of x264 --help
  • r1693: Fix debug message typo with DTS compression
  • r1692: Try to guess input length for lavf input
    Allows printing of progress indicator when using lavf input.
  • r1691: Workaround bug in fps/timestamp handling with lavf input
    reordered_opaque in lavf doesn't work correctly in the identity case (no reordering).
    Fixes incorrect output for some file types (e.g. raw in mov).
  • r1690: Fix aspect ratio writing in the MKV muxer
    The braindead Matroska spec dictates aspect ratio to be measured in pixels instead of, well, an actual aspect ratio.
  • r1689: Add libavcore check in configure
Version r1688
  • Release Date: Jul 29, 2010
  • r1688: Improve quantizer distribution with sliced-threads+VBV
    Should help avoid cases of very uneven quantizer choice between slices.
  • r1687: Remove dead code in slicetype.c
  • r1686: Fix incorrect duration/framerate/bitrate in flv header
  • r1685: invalidate_reference fixes
    invalidate_reference didn't actually invalidate the immediate previous frame, only frames that came before that.
    Make sure that reordering is forced when invalidate_reference is used, so that the reference list is correct decoder-side.
  • r1684: Filtering system-related fixes Fix configure to check for outdated libavutil in resize filter support.
    Do not print an explicit error message in ffms when requesting a frame beyond the number of frames in the source.
    Mention in --*help that filtering options can be specified as name=value.
    Fix the shadowing warning in the resize filter on posix systems.
Version r1683
  • Release Date: Jul 23, 2010
  • r1683: Improve reference_invalid support
    Reference invalidation can now be used to invalidate multiple frames at a time, rather than being limited to one per encoder_encode call.
  • r1682: Eradicate all mention of SI/SP-frames
Version r1681
  • Release Date: Jul 22, 2010
  • Fix stack alignment with MB-tree
    Broke 2-pass with MB-tree when calling from compilers with broken stack alignment (e.g. MSVC).
Version r1680
  • Release Date: Jul 20, 2010
  • r1680: Avisynth 2.6 colorspace support
    Use a customized avisynth_c.h to detect the new planar colorspaces.
  • r1679: Prevent some cases of cache aliasing.
    Avoid cases where image strides were a large power of 2. Core 2: +3% speed at widths 898..960, +6% at widths 1922..1984, most other resolutions unaffected.
    Nehalem and AMD: similar amount of speedup, but fewer resolutions affected.
  • r1678: Fix stack alignment for adaptive quant
    Broke calls from compilers with broken stack alignment (e.g. MSVC).
Version r1677
  • Release Date: Jul 16, 2010
  • r1677: Fix compilation with shared ffmpeg libs
    lavf input uses libavutil functions, so it must request flags for libavutil from pkg-config.
  • r1676: Fix another PCM bug
    CABAC assumes that NNZ is 0 or 1, not the number of actual nonzero coefficients.
    Didn't actually break the output; only had a tiny effect on RD.
Version r1675
  • Release Date: Jul 15, 2010
  • Please refer to http://mirror01.x264.nl/x264/changelog.txt for a full list of changes
Version r1666
  • Release Date: Jul 5, 2010
  • r1666: Support for 9 and 10-bit encoding
    Output bit depth is specified on compilation time via --bit-depth.
    There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow.
    Input is still 8-bit only; this will change in the future.

    Note that very few H.264 decoders support >8 bit depth currently.
    Also note that the quantizer scale differs for higher bit depth. For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51.
  • r1665: Support infinite keyint (--keyint infinite). This just means x264 won't insert non-scenecut keyframes.
    Useful for streaming when using interactive error recovery or some other mechanism that makes keyframes unnecessary.

    Also change POC logic to limit POC/framenum LSB size (to save bits per slice).
    Also fix a bug in the CPB underflow detection code (didn't affect the bitstream, just resulted in the failure to print certain warning messages).
  • r1664: Don't check i16x16 planar mode unless previous modes were useful
    Saves ~160 clocks per MB at subme=1, ~270 per MB at subme>1 (measured on Core i7).
    Negligle effect on compression.

    Also make a few more arrays static.
  • r1663: Centralize logging within x264cli
    x264cli messages will now respect the log level they pertain to.
    Slightly reduces binary size.
  • r1662: Make open-GOP Blu-ray compatible
    Blu-ray is even more braindamaged than we thought.
    Accordingly, open-gop options are now "normal" and "bluray", as opposed to display and coded.
    Normal should be used in all cases besides Blu-ray authoring.
  • r1661: Callback feature for low-latency per-slice output
    Add a callback to allow the calling application to send slices immediately after being encoded.
    Also add some extra information to the x264_nal_t structure to help inform such a calling application how the NAL units should be ordered.

    Full documentation is in x264.h.
  • r1660: Simplify pixel_ads
Version r1659
  • Release Date: Jun 25, 2010
  • r1659: Interactive encoder control: error resilience
    In low-latency streaming with few clients, it is often feasible to modify encoder behavior in some fashion based on feedback from clients.
    One possible application of this is error resilience: if a packet is lost, mark the associated frame (and any referenced from it) as lost.
    This allows quick recovery from errors with minimal expense bit-wise.

    The new i_dpb_size parameter allows a calling application to tell x264 to use a larger DPB size than required by the number of reference frames.
    This lets x264 and the client keep a large buffer of old references to fall back to in case of lost frames.
    If no recovery is possible even with the available buffer, x264 will force a keyframe.

    This initial version does not support B-frames or intra refresh.
    Recommended usage is to set keyint to a very large value, so that keyframes do not occur except as necessary for extreme error recovery.

    Full documentation is in x264.h.

    Move DTS/PTS calculation to before encoding each frame instead of after.
    Improve documentation of x264_encoder_intra_refresh.
  • r1658: Lookaheadless MB-tree support
    Uses past motion information instead of future data from the lookahead.
    Not as accurate, but better than nothing in zero-latency compression when a lookahead isn't available.
    Currently resets on keyframes, so only available if intra-refresh is set, to avoid pops on non-scenecut keyframes.
    Not on by default with any preset/tune combination; must be enabled explicitly if --tune zerolatency is used.

    Also slightly modify encoding presets: disable rc-lookahead in the fastest presets.
    Enable MB-tree in "veryfast", albeit with a very short lookahead.
  • r1657: Open-GOP support
    Allows B-frames immediately prior to keyframes (in display order).
    This helps reduce keyframe popping and improve compression with short keyframe intervals.
    Due to a staggering display of braindamage in the Blu-ray spec, two open-GOP modes are available.
    The two modes calculate keyframe interval differently: one based on coded distance and one based on display distance.
    The latter is superior compression-wise, but for no comprehensible reason, Blu-ray requires the former if open-GOP is used.
  • r1656: Use threadpools to avoid unnecessary thread creation
    Tiny performance improvement with fast settings and lots of threads.
    May help more on some OSs with slow thread creation, like OS X. Unify inconsistent synchronized abbreviations to sync.
  • r1655: Improve 2-pass bitrate prediction
    Adapt based on distance to the end in bits, not in frames.
    Helps in videos with absurdly simple end sections, e.g. black frames.
  • r1654: SSE4 and SSSE3 versions of some intra_sad functions
    Primarily Nehalem-optimized.
  • r1653: Improve HRD accuracy
    In a staggering display of brain damage, the spec requires all HRD math to be done in infinite precision despite the output being of quite limited precision.
    Accordingly, convert buffer management to work in units of timescale.
    These accumulating rounding errors probably didn't cause any real problems, but might in theory cause issues in very picky muxers on extremely long-running streams.
  • r1652: Use -fno-tree-vectorize to avoid miscompilation Some versions of gcc have been reported to attempt (and fail) to vectorize a loop in plane_expand_border.
    This results in a segfault, so to limit the possible effects of gcc's utter incompetence, we're turning off vectorization entirely.
    It's not like it ever did anything useful to begin with.
  • r1651: Fix SIGPIPEs caused by is_regular_file checks
    Check to see if input file is a pipe without opening it.
  • r1650: Fix compilation on ARM w/ Apple ABI
Version r1649
  • Release Date: Jun 15, 2010
  • r1649: Faster mbtree_propagate asm
    Replace fp division by multiply with the reciprocal.
    Only ~12% faster on penryn, but over 80% faster on amd k8.
    Also make checkasm slightly more tolerant to rounding error.
  • r1648: Convert the OPT_ defines in x264.c to an enum
  • r1647: Don't allow baseline profile streams with fake-interlaced
    Indicate use of --fake-interlaced in encoding options SEI.
  • r1646: Allocate space for null terminator in param_apply_tune
  • r1645: Fix regression in r1501.
    Could cause slightly incorrect analysis in rare cases, but no serious encoding issues.
    Also shut up gcc warning about pels_v.
  • r1644: Fix crash with --subme 0 + --weightp > 0. Regression in r1535
Version r1643
  • Release Date: Jun 10, 2010
  • Please refer to the changelogs for a full list of changes
Version r1629
  • Release Date: Jun 3, 2010
  • r1629: Fix no-mbtree + aq-mode=0
    Regression in r1618.
  • r1628: Add API function to fix x264_picture_t initialization
    Calling applications that do not use x264_picture_alloc need to use x264_picture_init to initialize x264_picture_t structures.
    Previously, if the calling application didn't zero x264_picture_t, Bad Things could happen.
Version r1627
  • Release Date: Jun 3, 2010
  • Please refer to the changelogs for a full list of changes
Version r1613
  • Release Date: May 27, 2010
  • Please refer to the changelogs for a full list of changes
Version r1602
  • Release Date: May 22, 2010
  • r1602: Fix performance regression in r1582
    Set the correct compiler flags.
  • r1601: Rewrite deblock strength calculation, add asm
    Rewrite is significantly slower, but is necessary to make asm possible.
    Similar concept to ffmpeg's deblock strength asm.
    Roughly one order of magnitude faster than C.
    Overall, with the asm, saves ~100-300 clocks in deblocking per MB.
  • r1600: Fix different output with differing sync-lookahead
    Also reduce memory consumption.
  • r1599: Mark Win32 executable as large address aware
  • r1598: Add "Fake interlaced" option
    This encodes all frames progressively yet flags the stream as interlaced.
    This makes it possible to encode valid 25p and 30p Blu-Ray streams.
    Also put the pulldown help section in a more appropriate place.
  • r1597: Modify version.sh to output to stdout.
    Update configure to match.
  • r1596: Set correct filesystem permissions for various files
  • r1595: Fix regression in r1566
    Intra stats need to be kept track of for fast intra decision.
  • r1594: Fix rc-lookahead in encoding options SEI in 2-pass with VBV
  • r1593: Reduce memory usage in 2-pass with b-adapt 2
Version r1592
  • Release Date: May 18, 2010
  • r1592: Overhaul CABAC: faster, less cache usage
    Horribly munge up the CABAC tables to allow deduplication of some data.
    Saves 256 bytes of L1d cache in non-RD, 512 bytes in RD. Add asm versions of bypass and terminal; save L1i cache by re-using putbyte code.
    Further optimize encode_decision.
    All 3 primary CABAC functions fit in under 256 bytes of code total on x86_64.
  • r1591: Fix typo in pulldown
  • r1590: Fix bitrate calculation in progress status
    Was slightly incorrect due to using pts, which is out of order.
  • r1589: Fix crash with sliced-threads on Phenom
  • r1588: Fix condition for printing rc=cbr in options SEI
    Also fix crf-max formatting.
  • r1587: Shrink even more constant arrays
  • r1586: Add API function to trigger intra refresh
    Useful for interactive applications where the encoder knows that packet loss has occurred on the client.
    Full documentation is in x264.h.
  • r1585: Fix intra refresh behavior with I-frames
    Intra refresh still allows I-frames (for scenecuts/etc).
    Now I-frames count as a full refresh, as opposed to instantly triggering a refresh.
  • r1584: More cosmetics
Version r1583
  • Release Date: May 7, 2010
  • Please refer to the changelogs for a full list of changes
Version r1570
  • Release Date: Apr 30, 2010
  • r1570: r1548 broke subme < 3 + p8x8/b8x8
    Caused significantly worse compression. Preset-wise, only affected veryfast.
    Fixed by not modifying mvc in-place.
  • r1569: More write-combining
  • r1568: Reduce lookahead memory usage, cache missesv Merge lowres_types with lowres_costs.
  • r1567: Fix build on x86 with asm on but SSE off
  • r1566: Don't calculate ref/partition stats if not necessary
  • r1565: Split out MV prediction into mvpred.c
    Make common/macroblock.c a bit less gigantic.
Version r1564
  • Release Date: Apr 25, 2010
  • Fix mv predictor clipping on non-x86 (regression in r1548)
Version r1563
  • Release Date: Apr 24, 2010
  • Please refer to the changelogs for a full list of changes
Version r1542
  • Release Date: Apr 18, 2010
  • r1542: Fix various early terminations with slices
    Neighbouring type values (type_top, etc) are now loaded even if the MB isn't available for prediction.
    Significant overall performance increase (as high as 5-10%+) with lots of slices (e.g. with slice-max-size).
  • r1541: Enable --fast-pskip on fast firstpass
  • r1540: Make interlaced detection in avisynth only apply to field-based input
    Fixes improper flagging of progressive sources.
  • r1539: Set psy=0 in lossless mode
    Doesn't actually affect output, just what's written in the SEI.
Version r1538
  • Release Date: Apr 12, 2010
  • Please refer to the changelogs for a full list of changes
Version r1523
  • Release Date: Apr 8, 2010
  • Please refer to the changelogs for a full list of changes
Version r1510
  • Release Date: Mar 29, 2010
  • Please refer to the changelogs for a full list of changes
Version r1471
  • Release Date: Feb 28, 2010
  • Please refer to the changelogs for a full list of changes
Version r1462
  • Release Date: Feb 24, 2010
  • Please refer to the changelogs for a full list of changes
Version r1442
  • Release Date: Feb 15, 2010
  • Please refer to the changelogs for a full list of changes
Version r1416
  • Release Date: Jan 31, 2010
  • Please refer to the changelogs for a full list of changes
Version r1400
  • Release Date: Jan 21, 2010
  • r1400: Use cross-prefix properly with pkg-config for cross-compiling
  • r1399: Various performance optimizations
    Simplify and compact storage of direct motion vectors, faster --direct auto.
    Shrink various arrays to save a bit of cache.
    Simplify and reorganize B macroblock type writing in CABAC.
    Add some missing ALIGNED macros.
  • r1398: Fix crash on new AMD M300 and similar CPUs
    Apparently these CPUs have SSE4a, but not misaligned SSE.
  • r1397: Fix intra refresh with subme < 6
    Also improve the quality of intra masking.
  • r1396: Add support for multiple --tune options
    Tunes apply in the order they are listed in the case of conflicts.
    Psy tunings, i.e. film/animation/grain/psnr/ssim, cannot be combined.
    Also clarify --profile, which forces the limits of a profile, not the profile itself.
  • r1395: Various bugfixes and tweaks in analysis
    Fix the oldest-ever bug in x264: b16x8 analysis used the wrong width for predict_mv.
    Fix cache_ref calls for slightly better MV prediction in bsub16x16 analysis.
    Make B-partition analysis consider reference frame costs.
    Various other minor changes.
    Overall very slightly improved mode decision and motion search in B-frames.
  • r1394: More --me tesa optimizations
  • r1393: Fix typo in configure
  • r1392: Make --fps force CFR mode
Version r1391
  • Release Date: Jan 15, 2010
  • Please refer to the changelogs for a full list of changes
Version r1376
  • Release Date: Dec 16, 2009
  • r1376: Don't do sum/ssd analysis if weightp == 1
    Typo fixes in comments and help.
  • r1375: Fix two bugs in 2-pass ratecontrol
    last_qscale_for wasn't set during the 2pass init code.
    abr_buffer was way too small in the case of multiple threads, so accordingly increase its buffer size based on the number of threads.
    May significantly increase quality with many threads in 2-pass mode, especially in cases with extremely large I-frames, such as anime.
  • r1374: Avisynth-MT and 2.6 compatibility fixes
    Explain to the user why YV12 conversion is forced with Avisynth 2.6.
    Fix encoding with Avisynth-MT scripts by inserting the necessary Distributor() call; speeds such scripts back up to expected levels.
Version r1373
  • Release Date: Dec 11, 2009
  • r1373: Fix zone parsing on mingw
    Due to MinGW evidently being in the hands of a pack of phenomenal idiots, MinGW does not have strtok_r, a basic string function.
    As such, remove the dependency on strtok_r in zone parsing.
    Previously, using zones for anything other than ratecontrol failed.
  • r1372: More lookahead optimizations
    Under subme 1, don't do any qpel search at all and round temporal MVs accordingly.
    Drop internal subme with subme 1 to do fullpel predictor checks only.
    Other minor optimizations.
  • r1371: Various minor missing changes from previous commits
    Boolify sliced threads too
    Remove unused constants from dct-a.asm
    Fix a few typos/minor errors in preset documentation
  • r1370: Fix regression in direct=auto/temporal in r1364
    Bug caused rare race condition in frame reference handling.
    This resulted in invalid bitstreams in some B-frames and, very rarely, crashes.
Version r1369
  • Release Date: Dec 10, 2009
  • r1369: Add fast pskip to x264 SEI info header
  • r1368: Minor seeking fix with Avisynth input
    Seeking past the end of the input with --seek would result in the same frame being repeated over and over.
  • r1367: Add support for MB-tree + B-pyramid
    Modify B-adapt 2 to consider pyramid in its calculations.
    Generally results in many more B-frames being used when pyramid is on.
    Modify MB-tree statsfile reading to handle the reordering necessary.
    Make differing keyint or pyramid between passes into a fatal error.
  • r1366: Use aliasing-avoidance macros in array_non_zero
  • r1365: MMX version of 8x8 interlaced zigzag
    Just as fast as SSSE3 on Nehalem (and faster on Conroe/Penryn), so remove the SSSE3 version.
  • r1364: Bring back slice-based threading support
    Enabled with --sliced-threads
    Unlike normal threading, adds no encoding latency.
    Less efficient than normal threading, both performance and compression-wise.
    Useful for low-latency encoding environments where performance is still important, such as HD videoconferencing.
    Add --tune zerolatency, which eliminates all x264 encoder-side latency (no delayed frames at all).
    Some tweaks to VBV ratecontrol and lookahead (in addition to those required by sliced threading).
    Commit sponsored by a media streaming company that wishes to remain anonymous.
  • r1363: Add more detailed help for presets/tunes/profiles
    Shows what options they represent.
  • r1362: qpel RD no longer needs mbcmp_unaligned
Version r1361
  • Release Date: Dec 9, 2009
  • ensure that all boolean options are {0,1} so they print consistently in the options SEI
Version r1360
  • Release Date: Dec 6, 2009
  • r1360: Actually do r1356
    Somehow commit r1356 got lost in the ether. I'm not sure how, but now it's fixed.
  • r1359: Remove some unused code from x264.c
  • r1358: SSSE3 version of zigzag_8x8_field
    Slightly faster interlaced encoding with 8x8dct.
    Helps most on Nehalem, somewhat disappointing on Conroe/Penryn.
  • r1357: Fix crash in interlaced with >8 refs
    Crash introduced in weightp.
  • r1356: Significantly faster qpel-RD
    Cache the results of MC, like in bidir-RD.
    Slightly changes output due to the necessary reordering of satd/RD calls.
    5-10% faster qpel-RD.
  • r1355: Add x264 prefix to functions with ffmpeg equivalents
    Not important now, but will be when we add libav* input support.
Version r1354
  • Release Date: Nov 30, 2009
  • 10L in r1353
    Broke mp4 output.
Version r1353
  • Release Date: Nov 30, 2009
  • Enhanced Avisynth input support
    Requires avisynth_c.h from the Avisynth API headers.
    Reports errors properly from Avisynth script input.
    Automatically construct input scripts for almost any input file.
    Tries ffmpegsource2, DSS2, directshowsource, and many other sourcing methods, based on the input file extension.
    Automatically converts to YV12.
Version r1352
  • Release Date: Nov 28, 2009
  • r1352: Much faster weightp
    Move sum/ssd calculation out of lookahead and do it only once per frame.
    Also various minor optimizations, cosmetics, and cleanups.
  • r1351: Fix bugs in fps/timestamp handling in FLV muxer
  • r1350: Fix bug in weightp analysis
    Weights weren't reset upon early terminations, so old (wrong) weights could stick around.
    Small compression improvement.
  • r1349: Minor deblocking optimization, update comments
  • r1348: Fix weightb with delta_poc_bottom
    Has no effect yet, but will be required once we add TFF/BFF signalling support in interlaced mode.
    Gives 0.5-0.7% better compression with proper TFF/BFF signalling.
Version r1347
  • Release Date: Nov 24, 2009
  • r1347: Give more meaningful error if 1st/2nd pass resolution differ
  • r1346: Fix extremely rare deadlock with sync-lookahead
    Patch partially by Anton Mitrofanov.
  • r1345: Only print weightp stats if there were P-frames
  • r1344: Faster lookahead with subme=1
    If it hasn't been clear already, don't use subme=1 as a "fast first pass" option.
    Use subme=2 instead; 1 and below now enable a fast (lower quality) lookahead mode.
  • r1343: Faster weightp analysis
    Modify pixel_var slightly to return the necessary information and use it for weight analysis instead of sad/ssd.
    Various minor cosmetics.
Version r1342
  • Release Date: Nov 16, 2009
  • r1342: Fix two issues in weightp
    If analysis decided on an offset of -128, x264 would create non-compliant streams.
    Fix some cases with nearly all intra blocks where analysis could pick very weird weights.
    Also add some asserts to check compliancy.
  • r1341: Allow compilation with non-Apple GCC on OS X
  • r1340: Use __attribute__((may_alias)) for type-punning GCC thinks pointer casts to unions aren't valid with strict aliasing.
    See http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#Type_002dpunning.
    Also use M32() in y4m.c.
    Enable -Wstrict-aliasing again since all such warnings are fixed.
Version r1339
  • Release Date: Nov 16, 2009
  • r1339: 100l in deadlock fix
  • r1338: FLV muxing support
  • r1337: Fix rare deadlock introduced in weightp
Version r1336
  • Release Date: Nov 14, 2009
  • r1336: Actually add -Wno-strict-aliasing to configure
  • r1335: Various weightp fixes
    Make weightp results match in threaded vs non-threaded mode.
    Fix two-pass with slow-firstpass.
  • r1334: Fix all aliasing violations
    New type-punning macros perform write/read-combining without aliasing violations per the second-to-last part of 6.5.7 in the C99 specification.
    GCC 4.4, however, doesn't seem to have read this part of the spec and still warns about the violations.
    Regardless, it seems to fix all known aliasing miscompilations, so perhaps the GCC warning generator is just broken.
    As such, add -Wno-strict-aliasing to CFLAGS.
  • r1333: Fix 10l in weightp on ARM
Version r1332
  • Release Date: Nov 10, 2009
  • Fix one (of possibly many) miscompilations in weightp
    Use NOINLINE and some emms calls to fix emms reordering issues.
    This issue occurred with some GCC versions if threads > 1 and the phase of the moon was right.
    Also a cosmetic in x264.c.
Version r1331
  • Release Date: Nov 10, 2009
  • Fix pixel_ssd on win64
    Didn't preserve XMM registers, may or may not have caused problems.
Version r1330
  • Release Date: Nov 9, 2009
  • r1330: Fix weightp logfile parsing on MinGW
  • r1329: cosmetics
  • r1328: Fix weightp on ARM + PPC
    No ARM or PPC assembly yet though.
  • r1327: Weighted P-frame prediction
    Merge Dylan's Google Summer of Code 2009 tree.
    Detect fades and use weighted prediction to improve compression and quality.
    "Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033.
    "Smart", the default mode, also performs fade detection and decides weights accordingly.
    MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades.
    If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed.
    However, it will be used to adjust quality instead of create actual weights.
    This will improve quality in fades when encoding in Baseline profile.

    Doesn't add support for interlaced encoding with weightp yet.
    Only adds support for luma weights, not chroma weights. Internal code for chroma weights is in, but there's no analysis yet.
    Baseline profile requires that weightp be off. All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks.
    "Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames.

    Thanks to Google for sponsoring our most successful Summer of Code yet!
  • r1326: Fix assert failure in the case of forced i-frames
    Note that this applies to non-IDR i-frames, not IDR-frames.
    This fix is also required for future open-gop.
  • r1325: Fix issues relating to input/output files being pipes/FIFOs
  • r1324: Various ARM-related fixes
    Fix comment for mc_copy_neon.
    Fix memzero_aligned_neon prototype.
    Update NEON (i)dct_dc prototypes.
    Duplicate x86 behavior for global+hidden functions.
  • r1323: Fix miscompilation with gcc 4.3 on ARM
    Aliasing violation in spatial prediction caused nasty artifacts.
    Shut up two other GCC warnings while we're at it.
  • r1322: Fix extremely rare infinite loop in 2-pass VBV
    Implicit conversion from double->float lost enough precision to cause the loop termination condition to never trigger.
    Bug report by Tal Aloni.
  • r1321: Fix large file support, broken in r1302
  • r1320: Dramatically reduce size of pixel_ssd_* asm functions
    ~10k of code size eliminated.
  • r1319: fix bottom-right pixel of lowres planes, which was uninitialized.
    weirdly, valgrind reported this only with --no-asm.
Version r1318
  • Release Date: Oct 31, 2009
  • r1318: Further reduce code size in bime
    ~7-8 kilobytes saved, ~0.6% faster subme 9.
  • r1317: Fix case in which MB-tree didn't propagate all data correctly
    Should improve quality in all cases.
    Also some minor cosmetic improvements.
  • r1316: Take into account chroma MV offset during interlaced motion search
    Small improvement in interlaced compression.
  • r1315: Slightly faster ssse3 width4 chroma MC
    Cacheline-aware in the same fashion as width8, but not conditional.
  • r1314: Eliminate some rare cases where MB-tree gave incorrect results in B-frames
    Also get rid of some unnecessary memcpies.
  • r1313: Fix cases in which b-adapt 1 could result in AUTO-type frames.
    This didn't actually cause any issues, but it removes the need for the fixing-up code that prevented said issues.
  • r1312: Motion compensation optimizations
    Turning off inlining saves a whole boatload of code size for near-zero speed cost.
    Simplify offset calculation.
    Various other optimizations.
  • r1311: Minor CAVLC optimizations
Version r1310
  • Release Date: Oct 26, 2009
  • cosmetics
Version r1309
  • Release Date: Oct 26, 2009
  • r1309: ISC-license x86inc.asm
    As the assembly abstraction layer is very useful in non-x264 projects, it is now ISC (simplified BSD) so that others, even in commercial projects, can use it as well.
  • r1308: Various minor CABAC optimizations
  • r1307: Fix bug in b-pyramid strict
    Bug caused invalid streams in some situations.
  • r1306: Remove non-mod16 warning
    Compression only "suffers" by an extremely marginal amount and too many people misinterpret the warning.
  • r1305: Fix two warnings + some minor optimizations
  • r1304: Fix a typo in b-pyramid help
    And an errant space in common/macroblock.c
  • r1303: A bit more write-combining in macroblock_cache_load
Version r1302
  • Release Date: Oct 25, 2009
  • split muxers.c into one file per format
    simplify internal muxer API
Version r1301
  • Release Date: Oct 20, 2009
  • r1301: Update fprofile with the latest change to b-pyramid
  • r1300: Fix assertion fail and incorrect costs with pyramid+VBV
    Deal properly with QPfile'd B-refs. x264 should handle multiple B-refs per minigop now, though only via forced frametypes.
  • r1299: Improve CRF initial QP selection, fix get_qscale bug
    If qcomp=1 (as in mb-tree), we don't need ABR_INIT_QP.
    get_qscale could give slightly weird results with still images
  • r1298: Print more accurate error message if dump_yuv fails
  • r1297: Reduce memory usage of b-adapt 2 trellis
    Also fix a minor bug where the algorithm ignored the last frame in the trellis.
  • r1296: Make B-pyramid spec-compliant
    The rules of the specification with regard to picture buffering for pyramid coding are widely ignored.
    x264's b-pyramid implementation, despite being practically identical to that proposed by the original paper, was technically not compliant.
    Now it is.
    Two modes are now available:
    1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames)
    2) normal b-pyramid, which is like the old mode except fully compliant.
    This patch also adds MMCO support (necessary for compliant pyramid in some cases).
    MB-tree still doesn't support b-pyramid (but will soon).
  • r1295: Add missing free for nal_buffer
    Fixes a memory leak.
  • r1294: sync yasm macros to ffmpeg
  • r1293: eliminate some divisions
Version r1294
  • Release Date: Oct 19, 2009
  • r1294: sync yasm macros to ffmpeg
  • r1293: eliminate some divisions
Version r1292
  • Release Date: Oct 13, 2009
  • r1292: Fix glitches with slow-firstpass + weightb + multiref + 2pass
    Bug in r1277
  • r1291: Simplify some code in b-adapt 2's trellis
  • r1290: Fix a very rare integer overflow in slicetype analysis
    Caused an assert failure when it occurred.
    Bug is as old as adaptive B-frames.
  • r1289: Reduce the aggressiveness of 2-pass VBV
    Now that B-frames are properly covered, we don't have to be as aggressive.
    This eliminates some issues with skyrocketing QPs in B-frames in 2-pass VBV.
Version r1288
  • Release Date: Oct 13, 2009
  • r1288: Fix regression: disable flash detection without B-frames
  • r1287: change all dct arrays to 1d.
    the C standard doesn't allow you to iterate 1-dimensionally over 2d arrays, and nothing other than the dsp functions themselves cares about the 2dness of dct.
    this fixes a miscompilation in x264_mb_optimize_chroma_dc.
  • r1286: Add row-based VBV for B-frames
    While B-frames still aren't explicitly covered by ratecontrol, this should resolve issues of VBV underflows due to larger-than-expected B-frames.
  • r1285: Improve VBV, fix bug in 2-pass VBV introduced in MB-tree
    Bug caused AQ'd row/frame costs to not be calculated (and thus caused underflows).
    Also make VBV more aggressive with more threads in 2-pass mode.
    Finally, --ratetol now affects VBV aggressiveness (higher is less aggressive).
  • r1284: Optimize exp2fix8
    Slightly faster and more accurate rounding.
  • r1283: Avoid scenecuts in flashes and similar situations
    "Flashes" are defined as any scene which lasts a very short period before a previous scene returns.
    A common example of this is of course a camera flash.
    Accordingly, look ahead during scenecut analysis and rule out the possibility of certain frames being scenecuts. Also handles cases of tons of short scenes in sequence and avoids making those scenecuts as well.
    Can only catch flashes of 1 frame in length with b-adapt 1.
    With b-adapt 2, can catch flashes of length --bframes. Speed cost should be negligible.
  • r1282: Fix bug where x264 generated non-compliant bitstreams with insane SAR values
Version r1281
  • Release Date: Oct 8, 2009
  • r1281: rm msvc project files and related ifdefs
  • r1280: SSE4 version of 4x4 idct
    27->24 clocks on Nehalem.
    This is really just an excuse to use "movsd" in a real function.
    Add some comments to subsum-related macros in x86util.
  • r1279: Constrained intra prediction support
    Enable with --constrained-intra. Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases.
    Commit sponsored by a media streaming company that wishes to remain anonymous.
  • r1278: Slightly improve non-RD p8x8 mode decision
    Subpartition costs are effectively zero in CABAC if sub-8x8 search is off.
  • r1277: Reorder reference frames optimally on second pass
    About +0.1-0.2% compression at normal bitrates, up to +1% at very low bitrates.
    Only works if the first pass uses the same number of refs as the second (i.e. not with fast first pass).
    Thus, only worthwhile at insanely slow speeds: as such, enable slow-firstpass by default with preset placebo.
    Note that this changes the stats file format!
  • r1276: Fix typo in ratecontrol_summary
  • r1275: Clip log2_max_frame_num
    It's still much higher than it needs to be, but that will be fixed with the upcoming MMCO patch.
    Also make sure we don't write too large a frame_num or poc in slice header.
  • r1274: Fix some issues with 3-pass statsfile handling
    The value of i_frame during encoder_close was incorrect.
  • r1273: Fix ctrl-C termation message with few frames encoded
  • r1272: Add support for single-frame VBV, improve compliance
    This allows both constant-framesize and capped-framesize encoding.
    Literal constant framesize isn't actually supported yet due to the lack of filler support.
    Example with 30fps video: --vbv-bufsize 200 --vbv-maxrate 6000 will ensure that no frame is ever larger than 200 kilobits.
    One example use-case of this is for zero-delay streaming where bandwidth costs need to be minimized. If every frame is smaller than 200 kilobits and the client has a 6 megabit connection, every single frame can be instantly sent to the client and handled without any decoder-side buffer.
    Fix a mistake in VBV calculation--this may have caused the VBV to be slightly non-compliant in some situations without x264 realizing it.
    Add primitive prediction handling for rows with quantizers lower than their reference. This slightly improves VBV in CBR mode. Various other minor improvements to VBV, mostly to make single-frame VBV work.
    Commit sponsored by a media streaming company that wishes to remain anonymous.
Version r1271
  • Release Date: Sep 25, 2009
  • r1271: Fix 10l in API change
    frame_num was set to 1, not 0, for the first frame. This broke spec compliance.
    Didn't actually seem to cause any problems though except for breaking decoding on Quicktime.
  • r1270: Allow user-set FPS for inputs other than YUV
  • r1269: Improve threaded frame handling
    Avoid unnecessary cond_wait
Version r1268
  • Release Date: Sep 24, 2009
  • r1268: Attempt to detect miscompilation due to bug in gcc 4.2
    I don't know if this bug still affects latest x264, but it can't hurt to try to detect it.
    Accordingly refuse to open the encoder if detected.
    Apparently VLC (on Windows) has been distributed for some time with a completely broken x264 due to the use of a completely broken compiler (gcc 4.2). In particular, the MV costs seem to be calculated incorrectly on win32 when linking from an application compiled without -ffast-math to an application with -ffast-math.
    I am not entirely certain why this occurs, but the result is, unsurprisingly, encoding quality that makes MPEG-2 look good, due to the motion search being completely broken.
  • r1267: Really fix encoder_close crash this time
    Not-entirely-fixed in r1253.
  • r1266: Check for 16x16 partitions masquerading as smaller ones
    Saves a few bits when using qpel-RD.
  • r1265: Update config.guess/sub; add Snow Leopard support
  • r1264: Fix integer overflow in 2-pass VBV
    Bug caused slight undersizing in 2-pass mode in some cases.
  • r1263: Fix bug with various bizarre commandline combinations and mbtree
    Second pass would have mbtree on even though the first pass didn't (and thus encoding would immediately fail).
  • r1262: Add intra prediction modes to output stats
    Also eliminate some NANs in stat output with intra-only encoding.
    Marginal speedup: disable stat calculation if log level is below X264_LOG_INFO.
    Various minor cosmetics.
  • r1261: Overhaul syntax in muxers.c/matroska.c
    The inconsistent syntax in these files has finally come to an end.
  • r1260: Major API change: encapsulate NALs within libx264
    libx264 now returns NAL units instead of raw data. x264_nal_encode is no longer a public function.
    See x264.h for full documentation of changes.
    New parameter: b_annexb, on by default. If disabled, startcodes are replaced by sizes as in mp4.
    x264's VBV now works on a NAL level, taking into account escape codes.
    VBV will also take into account the bit cost of SPS/PPS, but only if b_repeat_headers is set.
    Add an overhead tracking system to VBV to better predict the constant overhead of frames (headers, NALU overhead, etc).
Version r1259
  • Release Date: Sep 15, 2009
  • r1259: Add missing fclose for mbtree input statsfile on second pass
    Bug report by VFRmaniac
  • r1258: Improve progress indicator behavior
    Progress indicator will now indicate based on output frame, not input frame.
  • r1257: Update yasm configure check
    lzcnt apparently requires yasm 0.6.2.
  • r1256: Make MV costs global instead of static
    Fixes some extremely rare threading race conditions and makes the code cleaner.
    Downside: slightly higher memory usage when calling multiple encoders from the same application.
  • r1255: Don't print scenecut message multiple times in verbose mode
    Occurred mostly with b-adapt 2.
  • r1254: Optimize rounding of luma and chroma DC coefficients
    Reduce bitrate mostly-losslessly at low quantizers. In some rare cases, bitrate reduction may be as high as 10%.
    Luma rounding optimization (helps much less than chroma) requires trellis.
  • r1253: Fix crash if encoder_close is called before delayed frames are flushed
    Also no longer flush frames when ctrl-Cing x264, so x264 will close faster.
  • r1252: Improve x264 help
    Now has three help options: --help, --longhelp, and --fullhelp.
    --help only shows the most basic options; most users should not need more than these.
    Add usage examples.
    Fix typo in a comment.
Version r1251
  • Release Date: Sep 7, 2009
  • r1251: Factor out a redundant RD call in qpel-RD
    Fixes a problem that was supposed to be, but didn't, get fully fixed in r1238.
  • r1250: Fix RD early-skip
    Small quality improvement and speedup, was broken by r1214.
  • r1249: Faster CAVLC mb header writing for B macroblocks
  • r1248: Compile fixes for pre-ARMv6T2 and/or PIC
Version r1247
  • Release Date: Sep 3, 2009
  • Change priority handling on some OSs
    Instead of setting the lookahead thread to max priority, lower all the other threads' priorities instead.
    This is particularly useful when the "max priority" is "realtime", as in Windows, which can cause some problems.
Version r1246
  • Release Date: Sep 2, 2009
  • r1246: Threaded lookahead
    Move lookahead into a separate thread, set to higher priority than the other threads, for optimal performance.
    Reduces the amount that lookahead bottlenecks encoding, greatly increasing performance with lookahead-intensive settings (e.g. b-adapt 2) on many-core CPUs.
    Buffer size can be controlled with --sync-lookahead, which defaults to auto (threads+bframes buffer size).
    Note that this buffer is separate from the rc-lookahead value.
    Note also that this does not split lookahead itself into multiple threads yet; this may be added in the future.
    Additionally, split frames into "fdec" and "fenc" frame types and keep the two separate.
    This split greatly reduces memory usage, which helps compensate for the larger lookahead size.
    Extremely special thanks to Michael Kazmier and Alex Giladi of Avail Media, the original authors of this patch.
  • r1245: Force a link error in case of incompatible API
    This is because the number of bug reports due to miscompiled ffmpeg builds is reaching critical mass.
    The name of x264_encoder_open is now #defined based on the current X264_BUILD.
    Note that this changes the calling convention required for dlopen, but not for ordinary calls to x264_encoder_open.
  • r1244: Get rid of "CBR" descriptor from qcomp
    Though technically accurate in some vague way, I have never actually seen this option used correctly, rather it has been used by hundreds of people who can't read the documentation and believe that qcomp=0 is what should be used for CBR encoding.
Version r1243
  • Release Date: Sep 1, 2009
  • r1243: Faster me=tesa
    But it still spends all too much time in me_search_ref rather than asm.
  • r1243: Multi-slice encoding support
    Slicing support is available through three methods (which can be mixed):
    --slices sets a number of slices per frame and ensures rectangular slices (required for Blu-ray). Overridden by either of the following options:
    --slice-max-mbs sets a maximum number of macroblocks per slice.
    --slice-max-size sets a maximum slice size, in bytes (includes NAL overhead).
    Implement macroblock re-encoding support to allow highly accurate slice size limitation. Might be useful for other things in the future, too.
  • r1241: Fix a valgrind warning in b-adapt 2
Version r1240
  • Release Date: Aug 30, 2009
  • fix asm symbols for oprofile (regression in r1221)
Version r1239
  • Release Date: Aug 29, 2009
  • r1239: Fix bug in intra analysis in B-frames
    i8x8/i4x4 never got analysed when fast_intra was toggled and RD was off; up to a 2-3% quality improvement in non-RD mode.
    With this bug dating back to r369, this is probably the second-oldest bug ever fixed in x264.
  • r1238: Fix bug in b16x16 qpel RD
    Incorrect cost was used to initialize the search.
  • r1237: Check minimum chroma QP in addition to luma QP during CQM init
    Correctly error out if the implied minimum chroma QP is too low.
    Add missing emms to checkasm macroblock_tree_propagate test.
  • r1236: Faster mbtree propagate and x264_log2, less memory usage
    Avoid an int->float conversion with a small table.
    Change lowres_inter_types to a bitfield; cut its size by 75%.
    Somewhat lower memory usage with lots of bframes.
    Make log2/exp2 tables global to avoid duplication.
Version r1235
  • Release Date: Aug 27, 2009
  • r1235: Fix keyint=1 + VBV + rc-lookahead
  • r1234: Faster x264_exp2fix
    22->13 cycles on Core 2 with mfpmath=sse
  • r1233: compile x86 with fpmath=sse by default
Version r1232
  • Release Date: Aug 25, 2009
  • r1232: ARM configure: enable NEON-related options by default
    When compiling for ARM, x264 will compile by default for Cortex A8 unless specified otherwise.
    To compile for pre-ARMv6, --disable-asm is required.
  • r1231: 2-pass VBV fixes
    Properly run slicetype frame cost with 2pass + MB-tree.
    Slash the VBV rate tolerance in 2-pass mode; increasing it made sense for the highly reactive 1-pass VBV algorithm, but not for 2-pass.
    2-pass's planned frame sizes are guaranteed to be reasonable, since they are based on a real first pass, while 1-pass's, based on lookahead SATD, cannot always be trusted.
  • r1230: GSOC merge part 8: ARM NEON intra prediction assembly functions (partial)
    4x4 dc/h/ddr/ddl, 8x8 dc/h, 8x8c h/v, 16x16 dc/h/v
  • r1229: GSOC merge part 7: ARM NEON deblock assembly functions (partial)
    Originally written for ffmpeg by Mans Rullgard; ported by David.
    Luma and chroma inter deblocking; no intra yet.
  • r1228: GSOC merge part 6: ARM NEON quant assembly functions (partial)
    (de)quant 4x4, (de)quant 8x8, (de)quant DC, coeff_last
  • r1227: GSOC merge part 5: ARM NEON dct assembly functions
    (i)dct4x4dc, (i)dct4x4, (i)dct8x8, (i)dct_dc, zigzag_scan_frame_4x4
  • r1226: GSOC merge part 4: ARM NEON mc assembly functions
    prefetch, memcpy_aligned, memzero_aligned, avg, mc_luma, get_ref, mc_chroma, hpel_filter, frame_init_lowres
  • r1225: GSOC merge part 3: ARM NEON pixel assembly functions
    SAD, SADX3/X4, SSD, SATD, SA8D, Hadamard_AC, VAR, VAR2, SSIM
  • r1224: GSOC merge part 2: ARM stack alignment
    Neither GCC nor ARMCC support 16 byte stack alignment despite the fact that NEON loads require it.
    These macros only work for arrays, but fortunately that covers almost all instances of stack alignment in x264.
  • r1223: Fix unaligned accesses in bitstream writer
    Fixes x264 on CPUs with no unaligned access support (e.g. SPARC).
    Improves performance marginally on CPUs with penalties for unaligned stores (e.g. some x86).
Version r1222
  • Release Date: Aug 21, 2009
  • r1222: Fix bug in calculation of I-frame costs with AQ.
  • r1221: GSOC merge part 1: Framework for ARM assembly optimizations
    x264 will detect which ARM core it's building for and only build NEON asm if the target is ARMv6 or above, then enable NEON at runtime.
  • r1220: Fix a bug in checkasm and two OSX fixes
    MC chroma checkasm test could crash in some situations
    Remove -lmx, as it's not needed and the iPhone doesn't have it.
    Remove unused sqrtf emulation; it breaks if math.h is included.
  • r1219: Improve QPRD
    Always check the last macroblock's QP, even if the normal search doesn't reach it.
    Raise the failure threshold when moving towards the last macroblock's QP.
    0.2-1% improved compression.
  • r1218: Fix MB-tree with keyint<3
    Also slightly improve VBV keyint handling.
Version r1217
  • Release Date: Aug 19, 2009
  • r1217: Fix bug in VBV lookahead + no MB-tree
    I-frames need to have VBV lookahead run on them as well.
  • r1216: Add support for frame-accurate parameter changes
    Parameter structs can now be passed with individual frames.
    The previous method would only change the parameter of what was currently being encoded, which due to delay might be very far from an intended exact frame.
    Also add support for changing aspect ratio. Only works in a stream with repeating headers and requires the caller to force an IDR to ensure instant effect.
  • r1215: Fix x264_encoder_reconfig with multithreading
    New behavior: reconfigging the encoder will result in changes being applied
    to each of the encoding threads as they finish encoding the current frame.
Version r1214
  • Release Date: Aug 18, 2009
  • r1214: Fix two bugs in QPRD
    QPRD could in some cases force blocks to skip when they shouldn't be ~(+0.01db)
    Force QPRD to abide by qpmin/qpmax restrictions.
  • r1213: Lookahead VBV
    Use the large-scale lookahead capability introduced in MB-tree for ratecontrol purposes.
    (Does not require MB-tree, however.)
    Greatly improved quality and compliance in 1-pass VBV mode, especially in CBR; +2db OPSNR or more in some cases.
    Fix some other bugs in VBV, which should improve non-lookahead mode as well.
    Change the tolerance algorithm in row VBV to allow for more significant mispredictions when buffer is nearly full.
    Note that due to the fixing of an extremely long-standing bug (>1 year), bitrates may change by nontrivial amounts in CRF without MB-tree.
  • r1212: Fix bug in b-adapt 1
    B-adapt 1 didn't use more than MAX(1,bframes-1) B-frames when MB-tree was off.
  • r1211: Fix a potential failure in VBV
    If VBV does underflow, ratecontrol could be permanently broken for the rest of the clip.
    Revert part of the previous VBV changes to fix this.
Version r1210
  • Release Date: Aug 14, 2009
  • r1210: new API function x264_encoder_delayed_frames.
    fix x264cli on streams whose total length is less than the encoder latency.
  • r1209: Add no-mbtree to fprofile (and fix pyramid in fprofile)
  • r1208: Don't print a warning about direct=auto in 2pass when B-frames are off
  • r1207: fix lowres padding, which failed to extrapolate the right side for some resolutions.
    fix a buffer overread in x264_mbtree_propagate_cost_sse2. no effect on actual behavior, only theoretical correctness.
    fix x264_slicetype_frame_cost_recalculate on I-frames, which previously used all 0 mb costs.
    shut up a valgrind warning in predict_8x8_filter_mmx.
Version r1206
  • Release Date: Aug 10, 2009
  • r1206: simd part of x264_macroblock_tree_propagate.
    1.6x faster on conroe.
  • r1205: MB-tree fixes:
    AQ was applied inconsistently, with some AQed costs compared to other non-AQed costs. Strangely enough, fixing this increases SSIM on some sources but decreases it on others. More investigation needed.
    Account for weighted bipred.
    Reduce memory, increase precision, simplify, and early terminate.
Version r1204
  • Release Date: Aug 9, 2009
  • r1204: Add missing free()s for new data allocated for MB-tree
    Eliminates a memory leak.
  • r1203: Fix keyframe insertion with MB-tree and no B-frames
  • r1202: Fix MP4 output (bug in malloc checking patch)
Version r1201
  • Release Date: Aug 8, 2009
  • r1201: Gracefully terminate in the case of a malloc failure
    Fuzz tests show that all mallocs appear to be checked correctly now.
  • r1200: Fix a potential infinite loop in QPfile parsing on Windows
    ftell doesn't seem to work properly on Windows in text mode.
  • r1199: Fix delay calculation with multiple threads
    Delay frames for threading don't actually count as part of lookahead.
Version r1198
  • Release Date: Aug 7, 2009
  • r1198: Add "veryslow" preset
    Apparently some people are actually *using* placebo, so I've added this preset to bridge the gap.
  • r1197: Macroblock-tree ratecontrol
    On by default; can be turned off with --no-mbtree.
    Uses a large lookahead to track temporal propagation of data and weight quality accordingly.
    Requires a very large separate statsfile (2 bytes per macroblock) in multi-pass mode.
    Doesn't work with b-pyramid yet.
    Note that MB-tree inherently measures quality different from the standard qcomp method, so bitrates produced by CRF may change somewhat.
    This makes the "medium" preset a bit slower.
    Accordingly, make "fast" slower as well, and introduce a new preset "faster" between "fast" and "veryfast".
    All presets "fast" and above will have MB-tree on. Add a new option, --rc-lookahead, to control the distance MB tree looks ahead to perform propagation analysis.
    Default is 40; larger values will be slower and require more memory but give more accurate results.
    This value will be used in the future to control ratecontrol lookahead (VBV).
    Add a new option, --no-psy, to disable all psy optimizations that don't improve PSNR or SSIM.
    This disables psy-RD/trellis, but also other more subtle internal psy optimizations that can't be controlled directly via external parameters.
    Quality improvement from MB-tree is about 2-70% depending on content.
    Strength of MB-tree adjustments can be tweaked using qcompress; higher values mean lower MB-tree strength.
    Note that MB-tree may perform slightly suboptimally on fades; this will be fixed by weighted prediction, which is coming soon.
  • r1196: Various 1-pass VBV tweaks
    Make predictors have an offset in addition to a multiplier.
    This primarily fixes issues in sources with lots of extremely static scenes, such as anime and CGI.
    We tried linear regressions, but they were very unreliable as predictors.
    Also allow VBV to be slightly more aggressive in raising QPs to avoid not having enough bits left in some situations.
    Up to 1db improvement on some clips.
Version r1195
  • Release Date: Jul 29, 2009
  • r1195: Fix another 10L in QPRD
    An entry in subpel_iterations was missing.
    I have no idea how QPRD was working at all without this change.
  • r1194: Update help and cleanup in ratecontrol.c
    Deal with some out-of-date information.
  • r1193: 15% faster refine_bidir_satd, 10% faster refine_bidir_rd (or less with trellis=2)
    re-roll a loop (saves 44KB code size, which is the cause of most of this speed gain)
    don't re-mc mvs that haven't changed
Version r1192
  • Release Date: Jul 28, 2009
  • r1192: Faster bidir_rd plus some bugfixes
    Cache chroma MC during refine_bidir_rd and use both the luma and chroma caches to skip MC in macroblock_encode.
    Fix incorrect call to rd_cost_part; refine_bidir_rd output was incorrect for i8>0.
    Remove some redundant clips.
    ~12% faster refine_bidir_rd.
  • r1191: Add "fastdecode" tune option
    It does what it says it does.
Version r1190
  • Release Date: Jul 28, 2009
  • Fix two bugs in QPRD
    fprofile settings now actually fprofile QPRD.
    Don't use i_mbrd before initializing it.
Version r1189
  • Release Date: Jul 27, 2009
  • r1189: Fix 10l in QPRD
    Trellis used wrong lambda with trellis=1
  • r1188: Fix a nondeterminism with threads and subme>7
    Also add a few more checks to eliminate the need for spel_border.
  • r1187: Add QPRD support as subme=10
    Refactor trellis lambda selection to be done in analyse_init instead of in trellis.
    This will allow for more easy adaption of lambda later on; for now it allows constant lambda across variable QPs.
    QPRD is only available with adaptive quantization enabled and generally improves SSIM and visual quality.
    Additionally, weight the SSD values from RD based on the relative QP offset for chroma; helps visually at high QPs where chroma has a lower QP than luma.
    This fixes some visual artifacts created by QPRD at high QPs.
    Note that this generally hurts PSNR and SSIM, and so is only on when psy-RD is on.
  • r1186: SSSE3 cachesplit workaround for avg2_w16
    Palignr-based solution for the most commonly used qpel function.
    1-1.5% faster overall on Core 2 chips.
Version r1185
  • Release Date: Jul 23, 2009
  • shut up valgrind warnings in trellis
Version r1184
  • Release Date: Jul 20, 2009
  • New AQ algorithm option
    "Auto-variance" uses log(var)^2 instead of log(var) and attempts to adapt strength per-frame.
    Generates significantly better SSIM; on by default with --tune ssim.
    Whether it generates visually better quality is still up for debate.
    Available as --aq-mode 2.
Version r1183
  • Release Date: Jul 17, 2009
  • r1183: Cacheline-split SSSE3 chroma MC
    ~70% faster chroma MC on 32-bit Conroe
    Also slightly faster SSSE3 intra_sad_8x8c
  • r1182: Improve documentation of qp/crf options
Version r1181
  • Release Date: Jul 11, 2009
  • r1181: Merge array_non_zero into zigzag_sub
    Faster lossless, cleaner code.
    SSSE3 version of zigzag_sub_4x4_field, faster lossless interlaced coding.
  • r1180: Fix bug in reference frame autoadjustment
    For some types of input file, x264 did the adjustment before width/height were known.
Version r1179
  • Release Date: Jul 9, 2009
  • r1179: Fix fprofile settings to match changes in defaults
    Also add b-adapt 2 to fprofile.
  • r1178: Slightly faster dequant_flat assembly
    Eliminate some redundant shifts.
  • r1177: Totally new preset system for x264.c (not libx264), new defaults
    Other new features include "tune" and "profile" settings; see --help for more details.
    Unlike most other settings, "preset" and "tune" act before all other options.
    However, "profile" acts afterwards, overriding all other options.
    Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress.
    Users will hopefully find these changes to greatly improve usability.
  • r1176: Update Gabriel's email address in AUTHORS
  • r1175: Early termination for chroma encoding
    Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only.
    This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only.
    Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8. mmx/sse2/ssse3 versions of each.
    Early termination is disabled at very low QPs due to it not being useful there.
    Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2.
    Increase is greater with lower bitrates.
  • r1174: Fix bug in checkasm
    frame_init_lowres_core check didn't check the C plane.
    However, all x86 and PPC assembly was correct regardless of the unit test being incorrect.
Version r1173
  • Release Date: Jun 26, 2009
  • r1173: Add subpartition cost for sub-8x8 blocks
    Improves sub-p8x8 mode decision.
  • r1172: Yet more CABAC and CAVLC optimizations
    Also clean up a lot of pointless code duplication in CAVLC MV coding.
Version r1171
  • Release Date: Jun 23, 2009
  • Various CABAC optimizations and cleanups
  • Faster CABAC CBF context calculation for inter blocks.
  • Add x264_constant_p(), will probably be useful in the future as well.
  • Simpler subpartition functions.
  • Clean up and optimize mvd_cpn a bit more.
  • Various other minor optimizations.
Version r1170
  • Release Date: Jun 21, 2009
  • AltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x on PPC970MP.
Version r1169
  • Release Date: Jun 20, 2009
  • r1169: MMX CABAC mvd sum calculation
    Faster CABAC mvd coding.
  • r1168: Faster MV prediction
    Smaller code size, plus I get to use goto.
  • r1167: Fix potential crash in checkasm
    ssim_end4_sse2 requires aligned sums
  • r1166: SSSE3, faster SSE2/MMX integral_init4v
    The real reason I wrote this was an excuse to use shufpd.
Version r1165
  • Release Date: Jun 11, 2009
  • r1165: configure check for uclinux
  • r1164: fix a crash on frame width <= 48 pixels
Version r1163
  • Release Date: May 29, 2009
  • r1163: configure check for cc, rather than reporting lack of compiler as an asm error.
    configure check for -mno-cygwin, since it's removed from gcc4.
  • r1162: a better way to keep track of mv candidates.
    2-4% faster dia, hex, and umh.
  • r1161: reorder some motion estimation patterns.
    this change is useless on its own, but segregates the bitstream-changing part out of my next optimization.
Version r1160
  • Release Date: May 27, 2009
  • Fix VBV warning broken in r915
    x264 will now correctly warn about maxrate specified without bufsize even when a level is not set.
Version r1159
  • Release Date: May 25, 2009
  • r1159: configure check for ssse3-capable binutils
  • r1158: Fix 10L in r1155
    Broke --me esa/tesa due to forgetting to add handling for x264_cost_mv_fpel.
  • r1157: Fix bug where satd was incorrectly used with subme<=1
    Faster subme<=1 with i4x4 enabled.
  • r1156: Remove some pointless error handling code in cabac/cavlc
  • r1155: Save some memory on mv cost arrays
    Have quantizers that use the same lambda share the same cost array.
  • r1154: Various CABAC and CAVLC optimizations
    Backport CAVLC partial-inlining early termination to CABAC (~2-4% faster CABAC residual coding)
Version r1153
  • Release Date: May 19, 2009
  • r1153: fix a race condition at the end of thread_input
  • r1152: Various trellis speed optimizations
  • r1151: Make i686 the default arch on x86_32
    Disabling asm will default to a generic arch.
    Also fix configure for gcc 4.4.
  • r1150: Faster signed golomb coding
    3% faster CAVLC RDO and bitstream writing.
  • r1149: Faster spatial direct MV prediction
    unroll/tweak col_zero_flag
Version r1148
  • Release Date: May 10, 2009
  • r1148: More CABAC and CAVLC optimizations
    Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding.
    Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3.
    Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.
  • r1147: Various optimizations in frametype lookahead
  • r1146: Some cosmetics/cleanup
    Move some macros to x86util.asm that should have been there to begin with.
    Fix a typo that didn't cause any issues.
Version r1145
  • Release Date: Apr 22, 2009
  • r1145: fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version)
  • r1144: fix conversions between vectors with differing element types or numbers of subparts errors
Version r1143
  • Release Date: Apr 20, 2009
  • r1143: Add "coded blocks" stat to output information.
    This measures the total percentage of blocks, intra and inter, which have nonzero coefficients.
    "y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks.
    Note that skip blocks are included in this stat.
  • r1142: Enable asm predict_8x8_filter
    I'm not entirely sure how this snuck its way out of holger's intra pred patch.
  • r1141: Remove various bits of dead code found by CLANG.
Version r1140
  • Release Date: Apr 15, 2009
  • Slightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM
    shufps is the most underrated SSE instruction on x86.
Version r1139
  • Release Date: Apr 10, 2009
  • r1139: Various CABAC optimizations
    Move calculation of b_intra out of the core residual loop and hardcode it where applicable.
    Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size. Inlining only cache_mvd is faster and significantly smaller.
  • r1138: CAVLC optimizations
    faster bs_write_te, port CABAC context selection optimization to CAVLC.
Version r1137
  • Release Date: Apr 7, 2009
  • Faster CABAC RDO
    Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding, it's faster to use a branch than a cmov.
Version r1136
  • Release Date: Apr 5, 2009
  • Activate intra_sad_x3_8x8c in lookahead
Version r1134
  • Release Date: Apr 1, 2009
  • r1134: intra_sad_x3_8x8 assembly
  • r1133: intra_sad_x3_4x4 assembly
  • r1132: intra_sad_x3_8x8c assembly
    Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)
  • r1131: Shave one instruction off CABAC encode_decision
    range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3
Version r1130
  • Release Date: Mar 27, 2009
  • Faster probe_skip
    Add a second chroma threshold after the DC transform.
Version r1129
  • Release Date: Mar 21, 2009
  • Add missing "static" qualifier to two arrays
    Should slightly improve performance.
Version r1128
  • Release Date: Mar 19, 2009
  • SSE2 zigzag_interleave
    Replace PHADD with FastShuffle (more accurate naming).
    This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.
Version r1127
  • Release Date: Mar 11, 2009
  • r1127: Faster integral_init
    palignr to avoid unaligned loads is worth it in inith, but not initv.
  • r1126: Faster SSSE3 hpel_filter_v
    ~10% faster hpel_filter on 64-bit Penryn.
    32-bit version by Jason Garrett-Glaser.
Version r1125
  • Release Date: Mar 9, 2009
  • r1125: Faster SSE2 pixel_var
    Optimized using the DEINTB method from r1122. ~32% faster var_16x16 on Conroe.
  • r1124: SSSE3 hpel_filter_v
    Optimized using the same method as in r1122. Patch partially by Holger.
    ~8% faster hpel filter on 64-bit Nehalem
Version r1123
  • Release Date: Mar 7, 2009
  • r1123: Update some asm copyright headers
  • r1122: Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT
    Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
    16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
    Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
    Overall performance boost is up to ~15% on 64-bit Conroe.
  • r1121: Update x264 copyright date
Version r1120
  • Release Date: Mar 6, 2009
  • Remove pre-scenecut from fprofile commands as well
    Also add psy-trellis to fprofile
Version r1119
  • Release Date: Mar 4, 2009
  • r1119: Slightly faster 8x16 SAD on Penryn Core 2
    Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case.
    Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included.
  • r1118: Fix scenecut and VBV with videos of width/height <= 32
    Also remove an unused variable
  • r1117: Remove non-pre scenecut
    Add support for no-b-adapt + pre-scenecut (patch by BugMaster)
    Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways.
    Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1)
    Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2.
    Simplify pre-scenecut code.
  • r1116: Add AltiVec version of hadamard_ac. 2.4x faster than the C version.
    Note this this implementation is pretty naive and should be improved
    by implementing what's discussed in this ML thread:
    date: Mon, Feb 2, 2009 at 6:58 PM
    subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines
Version r1115
  • Release Date: Feb 27, 2009
  • Fix regression in r1085
    Deblocking was very slightly incorrect with partitions=all.
    Bug found by BugMaster.
Version r1114
  • Release Date: Feb 17, 2009
  • Optimize neighbor CBP calculation and fix related regression
  • r1105 introduced array overflow in cbp handling
Version r1113
  • Release Date: Feb 14, 2009
  • Show FPS when importing a raw YUV file
Version r1112
  • Release Date: Feb 13, 2009
  • r1112: Windows 64-bit support
    A "make distclean" is probably required after updating to this revision.
  • r1111: Minor fixes and cosmetics
    Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.
Version r1110
  • Release Date: Feb 12, 2009
  • fix 10l in 75b495f2723fcb77f
    Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )
Version r1109
  • Release Date: Feb 10, 2009
  • r1109: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors.
  • r1108: Promote chroma planes to 16 byte alignment.
    This will allow simplifying vectors loads that can only load 16-bytes aligned data (such as AltiVec).
  • r1107: Fix 10L in intra pred
    Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP).
Version r1106
  • Release Date: Feb 9, 2009
  • r1106: Add decimation in i16x16 blocks
    Up to +0.04db with CAVLC, generally a lot less with CABAC.
  • r1105: Much faster CABAC residual context selection
    Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO.M
    Up to 7% faster overall in extreme cases.
  • r1104: Faster coeff_last64 on 32-bit
  • r1103: More intra pred asm optimizations
    SSSE3 version of predict_8x8_hu
    SSE2 version of predict_8x8c_p
    SSSE3 versions of both planar prediction functions
    Optimizations to predict_16x16_p_sse2
    Some unnecessary REP_RETs -> RETs.
    SSE2 version of predict_8x8_vr by Holger.
    SSE2 version of predict_8x8_hd.
    Don't compile MMX versions of some of the pred functions on x86_64.
    Remove now-useless x86_64 C versions of 4x4 pred functions.
    Rewrite some of the x86_64-only C functions in asm.
Version r1102
  • Release Date: Feb 9, 2009
  • Speed-up mc_chroma_altivec by using vec_mladd cleverly, and unrolling.
    Also put width == 2 variant in its own scalar function because it's faster than a vectorized one.
Version r1101
  • Release Date: Feb 5, 2009
  • r1101: Merging Holger's GSOC branch part 2: intra prediction
    Assembly versions of most remaining 4x4 and 8x8 intra pred functions.
    Assembly version of predict_8x8_filter.
    A few other optimizations.
    Primarily Core 2-optimized.
  • r1100: 10l: fix compilation with GCC 4.3+
Version r1099
  • Release Date: Feb 4, 2009
  • r1099: Faster 8x8dct+CAVLC interleave
    Integrate array_non_zero with the CAVLC 8x8dct interleave function.
    Roughly 1.5-2x faster than the original separate array_non_zero method.
  • r1098: Measure CBP cost in i8x8 RD refinement
    ~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise.
    Allows a small optimization in i8x8 encoding.
Version r1097
  • Release Date: Feb 2, 2009
  • Take advantage of saturated signed horizontal sum instructions in the variance computation epilogue since there won't be any overflow triggering an overflow.
    Suggested by Loren Merritt
Version r1096
  • Release Date: Jan 31, 2009
  • Massive overhaul of nnz/cbp calculation
    Modify quantization to also calculate array_non_zero. PPC assembly changes by gpoirior.
    New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
    Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
    Also add new i16x16 DC-only iDCT with asm.
    Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
    Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
    Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
    Overall performance increase 0-6% depending on encoding settings.
Version r1095
  • Release Date: Jan 30, 2009
  • r1095: Add PowerPC support for "checkasm --bench", reading the time base register.
    This isn't ideal since the `time base' register is running at a fraction of the processor cycle speed, so the measurement isn't as precise as x86' rdtsc.
    It's better than nothing though...
  • r1094: fix detection of pthread and isfinite on OpenBSD
Version r1093
  • Release Date: Jan 29, 2009
  • r1093: remove $ECHON kludge, which broke on SunOS. bring back `gcc -MT`.
    remove auto-reconfigure on svn update, which has done nothing since we stopped using svn.
    fix $AS on sparc (was disabled by mmx check).
    fix --extra-asflags (was ignored).
    mark bash scripts as bash, not sh
    patch partly by Greg Robinson and Jugdish.
  • r1092: 1.6x faster satd_c (and sa8d and hadamard_ac) with pseudo-simd.
    60KB smaller binary.
  • r1091: Hack around a potential failure point in VBV pred_b_from_p can become absurdly large in static scenes, leading to rare collapses of quality with VBV+B-frames+threads.
    This isn't a final fix, but should resolve the problem in most cases in the meantime.
Version r1090
  • Release Date: Jan 28, 2009
  • Much faster chroma encoding and other opts
    ~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.br> Small optimization in cache_save (skip_bp)br> Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)br> Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
Version r1089
  • Release Date: Jan 27, 2009
  • add AltiVec implementation of x264_mc_copy_w16_aligned
Version r1088
  • Release Date: Jan 24, 2009
  • r1088: add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8
  • r1087: add AltiVec 16 <-> 32 bits conversions macros
Version r1086
  • Release Date: Jan 20, 2009
  • Replace 16x16=>32 mul + pack + add by a simple 16x16=>16 multiply-add.
    Suggested by Loren.
Version r1085
  • Release Date: Jan 20, 2009
  • r1085: Eliminate support for direct_8x8_inference=0
    The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
    As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
    Remove some unused mc code related to sub-8x8 partitions.
    Small deblocking speedup when p4x4 is used.
    Also remove unused x264_nal_decode prototype from x264.h.
  • r1084: Add AltiVec and CPU numbers detection on OpenBSD.
Version r1083
  • Release Date: Jan 19, 2009
  • r1083: Add AltiVec implementation of predict_8x8c_p. 2.6x faster than scalar C.
  • r1082: Warn if direct auto wasn't set on the first pass
    And, if it wasn't, run direct auto as if it was the first pass, rather than simply forcing temporal direct mode on all frames.
    Also a small tweak to coeff_level_run asm.
Version r1081
  • Release Date: Jan 18, 2009
  • Changes the PowerPC ppccommon.h header so it no longer checks for a particular
    OS such as Linux but instead looks for HAVE_ALTIVEC_H being set.
    Fixes all *BSD/PowerPC builds.
Version r1080
  • Release Date: Jan 15, 2009
  • r1080: update x264_hpel_filter_altivec's prototype to match the one of the C version.
    It changed in commit 045ae4045a1827555b3eaab4fbf3c9809e98c58f (factorization of mallocs)
    (NB: Altivec implementation wasn't allocating and writing to any scratch memory.)
  • r1079: rename vector+array unions to closer match the vector typedefs names.
  • r1078: Add Altivec implementation of all the remaining 16x16 predict routines.
Version r1077
  • Release Date: Jan 14, 2009
  • r1077: Cache ref costs and use more accurate MV costs
    New MV costs should improve quality slightly by improving the smoothness of the field of MV costs (and they're closer to CABAC's actual costs).
    Despite being optimized for CABAC, they still help under CAVLC, albeit less.
    MV cost change by Loren Merritt
  • r1076: Support forced frametypes with scenecut/b-adapt
    This allows an input qpfile to be used to force I-frames, for example.
    The same can be done through the library interface.
    Document the format of the qpfile in --longhelp and the forcing of frametypes in x264.h
    Note that forcing B-frames and B-refs may not always have the intended result.
    Patch partially by Steven Walters
  • r1075: Remove an IDIV from i8x8 analysis
    Only one IDIV is left in macroblock level code (transform_rd)
Version r1074
  • Release Date: Jan 9, 2009
  • Fix regression in r1066
    With some combinations of video width and other settings, the scratch buffer was slightly too small.
    This caused heap corruption on some systems.
Version r1073
  • Release Date: Jan 7, 2009
  • r1073: Disable B-frames in lossless mode
    They hurt compression anyways, and direct auto was bugged with lossless.
  • r1072: Factorize in ppccommon.h the conditional inclusion of altivec.h on Linux systems.
Version r1071
  • Release Date: Jan 2, 2009
  • r1071: Small tweaks to coeff asm
    Factor out a few redundant pxors
    Related cosmetics
  • r1070: Fix C99ism in r1066
Version r1069
  • Release Date: Jan 1, 2009
  • r1069: Use the correct strtok under MSVC
    Also change one malloc -> x264_malloc
  • r1068: Add stack alignment for lookahead functions
    Should allow libx264 to be called from non-gcc-compiled applications without adding force_align_arg_pointer.
  • r1067: Add support for SSE4a (Phenom) LZCNT instruction
    Significantly speeds up coeff_last and coeff_level_run on Phenom CPUs for faster CAVLC and CABAC.
    Also a small tweak to coeff_level_run asm.
  • r1066: factor mallocs out of hpel, ssim, and esa.
    there should now be no memory allocation outside of init-time.
Version r1065
  • Release Date: Dec 30, 2008
  • r1065: Much faster CAVLC RDO and bitstream writing
  • r1065: Pure asm version of level/run coding. Over 2x faster than C.
  • r1065: Up to 40% faster CAVLC RDO. Overall benefit up to ~7.5% with RDO or ~5% with fast encoding settings.
  • r1064: Cosmetics: cleaner syntax for defining temporary registers in asm
  • r1064: Globally define t#[qdwb], so that only t# needs to be locally defined when reorganizing registers
Version r1063
  • Release Date: Dec 29, 2008
  • Much faster CABAC RDO
    Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO.
    This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts.
    However, the PSNR penalty of this is extremely small (~0.001db).
    Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20.
    Overall encoding speed benefit is up to 5%, depending on encoding settings.
    Also remove an old unnecessary CABAC table that hasn't been used for years.
Version r1062
  • Release Date: Dec 27, 2008
  • VLC table optimizations
  • Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc.
  • Also a small optimization in p8x8 CAVLC.
Version r1061
  • Release Date: Dec 25, 2008
  • r1061: Fix crash in --me esa/tesa introduced in r1058
    Also suppress the last mingw warning message
  • r1060: Optimize variance asm + minor changes
    Remove SAD argument from var, not needed anymore.
    Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
    Eliminate all remaining warnings on gcc 3.4 on cygwin
    Port another minor optimization from lavc (pskip)
  • r1059: Minor CABAC cleanups and related optimizations
    Merge the two list tables to allow cleaner MC/CABAC/CAVLC code
    Remove lots of unnecessary {s
    Port some very minor opts from lavc
  • r1058: faster ESA init
    reduce memory if using ESA and not p4x4
Version r1057
  • Release Date: Dec 16, 2008
  • r1057: More macroblock_cache optimizations
    Patch partially by Loren Merritt
  • r1056: Faster macroblock_cache_rect
    Explicit loop unrolling
Version r1055
  • Release Date: Dec 15, 2008
  • Optimizations in predict_mv_direct
    Add some early terminations and minor optimizations
    This change may also fix the extremely rare direct+threading MV bug.
Version r1054
  • Release Date: Dec 15, 2008
  • Fix visual corruption when picture width was not mod 32.
    The previous Altivec implemention of mc_chroma assumed that i_src_stride was always mod 16.
Version r1053
  • Release Date: Dec 14, 2008
  • r1053: Add support for FSF GCC version >= 4.3 on OSX.
    So far, only Apple GCC version was supported.
  • r1052: More accurate refcost for p8x8 CAVLC
    Slightly better quality, especially in non-RD mode, with CAVLC.
Version r1051
  • Release Date: Dec 12, 2008
  • r1051: use lookup tables instead of actual exp/pow for AQ
    Significant speed boost, especially on CPUs with atrociously slow floating point units (e.g. Pentium 4 saves 800 clocks per MB with this change).
    Add x264_clz function as part of the LUT system: this may be useful later.
    Note this changes output somewhat as the numbers from the lookup table are not exact.
  • r1050: Suppress saveptr warnings on Windows GCC
  • r1049: More small speed tweaks to macroblock.c
  • r1048: Much faster CAVLC residual coding
    7 due to different nonzero counts being stored during qpel RD.
Version r1047
  • Release Date: Dec 6, 2008
  • r1047: fix compilation with GCC-4.3+
  • r1046: High Profile allows 25% higher maxbitrate/cpb
    Correct level detection to take this into account.
Version r1046
  • Release Date: Dec 1, 2008
  • High Profile allows 25% higher maxbitrate/cpb
    Correct level detection to take this into account.
Version r1045
  • Release Date: Nov 30, 2008
  • r1045: s/nasm/yasm in VS project file
  • r1044: Cosmetic: update various file headers.
  • r1043: add date and compiler to `x264 --version`
Version r1042
  • Release Date: Nov 29, 2008
  • r1042: 10L in r1041
  • r1041: Significantly faster CABAC and CAVLC residual coding and bit cost calculation
    Early-terminate in residual writing using stored nnz counts
    To allow the above, store nnz counts for luma and chroma DC
    Add assembly functions to find the last nonzero coefficient in a block
    Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
    Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
    CAVLC output should be equivalent.
Version r1040
  • Release Date: Nov 27, 2008
  • r1040: dequant_4x4_dc assembly
    About 3.5x faster DC dequant on Conroe
  • r1039: fix an overflow in dct4x4dc_mmx
    (unlikely to have occurred in any real video)
Version r1038
  • Release Date: Nov 26, 2008
  • Remove nasm support
    Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support.
    Users should upgrade to yasm 0.6.1 or later.
Version r1037
  • Release Date: Nov 26, 2008
  • r1037: Fix rare warning messages in ratecontrol due to r1020
  • r1036: Fix MSVC compilation and clean up MSVC build file
    Remove Release64 which never worked anyways.
  • r1035: Faster width4 SSD+SATD, SSE4 optimizations
    Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
    Use pinsrd (SSE4) for faster width4 SSDv Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
    Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
    These optimizations help on Nehalem, Phenom, and Penryn CPUs.
  • r1034: fix indentation, whitespace cleanup, more consistent indentation of macro backslashes
  • r1033: Change some macros to be more sensitive to memory alignment, thus avoiding
    useless loads/stores and calculations of permutation vectors.
    Affected functions are all of mc_luma, mc_chroma, 'get_ref', SATD, SA8D and deblock.
    Gains globally vary from ~5% - 15% on a depending on settings running on a 1.42 ghz G4.
Version r1032
  • Release Date: Nov 25, 2008
  • r1032: refactor satd. 20KB smaller binary.
    refactor sa8d. slightly faster.
    more checkasm for hadamard.
  • r1031: Fix crash with threads and SSEMisalign on Phenom
    Misalign mask needed to be set separately for each encoding thread.
Version r1030
  • Release Date: Nov 25, 2008
  • Phenom CPU optimizations
  • Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
  • Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
  • Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
  • Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
  • Merge cpu-32.asm and cpu-64.asm
  • Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
Version r1029
  • Release Date: Nov 21, 2008
  • A few tweaks to decimate asm
  • A little bit faster on both 32-bit and 64-bit
Version r1028
  • Release Date: Nov 14, 2008
  • r1028: Nehalem optimization part 2: SSE2 width-8 SAD
    Helps a bit on Phenom as well
    ~25% faster width8 multiSAD on Nehalem
  • Add subme=0 (fullpel motion estimation only)
    Only for experimental purposes and ultra-fast encoding. Probably not a good idea for firstpass.
Version r1026
  • Release Date: Nov 11, 2008
  • r1026: Fix minor memory leak in r1022
  • r1025: r1024 borked checkasm
    Remove idct/dct2x2 from checkasm as they are no longer in dctf
Version r1024
  • Release Date: Nov 11, 2008
  • r1024: Faster chroma encoding
    9-12% faster chroma encode.
    Move all functions for handling chroma DC that don't have assembly versions to macroblock.c and inline them, along with a few other tweaks.
  • r1023: Various cosmetics and minor fixes
    Disable hadamard_ac sse2/ssse3 under stack_mod4
    Fix one MSVC compilation warning
    Fix compilation in debug mode in certain cases on x64
    Remove eval.c from MSVC project
    Fix crash when VBV is used in CQP mode
    Patches by MasterNobody
  • r1022: Faster b-adapt + adaptive quantization
    Factor out pow to be only called once per macroblock. Speeds up b-adapt, especially b-adapt 2, considerably.
    Speed boost is as high as 24% with b-adapt 2 + b-frames 1
  • r1021: Faster CABAC residual encoding
    6% faster block_residual_write_cabac in RD mode.
  • r1020: Fix potential crash in the case that the input statsfile is too short
    Also resolve various other potential weirdness (such as multiple copies of the same error message in threaded mode).
Version r1019
  • Release Date: Nov 6, 2008
  • r1019: Initial Nehalem CPU optimizations
    movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed.
    Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary.
    Thanks to Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations.
    Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%.
  • r1018: Fix potential infinite loop in VBV under GCC 4.
  • r1017: Encoder_reconfig: esa/tesa can only be enabled if they were on to begin with
    Bug report by kemuri-_9.
Version r1016
  • Release Date: Nov 1, 2008
  • Please refer to this page for a full list of changes
Version r999
  • Release Date: Oct 3, 2008
  • r999: rm gtk, avc2avi.
    I don't remember why I allowed a gui into the repository in the first place. There's nothing that makes this one special relative to all the other x264 guis.
    avc2avi doesn't compile since we removed the bitstream reader. And avc doesn't belong in avi.
  • r998: Resolve quality regression in r996
    Accidentally removed the wrong line of code. I think this classifies as a "10l".
    Thanks to techouse for initial bug report and skystrife for helping me find it.
  • r997: Fix minor memory leak accidentally added with the addition of b-adapt 2
Version r996
  • Release Date: Oct 2, 2008
  • Rework subme system, add RD refinement in B-frames
    The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames.
    subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent
    --b-rdo has, accordingly, been removed. --bime has also been removed, and instead enabled automatically at subme >= 5.
    RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime.
Version r995
  • Release Date: Sep 29, 2008
  • Fix potential miscompilation of some inline asm
    Caused problems under some gcc 4.x versions with predictive lossless
Version r994
  • Release Date: Sep 29, 2008
  • r994: Replace High 4:4:4 profile lossless with High 4:4:4 Predictive.
    This improves lossless compression by about 4-25% depending on source.
    The benefit is generally higher for intra-only compression.
    Also add support for 8x8dct and i8x8 blocks in lossless mode; this improves compression very slightly.
    In some rare cases 8x8dct can hurt compression in lossless mode, but its usually helpful, albeit marginally.
    Note that 8x8dct is only available with CABAC as it is never useful with CAVLC.
    High 4:4:4 Predictive replaced the previous profile in a 2007 revision to the H.264 standard.
    The only known compliant decoder for this profile is the latest version of CoreAVC.
    As I write this, JM does not actually correctly decode this profile.
    Hopefully this lack of support will soon change with this commit, as x264 will be (to my knowledge) the first compliant encoder.
  • r993: Fix typo in progress indicator when using piped input
Version r992
  • Release Date: Sep 28, 2008
  • r992: avg_weight_ssse3
  • r991: fix bitstream writer on bigendian 64bit (regression in r903)
  • r990: remove authors whose code no longer exists
  • r989: more diagnostics when configure finds an unsuitable assembler
Version r988
  • Release Date: Sep 27, 2008
  • Make x264 progress indicator more concise
    Now the % indicator should be readable on the header of a minimized window on Windows systems.
Version full rev. 987
  • Release Date: Sep 23, 2008
  • Fix deblocking + threads + AQ bug
    At low QPs, with threads and deblocking on, deblocking could be improperly disabled.
    Revision in which this bug was introduced is unknown; it may be as old as b_variable_qp in x264 itself.
Version full rev. 986
  • Release Date: Sep 22, 2008
  • Resolve possible crash in bime, improve the fix in r985
Version full rev. 985
  • Release Date: Sep 21, 2008
  • r985: Fix rare crash issue in b-adapt
    Regression *probably* in r979
  • r984: Merging Holger's GSOC branch part 1: hpel_filter speedups
  • r983: r980 borked weighted bime
  • r982: Disable I_PCM with psy-RD
    psy-RD seems to put the PCM threshold a bit lower than it should be, so PCM is now disabled under psy-RD.
  • r981: Merge avg and avg_weight
    avg_weight no longer has to be special-cased in the code; faster weightb
  • r980: Rewrite avg/avg_weight to take two source pointers
    This allows the use of get_ref instead of mc_luma almost everywhere for bipred
Version full rev. 979
  • Release Date: Sep 17, 2008
  • r979: Use low-resolution lookahead motion vectors as an extra predictor
    Improves quality considerably (0-5%) in 1pass/CRF mode, especially with lower --me values and complex motion.
    Reverses the order of lowres lookahead search to improve the usefulness of the extra predictors.
  • r978: Add missing free() for f_qp_offset in frame.c
Version full rev. 977
  • Release Date: Sep 17, 2008
  • r977: Correct misprediction of bitrate in threaded mode
    Improves bitrate accuracy in cases with large numbers of threads.
    Loosely based on a patch by BugMaster.
  • r976: Fix a case in which VBV underflows can occur
    Fix a potential case where a frame might be initially allocated too low a QP, which would then have to be raised a low during row-based ratecontrol.
    In some cases, this could even produce VBV underflows in 2pass mode.
  • r975: Use correct format specifier for uint64_t
Version full rev. 974
  • Release Date: Sep 16, 2008
  • Correct misprediction of bitrate in threaded mode
    Improves bitrate accuracy in cases with large numbers of threads.
    Loosely based on a patch by BugMaster.
Version full rev. 973
  • Release Date: Sep 16, 2008
  • r973: Fix regression in b-adapt patch: encoder_open failed for multipass encodes without bframes.
  • r972: Stop SAR in y4m input from overriding --sar on commandline
  • r971: hadamard_ac for psy-rd
    c version is 1.7x faster than satd+sa8d+sad
    ssse3 version is 2.3x faster than satd+sa8d+sad
  • r970: Psychovisually optimized rate-distortion optimization and trellis
    The latter, psy-trellis, is disabled by default and is reserved as experimental; your mileage may vary.
    Default subme is raised to 6 so that psy RD is on by default.
Version full rev. 969
  • Release Date: Sep 15, 2008
  • Add optional more optimal B-frame decision method
    This method (--b-adapt 2) uses a Viterbi algorithm somewhat similar to that used in trellis quantization.
    Note that it is not fully optimized and is very slow with large --bframes values.
    It also takes into account weightb, which should improve fade detection.
    Additionally, changes were made to cache lowres intra results for each frame to avoid recalculating them. This should improve performance in both B-frame decision methods.
    This can also be done for motion vectors, which will dramatically improve b-adapt 2 performance when it is complete.
    This patch also reads b_adapt and scenecut settings from the first pass so that the x264 header information in the output file will have correct information (since frametype decision is only done on the first pass).
Version full rev. 968
  • Release Date: Sep 14, 2008
  • Move adaptive quantization to before ratecontrol, eliminate qcomp bias
    This change improves VBV accuracy and improves bit distribution in CRF and 2pass.
    Instead of being applied after ratecontrol, AQ becomes part of the complexity measure that ratecontrol uses.
    This allows for modularity for changes to AQ; a new AQ algorithm can be introduced simply by introducing a new aq_mode and a corresponding if in adaptive_quant_frame.
    This also allows quantizer field smoothing, since quantizers are calculated beofrehand rather during encoding.
    Since there is no more reason for it, aq_mode 1 is removed. The new mode 1 is in a sense a merger of the old modes 1 and 2.
    WARNING: This change redefines CRF when using AQ, so output bitrate for a given CRF may be significantly different from before this change!
Version full rev. 967
  • Release Date: Sep 10, 2008
  • r967: Fix crash when using b-adapt at resolutions 32x32 or below.
    Original patch by BugMaster, but was mostly rewritten in order to make b-adapt actually *work* at such resolutions, not merely stop crashing.
  • r966: Add title-bar progress indicator under WIN32
    Also add bitrate-so-far output when piping data to x264 (total frames not known)
    Patch mostly by recover from Doom9.
Version full rev. 965
  • Release Date: Sep 7, 2008
  • Revert part of r963
    In some rare (but significant) cases, the optimized nal_encode algorithm gave incorrect results.
Version full rev. 964
  • Release Date: Sep 6, 2008
  • r964: Predict 4x4_DC asm
    Also remove 5-year-old unnecessary #define that reduced speed unnecessarily under MSVC-compiled builds
  • r963: Faster NAL unit encoding and remove unused nal_decode
    Small speedup at very high bitrates
  • r962: CAVLC cleanup and optimizations
    Also move some small functions in macroblock.c to a .h file so they can be inlined.
  • r961: Faster avg_weight assembly
    Unrolling the loop a bit improves performance
  • r960: Faster H asm intra prediction functions
    Take advantage of the H prediction method invented for merged intra SAD and apply it to regular prediction, too.
  • r959: Add merged SAD for i16x16 analysis
    Roughly 30% faster i16x16 analysis under subme=1
  • r958: Add sad_aligned for faster subme=1 mbcmp
    Distinguish between unaligned and aligned uses of mbcmp
    SAD_aligned, for MMX SADs, uses non-cacheline SADs.
Version full rev. 957
  • Release Date: Sep 3, 2008
  • Improve progress indicator
    Show average bitrate so far during encoding
    Decrease update interval for longer encodes (max of 10 frames encoded between updates)
Version full rev. 956
  • Release Date: Sep 2, 2008
  • Fix speed regression in r951
    Row SATDs are only necessary in VBV mode, so don't need to be checked if VBV is off.
Version full rev. 955
  • Release Date: Sep 1, 2008
  • r955: zigzag asm
  • r954: fix SOFLAGS used when building gtk frontend
    patch by Markus Kanet %darkvision A gmx P eu%
Version full rev. 953
  • Release Date: Aug 30, 2008
  • r953: remove the distinction between itex and ptex
    (changes 2pass statsfile format)
  • r952: hardcode the ratecontrol equation, and remove the rceq option
  • r951: Fix some uses of uninitialized row_satd values in VBV
    Resolves some issues with QP51 in I-frames with scenecut
Version full rev. 950
  • Release Date: Aug 27, 2008
  • r950: Activate trellis in p8x8 qpel RD
    Also clean up macroblock.c with some refactoring
    Note that this change significantly reduces subme7+trellis2 performance, but improves quality.
    Issue originally reported by Alex_W.
  • r949: Improve VBV accuracy
    Don't use the previous frame's row SATD as a predictor if it is too different from this frame's row SATD.
Version full rev. 948
  • Release Date: Aug 23, 2008
  • improve generation of Darwin libraries
    Patch by vmrsss %vmrsss A gmail P com%
Version full rev. 947
  • Release Date: Aug 22, 2008
  • r947: Fix compilation in gcc 3.4.x (issue in r946)
    Due to a bug in gcc 3.4.x, in certain cases of inlining, the array_non_zero_int_mmx inline asssembly is miscompiled and causes a crash with --subme 7 --8x8dct.
    This minor hack fixes this issue.
  • r946: shut up various gcc warnings
  • r945: fix a crash with invalid args and --thread-input (introduced in r921
  • r944: drop support for x86_32 PIC.
  • r943: use permute macros in satd
    move some more shared macros to x264util.asm
Version full rev. 942
  • Release Date: Aug 21, 2008
  • r942: cosmetics
  • r941: r940 broke threads
  • r940: Cleanups in macroblock_cache_save/load
    A bit more loop unrolling, and moving some constant code to the global init function
  • r939: Deblocking code cleanup and cosmetics
    Convert the style of the deblocking code to the standard x264 style
    Eliminate some trailing whitespace
Version full rev. 938
  • Release Date: Aug 19, 2008
  • 4% faster deblock: special-case macroblock edges
    Along with a bit of related code reorganization and macroification
Version full rev. 937
  • Release Date: Aug 17, 2008
  • Add dedicated variance function instead of using SAD+SSD
    Faster variance calculation
Version full rev. 936
  • Release Date: Aug 16, 2008
  • r936: 6% faster deblock: remove some clips, earlier termiantion on low qps.
  • r935: Faster deblocking
    Early termination for bS=0, alpha=0, beta=0
    Refactoring, various other optimizations
    About 30% faster deblocking overall.
Version full rev. 934
  • Release Date: Aug 12, 2008
  • r934: asm cosmetics
  • r933: yet another posix-emulating define on solaris
  • r932: update msvc projectfile
  • r931: drop support for msvc6
Version full rev. 930
  • Release Date: Aug 10, 2008
  • r930: Prevent VBV from lowering quantizer too much
    This code seemed to act up unexpectedly sometimes, creating a situation where in 1-pass VBV mode, a frame's quantizer would drop all the way to qpmin and then shoot back upwards to qpmax, causing serious visual issues.
    This change may decrease bitrate in VBV mode, but that is preferable to the artifacting produced by this code.
  • r929: Improve subme7 at low QPs and add subme7 support in lossless mode
Version full rev. 928
  • Release Date: Jul 31, 2008
  • r928: cosmetics: merge x86inc*.asm
  • r927: Add missing x264util.asm
  • r926: Basic sanity checking of qpmax/qpmin options
  • r925: Fix regression in r922
    set the chroma DC coefficients to zero for residual coding in qpel-rd
    fix C99ism
  • r924: Refactor asm macros part 2: DCT
  • r923: Refactor asm macros part 1: DCT
Version full rev. 922
  • Release Date: Jul 30, 2008
  • r922: Improve intra RD refine, speed up residual_write_cabac
    a do/while loop can be used for residual_write, but i8x8 had to be fixed so that it wouldn't call residual_write with zero coeffs
    proper nnz handling added to cabac intra rd refine chroma cbp added to 8x8 chroma rd cbp was tested, but wasn't useful
  • r921: Fix a few more minor memleaks
Version full rev. 920
  • Release Date: Jul 26, 2008
  • r920: stats summary: print distribution of numbers of consecutive B-frames
  • r919: add interlacing to the list of stuff checked by x264_validate_levels
Version full rev. 918
  • Release Date: Jul 25, 2008
  • r918: Fix C99-ism in r907
  • r917: Faster temporal predictor calculation
    Split into a separate commit because this changes rounding, and thus changes output slightly.
  • r916: Align lowres planes for improved cacheline split performance
Version full rev. 915
  • Release Date: Jul 21, 2008
  • autodetect level based on resolution/bitrate/refs/etc, rather than defaulting to L5.1 if vbv is not enabled (and especially in crf/cqp), we have to guess max bitrate, so we might underestimate the required level.
Version full rev. 914
  • Release Date: Jul 19, 2008
  • fix bs_write_ue_big for values >= 0x10000.
    (no immediate effect, since nothing writes such values yet)
Version full rev. 913
  • Release Date: Jul 17, 2008
  • Fix lossless mode borked in r901
Version full rev. 912
  • Release Date: Jul 13, 2008
  • r912: Relax QPfile restrictions
    Allow a QPfile to contain fewer frames than the total number of frames in the video and have ratecontrol fill in the rest.
    Patch by kemuri9.
  • r911: Limit MVrange correctly in interlaced mode
    Bug report by Sigma Designs, Inc.
Version full rev. 910
  • Release Date: Jul 12, 2008
  • r910: Fix bug with PCM and adaptive quantization
    In rare cases CABAC desync could occur, causing bitstream corruption
  • r909: Fix memory leak upon x264 closing
    Doesn't affect the CLI, but potentially important for programs which call x264 as a shared library.
  • r908: Fix compilation on PPC systems (borked in r903)
    Bigendian systems didn't have endian_fix32 defined
  • r907: Add L1 reflist and B macroblock types to x264 info
    Also remove display of "PCM" if PCM mode is never used in the encode.
    L1 reflist information will only show if pyramid coding is used.
Version full rev. 906
  • Release Date: Jul 11, 2008
  • r906: Fix and enable I_PCM macroblock support
    In RD mode, always consider PCM as a macroblock mode possibility
    Fix bitstream writing for PCM blocks in CAVLC and CABAC, and a few other minor changes to make PCM work.
    PCM macroblocks improve compression at very low QPs (1-5) and in lossless mode.
  • r905: de-duplicate vlc tables
  • r904: faster ue/se/te write
  • r903: faster bs_write
  • r902: cosmetics in ssd asm
Version full rev. 901
  • Release Date: Jul 7, 2008
  • r901: Various optimizations and cosmetics
    Update AUTHORS file with Gabriel and me
    update XCHG macro to work correctly in if statements
    Add new lookup tables for block_idx and fdec/fenc addresses
    Slightly faster array_non_zero_count_mmx (patch by holger)
    Eliminate branch in analyse_intra
    Unroll loops in and clean up chroma encode
    Convert some for loops to do/while loops for speed improvement
    Do explicit write-combining on --me tesa mvsad_t struct Shrink --me esa zero[] array
    Speed up bime by reducing size of visited[][][] array
  • r900: Resolve floating point exception with frame_init_lowres mmx
    In some cases, the mmx version of frame_init_lowres could leave the FPU uninitialized for use in ratecontrol, resulting in floating point exceptions.
    Since frame_init_lowres is such a time-consuming function, an emms was just put at the end, since it costs almost nothing compared to the total time of frame_init_lowres.
Version full rev. 899
  • Release Date: Jul 5, 2008
  • Update my email address
Version full rev. 898
  • Release Date: Jul 4, 2008
  • Update file headers throughout x264
    Update "Authors" lists based on actual authorship; highest is most important
    Update copyright notices and remove old CVS tags from file headers
    Add file headers to GTK and other sections missing them Update FSF address
    Other header-related cosmetics
Version full rev. 897
  • Release Date: Jul 3, 2008
  • r897: denoise_dct asm
  • r896: cosmetics in permutation macros
    SWAP can now take mmregs directly, rather than just their numbers
Version full rev. 895
  • Release Date: Jul 3, 2008
  • r895: Fix bug in adaptive quantization
    In some cases adaptive quantization did not correctly calculate the variance.
    Bug reported by MasterNobody
  • r894: lowres_init asm
    rounding is changed for asm convenience. this makes the c version slower, but there's no way around that if all the implementations are to have the same results.
  • r893: Optimizations and cosmetics in macroblock.c
    If an i4x4 dct block has no coefficients, don't bother with dequant/zigzag/idct. Not useful for larger sizes because the odds of an empty block are much lower.
    Cosmetics in i16x16 to be more consistent with other similar functions.
    Add an SSD threshold for chroma in probe_skip to improve speed and minimize time spent on chroma skip analysis.
    Rename lambda arrays to lambda_tab for consistency.
Version full rev. 892
  • Release Date: Jun 30, 2008
  • some asm functions require aligned stack. disable these when compiling with msvc/icc.
Version full rev. 891
  • Release Date: Jun 25, 2008
  • r891: Move bitstream end check to macroblock level
  • r891: Additionally, instead of silently truncating the frame upon reaching the end of the buffer, reallocate a larger buffer instead.
  • r890: Convert NNZ to raster order and other optimizations
  • r890: Converting NNZ to raster order simplifies a lot of the load/store code and allows more use of write-combining.
  • r890: More use of write-combining throughout load/save code in common/macroblock.c
  • r890: GCC has aliasing issues in the case of stores to 8-bit heap-allocated arrays; dereferencing the pointer once avoids this problem and significantly increases performance.
  • r890: More manual loop unrolling and such.
  • r890: Move all packXtoY functions to macroblock.h so any function can use them.
  • r890: Add pack8to32.
  • r890: Minor optimizations to encoder/macroblock.c
Version full rev. 889
  • Release Date: Jun 19, 2008
  • r889: mc_chroma_sse2/ssse3
  • r888: checkasm --bench=function_name
  • r887: interleave psnr/ssim computation with reference frame filtering, to improve cache coherency
Version full rev. 886
  • Release Date: Jun 16, 2008
  • r886: Add more inline asm and a runtime check for MMXEXT support
    x264 will now terminate gracefully rather than SIGILL when run on a machine with no MMXEXT support.
    A configure option is now available to build x264 without assembly support for support on such old CPUs as the Pentium 2, K6, etc.
  • r885: Use aligned memcpy for x264_me_t struct and cosmetics
  • r884: Cosmetics and loop unrolling
    GCC is not very good at loop unrolling in cases where it can perform constant propagation, so the unrolling unfortunately has to be done manually.
Version full rev. 883
  • Release Date: Jun 13, 2008
  • r883: Fix regression in 64-bit in r882
    i_mvc needs to be 64-bit when used with a 64-bit memory pointer
  • r882: More tweaks to me.c
  • r882: Added inline MMX version of UMH's predictor difference test
  • r882: Various cosmetics throughout me.c
  • r882: Removed a C99-ism introduced in r878.
Version full rev. 881
  • Release Date: Jun 12, 2008
  • Fix regression in r736
    r736 added intra RD refinement to B-frames; however, it is possible for subme=7 to be used without b-rdo.
    This means intra RD isn't run, and therefore it is possible for intra chroma analysis to not have been run, since update_cache was never called for an intra block, and chroma ME is not required even at subme=7.
    r801, which removed a memset, made this worse because previously the chroma prediction mode was at least initialized to zero; now it was not initialized at all.
    Therefore, --no-chroma-me, --subme 7, and no --b-rdo had the potential to crash.
    This change restricts intra RD refinement to only be run when --b-rdo is enabled (sensible to begin with), thus preventing a crash in this case.
Version full rev. 880
  • Release Date: Jun 11, 2008
  • r880: Fix regression in r850
    Bug resulted in rare incorrect chroma encoding
  • r879: Cosmetics in VBV handling
  • r878: Tweaks and cosmetics in me.c
    Use write-combining for predictor checking and other tweaks.
Version full rev. 877
  • Release Date: Jun 8, 2008
  • r877: Partially inline trellis quantization
    Inlining trellis into the 4x4/8x8 trellis wrappers increases trellis speed by about 5-10% through constant propagation.
  • r876: Various cosmetic changes.
  • r875: avg_weight_sse2
  • r874: many changes to which asm functions are enabled on which cpus.
    with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that.
    some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that.
    some ssse3 instructions didn't become useful until Penryn, so yet another flag.
    disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx.
    enable some sse2 functions on Athlon64 that always were faster and we just didn't notice.
    remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast". don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D).
  • r873: enable ssse3 phadd satd on Penryn.
  • r872: benchmark most of the asm functions (checkasm --bench).
Version full rev. 871
  • Release Date: Jun 6, 2008
  • Cosmetic: fix C99-ism
Version full rev. 870
  • Release Date: Jun 6, 2008
  • Use a gaussian window for cplxblur
    Cplxblur was originally intended to use a gaussian window, but in its current form did not.This change provides a tiny improvement to 2pass ratecontrol.
Version full rev. 869
  • Release Date: Jun 4, 2008
  • r869: cosmetics
  • r868: nasm compatible NX stack
  • r867: CQP is incompatible with AQ
  • r866: memzero_aligned_mmx
  • r865: binmode stdin on mingw, not just msvc
  • r864: omit redundant mc after non-rdo dct size decision, and in b-direct rdo
  • r863: allow fractional CRF values with AQ.
  • r862: fix some uninitialized partitions in rdo
Version full rev. 861
  • Release Date: Jun 3, 2008
  • r861: 2-pass VBV support and improved VBV handling
    Dramatically improves 1-pass VBV ratecontrol (especially CBR) and provides support for VBV in 2-pass mode. This consists of a series of functions that attempts to find overflows and underflows in the VBV from the first-pass statsfile and fix them before encoding.
    1-pass VBV code partially by Dark Shikari.
  • r860: Fix noise reduction in threaded mode.
    Previously enabling noise reduction with threads had no effect.
    Note that this is not an optimal solution; each thread still tracks noise reducation separately (unlike in single-threaded mode).
Version full rev. 859
  • Release Date: May 21, 2008
  • r859: fix a crash on win32 with threads.
    r852 introduced an assumption in deblock that the stack is aligned.
  • r858: remove nasm version check. a feature check is all that's needed.
    silence stderr in yasm version check.
  • r857: cosmetics in cabac
  • r856: faster residual_write_cabac
  • r855: change DEBUG_DUMP_FRAME to run-time --dump-yuv
  • r854: x264_median_mv_mmxext
    this is the first non-runtime-detected use of mmxext, but it has to be inlined
  • r853: factor duplicated code out of deblock chroma mmx
  • r852: deblock_luma_intra_mmx
Version full rev. 851
  • Release Date: May 17, 2008
  • r851: write aspect ratio in mp4
  • r850: omit delta_quant in i16x16 blocks with no residual
    (all other block types were already covered, but i16x16 cbp is special)
  • r849: explicit write combining, because gcc fails at optimizing consecutive memory accesses
  • r848: force unroll macroblock_load_pic_pointers and a few other minor optimizations
  • r847: quant_2x2_dc_ssse3
  • r846: r836 borked lossless cabac nnz
Version full rev. 845
  • Release Date: May 15, 2008
  • r845: use elf instead of a.out on netbsd
  • r844: fix x264_realloc when not using libc realloc.
  • r843: don't pretend to support win64. remove all related code.
    it hasn't worked since probably some time in 2005, and won't ever be fixed unless someone steps up to maintain it.
  • r842: cosmetics: replace last instances of parm# asm macros with r#
  • r841: remove DEBUG_BENCHMARK
  • r840: faster probe_skip
Version full rev. 839
  • Release Date: Apr 28, 2008
  • r839: drop support for pre-SSE3 assemblers
  • r838: s/x264_cpu_restore/x264_emms/
    no point in giving it a generic name when it's not generic
  • r837: faster cabac_mb_cbp_luma
    ported from ffmpeg
  • r836: remove some redundant nnz counts
    move some nnz counts from macroblock_encode to cavlc if cabac doesn't need them
  • r835: compute missing nnz count in subme7 cavlc
  • r834: remove a division in macroblock-level bookkeeping
  • r833: omit P/B-skip mc from macroblock_encode if the pixels haven't been overwritten since probe_skip
  • r832: earlier termination in SEA if mvcost exceeds residual
  • r831: remove void* arithmetic from r821
Version full rev. 830
  • Release Date: Apr 26, 2008
  • r830: Fix define of illegal function identifiers (as defined in section "7.1.3 Reserved identiers" of C99 spec)
  • r829: Fix define of illegal identifier (as defined in section "7.1.3 Reserved identiers" of C99 spec) "__UNUSED__", and use the one defined in common/osdep.h, i.e. "UNUSED"
    based on a patch by Diego Biurrun
Version full rev. 828
  • Release Date: Apr 25, 2008
  • r828: more consistent include name (in line with other PPC includes
  • r827: fix illegal identifiers in multiple inclusion guards
    patch by Diego Biurrun % diego A biurrun P de %
Version full rev. 826
  • Release Date: Apr 22, 2008
  • r826: AQ now treats perfectly flat blocks as low energy, rather than retaining previous block's QP.
  • r826: fixes occasional blocking in fades.
  • r825: checkasm cabac
  • r824: s/movdqa/movaps/g
  • r823: --asm to allow testing of different versions of asm without recompile
  • r822: copy left neighbor pixels directly from previous mb instead of main plane
Version full rev. 821
  • Release Date: Apr 17, 2008
  • r821: cacheline split workaround for mc_luma
  • r820: add "SECTION_RODATA" before "SECTION .text" to setup the fakegot label used in macho binaries.
    This fixes compilation with --enable-pic
    Requires Yasm 0.7.0 or newer
    Patch by Dave Lee % davelee P com A gmail P com %
Version full rev. 819
  • Release Date: Apr 15, 2008
  • more hpel fixes
Version full rev. 818
  • Release Date: Apr 12, 2008
  • r818: update msvc projectfile
  • r817: r810 borked hpel_filter_sse2 on unaligned buffers
Version full rev. 816
  • Release Date: Apr 10, 2008
  • r816: threads=auto on multicore now implies thread input, just like explicit thread numbers already did
  • r815: dct4 sse2
  • r814: faster x86_32 dct8
  • r813: macros to deal with macros that permute their arguments
  • r812: mmx cachesplit sad of non-square sizes checked height instead of width
  • r811: sfence after nontemporal stores
  • r810: simplify hpel filter asm (move control flow to C) and add sse2, ssse3 versions
  • r809: more mmx/xmm macros (mova, movu, movh)
Version full rev. 808
  • Release Date: Apr 1, 2008
  • r808: improve handling of cavlc dct coef overflows support large coefs in high profile, and clip to allowed range in baseline/main
  • r807: fix shared libs on MacOSX
    based on a patch by İsmail Dönmez
  • r806: typo in r803
Version full rev. 805
  • Release Date: Mar 31, 2008
  • r805: fix a crash on mp4 muxing with invalid params
  • r804: variance-based psy adaptive quantization
  • r804: new options: --aq-mode --aq-strength
  • r804: AQ is enabled by default
  • r803: fix naming of .dll on mingw
  • r802: don't distinguish between mingw and cygwin
  • r801: remove a memset
  • r800: typo. don't evaluate rd pskip when p16x16 found ref>
  • r799: 0r784 borked lossless dc zigzag
Version full rev. 798
  • Release Date: Mar 26, 2008
  • r798: fix an arithmetic overflow that disabled SEA threshold after finding a mv with SAD < mvcost.
  • r797: fix hpel_filter_altivec picked up by checkasm
    Patch by Manuel %maaanuuu A gmx.net % and Noboru Asai % noboru P asai A gmail P com %
Version full rev. 796
  • Release Date: Mar 25, 2008
  • r796: faster residual
  • r795: nasm doesn't like align(nop) in structs
  • r794: reduce the size of some cabac arrays
  • r793: use cabac context transition table from trellis in normal residual coding too
  • r792: rearrange cabac struct to reduce code size
Version full rev. 791
  • Release Date: Mar 25, 2008
  • r791: higher precision RD lambda
    improves quality at QP<=12.
  • r790: faster cabac_encode_ue_bypass
  • r789: cabac asm.
    mostly because gcc refuses to use cmov.
    28% faster than c on core2, 11% on k8, 6% on p4.
  • r788: cosmetics in cabac
  • r787: inline cabac_size_decision
Version full rev. 786
  • Release Date: Mar 23, 2008
  • r786: cosmetics in DECLARE_ALIGNED
  • r785: don't distinguish between luma4x4 and luma4x4ac
  • r784: faster lossless zigzag
  • r783: more alignment
Version full rev. 782
  • Release Date: Mar 22, 2008
  • r782: add tesa and lossless to fprofile
  • r781: cosmetics in residual_write
  • r780: remove unused bitstream reader
  • r779: cosmetics in quant asm
  • r778: special case dequant for flat matrix
Version full rev. 777
  • Release Date: Mar 21, 2008
  • r777: faster dequant
  • r776: simplify hpel_filter_c
  • r775: use x264_mc_copy_w16_sse2 in mc.copy, it was previously only in mc_luma
  • r774: new ssd_8x*_sse2
  • r774: align ssd_16x*_sse2
  • r774: unroll ssd_4x*_mmx
  • r773: update altivec zigzags
  • r772: r768 borked cavlc
Version full rev. 771
  • Release Date: Mar 20, 2008
  • r771: cosmetics in intra predict
  • r770: faster intra predict 8x8 hu/hd
  • r769: reduce zigzag arrays from int to int16_t
  • r768: reduce the size of some arrays
  • r767: skip intra pred+dct+quant in cases where it's redundant (analyse vs encode)
    large speedup with trellis=2, small speedup with trellis=0 and/or subme>=6
  • r766: cosmetics in asm
  • r765: satd_4x4_ssse3
  • r764: get_ref_sse2
Version full rev. 763
  • Release Date: Mar 19, 2008
  • r763: continue instead of crash when the threading mv constraint is violated.
    doesn't fix the underlying bug, but hopefully less annoying until we find it.
  • r762: remove remaining reference to clip1.h
  • r761: fix name mangling again.
    apparently it's not just a convention, dll build fails if you try to export a non-prefixed name.
  • r760: update msvc projectfile
  • r759: missing #ifdef HAVE_SSE3
  • r758: don't define offsetof since it's standard
  • r757: shut up gcc warning in offsetof
Version full rev. 756
  • Release Date: Mar 17, 2008
  • r756: increase alignment of mv arrays
  • r755: memcpy_aligned_sse2
  • r754: checkasm check whether callee-saved regs are correctly saved
    x86_32 only for now since x86_64 varargs are annoying
  • r753: fix x86_32 ads which failed to preserve a register
  • r752: fix some name mangling issues introduced by the merge
  • r751: remove x264_mc_clip1.
    it's wrong for sufficiently perverse inputs, and clip_uint8 is faster anyway.
  • r750: merge x86_32 and x86_64 asm, with macros to abstract calling convention and register names
Version full rev. 749
  • Release Date: Mar 12, 2008
  • git compatible version script
Version full rev. 748
  • Release Date: Mar 8, 2008
  • check for broken versions of yasm
Version full rev. 747
  • Release Date: Mar 7, 2008
  • Rev. 746: .gitignore
  • Rev. 747: increase the alignment of the i8x8 edge cache, needed for sse2 intra prediction.
    patch by Alexander Strange.
Version full rev. 745
  • Release Date: Mar 2, 2008
  • Rev. 745: pic macros now keep track of which register holds the GOT, so variable access doesn't have to care
  • Rev. 744: remove x86_64 predict_8x8_ddl_mmxext because sse2 is faster even on amd
  • Rev. 743: cosmetics in dsp init
  • Rev. 742: sse2 16x16 intra pred.
  • Rev. 742: port the remaining intra pred functions from x86_64 to x86_32.
    patch by Dark Shikari.
  • Rev. 742: some simplifications to mmx intra pred that should have been done way back when we switched to constant fdec_stride.
  • Rev. 742: and remove pic spills in functions that have a free caller-saved reg.
    patch partly by Dark Shikari.
  • Rev. 740: faster array_non_zero
  • Rev. 739: x86_32 sse2 idct8
    ported from ffmpeg by Dark Shikari
  • Rev. 738: checkasm: relax the threshold for floating-point ssim
  • Rev. 737: checkasm: test idct with the range of coefficients what can really be encountered, as opposed to random numbers which might overflow.
Version full rev. 736
  • Release Date: Jan 29, 2008
  • intra_rd_refine in B-frames
Version full rev. 735
  • Release Date: Jan 28, 2008
  • print average of macroblock QPs instead of frame's nominal QP
  • update date
  • remove colorspace conversion support, because it has no business in any codec
  • misc fixes in checkasm
  • remove a useless bit of me=umh (originally copied from JM, where it was used for something)
  • fix a memleak in cqm
  • fix a memleak in mkv muxer
    patch by saintdev
  • satd exhaustive motion search (--me tesa)
  • fix cabac context for nonzero delta_qp of the 2nd mb of a frame in interlaced mode
  • fix mapping of mvs to partitions in p4x4_chroma
    patch by Noboru Asai
  • fix mvp for b16x8 and b8x16 L1 search
    patch by Wei-Yin Chen
  • shave a couple cycles off cabac functions
  • faster and smaller x264_macroblock_cache_mv etc
  • configure test for endianness
Version full rev. 721
  • Release Date: Jan 18, 2008
  • change the meaning of --ref: it now selects DPB size (including B-frames), rather than L0 size (which B-frames are added to)
Version full rev. 720
  • Release Date: Jan 15, 2008
  • add / fix support for FreeBSD, based on a patch by Igor Mozolevsky % igor A hybrid-lab P co P uk %
Version full rev. 719
  • Release Date: Jan 10, 2008
  • shut up some valgrind warnings
  • slightly wrong memory allocation in r717, fixes a potential crash with merange>32
Version full rev. 717
  • Release Date: Jan 7, 2008
  • convert absolute difference of sums from mmx to sse2
  • convert mv bits cost and ads threshold from C to sse2
  • convert bytemask-to-list from C to scalar asm
    1.6x faster me=esa (x86_64) or 1.3x faster (x86_32).
    (times consider only motion estimation. overall encode speedup may vary.)
  • round esa range to a multiple of 4
Version full rev. 715
  • Release Date: Jan 4, 2008
  • use define _WIN32 instead of __WIN32__ or WIN32 defines.
    NSDN reference: http://msdn2.microsoft.com/en-us/library/b0084kay(VS.80).aspx
    Patch by BugMaster %BugMaster A narod P ru%
    Original thread:
    date: Dec 27, 2007 3:18 AM
    subject: [x264-devel] VS2008 compilation error (need of replacement __WIN32__ with _WIN32)
Version full rev. 714
  • Release Date: Dec 21, 2007
  • tweak x264_pixel_sad_x4_16x16_sse2 horizontal sum. 168 -> 166 cycles on core2.
Version full rev. 713
  • Release Date: Dec 21, 2007
  • fix a nondeterminism involving 8x8dct, rdo, and threads.
Version full rev. 712
  • Release Date: Dec 14, 2007
  • also test arch-specific x264_zigzag_* implementations in checkasm.c
    patch by Patch by Noboru Asai % noboru P asai A gmail P com%
Version full rev. 711
  • Release Date: Dec 11, 2007
  • Add AltiVec implementation of
    • x264_zigzag_scan_4x4_frame_altivec()
    • x264_zigzag_scan_4x4ac_frame_altivec()
    • x264_zigzag_scan_4x4_field_altivec()
    • x264_zigzag_scan_4x4ac_field_altivec()
    each around 1.3 tp 1.8x faster than C version
    Patch by Noboru Asai % noboru P asai A gmail P com%
Version full rev. 710
  • Release Date: Dec 10, 2007
  • adds AliVec implementation of predict_16x16_p() over 4x faster than C version
Version full rev. 709
  • Release Date: Dec 7, 2007
  • revert the x86_32 part of r708. elf shared libraries aren't important enough to be worth the extra lines of code to check for nasm.
Version full rev. 708
  • Release Date: Dec 4, 2007
  • Rev. 708: mark asm functions as hidden
  • Rev. 707: check whether ld supports -Bsymbolic before using it
Version full rev. 706
  • Release Date: Dec 3, 2007
  • reduce the data type used in some tables. 16KB smaller exe.
Version full rev. 705
  • Release Date: Dec 2, 2007
  • Rev. 705: faster removal of duplicate mv predictors
  • Rev. 704: avoid a division in x264_mb_predict_mv_ref16x16.
    patch by Dark Shikari.
  • Rev. 703: avoid a division in umh.
    patch by Dark Shikari.
Version full rev. 702
  • Release Date: Nov 27, 2007
  • fix a memleak in h->mb.mvr
Version full rev. 701
  • Release Date: Nov 26, 2007
  • fix compilation as a shared library on x86_64 (regression in r696)
Version full rev. 700
  • Release Date: Nov 22, 2007
  • Rev. 700: add support for x86_64 on Darwin9.0 (Mac OS X 10.5, aka Leopard)
    Patch by Antoine Gerschenfeld %gerschen A clipper P ens P fr%
  • Rev. 699: cover some more options in fprofile. (esa, bime, cqm, nr, no-dct-decimate, trellis2) previously, esa was slower with fprofile than without, since gcc thought it wasn't important. now esa benefits like anything else.
Version full rev. 698
  • Release Date: Nov 21, 2007
  • Rev. 698: Add AltiVec implementation of x264_pixel_ssd_8x8, 3x faster than C version
    Overall speed-up: 0.7% with --bframes 3 --ref 5 -m 7 --b-rdo
    Patch by Noboru Asai %noboru P asai A gmail P com%
  • Rev. 697: limit mvs to [-512,511.75] instead of [-512,512]
  • Rev. 696: avoid memory loads that span the border between two cachelines.
    on core2 this makes x264_pixel_sad an average of 2x faster. other intel cpus gain various amounts. amd are unaffected.
    overall speedup: 1-10%, depending on how much time is spent in fullpel motion estimation.
  • Rev. 695: add cache info to cpu_detect. also print sse3.
Version full rev. 694
  • Release Date: Nov 20, 2007
  • Rev. 694: cosmetics: reorder mc_luma/mc_chroma/get_ref arguments for consistency with other functions
  • Rev. 693: separate pixel_avg into cases for mc and for bipred
Version full rev. 692
  • Release Date: Nov 19, 2007
  • Rev. 692: add AltiVec implementation of ssim_4x4x2_core, about 4x faster than C version.
    Overall: 0.1-0.2% faster with default encoding settings
    Patch by Noboru Asai %noboru P asai A gmail P com%
  • Rev. 691: Add AltiVec implementation ofx264_hpel_filter. Provides a 10-11% overall speed-up with default encoding options
    Patch by Noboru Asai %noboru P asai A gmail P com
Version full rev. 690
  • Release Date: Nov 18, 2007
  • cosmetics in dsp function selection
Version full rev. 689
  • Release Date: Nov 18, 2007
  • remove sad_pde. it's been unused ever since successive elimination replaced it.
Version full rev. 688
  • Release Date: Nov 17, 2007
  • Rev. 688: cosmetics: use symbolic constants for frame padding radius
  • Rev. 687: move hpel_filter cpu detection to a function pointer like everything else
Version full rev. 686
  • Release Date: Nov 16, 2007
  • cosmetics: use separate variables for frame width and stride
Version full rev. 685
  • Release Date: Nov 14, 2007
  • rev. 685: Add AltiVec implementation of add4x4_idct, add8x8_idct, add16x16_idct, 3.2x faster on average 1.05x faster overall with default encoding options
    Patch by Noboru Asai % noboru DD asai AA gmail DD com %
  • rev. 684: add AltiVec implementation of dequant_4x4 and dequant_8x8, 2.8x faster than C, 1.01x faster than previous revision with default encoding options
    Patch by Noboru Asai % noboru DD asai AA gmail DD com %
Version full rev. 683
  • Release Date: Nov 13, 2007
  • Add AltiVec implementation of quant_2x2_dc, fix Altivec implementation of quant_(4x4|8x8)(|_dc) wrt current C implementation
    Patch by Noboru Asai % noboru DD asai AA gmail DD com %
Version full rev. 682
  • Release Date: Nov 2, 2007
  • fix a possible nondeterminism with me=umh + threads.
Version full rev. 681
  • Release Date: Oct 31, 2007
  • use hex instead of dia for rdo mv refinement. ~0.5% lower bitrate at subme=7.
    patch by Dark Shikari.
Version full rev. 680
  • Release Date: Sep 25, 2007
  • port sad_*_x3_sse2 to x86_64
  • don't overwrite pthread* namespace, because system headers might define those functions even if we don't want them
Version full rev. 678
  • Release Date: Sep 22, 2007
  • faster 4x4 sad
Version full rev. 677
  • Release Date: Sep 21, 2007
  • fix an arithmetic overflow in trellis at high qp.
Version full rev. 676
  • Release Date: Sep 16, 2007
  • implement multithreaded me=esa
Version full rev. 675
  • Release Date: Sep 13, 2007
  • fix some integer overflows. now vbv size can exceed 2 Gbit.
Version full rev. 674
  • Release Date: Sep 10, 2007
  • allow --vbv-init to take absolute values (in kbit), in addition to the previous fractions of vbv-bufsize.
Version full rev. 673
  • Release Date: Sep 9, 2007
  • remove a bashism
Version full rev. 672
  • Release Date: Sep 4, 2007
  • reorder headers so that largefile support is defined before the first copy of stdio
Version full rev. 671
  • Release Date: Aug 21, 2007
  • regression in r669: broke saving of configure args if make has to re-run configure
Version full rev. 670
  • Release Date: Aug 18, 2007
  • regression in r669: --enable-shared should imply --enable-pic on some archs.
Version full rev. 669
  • Release Date: Aug 14, 2007
  • Add a --host flag to allow overriding config.guess; this is particularly useful with a 64-bits kernel running a 32-bits userland to build 32-bits apps.
  • Normalize any host triplet into a quadruplet via config.sub.
  • Move option parsing before any use of architecture information.
  • Update config.guess.
Version full rev. 667
  • Release Date: Jul 18, 2007
  • mingw doesn't have strtok_r
  • move os/compiler specific defines to their own header
  • extend zones to support (some) encoding parameters in addition to ratecontrol.
Version full rev. 664
  • Release Date: Jul 7, 2007
  • cosmetics
Version full rev. 663
  • Release Date: Jun 29, 2007
  • limit vertical motion vectors to +/-512, since some decoders actually depend on that limit.
Version full rev. 662
  • Release Date: Jun 23, 2007
  • Add vertical and horizontal luma deblocking accelerated with Altivec, based on Graham Booker's code written for FFmpeg with slight modifications to re-use x264's macros
Version full rev. 661
  • Release Date: Jun 16, 2007
  • cosmetics in cpu detection
  • fix compilation without asm on x86_32 (r658 worked only on x86_64).
Version full rev. 659
  • Release Date: Jun 11, 2007
  • exempt 1080p from the non-mod16 warning
Version full rev. 658
  • Release Date: Jun 6, 2007
  • allow compiling without yasm/nasm on x86 and x86-64 platforms
  • updated MS VC8/VC7 build, patch by Gabriel Bouvigne
Version full rev. 656
  • Release Date: May 26, 2007
  • replace alloca with malloc everywhere. per manpage, use of alloca is discouraged. this may have a minor effect on the speed of ssim and esa, but that appears too small to measure.
Version full rev. 655
  • Release Date: May 3, 2007
  • require a ratecontrol method to be specified, it no longer defaults to cqp=26.
Version full rev. 654
  • Release Date: Apr 23, 2007
  • fix nnz computation in cavlc+8x8dct+deblock. (regression in r607)
  • fix the computation of bits used for vbv. (regression in r651)
Version full rev. 652
  • Release Date: Apr 22, 2007
  • c89 compile fix
Version full rev. 651
  • Release Date: Apr 22, 2007
  • cabac: use bytestream instead of bitstream. 35% faster cabac, 20% faster overall lossless, ~1% faster overall at normal bitrates.
Version full rev. 650
  • Release Date: Apr 13, 2007
  • remove the restriction on number of threads as a function of resolution (it was wrong anyway in the presence of B-frames), and raise the max number of threads in general (though more will have to be done before it can really scale to lots of cores).
Version full rev. 649
  • Release Date: Apr 11, 2007
  • tweak ssse3 quant
Version full rev. 648
  • Release Date: Apr 8, 2007
  • change some tables from int to int8_t. 13KB smaller executable.
  • faster cabac rdo. up to 10% faster at q0, but negligible at normal bitrates.
  • workaround gcc's inability to align variables on the stack.
    this crash was introduced in r642, but only because previous versions didn't use sse2 on the stack.
Version full rev. 645
  • Release Date: Apr 6, 2007
  • 32bit version of ssse3 satd. switch default assembler to yasm. it will still fallback to nasm if you don't have yasm.
  • simplify trellis
  • fix an arithmetic overflow in trellis with QP >= 42
  • 2x faster quant. 2% overall.
    side effects:
    not bit-identical to the previous algorithm. while the new algorithm covers a wider range of cqms than the previous one did, I couldn't find a good way to fallback to a general version for the extreme cqms. so now it refuses to encode extreme cqms instead of just being slower. lays a framework for custom deadzone matrices, though I didn't add an api.
  • when encoding with a cqm, probe_skip now also uses the cqm, instead of the flat matrix
Version full rev. 640
  • Release Date: Apr 4, 2007
  • cosmetics in asm macros
  • use only c-style comments in public header (patch by Vincent Torres
Version full rev. 638
  • Release Date: Apr 3, 2007
  • in hpel search, merge two 16x16 mc calls into one 16x17. 15% faster hpel, .3% overall.
  • Compile fix
Version full rev. 636
  • Release Date: Apr 1, 2007
  • remove private stuff from public headers. no more need for -D__X264__
Version full rev. 635
  • Release Date: Mar 25, 2007
  • adjust bitstream buffer sizes for very large frames
Version full rev. 634
  • Release Date: Mar 15, 2007
  • rev. 634: conflate HAVE_MMXEXT with HAVE_SSE2, since they were never used distinctly.
  • rev. 633: Made -DNEED_ALTIVEC unnecessary, thanks to Guillaume Poirier.
  • rev. 632: check x264_cpu_detect() before calling AltiVec functions.
  • rev. 631: ssse3 detection. x86_64 ssse3 satd and quant.
  • rev. 631: requires yasm >= 0.6.0
  • rev. 630: Use -maltivec when building dependencies, or cannot be used.
  • rev. 630: Do not declare vectors in non-AltiVec files.
  • rev. 629: common/cpu.c: runtime AltiVec autodetection on Linux.
  • rev. 629: configure, Makefile: do not build the whole project with -maltivec because it generates AltiVec code in weird places.
Version full rev. 628
  • Release Date: Mar 6, 2007
  • fix a small memleak. patch by Limin Wang.
Version full rev. 627
  • Release Date: Mar 4, 2007
  • compile fix for GCC-3.3 on OSX, based on a patch by Patrice Bensoussan % patrice P bensoussan A free P fr% Note: regression test still do not pass with GCC-3.3, but they never did as far as I can remember.
  • cosmetics in regression test
  • regression testing, run similar to fprofiled: VIDS='vid_720x480.yuv' make test
Version full rev. 624
  • Release Date: Mar 1, 2007
  • add ability to generate doxygen documentation; make dox
Version full rev. 623
  • Release Date: Feb 23, 2007
  • oops, scenecut detection failed to activate when using threads and not using B-frames
Version full rev. 622
  • Release Date: Jan 30, 2007
  • extras/getopt.c was BSD licensed. replace with a LGPL version (from glibc).
Version full rev. 621
  • Release Date: Jan 26, 2007
  • Fix build issues on Linux. Only gcc-4.x is supported, as on OSX.
  • Cleans up a few inconsistencies in the code too.
Version full rev. 620
  • Release Date: Jan 22, 2007
  • tweak block_residual_write_cavlc.
  • up to 1% faster lossless, no difference at normal bitrates.
Version full rev. 619
  • Release Date: Jan 21, 2007
  • don't assume int is exactly 4 bytes
Version full rev. 618
  • Release Date: Jan 12, 2007
  • make array_non_zero() compatible with -fstrict-aliasing
Version full rev. 617
  • Release Date: Jan 10, 2007
  • Honor CFLAGS and LDFLAGS set by the user
Version full rev. 616
  • Release Date: Jan 3, 2007
  • Check whether 'echo -n' works, otherwise try printf (fixes build on current OS X 10.5)
Version full rev. 615
  • Release Date: Jan 2, 2007
  • Check version of nasm on OS X / Intel
Version full rev. 614
  • Release Date: Dec 21, 2006
  • wrong reference frames were used with refs>=14 + pyramid (regression in r607)
  • enable thread synchronization primitives on linux too
Version full rev. 612
  • Release Date: Dec 20, 2006
  • fix a crash with x264_encoder_headers() + threads
Version full rev. 611
  • Release Date: Dec 16, 2006
  • don't skip autodection on configure --enable-pthread
  • more win32threads -> pthreads
  • cosmetics: rename list operators to be consistent with Perl, and move them to common/
  • win32: use pthreads instead of win32threads. for some reason, pthreads is much faster.
  • New threading method:
    Encode multiple frames in prallel instead of dividing each frame into slices. Improves speed, and reduces the bitrate penalty of threading.

    Side effects: It is no longer possible to re-encode a frame, so threaded scenecut detection must run in the pre-me pass, which is faster but less precise.
    It is now useful to use more threads than you have cpus. --threads=auto has been updated to use cpus*1.5.
    Minor changes to ratecontrol.
  • New options: --pre-scenecut, --mvrange-thread, --non-deterministic
Version full rev. 606
  • Release Date: Dec 13, 2006
  • Do not assume anything about sizeof(cpu_set_t).
Version full rev. 605
  • Release Date: Dec 12, 2006
  • Add support for kFreeBSD (FreeBSD kernel with GNU userland).
Version full rev. 604
  • Release Date: Nov 28, 2006
  • Add Altivec implementations of add8x8_idct8, add16x16_idct8, sa8d_8x8 and sa8d_16x16
    Note: doesn't take advantage of some possible aligned memory accesses, so there's still room for improvement
Version full rev. 603
  • Release Date: Nov 26, 2006
  • Force alignment of the fake .rodata on MacIntel
Version full rev. 602
  • Release Date: Nov 23, 2006
  • don't treat vbv_maxrate as a minrate too if it's higher than target average bitrate.
Version full rev. 601
  • Release Date: Nov 19, 2006
  • Merges Guillaume Poirier's AltiVec changes:
    • Adds optimized quant and sub*dct8 routines
    • Faster sub*dct routines
  • ~8% overall speed-up with default settings
Version full rev. 600
  • Release Date: Nov 7, 2006
  • 10% faster deblock mmx functions. ported from ffmpeg.
  • checkasm: ignore insignificant differences in floating-point ssim
Version full rev. 598
  • Release Date: Oct 31, 2006
  • display final ratefactor in abr when a loose vbv is applied. (still disabled in true cbr)
Version full rev. 597
  • Release Date: Oct 30, 2006
  • fix parsing of --deblock %d,%d (beta was ignored)
  • compute chroma_qp only once per mb
Version full rev. 595
  • Release Date: Oct 29, 2006
  • rd refinement of intra chroma direction (enabled in --subme 7)
    patch by Alex Wright.
Version full rev. 594
  • Release Date: Oct 19, 2006
  • fix a crash in avc2avi
Version full rev. 593
  • Release Date: Oct 17, 2006
  • skip deblocking and motion interpolation when using only I-frames
Version full rev. 592
  • Release Date: Oct 14, 2006
  • cosmetics
  • allow fractional values of crf
Version full rev. 590
  • Release Date: Oct 11, 2006
  • prefetch pixels for motion compensation and deblocking.
  • fix a crash on interlace + >8 reference frames
  • no more decoder. it never worked anyway, and the presence of defunct code was confusing people.
Version full rev. 587
  • Release Date: Oct 10, 2006
  • compute pskip_mv only once per macroblock, and store it
  • slightly faster chroma_mc_mmx
  • missing emms in plane_copy_mmx
Version full rev. 584
  • Release Date: Oct 7, 2006
  • merge center_filter_mmx with horizontal_filter_mmx
  • 1.5x faster center_filter_mmx (amd64)
Version full rev. 582
  • Release Date: Oct 6, 2006
  • mmx/prefetch implementation of plane_copy
  • no more vfw
  • gtk fixes:
    • in Makefile
      • fix datadir for mingw users
      • remove the shared lib during the clean rule
      • use $(ENCODE_BIN) instead of x264_gtk_encode
      • add some $(DESTDIR) and create some directories when necessary
      • remove -lintl
    • statfile_length -> statsfile_length
    • fix the "sensitivity" of the widget of update_statfile
    • the logo is now handled correctly on windows
  • added: beginning of multipass support
  • patch by Vincent Torri.
Version full rev. 579
  • Release Date: Oct 5, 2006
  • accept mencoder's option names as synonyms (api only, not in x264cli)
Version full rev. 578
  • Release Date: Oct 3, 2006
  • simplify satd_sse2
  • better error checking in x264_param_parse.
  • add synonyms for a few options.
Version full rev. 576
  • Release Date: Oct 2, 2006
  • fix some strides that weren't a multiple of 16.
  • tweak motion compensation amd64 asm. 0.3% overall speedup.
  • strip local symbols from asm .o files, since they confuse oprofile
  • add an option to control direct_8x8_inference_flag, default to enabled.
  • slightly faster encoding and decoding of p4x4 + B-frames, and is needed for strict Levels compliance.
Version full rev. 572
  • Release Date: Oct 1, 2006
  • allow custom deadzones for non-trellis quantization. patch by Alex Wright.
  • move zigzag scan functions to dsp function pointers.
  • mmx implementation of interlaced zigzag.
Version full rev. 570
  • Release Date: Oct 1, 2006
  • support interlace. uses MBAFF syntax, but is not adaptive yet.
Version full rev. 569
  • Release Date: Sep 28, 2006
  • allow --zones in cqp encodes
Version full rev. 568
  • Release Date: Sep 27, 2006
  • cli: fix some typos in vui parameters from r542. patch by Foxy Shadis.
Version full rev. 567
  • Release Date: Sep 26, 2006
  • Add an "all" rule to the Makefile. Ideally "default" should be renamed, but I don't want to break existing scripts.
Version full rev. 566
  • Release Date: Sep 25, 2006
  • workaround: on some systems, alloca() isn't aligned
Version full rev. 565
  • Release Date: Sep 23, 2006
  • missing picpop
Version full rev. 564
  • Release Date: Sep 14, 2006
  • fix a buffer overread from r540
Version full rev. 563
  • Release Date: Sep 13, 2006
  • cosmetics (spelling)
  • faster ESA
Version full rev. 560
  • Release Date: Sep 11, 2006
  • Use the autotool's config.guess script instead of uname to check the system and CPU types, to avoid issues when using for instance a 32-bit userland on top of a 64-bit kernel.
  • Add the autotool's config.guess script so that we can use it instead of uname in the configure script.
Version full rev. 558
  • Release Date: Aug 23, 2006
  • 10l in r553
Version full rev. 557
  • Release Date: Aug 21, 2006
  • ssim broke on amd64 w/ pic.
Version full rev. 556
  • Release Date: Aug 19, 2006
  • MSVC compatibility fix from Haali
Version full rev. 555
  • Release Date: Aug 18, 2006
  • support changing some more parameters in x264_encoder_reconfig()
  • SSIM computation. (default on, disable by --no-ssim)
Version full rev. 553
  • Release Date: Aug 17, 2006
  • configure: --enable-debug reduces optimization to -O1
  • cosmetics
Version full rev. 551
  • Release Date: Aug 4, 2006
  • gcc -fprofile-generate isn't threadsafe
  • cli: move some options from --help to --longhelp
  • cli: don't try to get resolution from filename unless input is rawyuv
  • r542 broke --visualize
Version full rev. 547
  • Release Date: Aug 3, 2006
  • Nicer OS X x264_cpu_num_processors (thanks David)
  • Support OS X and BeOS in x264_cpu_num_processors
  • Fixes contexts allocation with threads=auto
  • select initial qp for abr and cbr baased on satd and bitrate, rather than cq24.
  • --threads=auto to detect number of cpus
  • api addition: x264_param_parse() to set options by name
  • fix a rare NaN in ratecontrol
  • move quant_mf[] from x264_t to the heap, and merge duplicate entries
  • GTK update. patch by Vincent Torri.
    fixed: cleaning of Makefile time elapsed seems broken ('total time' label replaced by 'time remaining') text entries of the status window are now not editable added: compilation from x264/ (add --enable-gtk option to configure) shared lib creation if --enable-shared is passed to configure x264gtk.pc --b-rdo, --no-dct-decimate
  • new option: --qpfile forces frames types and QPs. (intended for ratecontrol experiments, not for real encodes)
Version full rev. 537
  • Release Date: Jul 18, 2006
  • api change: select ratecontrol method with an enum (param.rc.i_rc_method) instead of a bunch of booleans.
Version full rev. 536
  • Release Date: Jul 17, 2006
  • slightly faster mmx dct
  • OpenBSD build fixes.
  • patch by Vizeli Pascal (pvizeli at yahoo dot de)
Version full rev. 534
  • Release Date: Jul 9, 2006
  • mc_chroma width2 mmx
Version full rev. 533
  • Release Date: Jun 29, 2006
  • make libx264.so symlink relative
Version full rev. 532
  • Release Date: Jun 13, 2006
  • added:
    • direct=auto
    • no-fast-pskip
    • vbv
    • cqm
    • tooltips (without descriptions yet)
    • translations
    • `make clean` for .exe
    • when file exists, ask for override
  • fixes:
    • debug level bug
    • bitrate slider bug
    • mixed-refs can be set only if ref>1
    • i8x8 can be set only if 8x8 transform is enabled
    • # of threads capped at 4
    • fourcc can't be removed
    • cosmetics
Version full rev. 531
  • Release Date: Jun 1, 2006
  • vfw installer: tweak nsis compression. patch by Francesco Corriga.
Version full rev. 530
  • Release Date: May 31, 2006
  • Fixed typo that caused x264_encoder_open to always fail
Version full rev. 529
  • Release Date: May 30, 2006
  • check some mallocs' return value
  • make -> $(MAKE)
Version full rev. 527
  • Release Date: May 24, 2006
  • convert non-fatal errors to message level "warning".
Version full rev. 526
  • Release Date: May 23, 2006
  • fix a memory alignment. (no effect on x86, but might be needed for other simd)
Version full rev. 525
  • Release Date: May 21, 2006
  • when using DEBUG_DUMP_FRAME, write decoded pictures in display order. patch by Loic Le Loarer.
  • non-referenced B-frames should have the same frame_num as the following ref frame, not the previous. patch by Loic Le Loarer.
Version full rev. 523
  • Release Date: May 12, 2006
  • set the SPS constraint_set[01]_flag based on the profile in use, just in case some decoder cares
Version full rev. 522
  • Release Date: May 11, 2006
  • msvc doesn't like C99 named array initializers
  • allow sar=1/1.
    patch by Loic Le Loarer.
  • faster intra search: filter i8x8 edges only once, and reuse for multiple predictions.
Version full rev. 519
  • Release Date: May 10, 2006
  • faster intra search: some prediction modes don't have to compute a full hadamard transform. x86 and amd64 asm.
Version full rev. 518
  • Release Date: May 8, 2006
  • --sps-id, to allow concatenating streams with different settings.
Version full rev. 517
  • Release Date: May 4, 2006
  • typo in expand_border_mod16
Version full rev. 516
  • Release Date: Apr 30, 2006
  • typo impaired 2pass bitrate prediction.
Version full rev. 515
  • Release Date: Apr 29, 2006
  • Let the user choose the compiler with "CC=xxx ./configure"
Version full rev. 514
  • Release Date: Apr 29, 2006
  • More vector types fixes for gcc 3.3
Version full rev. 513
  • Release Date: Apr 29, 2006
  • More vector casts to try and make compilers happier
Version full rev. 512
  • Release Date: Apr 25, 2006
  • Use sa8d instead of satd for i8x8 search. +.01 dB, -.5% speed
Version full rev. 511
  • Release Date: Apr 25, 2006
  • Before evaluating the RD score of any mode, check satd and abort if it's much worse than some other mode.
  • Also apply more early termination to intra search. speed at -m1:+1%, -m4:+3%, -m6:+8%, -m7:+20%
Version full rev. 510
  • Release Date: Apr 25, 2006
  • common/ppc/pixel.c: fixed illegal implicit casts of vector types
Version full rev. 509
  • Release Date: Apr 25, 2006
  • Added %$#@#$! support for #@%$!#@ armv4l CPU.
Version full rev. 508
  • Release Date: Apr 24, 2006
  • When evaluating predictors to start fullpel motion search, use subpel positions instead of rounding to fullpel. about +.02 dB, -1.6% speed at subme>=3 patch by Alex Wright.
Version full rev. 507
  • Release Date: Apr 24, 2006
  • mmx implementation of x264_pixel_sa8d
Version full rev. 506
  • Release Date: Apr 21, 2006
  • 10l in r463 (q0 i16x16 dc was permuted)
Version full rev. 505
  • Release Date: Apr 20, 2006
  • typo in r504
Version full rev. 504
  • Release Date: Apr 20, 2006
  • update msvc project files.
  • patch by anonymous.
Version full rev. 503
  • Release Date: Apr 19, 2006
  • Before, we eliminated dct blocks containing only a small single coefficient. Now that behavior is optional, by --no-dct-decimate. based on a patch by Alex Wright.
Version full rev. 502
  • Release Date: Apr 17, 2006
  • Enables more agressive optimizations (-fastf -mcpu=G4) on OS X.
  • Adds AltiVec interleaved SAD and SSD16x16.
  • Overall speedup up to 20%.
Version full rev. 501
  • Release Date: Apr 17, 2006
  • faster cabac_encode_bypass
Version full rev. 500
  • Release Date: Apr 16, 2006
  • restored AltiVec dct
Version full rev. 499
  • Release Date: Apr 16, 2006
  • more AltiVec mc, ~4.5% overall speedup
Version full rev. 498
  • Release Date: Apr 12, 2006
  • slightly faster loopfilter
Version full rev. 496
  • Release Date: Apr 12, 2006
  • cosmetics in sad/ssd/satd mmx
Version full rev. 497
  • Release Date: Apr 12, 2006
  • 3% faster satd_mmx
Version full rev. 495
  • Release Date: Apr 11, 2006
  • store quoted configure options. needed e.g. for multiple args under --extra-cflags.
Version full rev. 494
  • Release Date: Apr 11, 2006
  • fix a yasm-incompatible syntax in x86 asm
Version full rev. 493
  • Release Date: Apr 11, 2006
  • yasm noexec stack
Version full rev. 492
  • Release Date: Apr 10, 2006
  • more interleaved SAD.
  • 25% faster halfpel.
Version full rev. 491
  • Release Date: Apr 10, 2006
  • more interleaved SAD.
  • 1% faster umh, 6% faster esa.
Version full rev. 489
  • Release Date: Apr 10, 2006
  • Added support for ppc64. I'm really f***ing tired of having to do this.
Version full rev. 490
  • Release Date: Apr 10, 2006
  • interleave multiple calls to SAD.
  • 15% faster fullpel motion estimation.
Version full rev. 488
  • Release Date: Apr 8, 2006
  • use LDFLAGS when linking shared lib
Version full rev. 487
  • Release Date: Mar 29, 2006
  • compilation fix for mingw, darwin (off_t was undefined)
Version full rev. 486
  • Release Date: Mar 28, 2006
  • (r486) GTK: support yuv4mpeg input. patch by Vincent Torri.
  • (r485) GTK: fix avs input. patch by Vincent Torri.
  • (r484) cli: support yuv4mpeg input. patch by anonymous.
  • (r483) GTK: compilation fixes
Version full rev. 477
  • Release Date: Mar 23, 2006
  • 10l in r473 and stdin
  • RD subpel motion estimation (--subme 7)
  • cosmetics in cabac_mb_cbf
Version full rev. 451
  • Release Date: Mar 4, 2006
  • 10l in r443 (p4x4 chroma)
  • common/i386/i386inc.asm: tell the ELF linker about our stack properties so that it does not assume the stack has to be executable.
  • configure common/i386/i386inc.asm: got rid of -DFORMAT_* nasm flags and use built-in preprocessor tests instead.
  • common/i386: factored the .rodata section declaration into i386inc.asm.
  • configure: activate minor nasm optimisations, such as assembling "add eax, 8" as "add eax, byte 8".
  • common/i386/*.asm: don't use the "GLOBAL" reserved word, some versions NASM complain about it. Replaced it with "GOT_ebx".

Rate this software:

Your Rating:
You have not voted yet!

Average: 4.06
Total Ratings: 48

 



Trailer






Software Submissions