Software - > x264

Screenshot

Downloads


Join Our Newsletter
Title: x264
Author/Publisher: Official Website
Ease of use: Not Rated
Latest Version: full rev. 451 (details)
OS Support: Windows 2000 Windows XP
License: Freeware
Date Added: Aug 15, 2005
Last Updated: Mar 4, 2006
Page Viewed: 136399 times

  Email Updates
email me whenever this software gets updated

Send to Friend   Send to Friend

User Rating:

Details / Vote Now
User Comments:
Post Your Comments

Summary Links Images Ratings History Tools

Revision History:

Version r1173
  • Release Date: Jun 26, 2009
  • Download(s):
  • r1173: Add subpartition cost for sub-8x8 blocks
    Improves sub-p8x8 mode decision.
  • r1172: Yet more CABAC and CAVLC optimizations
    Also clean up a lot of pointless code duplication in CAVLC MV coding.
Version r1171
  • Release Date: Jun 23, 2009
  • Download(s):
  • Various CABAC optimizations and cleanups
  • Faster CABAC CBF context calculation for inter blocks.
  • Add x264_constant_p(), will probably be useful in the future as well.
  • Simpler subpartition functions.
  • Clean up and optimize mvd_cpn a bit more.
  • Various other minor optimizations.
Version r1170
  • Release Date: Jun 21, 2009
  • Download(s):
  • AltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x on PPC970MP.
Version r1169
  • Release Date: Jun 20, 2009
  • Download(s):
  • r1169: MMX CABAC mvd sum calculation
    Faster CABAC mvd coding.
  • r1168: Faster MV prediction
    Smaller code size, plus I get to use goto.
  • r1167: Fix potential crash in checkasm
    ssim_end4_sse2 requires aligned sums
  • r1166: SSSE3, faster SSE2/MMX integral_init4v
    The real reason I wrote this was an excuse to use shufpd.
Version r1165 Version r1163
  • Release Date: May 29, 2009
  • Download(s):
  • r1163: configure check for cc, rather than reporting lack of compiler as an asm error.
    configure check for -mno-cygwin, since it's removed from gcc4.
  • r1162: a better way to keep track of mv candidates.
    2-4% faster dia, hex, and umh.
  • r1161: reorder some motion estimation patterns.
    this change is useless on its own, but segregates the bitstream-changing part out of my next optimization.
Version r1160
  • Release Date: May 27, 2009
  • Download(s):
  • Fix VBV warning broken in r915
    x264 will now correctly warn about maxrate specified without bufsize even when a level is not set.
Version r1159
  • Release Date: May 25, 2009
  • Download(s):
  • r1159: configure check for ssse3-capable binutils
  • r1158: Fix 10L in r1155
    Broke --me esa/tesa due to forgetting to add handling for x264_cost_mv_fpel.
  • r1157: Fix bug where satd was incorrectly used with subme<=1
    Faster subme<=1 with i4x4 enabled.
  • r1156: Remove some pointless error handling code in cabac/cavlc
  • r1155: Save some memory on mv cost arrays
    Have quantizers that use the same lambda share the same cost array.
  • r1154: Various CABAC and CAVLC optimizations
    Backport CAVLC partial-inlining early termination to CABAC (~2-4% faster CABAC residual coding)
Version r1153
  • Release Date: May 19, 2009
  • Download(s):
  • r1153: fix a race condition at the end of thread_input
  • r1152: Various trellis speed optimizations
  • r1151: Make i686 the default arch on x86_32
    Disabling asm will default to a generic arch.
    Also fix configure for gcc 4.4.
  • r1150: Faster signed golomb coding
    3% faster CAVLC RDO and bitstream writing.
  • r1149: Faster spatial direct MV prediction
    unroll/tweak col_zero_flag
Version r1148
  • Release Date: May 10, 2009
  • Download(s):
  • r1148: More CABAC and CAVLC optimizations
    Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding.
    Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3.
    Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.
  • r1147: Various optimizations in frametype lookahead
  • r1146: Some cosmetics/cleanup
    Move some macros to x86util.asm that should have been there to begin with.
    Fix a typo that didn't cause any issues.
Version r1145
  • Release Date: Apr 22, 2009
  • Download(s):
  • r1145: fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version)
  • r1144: fix conversions between vectors with differing element types or numbers of subparts errors
Version r1143
  • Release Date: Apr 20, 2009
  • Download(s):
  • r1143: Add "coded blocks" stat to output information.
    This measures the total percentage of blocks, intra and inter, which have nonzero coefficients.
    "y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks.
    Note that skip blocks are included in this stat.
  • r1142: Enable asm predict_8x8_filter
    I'm not entirely sure how this snuck its way out of holger's intra pred patch.
  • r1141: Remove various bits of dead code found by CLANG.
Version r1140
  • Release Date: Apr 15, 2009
  • Download(s):
  • Slightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM
    shufps is the most underrated SSE instruction on x86.
Version r1139
  • Release Date: Apr 10, 2009
  • Download(s):
  • r1139: Various CABAC optimizations
    Move calculation of b_intra out of the core residual loop and hardcode it where applicable.
    Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size. Inlining only cache_mvd is faster and significantly smaller.
  • r1138: CAVLC optimizations
    faster bs_write_te, port CABAC context selection optimization to CAVLC.
Version r1137
  • Release Date: Apr 7, 2009
  • Download(s):
  • Faster CABAC RDO
    Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding, it's faster to use a branch than a cmov.
Version r1136
  • Release Date: Apr 5, 2009
  • Activate intra_sad_x3_8x8c in lookahead
Version r1134
  • Release Date: Apr 1, 2009
  • r1134: intra_sad_x3_8x8 assembly
  • r1133: intra_sad_x3_4x4 assembly
  • r1132: intra_sad_x3_8x8c assembly
    Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)
  • r1131: Shave one instruction off CABAC encode_decision
    range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3
Version r1130
  • Release Date: Mar 27, 2009
  • Faster probe_skip
    Add a second chroma threshold after the DC transform.
Version r1129
  • Release Date: Mar 21, 2009
  • Add missing "static" qualifier to two arrays
    Should slightly improve performance.
Version r1128
  • Release Date: Mar 19, 2009
  • SSE2 zigzag_interleave
    Replace PHADD with FastShuffle (more accurate naming).
    This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.
Version r1127
  • Release Date: Mar 11, 2009
  • Download(s):
  • r1127: Faster integral_init
    palignr to avoid unaligned loads is worth it in inith, but not initv.
  • r1126: Faster SSSE3 hpel_filter_v
    ~10% faster hpel_filter on 64-bit Penryn.
    32-bit version by Jason Garrett-Glaser.
Version r1125
  • Release Date: Mar 9, 2009
  • r1125: Faster SSE2 pixel_var
    Optimized using the DEINTB method from r1122. ~32% faster var_16x16 on Conroe.
  • r1124: SSSE3 hpel_filter_v
    Optimized using the same method as in r1122. Patch partially by Holger.
    ~8% faster hpel filter on 64-bit Nehalem
Version r1123
  • Release Date: Mar 7, 2009
  • r1123: Update some asm copyright headers
  • r1122: Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT
    Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
    16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
    Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
    Overall performance boost is up to ~15% on 64-bit Conroe.
  • r1121: Update x264 copyright date
Version r1120
  • Release Date: Mar 6, 2009
  • Remove pre-scenecut from fprofile commands as well
    Also add psy-trellis to fprofile
Version r1119
  • Release Date: Mar 4, 2009
  • r1119: Slightly faster 8x16 SAD on Penryn Core 2
    Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case.
    Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included.
  • r1118: Fix scenecut and VBV with videos of width/height <= 32
    Also remove an unused variable
  • r1117: Remove non-pre scenecut
    Add support for no-b-adapt + pre-scenecut (patch by BugMaster)
    Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways.
    Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1)
    Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2.
    Simplify pre-scenecut code.
  • r1116: Add AltiVec version of hadamard_ac. 2.4x faster than the C version.
    Note this this implementation is pretty naive and should be improved
    by implementing what's discussed in this ML thread:
    date: Mon, Feb 2, 2009 at 6:58 PM
    subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines
Version r1115
  • Release Date: Feb 27, 2009
  • Fix regression in r1085
    Deblocking was very slightly incorrect with partitions=all.
    Bug found by BugMaster.
Version r1114
  • Release Date: Feb 17, 2009
  • Download(s):
  • Optimize neighbor CBP calculation and fix related regression
  • r1105 introduced array overflow in cbp handling
Version r1113
  • Release Date: Feb 14, 2009
  • Show FPS when importing a raw YUV file
Version r1112
  • Release Date: Feb 13, 2009
  • r1112: Windows 64-bit support
    A "make distclean" is probably required after updating to this revision.
  • r1111: Minor fixes and cosmetics
    Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.
Version r1110
  • Release Date: Feb 12, 2009
  • Download(s):
  • fix 10l in 75b495f2723fcb77f
    Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )
Version r1109
  • Release Date: Feb 10, 2009
  • r1109: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors.
  • r1108: Promote chroma planes to 16 byte alignment.
    This will allow simplifying vectors loads that can only load 16-bytes aligned data (such as AltiVec).
  • r1107: Fix 10L in intra pred
    Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP).
Version r1106
  • Release Date: Feb 9, 2009
  • r1106: Add decimation in i16x16 blocks
    Up to +0.04db with CAVLC, generally a lot less with CABAC.
  • r1105: Much faster CABAC residual context selection
    Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO.M
    Up to 7% faster overall in extreme cases.
  • r1104: Faster coeff_last64 on 32-bit
  • r1103: More intra pred asm optimizations
    SSSE3 version of predict_8x8_hu
    SSE2 version of predict_8x8c_p
    SSSE3 versions of both planar prediction functions
    Optimizations to predict_16x16_p_sse2
    Some unnecessary REP_RETs -> RETs.
    SSE2 version of predict_8x8_vr by Holger.
    SSE2 version of predict_8x8_hd.
    Don't compile MMX versions of some of the pred functions on x86_64.
    Remove now-useless x86_64 C versions of 4x4 pred functions.
    Rewrite some of the x86_64-only C functions in asm.
Version r1102
  • Release Date: Feb 9, 2009
  • Speed-up mc_chroma_altivec by using vec_mladd cleverly, and unrolling.
    Also put width == 2 variant in its own scalar function because it's faster than a vectorized one.
Version r1101
  • Release Date: Feb 5, 2009
  • r1101: Merging Holger's GSOC branch part 2: intra prediction
    Assembly versions of most remaining 4x4 and 8x8 intra pred functions.
    Assembly version of predict_8x8_filter.
    A few other optimizations.
    Primarily Core 2-optimized.
  • r1100: 10l: fix compilation with GCC 4.3+
Version r1099
  • Release Date: Feb 4, 2009
  • r1099: Faster 8x8dct+CAVLC interleave
    Integrate array_non_zero with the CAVLC 8x8dct interleave function.
    Roughly 1.5-2x faster than the original separate array_non_zero method.
  • r1098: Measure CBP cost in i8x8 RD refinement
    ~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise.
    Allows a small optimization in i8x8 encoding.
Version r1097
  • Release Date: Feb 2, 2009
  • Take advantage of saturated signed horizontal sum instructions in the variance computation epilogue since there won't be any overflow triggering an overflow.
    Suggested by Loren Merritt
Version r1096
  • Release Date: Jan 31, 2009
  • Massive overhaul of nnz/cbp calculation
    Modify quantization to also calculate array_non_zero. PPC assembly changes by gpoirior.
    New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
    Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
    Also add new i16x16 DC-only iDCT with asm.
    Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
    Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
    Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
    Overall performance increase 0-6% depending on encoding settings.
Version r1095
  • Release Date: Jan 30, 2009
  • r1095: Add PowerPC support for "checkasm --bench", reading the time base register.
    This isn't ideal since the `time base' register is running at a fraction of the processor cycle speed, so the measurement isn't as precise as x86' rdtsc.
    It's better than nothing though...
  • r1094: fix detection of pthread and isfinite on OpenBSD
Version r1093
  • Release Date: Jan 29, 2009
  • r1093: remove $ECHON kludge, which broke on SunOS. bring back `gcc -MT`.
    remove auto-reconfigure on svn update, which has done nothing since we stopped using svn.
    fix $AS on sparc (was disabled by mmx check).
    fix --extra-asflags (was ignored).
    mark bash scripts as bash, not sh
    patch partly by Greg Robinson and Jugdish.
  • r1092: 1.6x faster satd_c (and sa8d and hadamard_ac) with pseudo-simd.
    60KB smaller binary.
  • r1091: Hack around a potential failure point in VBV pred_b_from_p can become absurdly large in static scenes, leading to rare collapses of quality with VBV+B-frames+threads.
    This isn't a final fix, but should resolve the problem in most cases in the meantime.
Version r1090
  • Release Date: Jan 28, 2009
  • Much faster chroma encoding and other opts
    ~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.br> Small optimization in cache_save (skip_bp)br> Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)br> Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
Version r1089
  • Release Date: Jan 27, 2009
  • add AltiVec implementation of x264_mc_copy_w16_aligned
Version r1088
  • Release Date: Jan 24, 2009
  • r1088: add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8
  • r1087: add AltiVec 16 <-> 32 bits conversions macros
Version r1086
  • Release Date: Jan 20, 2009
  • Replace 16x16=>32 mul + pack + add by a simple 16x16=>16 multiply-add.
    Suggested by Loren.
Version r1085
  • Release Date: Jan 20, 2009
  • r1085: Eliminate support for direct_8x8_inference=0
    The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
    As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
    Remove some unused mc code related to sub-8x8 partitions.
    Small deblocking speedup when p4x4 is used.
    Also remove unused x264_nal_decode prototype from x264.h.
  • r1084: Add AltiVec and CPU numbers detection on OpenBSD.
Version r1083
  • Release Date: Jan 19, 2009
  • r1083: Add AltiVec implementation of predict_8x8c_p. 2.6x faster than scalar C.
  • r1082: Warn if direct auto wasn't set on the first pass
    And, if it wasn't, run direct auto as if it was the first pass, rather than simply forcing temporal direct mode on all frames.
    Also a small tweak to coeff_level_run asm.
Version r1081
  • Release Date: Jan 18, 2009
  • Changes the PowerPC ppccommon.h header so it no longer checks for a particular
    OS such as Linux but instead looks for HAVE_ALTIVEC_H being set.
    Fixes all *BSD/PowerPC builds.
Version r1080
  • Release Date: Jan 15, 2009
  • r1080: update x264_hpel_filter_altivec's prototype to match the one of the C version.
    It changed in commit 045ae4045a1827555b3eaab4fbf3c9809e98c58f (factorization of mallocs)
    (NB: Altivec implementation wasn't allocating and writing to any scratch memory.)
  • r1079: rename vector+array unions to closer match the vector typedefs names.
  • r1078: Add Altivec implementation of all the remaining 16x16 predict routines.
Version r1077
  • Release Date: Jan 14, 2009
  • r1077: Cache ref costs and use more accurate MV costs
    New MV costs should improve quality slightly by improving the smoothness of the field of MV costs (and they're closer to CABAC's actual costs).
    Despite being optimized for CABAC, they still help under CAVLC, albeit less.
    MV cost change by Loren Merritt
  • r1076: Support forced frametypes with scenecut/b-adapt
    This allows an input qpfile to be used to force I-frames, for example.
    The same can be done through the library interface.
    Document the format of the qpfile in --longhelp and the forcing of frametypes in x264.h
    Note that forcing B-frames and B-refs may not always have the intended result.
    Patch partially by Steven Walters
  • r1075: Remove an IDIV from i8x8 analysis
    Only one IDIV is left in macroblock level code (transform_rd)
Version r1074
  • Release Date: Jan 9, 2009
  • Fix regression in r1066
    With some combinations of video width and other settings, the scratch buffer was slightly too small.
    This caused heap corruption on some systems.
Version r1073
  • Release Date: Jan 7, 2009
  • r1073: Disable B-frames in lossless mode
    They hurt compression anyways, and direct auto was bugged with lossless.
  • r1072: Factorize in ppccommon.h the conditional inclusion of altivec.h on Linux systems.
Version r1071
  • Release Date: Jan 2, 2009
  • r1071: Small tweaks to coeff asm
    Factor out a few redundant pxors
    Related cosmetics
  • r1070: Fix C99ism in r1066
Version r1069
  • Release Date: Jan 1, 2009
  • r1069: Use the correct strtok under MSVC
    Also change one malloc -> x264_malloc
  • r1068: Add stack alignment for lookahead functions
    Should allow libx264 to be called from non-gcc-compiled applications without adding force_align_arg_pointer.
  • r1067: Add support for SSE4a (Phenom) LZCNT instruction
    Significantly speeds up coeff_last and coeff_level_run on Phenom CPUs for faster CAVLC and CABAC.
    Also a small tweak to coeff_level_run asm.
  • r1066: factor mallocs out of hpel, ssim, and esa.
    there should now be no memory allocation outside of init-time.
Version r1065
  • Release Date: Dec 30, 2008
  • r1065: Much faster CAVLC RDO and bitstream writing
  • r1065: Pure asm version of level/run coding. Over 2x faster than C.
  • r1065: Up to 40% faster CAVLC RDO. Overall benefit up to ~7.5% with RDO or ~5% with fast encoding settings.
  • r1064: Cosmetics: cleaner syntax for defining temporary registers in asm
  • r1064: Globally define t#[qdwb], so that only t# needs to be locally defined when reorganizing registers
Version r1063
  • Release Date: Dec 29, 2008
  • Much faster CABAC RDO
    Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO.
    This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts.
    However, the PSNR penalty of this is extremely small (~0.001db).
    Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20.
    Overall encoding speed benefit is up to 5%, depending on encoding settings.
    Also remove an old unnecessary CABAC table that hasn't been used for years.
Version r1062
  • Release Date: Dec 27, 2008
  • VLC table optimizations
  • Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc.
  • Also a small optimization in p8x8 CAVLC.
Version r1061
  • Release Date: Dec 25, 2008
  • r1061: Fix crash in --me esa/tesa introduced in r1058
    Also suppress the last mingw warning message
  • r1060: Optimize variance asm + minor changes
    Remove SAD argument from var, not needed anymore.
    Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
    Eliminate all remaining warnings on gcc 3.4 on cygwin
    Port another minor optimization from lavc (pskip)
  • r1059: Minor CABAC cleanups and related optimizations
    Merge the two list tables to allow cleaner MC/CABAC/CAVLC code
    Remove lots of unnecessary {s
    Port some very minor opts from lavc
  • r1058: faster ESA init
    reduce memory if using ESA and not p4x4
Version r1057
  • Release Date: Dec 16, 2008
  • r1057: More macroblock_cache optimizations
    Patch partially by Loren Merritt
  • r1056: Faster macroblock_cache_rect
    Explicit loop unrolling
Version r1055
  • Release Date: Dec 15, 2008
  • Optimizations in predict_mv_direct
    Add some early terminations and minor optimizations
    This change may also fix the extremely rare direct+threading MV bug.
Version r1054
  • Release Date: Dec 15, 2008
  • Fix visual corruption when picture width was not mod 32.
    The previous Altivec implemention of mc_chroma assumed that i_src_stride was always mod 16.
Version r1053
  • Release Date: Dec 14, 2008
  • r1053: Add support for FSF GCC version >= 4.3 on OSX.
    So far, only Apple GCC version was supported.
  • r1052: More accurate refcost for p8x8 CAVLC
    Slightly better quality, especially in non-RD mode, with CAVLC.
Version r1051
  • Release Date: Dec 12, 2008
  • r1051: use lookup tables instead of actual exp/pow for AQ
    Significant speed boost, especially on CPUs with atrociously slow floating point units (e.g. Pentium 4 saves 800 clocks per MB with this change).
    Add x264_clz function as part of the LUT system: this may be useful later.
    Note this changes output somewhat as the numbers from the lookup table are not exact.
  • r1050: Suppress saveptr warnings on Windows GCC
  • r1049: More small speed tweaks to macroblock.c
  • r1048: Much faster CAVLC residual coding
    7 due to different nonzero counts being stored during qpel RD.
Version r1047
  • Release Date: Dec 6, 2008
  • r1047: fix compilation with GCC-4.3+
  • r1046: High Profile allows 25% higher maxbitrate/cpb
    Correct level detection to take this into account.
Version r1046
  • Release Date: Dec 1, 2008
  • High Profile allows 25% higher maxbitrate/cpb
    Correct level detection to take this into account.
Version r1045
  • Release Date: Nov 30, 2008
  • r1045: s/nasm/yasm in VS project file
  • r1044: Cosmetic: update various file headers.
  • r1043: add date and compiler to `x264 --version`
Version r1042
  • Release Date: Nov 29, 2008
  • r1042: 10L in r1041
  • r1041: Significantly faster CABAC and CAVLC residual coding and bit cost calculation
    Early-terminate in residual writing using stored nnz counts
    To allow the above, store nnz counts for luma and chroma DC
    Add assembly functions to find the last nonzero coefficient in a block
    Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
    Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
    CAVLC output should be equivalent.
Version r1040
  • Release Date: Nov 27, 2008
  • r1040: dequant_4x4_dc assembly
    About 3.5x faster DC dequant on Conroe
  • r1039: fix an overflow in dct4x4dc_mmx
    (unlikely to have occurred in any real video)
Version r1038
  • Release Date: Nov 26, 2008
  • Remove nasm support
    Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support.
    Users should upgrade to yasm 0.6.1 or later.
Version r1037
  • Release Date: Nov 26, 2008
  • r1037: Fix rare warning messages in ratecontrol due to r1020
  • r1036: Fix MSVC compilation and clean up MSVC build file
    Remove Release64 which never worked anyways.
  • r1035: Faster width4 SSD+SATD, SSE4 optimizations
    Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
    Use pinsrd (SSE4) for faster width4 SSDv Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
    Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
    These optimizations help on Nehalem, Phenom, and Penryn CPUs.
  • r1034: fix indentation, whitespace cleanup, more consistent indentation of macro backslashes
  • r1033: Change some macros to be more sensitive to memory alignment, thus avoiding
    useless loads/stores and calculations of permutation vectors.
    Affected functions are all of mc_luma, mc_chroma, 'get_ref', SATD, SA8D and deblock.
    Gains globally vary from ~5% - 15% on a depending on settings running on a 1.42 ghz G4.
Version r1032
  • Release Date: Nov 25, 2008
  • r1032: refactor satd. 20KB smaller binary.
    refactor sa8d. slightly faster.
    more checkasm for hadamard.
  • r1031: Fix crash with threads and SSEMisalign on Phenom
    Misalign mask needed to be set separately for each encoding thread.
Version r1030
  • Release Date: Nov 25, 2008
  • Phenom CPU optimizations
  • Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
  • Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
  • Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
  • Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
  • Merge cpu-32.asm and cpu-64.asm
  • Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
Version r1029
  • Release Date: Nov 21, 2008
  • A few tweaks to decimate asm
  • A little bit faster on both 32-bit and 64-bit
Version r1028
  • Release Date: Nov 14, 2008
  • r1028: Nehalem optimization part 2: SSE2 width-8 SAD
    Helps a bit on Phenom as well
    ~25% faster width8 multiSAD on Nehalem
  • Add subme=0 (fullpel motion estimation only)
    Only for experimental purposes and ultra-fast encoding. Probably not a good idea for firstpass.
Version r1026
  • Release Date: Nov 11, 2008
  • r1026: Fix minor memory leak in r1022
  • r1025: r1024 borked checkasm
    Remove idct/dct2x2 from checkasm as they are no longer in dctf
Version r1024
  • Release Date: Nov 11, 2008
  • r1024: Faster chroma encoding
    9-12% faster chroma encode.
    Move all functions for handling chroma DC that don't have assembly versions to macroblock.c and inline them, along with a few other tweaks.
  • r1023: Various cosmetics and minor fixes
    Disable hadamard_ac sse2/ssse3 under stack_mod4
    Fix one MSVC compilation warning
    Fix compilation in debug mode in certain cases on x64
    Remove eval.c from MSVC project
    Fix crash when VBV is used in CQP mode
    Patches by MasterNobody
  • r1022: Faster b-adapt + adaptive quantization
    Factor out pow to be only called once per macroblock. Speeds up b-adapt, especially b-adapt 2, considerably.
    Speed boost is as high as 24% with b-adapt 2 + b-frames 1
  • r1021: Faster CABAC residual encoding
    6% faster block_residual_write_cabac in RD mode.
  • r1020: Fix potential crash in the case that the input statsfile is too short
    Also resolve various other potential weirdness (such as multiple copies of the same error message in threaded mode).
Version r1019
  • Release Date: Nov 6, 2008
  • r1019: Initial Nehalem CPU optimizations
    movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed.
    Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary.
    Thanks to Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations.
    Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%.
  • r1018: Fix potential infinite loop in VBV under GCC 4.
  • r1017: Encoder_reconfig: esa/tesa can only be enabled if they were on to begin with
    Bug report by kemuri-_9.
Version r1016
  • Release Date: Nov 1, 2008
  • Please refer to this page for a full list of changes
Version r999
  • Release Date: Oct 3, 2008
  • r999: rm gtk, avc2avi.
    I don't remember why I allowed a gui into the repository in the first place. There's nothing that makes this one special relative to all the other x264 guis.
    avc2avi doesn't compile since we removed the bitstream reader. And avc doesn't belong in avi.
  • r998: Resolve quality regression in r996
    Accidentally removed the wrong line of code. I think this classifies as a "10l".
    Thanks to techouse for initial bug report and skystrife for helping me find it.
  • r997: Fix minor memory leak accidentally added with the addition of b-adapt 2
Version r996
  • Release Date: Oct 2, 2008
  • Rework subme system, add RD refinement in B-frames
    The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames.
    subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent
    --b-rdo has, accordingly, been removed. --bime has also been removed, and instead enabled automatically at subme >= 5.
    RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime.
Version r995
  • Release Date: Sep 29, 2008
  • Fix potential miscompilation of some inline asm
    Caused problems under some gcc 4.x versions with predictive lossless
Version r994
  • Release Date: Sep 29, 2008
  • r994: Replace High 4:4:4 profile lossless with High 4:4:4 Predictive.
    This improves lossless compression by about 4-25% depending on source.
    The benefit is generally higher for intra-only compression.
    Also add support for 8x8dct and i8x8 blocks in lossless mode; this improves compression very slightly.
    In some rare cases 8x8dct can hurt compression in lossless mode, but its usually helpful, albeit marginally.
    Note that 8x8dct is only available with CABAC as it is never useful with CAVLC.
    High 4:4:4 Predictive replaced the previous profile in a 2007 revision to the H.264 standard.
    The only known compliant decoder for this profile is the latest version of CoreAVC.
    As I write this, JM does not actually correctly decode this profile.
    Hopefully this lack of support will soon change with this commit, as x264 will be (to my knowledge) the first compliant encoder.
  • r993: Fix typo in progress indicator when using piped input
Version r992
  • Release Date: Sep 28, 2008
  • r992: avg_weight_ssse3
  • r991: fix bitstream writer on bigendian 64bit (regression in r903)
  • r990: remove authors whose code no longer exists
  • r989: more diagnostics when configure finds an unsuitable assembler
Version r988
  • Release Date: Sep 27, 2008
  • Make x264 progress indicator more concise
    Now the % indicator should be readable on the header of a minimized window on Windows systems.
Version full rev. 987
  • Release Date: Sep 23, 2008
  • Fix deblocking + threads + AQ bug
    At low QPs, with threads and deblocking on, deblocking could be improperly disabled.
    Revision in which this bug was introduced is unknown; it may be as old as b_variable_qp in x264 itself.
Version full rev. 986
  • Release Date: Sep 22, 2008
  • Resolve possible crash in bime, improve the fix in r985
Version full rev. 985
  • Release Date: Sep 21, 2008
  • r985: Fix rare crash issue in b-adapt
    Regression *probably* in r979
  • r984: Merging Holger's GSOC branch part 1: hpel_filter speedups
  • r983: r980 borked weighted bime
  • r982: Disable I_PCM with psy-RD
    psy-RD seems to put the PCM threshold a bit lower than it should be, so PCM is now disabled under psy-RD.
  • r981: Merge avg and avg_weight
    avg_weight no longer has to be special-cased in the code; faster weightb
  • r980: Rewrite avg/avg_weight to take two source pointers
    This allows the use of get_ref instead of mc_luma almost everywhere for bipred
Version full rev. 979
  • Release Date: Sep 17, 2008
  • r979: Use low-resolution lookahead motion vectors as an extra predictor
    Improves quality considerably (0-5%) in 1pass/CRF mode, especially with lower --me values and complex motion.
    Reverses the order of lowres lookahead search to improve the usefulness of the extra predictors.
  • r978: Add missing free() for f_qp_offset in frame.c
Version full rev. 977
  • Release Date: Sep 17, 2008
  • r977: Correct misprediction of bitrate in threaded mode
    Improves bitrate accuracy in cases with large numbers of threads.
    Loosely based on a patch by BugMaster.
  • r976: Fix a case in which VBV underflows can occur
    Fix a potential case where a frame might be initially allocated too low a QP, which would then have to be raised a low during row-based ratecontrol.
    In some cases, this could even produce VBV underflows in 2pass mode.
  • r975: Use correct format specifier for uint64_t
Version full rev. 974
  • Release Date: Sep 16, 2008
  • Correct misprediction of bitrate in threaded mode
    Improves bitrate accuracy in cases with large numbers of threads.
    Loosely based on a patch by BugMaster.
Version full rev. 973
  • Release Date: Sep 16, 2008
  • r973: Fix regression in b-adapt patch: encoder_open failed for multipass encodes without bframes.
  • r972: Stop SAR in y4m input from overriding --sar on commandline
  • r971: hadamard_ac for psy-rd
    c version is 1.7x faster than satd+sa8d+sad
    ssse3 version is 2.3x faster than satd+sa8d+sad
  • r970: Psychovisually optimized rate-distortion optimization and trellis
    The latter, psy-trellis, is disabled by default and is reserved as experimental; your mileage may vary.
    Default subme is raised to 6 so that psy RD is on by default.
Version full rev. 969
  • Release Date: Sep 15, 2008
  • Add optional more optimal B-frame decision method
    This method (--b-adapt 2) uses a Viterbi algorithm somewhat similar to that used in trellis quantization.
    Note that it is not fully optimized and is very slow with large --bframes values.
    It also takes into account weightb, which should improve fade detection.
    Additionally, changes were made to cache lowres intra results for each frame to avoid recalculating them. This should improve performance in both B-frame decision methods.
    This can also be done for motion vectors, which will dramatically improve b-adapt 2 performance when it is complete.
    This patch also reads b_adapt and scenecut settings from the first pass so that the x264 header information in the output file will have correct information (since frametype decision is only done on the first pass).
Version full rev. 968
  • Release Date: Sep 14, 2008
  • Move adaptive quantization to before ratecontrol, eliminate qcomp bias
    This change improves VBV accuracy and improves bit distribution in CRF and 2pass.
    Instead of being applied after ratecontrol, AQ becomes part of the complexity measure that ratecontrol uses.
    This allows for modularity for changes to AQ; a new AQ algorithm can be introduced simply by introducing a new aq_mode and a corresponding if in adaptive_quant_frame.
    This also allows quantizer field smoothing, since quantizers are calculated beofrehand rather during encoding.
    Since there is no more reason for it, aq_mode 1 is removed. The new mode 1 is in a sense a merger of the old modes 1 and 2.
    WARNING: This change redefines CRF when using AQ, so output bitrate for a given CRF may be significantly different from before this change!
Version full rev. 967
  • Release Date: Sep 10, 2008
  • r967: Fix crash when using b-adapt at resolutions 32x32 or below.
    Original patch by BugMaster, but was mostly rewritten in order to make b-adapt actually *work* at such resolutions, not merely stop crashing.
  • r966: Add title-bar progress indicator under WIN32
    Also add bitrate-so-far output when piping data to x264 (total frames not known)
    Patch mostly by recover from Doom9.
Version full rev. 965
  • Release Date: Sep 7, 2008
  • Revert part of r963
    In some rare (but significant) cases, the optimized nal_encode algorithm gave incorrect results.
Version full rev. 964
  • Release Date: Sep 6, 2008
  • r964: Predict 4x4_DC asm
    Also remove 5-year-old unnecessary #define that reduced speed unnecessarily under MSVC-compiled builds
  • r963: Faster NAL unit encoding and remove unused nal_decode
    Small speedup at very high bitrates
  • r962: CAVLC cleanup and optimizations
    Also move some small functions in macroblock.c to a .h file so they can be inlined.
  • r961: Faster avg_weight assembly
    Unrolling the loop a bit improves performance
  • r960: Faster H asm intra prediction functions
    Take advantage of the H prediction method invented for merged intra SAD and apply it to regular prediction, too.
  • r959: Add merged SAD for i16x16 analysis
    Roughly 30% faster i16x16 analysis under subme=1
  • r958: Add sad_aligned for faster subme=1 mbcmp
    Distinguish between unaligned and aligned uses of mbcmp
    SAD_aligned, for MMX SADs, uses non-cacheline SADs.
Version full rev. 957
  • Release Date: Sep 3, 2008
  • Improve progress indicator
    Show average bitrate so far during encoding
    Decrease update interval for longer encodes (max of 10 frames encoded between updates)
Version full rev. 956
  • Release Date: Sep 2, 2008
  • Fix speed regression in r951
    Row SATDs are only necessary in VBV mode, so don't need to be checked if VBV is off.
Version full rev. 955
  • Release Date: Sep 1, 2008
  • r955: zigzag asm
  • r954: fix SOFLAGS used when building gtk frontend
    patch by Markus Kanet %darkvision A gmx P eu%
Version full rev. 953
  • Release Date: Aug 30, 2008
  • r953: remove the distinction between itex and ptex
    (changes 2pass statsfile format)
  • r952: hardcode the ratecontrol equation, and remove the rceq option
  • r951: Fix some uses of uninitialized row_satd values in VBV
    Resolves some issues with QP51 in I-frames with scenecut
Version full rev. 950
  • Release Date: Aug 27, 2008
  • r950: Activate trellis in p8x8 qpel RD
    Also clean up macroblock.c with some refactoring
    Note that this change significantly reduces subme7+trellis2 performance, but improves quality.
    Issue originally reported by Alex_W.
  • r949: Improve VBV accuracy
    Don't use the previous frame's row SATD as a predictor if it is too different from this frame's row SATD.
Version full rev. 948
  • Release Date: Aug 23, 2008
  • improve generation of Darwin libraries
    Patch by vmrsss %vmrsss A gmail P com%
Version full rev. 947
  • Release Date: Aug 22, 2008
  • r947: Fix compilation in gcc 3.4.x (issue in r946)
    Due to a bug in gcc 3.4.x, in certain cases of inlining, the array_non_zero_int_mmx inline asssembly is miscompiled and causes a crash with --subme 7 --8x8dct.
    This minor hack fixes this issue.
  • r946: shut up various gcc warnings
  • r945: fix a crash with invalid args and --thread-input (introduced in r921
  • r944: drop support for x86_32 PIC.
  • r943: use permute macros in satd
    move some more shared macros to x264util.asm
Version full rev. 942
  • Release Date: Aug 21, 2008
  • r942: cosmetics
  • r941: r940 broke threads
  • r940: Cleanups in macroblock_cache_save/load
    A bit more loop unrolling, and moving some constant code to the global init function
  • r939: Deblocking code cleanup and cosmetics
    Convert the style of the deblocking code to the standard x264 style
    Eliminate some trailing whitespace
Version full rev. 938
  • Release Date: Aug 19, 2008
  • 4% faster deblock: special-case macroblock edges
    Along with a bit of related code reorganization and macroification
Version full rev. 937
  • Release Date: Aug 17, 2008
  • Add dedicated variance function instead of using SAD+SSD
    Faster variance calculation
Version full rev. 936
  • Release Date: Aug 16, 2008
  • r936: 6% faster deblock: remove some clips, earlier termiantion on low qps.
  • r935: Faster deblocking
    Early termination for bS=0, alpha=0, beta=0
    Refactoring, various other optimizations
    About 30% faster deblocking overall.
Version full rev. 934
  • Release Date: Aug 12, 2008
  • r934: asm cosmetics
  • r933: yet another posix-emulating define on solaris
  • r932: update msvc projectfile
  • r931: drop support for msvc6
Version full rev. 930
  • Release Date: Aug 10, 2008
  • r930: Prevent VBV from lowering quantizer too much
    This code seemed to act up unexpectedly sometimes, creating a situation where in 1-pass VBV mode, a frame's quantizer would drop all the way to qpmin and then shoot back upwards to qpmax, causing serious visual issues.
    This change may decrease bitrate in VBV mode, but that is preferable to the artifacting produced by this code.
  • r929: Improve subme7 at low QPs and add subme7 support in lossless mode
Version full rev. 928
  • Release Date: Jul 31, 2008
  • r928: cosmetics: merge x86inc*.asm
  • r927: Add missing x264util.asm
  • r926: Basic sanity checking of qpmax/qpmin options
  • r925: Fix regression in r922
    set the chroma DC coefficients to zero for residual coding in qpel-rd
    fix C99ism
  • r924: Refactor asm macros part 2: DCT
  • r923: Refactor asm macros part 1: DCT
Version full rev. 922
  • Release Date: Jul 30, 2008
  • r922: Improve intra RD refine, speed up residual_write_cabac
    a do/while loop can be used for residual_write, but i8x8 had to be fixed so that it wouldn't call residual_write with zero coeffs
    proper nnz handling added to cabac intra rd refine chroma cbp added to 8x8 chroma rd cbp was tested, but wasn't useful
  • r921: Fix a few more minor memleaks
Version full rev. 920
  • Release Date: Jul 26, 2008
  • r920: stats summary: print distribution of numbers of consecutive B-frames
  • r919: add interlacing to the list of stuff checked by x264_validate_levels
Version full rev. 918
  • Release Date: Jul 25, 2008
  • r918: Fix C99-ism in r907
  • r917: Faster temporal predictor calculation
    Split into a separate commit because this changes rounding, and thus changes output slightly.
  • r916: Align lowres planes for improved cacheline split performance
Version full rev. 915
  • Release Date: Jul 21, 2008
  • autodetect level based on resolution/bitrate/refs/etc, rather than defaulting to L5.1 if vbv is not enabled (and especially in crf/cqp), we have to guess max bitrate, so we might underestimate the required level.
Version full rev. 914
  • Release Date: Jul 19, 2008
  • fix bs_write_ue_big for values >= 0x10000.
    (no immediate effect, since nothing writes such values yet)
Version full rev. 913
  • Release Date: Jul 17, 2008
  • Fix lossless mode borked in r901
Version full rev. 912
  • Release Date: Jul 13, 2008
  • r912: Relax QPfile restrictions
    Allow a QPfile to contain fewer frames than the total number of frames in the video and have ratecontrol fill in the rest.
    Patch by kemuri9.
  • r911: Limit MVrange correctly in interlaced mode
    Bug report by Sigma Designs, Inc.
Version full rev. 910
  • Release Date: Jul 12, 2008
  • r910: Fix bug with PCM and adaptive quantization
    In rare cases CABAC desync could occur, causing bitstream corruption
  • r909: Fix memory leak upon x264 closing
    Doesn't affect the CLI, but potentially important for programs which call x264 as a shared library.
  • r908: Fix compilation on PPC systems (borked in r903)
    Bigendian systems didn't have endian_fix32 defined
  • r907: Add L1 reflist and B macroblock types to x264 info
    Also remove display of "PCM" if PCM mode is never used in the encode.
    L1 reflist information will only show if pyramid coding is used.
Version full rev. 906
  • Release Date: Jul 11, 2008
  • r906: Fix and enable I_PCM macroblock support
    In RD mode, always consider PCM as a macroblock mode possibility
    Fix bitstream writing for PCM blocks in CAVLC and CABAC, and a few other minor changes to make PCM work.
    PCM macroblocks improve compression at very low QPs (1-5) and in lossless mode.
  • r905: de-duplicate vlc tables
  • r904: faster ue/se/te write
  • r903: faster bs_write
  • r902: cosmetics in ssd asm
Version full rev. 901
  • Release Date: Jul 7, 2008
  • r901: Various optimizations and cosmetics
    Update AUTHORS file with Gabriel and me
    update XCHG macro to work correctly in if statements
    Add new lookup tables for block_idx and fdec/fenc addresses
    Slightly faster array_non_zero_count_mmx (patch by holger)
    Eliminate branch in analyse_intra
    Unroll loops in and clean up chroma encode
    Convert some for loops to do/while loops for speed improvement
    Do explicit write-combining on --me tesa mvsad_t struct Shrink --me esa zero[] array
    Speed up bime by reducing size of visited[][][] array
  • r900: Resolve floating point exception with frame_init_lowres mmx
    In some cases, the mmx version of frame_init_lowres could leave the FPU uninitialized for use in ratecontrol, resulting in floating point exceptions.
    Since frame_init_lowres is such a time-consuming function, an emms was just put at the end, since it costs almost nothing compared to the total time of frame_init_lowres.
Version full rev. 899
  • Release Date: Jul 5, 2008
  • Update my email address
Version full rev. 898
  • Release Date: Jul 4, 2008
  • Update file headers throughout x264
    Update "Authors" lists based on actual authorship; highest is most important
    Update copyright notices and remove old CVS tags from file headers
    Add file headers to GTK and other sections missing them Update FSF address
    Other header-related cosmetics
Version full rev. 897
  • Release Date: Jul 3, 2008
  • r897: denoise_dct asm
  • r896: cosmetics in permutation macros
    SWAP can now take mmregs directly, rather than just their numbers
Version full rev. 895
  • Release Date: Jul 3, 2008
  • r895: Fix bug in adaptive quantization
    In some cases adaptive quantization did not correctly calculate the variance.
    Bug reported by MasterNobody
  • r894: lowres_init asm
    rounding is changed for asm convenience. this makes the c version slower, but there's no way around that if all the implementations are to have the same results.
  • r893: Optimizations and cosmetics in macroblock.c
    If an i4x4 dct block has no coefficients, don't bother with dequant/zigzag/idct. Not useful for larger sizes because the odds of an empty block are much lower.
    Cosmetics in i16x16 to be more consistent with other similar functions.
    Add an SSD threshold for chroma in probe_skip to improve speed and minimize time spent on chroma skip analysis.
    Rename lambda arrays to lambda_tab for consistency.
Version full rev. 892
  • Release Date: Jun 30, 2008
  • some asm functions require aligned stack. disable these when compiling with msvc/icc.
Version full rev. 891
  • Release Date: Jun 25, 2008
  • r891: Move bitstream end check to macroblock level
  • r891: Additionally, instead of silently truncating the frame upon reaching the end of the buffer, reallocate a larger buffer instead.
  • r890: Convert NNZ to raster order and other optimizations
  • r890: Converting NNZ to raster order simplifies a lot of the load/store code and allows more use of write-combining.
  • r890: More use of write-combining throughout load/save code in common/macroblock.c
  • r890: GCC has aliasing issues in the case of stores to 8-bit heap-allocated arrays; dereferencing the pointer once avoids this problem and significantly increases performance.
  • r890: More manual loop unrolling and such.
  • r890: Move all packXtoY functions to macroblock.h so any function can use them.
  • r890: Add pack8to32.
  • r890: Minor optimizations to encoder/macroblock.c
Version full rev. 889
  • Release Date: Jun 19, 2008
  • r889: mc_chroma_sse2/ssse3
  • r888: checkasm --bench=function_name
  • r887: interleave psnr/ssim computation with reference frame filtering, to improve cache coherency
Version full rev. 886
  • Release Date: Jun 16, 2008
  • r886: Add more inline asm and a runtime check for MMXEXT support
    x264 will now terminate gracefully rather than SIGILL when run on a machine with no MMXEXT support.
    A configure option is now available to build x264 without assembly support for support on such old CPUs as the Pentium 2, K6, etc.
  • r885: Use aligned memcpy for x264_me_t struct and cosmetics
  • r884: Cosmetics and loop unrolling
    GCC is not very good at loop unrolling in cases where it can perform constant propagation, so the unrolling unfortunately has to be done manually.
Version full rev. 883
  • Release Date: Jun 13, 2008
  • r883: Fix regression in 64-bit in r882
    i_mvc needs to be 64-bit when used with a 64-bit memory pointer
  • r882: More tweaks to me.c
  • r882: Added inline MMX version of UMH's predictor difference test
  • r882: Various cosmetics throughout me.c
  • r882: Removed a C99-ism introduced in r878.
Version full rev. 881
  • Release Date: Jun 12, 2008
  • Fix regression in r736
    r736 added intra RD refinement to B-frames; however, it is possible for subme=7 to be used without b-rdo.
    This means intra RD isn't run, and therefore it is possible for intra chroma analysis to not have been run, since update_cache was never called for an intra block, and chroma ME is not required even at subme=7.
    r801, which removed a memset, made this worse because previously the chroma prediction mode was at least initialized to zero; now it was not initialized at all.
    Therefore, --no-chroma-me, --subme 7, and no --b-rdo had the potential to crash.
    This change restricts intra RD refinement to only be run when --b-rdo is enabled (sensible to begin with), thus preventing a crash in this case.
Version full rev. 880
  • Release Date: Jun 11, 2008
  • r880: Fix regression in r850
    Bug resulted in rare incorrect chroma encoding
  • r879: Cosmetics in VBV handling
  • r878: Tweaks and cosmetics in me.c
    Use write-combining for predictor checking and other tweaks.
Version full rev. 877
  • Release Date: Jun 8, 2008
  • r877: Partially inline trellis quantization
    Inlining trellis into the 4x4/8x8 trellis wrappers increases trellis speed by about 5-10% through constant propagation.
  • r876: Various cosmetic changes.
  • r875: avg_weight_sse2
  • r874: many changes to which asm functions are enabled on which cpus.
    with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that.
    some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that.
    some ssse3 instructions didn't become useful until Penryn, so yet another flag.
    disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx.
    enable some sse2 functions on Athlon64 that always were faster and we just didn't notice.
    remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast". don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D).
  • r873: enable ssse3 phadd satd on Penryn.
  • r872: benchmark most of the asm functions (checkasm --bench).
Version full rev. 871
  • Release Date: Jun 6, 2008
  • Cosmetic: fix C99-ism
Version full rev. 870
  • Release Date: Jun 6, 2008
  • Use a gaussian window for cplxblur
    Cplxblur was originally intended to use a gaussian window, but in its current form did not.This change provides a tiny improvement to 2pass ratecontrol.
Version full rev. 869
  • Release Date: Jun 4, 2008
  • r869: cosmetics
  • r868: nasm compatible NX stack
  • r867: CQP is incompatible with AQ
  • r866: memzero_aligned_mmx
  • r865: binmode stdin on mingw, not just msvc
  • r864: omit redundant mc after non-rdo dct size decision, and in b-direct rdo
  • r863: allow fractional CRF values with AQ.
  • r862: fix some uninitialized partitions in rdo
Version full rev. 861
  • Release Date: Jun 3, 2008
  • r861: 2-pass VBV support and improved VBV handling
    Dramatically improves 1-pass VBV ratecontrol (especially CBR) and provides support for VBV in 2-pass mode. This consists of a series of functions that attempts to find overflows and underflows in the VBV from the first-pass statsfile and fix them before encoding.
    1-pass VBV code partially by Dark Shikari.
  • r860: Fix noise reduction in threaded mode.
    Previously enabling noise reduction with threads had no effect.
    Note that this is not an optimal solution; each thread still tracks noise reducation separately (unlike in single-threaded mode).
Version full rev. 859
  • Release Date: May 21, 2008
  • r859: fix a crash on win32 with threads.
    r852 introduced an assumption in deblock that the stack is aligned.
  • r858: remove nasm version check. a feature check is all that's needed.
    silence stderr in yasm version check.
  • r857: cosmetics in cabac
  • r856: faster residual_write_cabac
  • r855: change DEBUG_DUMP_FRAME to run-time --dump-yuv
  • r854: x264_median_mv_mmxext
    this is the first non-runtime-detected use of mmxext, but it has to be inlined
  • r853: factor duplicated code out of deblock chroma mmx
  • r852: deblock_luma_intra_mmx
Version full rev. 851
  • Release Date: May 17, 2008
  • r851: write aspect ratio in mp4
  • r850: omit delta_quant in i16x16 blocks with no residual
    (all other block types were already covered, but i16x16 cbp is special)
  • r849: explicit write combining, because gcc fails at optimizing consecutive memory accesses
  • r848: force unroll macroblock_load_pic_pointers and a few other minor optimizations
  • r847: quant_2x2_dc_ssse3
  • r846: r836 borked lossless cabac nnz
Version full rev. 845
  • Release Date: May 15, 2008
  • r845: use elf instead of a.out on netbsd
  • r844: fix x264_realloc when not using libc realloc.
  • r843: don't pretend to support win64. remove all related code.
    it hasn't worked since probably some time in 2005, and won't ever be fixed unless someone steps up to maintain it.
  • r842: cosmetics: replace last instances of parm# asm macros with r#
  • r841: remove DEBUG_BENCHMARK
  • r840: faster probe_skip
Version full rev. 839
  • Release Date: Apr 28, 2008
  • r839: drop support for pre-SSE3 assemblers
  • r838: s/x264_cpu_restore/x264_emms/
    no point in giving it a generic name when it's not generic
  • r837: faster cabac_mb_cbp_luma
    ported from ffmpeg
  • r836: remove some redundant nnz counts
    move some nnz counts from macroblock_encode to cavlc if cabac doesn't need them
  • r835: compute missing nnz count in subme7 cavlc
  • r834: remove a division in macroblock-level bookkeeping
  • r833: omit P/B-skip mc from macroblock_encode if the pixels haven't been overwritten since probe_skip
  • r832: earlier termination in SEA if mvcost exceeds residual
  • r831: remove void* arithmetic from r821
Version full rev. 830
  • Release Date: Apr 26, 2008
  • r830: Fix define of illegal function identifiers (as defined in section "7.1.3 Reserved identiers" of C99 spec)
  • r829: Fix define of illegal identifier (as defined in section "7.1.3 Reserved identiers" of C99 spec) "__UNUSED__", and use the one defined in common/osdep.h, i.e. "UNUSED"
    based on a patch by Diego Biurrun
Version full rev. 828
  • Release Date: Apr 25, 2008
  • r828: more consistent include name (in line with other PPC includes
  • r827: fix illegal identifiers in multiple inclusion guards
    patch by Diego Biurrun % diego A biurrun P de %
Version full rev. 826
  • Release Date: Apr 22, 2008
  • r826: AQ now treats perfectly flat blocks as low energy, rather than retaining previous block's QP.
  • r826: fixes occasional blocking in fades.
  • r825: checkasm cabac
  • r824: s/movdqa/movaps/g
  • r823: --asm to allow testing of different versions of asm without recompile
  • r822: copy left neighbor pixels directly from previous mb instead of main plane
Version full rev. 821
  • Release Date: Apr 17, 2008
  • r821: cacheline split workaround for mc_luma
  • r820: add "SECTION_RODATA" before "SECTION .text" to setup the fakegot label used in macho binaries.
    This fixes compilation with --enable-pic
    Requires Yasm 0.7.0 or newer
    Patch by Dave Lee % davelee P com A gmail P com %
Version full rev. 819
  • Release Date: Apr 15, 2008
  • more hpel fixes
Version full rev. 818
  • Release Date: Apr 12, 2008
  • r818: update msvc projectfile
  • r817: r810 borked hpel_filter_sse2 on unaligned buffers
Version full rev. 816
  • Release Date: Apr 10, 2008
  • r816: threads=auto on multicore now implies thread input, just like explicit thread numbers already did
  • r815: dct4 sse2
  • r814: faster x86_32 dct8
  • r813: macros to deal with macros that permute their arguments
  • r812: mmx cachesplit sad of non-square sizes checked height instead of width
  • r811: sfence after nontemporal stores
  • r810: simplify hpel filter asm (move control flow to C) and add sse2, ssse3 versions
  • r809: more mmx/xmm macros (mova, movu, movh)
Version full rev. 808
  • Release Date: Apr 1, 2008
  • r808: improve handling of cavlc dct coef overflows support large coefs in high profile, and clip to allowed range in baseline/main
  • r807: fix shared libs on MacOSX
    based on a patch by İsmail Dönmez
  • r806: typo in r803
Version full rev. 805
  • Release Date: Mar 31, 2008
  • r805: fix a crash on mp4 muxing with invalid params
  • r804: variance-based psy adaptive quantization
  • r804: new options: --aq-mode --aq-strength
  • r804: AQ is enabled by default
  • r803: fix naming of .dll on mingw
  • r802: don't distinguish between mingw and cygwin
  • r801: remove a memset
  • r800: typo. don't evaluate rd pskip when p16x16 found ref>
  • r799: 0r784 borked lossless dc zigzag
Version full rev. 798
  • Release Date: Mar 26, 2008
  • r798: fix an arithmetic overflow that disabled SEA threshold after finding a mv with SAD < mvcost.
  • r797: fix hpel_filter_altivec picked up by checkasm
    Patch by Manuel %maaanuuu A gmx.net % and Noboru Asai % noboru P asai A gmail P com %
Version full rev. 796
  • Release Date: Mar 25, 2008
  • r796: faster residual
  • r795: nasm doesn't like align(nop) in structs
  • r794: reduce the size of some cabac arrays
  • r793: use cabac context transition table from trellis in normal residual coding too
  • r792: rearrange cabac struct to reduce code size
Version full rev. 791
  • Release Date: Mar 25, 2008
  • r791: higher precision RD lambda
    improves quality at QP<=12.
  • r790: faster cabac_encode_ue_bypass
  • r789: cabac asm.
    mostly because gcc refuses to use cmov.
    28% faster than c on core2, 11% on k8, 6% on p4.
  • r788: cosmetics in cabac
  • r787: inline cabac_size_decision
Version full rev. 786
  • Release Date: Mar 23, 2008
  • r786: cosmetics in DECLARE_ALIGNED
  • r785: don't distinguish between luma4x4 and luma4x4ac
  • r784: faster lossless zigzag
  • r783: more alignment
Version full rev. 782
  • Release Date: Mar 22, 2008
  • r782: add tesa and lossless to fprofile
  • r781: cosmetics in residual_write
  • r780: remove unused bitstream reader
  • r779: cosmetics in quant asm
  • r778: special case dequant for flat matrix
Version full rev. 777
  • Release Date: Mar 21, 2008
  • r777: faster dequant
  • r776: simplify hpel_filter_c
  • r775: use x264_mc_copy_w16_sse2 in mc.copy, it was previously only in mc_luma
  • r774: new ssd_8x*_sse2
  • r774: align ssd_16x*_sse2
  • r774: unroll ssd_4x*_mmx
  • r773: update altivec zigzags
  • r772: r768 borked cavlc
Version full rev. 771
  • Release Date: Mar 20, 2008
  • r771: cosmetics in intra predict
  • r770: faster intra predict 8x8 hu/hd
  • r769: reduce zigzag arrays from int to int16_t
  • r768: reduce the size of some arrays
  • r767: skip intra pred+dct+quant in cases where it's redundant (analyse vs encode)
    large speedup with trellis=2, small speedup with trellis=0 and/or subme>=6
  • r766: cosmetics in asm
  • r765: satd_4x4_ssse3
  • r764: get_ref_sse2
Version full rev. 763
  • Release Date: Mar 19, 2008
  • r763: continue instead of crash when the threading mv constraint is violated.
    doesn't fix the underlying bug, but hopefully less annoying until we find it.
  • r762: remove remaining reference to clip1.h
  • r761: fix name mangling again.
    apparently it's not just a convention, dll build fails if you try to export a non-prefixed name.
  • r760: update msvc projectfile
  • r759: missing #ifdef HAVE_SSE3
  • r758: don't define offsetof since it's standard
  • r757: shut up gcc warning in offsetof
Version full rev. 756
  • Release Date: Mar 17, 2008
  • r756: increase alignment of mv arrays
  • r755: memcpy_aligned_sse2
  • r754: checkasm check whether callee-saved regs are correctly saved
    x86_32 only for now since x86_64 varargs are annoying
  • r753: fix x86_32 ads which failed to preserve a register
  • r752: fix some name mangling issues introduced by the merge
  • r751: remove x264_mc_clip1.
    it's wrong for sufficiently perverse inputs, and clip_uint8 is faster anyway.
  • r750: merge x86_32 and x86_64 asm, with macros to abstract calling convention and register names
Version full rev. 749
  • Release Date: Mar 12, 2008
  • git compatible version script
Version full rev. 748
  • Release Date: Mar 8, 2008
  • check for broken versions of yasm
Version full rev. 747
  • Release Date: Mar 7, 2008
  • Rev. 746: .gitignore
  • Rev. 747: increase the alignment of the i8x8 edge cache, needed for sse2 intra prediction.
    patch by Alexander Strange.
Version full rev. 745
  • Release Date: Mar 2, 2008
  • Rev. 745: pic macros now keep track of which register holds the GOT, so variable access doesn't have to care
  • Rev. 744: remove x86_64 predict_8x8_ddl_mmxext because sse2 is faster even on amd
  • Rev. 743: cosmetics in dsp init
  • Rev. 742: sse2 16x16 intra pred.
  • Rev. 742: port the remaining intra pred functions from x86_64 to x86_32.
    patch by Dark Shikari.
  • Rev. 742: some simplifications to mmx intra pred that should have been done way back when we switched to constant fdec_stride.
  • Rev. 742: and remove pic spills in functions that have a free caller-saved reg.
    patch partly by Dark Shikari.
  • Rev. 740: faster array_non_zero
  • Rev. 739: x86_32 sse2 idct8
    ported from ffmpeg by Dark Shikari
  • Rev. 738: checkasm: relax the threshold for floating-point ssim
  • Rev. 737: checkasm: test idct with the range of coefficients what can really be encountered, as opposed to random numbers which might overflow.
Version full rev. 736
  • Release Date: Jan 29, 2008
  • intra_rd_refine in B-frames
Version full rev. 735
  • Release Date: Jan 28, 2008
  • print average of macroblock QPs instead of frame's nominal QP
  • update date
  • remove colorspace conversion support, because it has no business in any codec
  • misc fixes in checkasm
  • remove a useless bit of me=umh (originally copied from JM, where it was used for something)
  • fix a memleak in cqm
  • fix a memleak in mkv muxer
    patch by saintdev
  • satd exhaustive motion search (--me tesa)
  • fix cabac context for nonzero delta_qp of the 2nd mb of a frame in interlaced mode
  • fix mapping of mvs to partitions in p4x4_chroma
    patch by Noboru Asai
  • fix mvp for b16x8 and b8x16 L1 search
    patch by Wei-Yin Chen
  • shave a couple cycles off cabac functions
  • faster and smaller x264_macroblock_cache_mv etc
  • configure test for endianness
Version full rev. 721
  • Release Date: Jan 18, 2008
  • change the meaning of --ref: it now selects DPB size (including B-frames), rather than L0 size (which B-frames are added to)
Version full rev. 720
  • Release Date: Jan 15, 2008
  • add / fix support for FreeBSD, based on a patch by Igor Mozolevsky % igor A hybrid-lab P co P uk %
Version full rev. 719
  • Release Date: Jan 10, 2008
  • shut up some valgrind warnings
  • slightly wrong memory allocation in r717, fixes a potential crash with merange>32
Version full rev. 717
  • Release Date: Jan 7, 2008
  • convert absolute difference of sums from mmx to sse2
  • convert mv bits cost and ads threshold from C to sse2
  • convert bytemask-to-list from C to scalar asm
    1.6x faster me=esa (x86_64) or 1.3x faster (x86_32).
    (times consider only motion estimation. overall encode speedup may vary.)
  • round esa range to a multiple of 4
Version full rev. 715
  • Release Date: Jan 4, 2008
  • use define _WIN32 instead of __WIN32__ or WIN32 defines.
    NSDN reference: http://msdn2.microsoft.com/en-us/library/b0084kay(VS.80).aspx
    Patch by BugMaster %BugMaster A narod P ru%
    Original thread:
    date: Dec 27, 2007 3:18 AM
    subject: [x264-devel] VS2008 compilation error (need of replacement __WIN32__ with _WIN32)
Version full rev. 714
  • Release Date: Dec 21, 2007
  • tweak x264_pixel_sad_x4_16x16_sse2 horizontal sum. 168 -> 166 cycles on core2.
Version full rev. 713
  • Release Date: Dec 21, 2007
  • fix a nondeterminism involving 8x8dct, rdo, and threads.
Version full rev. 712
  • Release Date: Dec 14, 2007
  • also test arch-specific x264_zigzag_* implementations in checkasm.c
    patch by Patch by Noboru Asai % noboru P asai A gmail P com%
Version full rev. 711
  • Release Date: Dec 11, 2007
  • Add AltiVec implementation of
    • x264_zigzag_scan_4x4_frame_altivec()
    • x264_zigzag_scan_4x4ac_frame_altivec()
    • x264_zigzag_scan_4x4_field_altivec()
    • x264_zigzag_scan_4x4ac_field_altivec()
    each around 1.3 tp 1.8x faster than C version
    Patch by Noboru Asai % noboru P asai A gmail P com%
Version full rev. 710
  • Release Date: Dec 10, 2007
  • adds AliVec implementation of predict_16x16_p() over 4x faster than C version
Version full rev. 709
  • Release Date: Dec 7, 2007
  • revert the x86_32 part of r708. elf shared libraries aren't important enough to be worth the extra lines of code to check for nasm.
Version full rev. 708
  • Release Date: Dec 4, 2007
  • Rev. 708: mark asm functions as hidden
  • Rev. 707: check whether ld supports -Bsymbolic before using it
Version full rev. 706
  • Release Date: Dec 3, 2007
  • reduce the data type used in some tables. 16KB smaller exe.
Version full rev. 705
  • Release Date: Dec 2, 2007
  • Rev. 705: faster removal of duplicate mv predictors
  • Rev. 704: avoid a division in x264_mb_predict_mv_ref16x16.
    patch by Dark Shikari.
  • Rev. 703: avoid a division in umh.
    patch by Dark Shikari.
Version full rev. 702
  • Release Date: Nov 27, 2007
  • fix a memleak in h->mb.mvr
Version full rev. 701
  • Release Date: Nov 26, 2007
  • fix compilation as a shared library on x86_64 (regression in r696)
Version full rev. 700
  • Release Date: Nov 22, 2007
  • Rev. 700: add support for x86_64 on Darwin9.0 (Mac OS X 10.5, aka Leopard)
    Patch by Antoine Gerschenfeld %gerschen A clipper P ens P fr%
  • Rev. 699: cover some more options in fprofile. (esa, bime, cqm, nr, no-dct-decimate, trellis2) previously, esa was slower with fprofile than without, since gcc thought it wasn't important. now esa benefits like anything else.
Version full rev. 698
  • Release Date: Nov 21, 2007
  • Rev. 698: Add AltiVec implementation of x264_pixel_ssd_8x8, 3x faster than C version
    Overall speed-up: 0.7% with --bframes 3 --ref 5 -m 7 --b-rdo
    Patch by Noboru Asai %noboru P asai A gmail P com%
  • Rev. 697: limit mvs to [-512,511.75] instead of [-512,512]
  • Rev. 696: avoid memory loads that span the border between two cachelines.
    on core2 this makes x264_pixel_sad an average of 2x faster. other intel cpus gain various amounts. amd are unaffected.
    overall speedup: 1-10%, depending on how much time is spent in fullpel motion estimation.
  • Rev. 695: add cache info to cpu_detect. also print sse3.
Version full rev. 694
  • Release Date: Nov 20, 2007
  • Rev. 694: cosmetics: reorder mc_luma/mc_chroma/get_ref arguments for consistency with other functions
  • Rev. 693: separate pixel_avg into cases for mc and for bipred
Version full rev. 692
  • Release Date: Nov 19, 2007
  • Rev. 692: add AltiVec implementation of ssim_4x4x2_core, about 4x faster than C version.
    Overall: 0.1-0.2% faster with default encoding settings
    Patch by Noboru Asai %noboru P asai A gmail P com%
  • Rev. 691: Add AltiVec implementation ofx264_hpel_filter. Provides a 10-11% overall speed-up with default encoding options
    Patch by Noboru Asai %noboru P asai A gmail P com
Version full rev. 690
  • Release Date: Nov 18, 2007
  • cosmetics in dsp function selection
Version full rev. 689
  • Release Date: Nov 18, 2007
  • remove sad_pde. it's been unused ever since successive elimination replaced it.
Version full rev. 688
  • Release Date: Nov 17, 2007
  • Rev. 688: cosmetics: use symbolic constants for frame padding radius
  • Rev. 687: move hpel_filter cpu detection to a function pointer like everything else
Version full rev. 686
  • Release Date: Nov 16, 2007
  • cosmetics: use separate variables for frame width and stride
Version full rev. 685
  • Release Date: Nov 14, 2007
  • rev. 685: Add AltiVec implementation of add4x4_idct, add8x8_idct, add16x16_idct, 3.2x faster on average 1.05x faster overall with default encoding options
    Patch by Noboru Asai % noboru DD asai AA gmail DD com %
  • rev. 684: add AltiVec implementation of dequant_4x4 and dequant_8x8, 2.8x faster than C, 1.01x faster than previous revision with default encoding options
    Patch by Noboru Asai % noboru DD asai AA gmail DD com %
Version full rev. 683
  • Release Date: Nov 13, 2007
  • Add AltiVec implementation of quant_2x2_dc, fix Altivec implementation of quant_(4x4|8x8)(|_dc) wrt current C implementation
    Patch by Noboru Asai % noboru DD asai AA gmail DD com %
Version full rev. 682
  • Release Date: Nov 2, 2007
  • fix a possible nondeterminism with me=umh + threads.
Version full rev. 681
  • Release Date: Oct 31, 2007
  • use hex instead of dia for rdo mv refinement. ~0.5% lower bitrate at subme=7.
    patch by Dark Shikari.
Version full rev. 680
  • Release Date: Sep 25, 2007
  • port sad_*_x3_sse2 to x86_64
  • don't overwrite pthread* namespace, because system headers might define those functions even if we don't want them
Version full rev. 678
  • Release Date: Sep 22, 2007
  • faster 4x4 sad
Version full rev. 677
  • Release Date: Sep 21, 2007
  • fix an arithmetic overflow in trellis at high qp.
Version full rev. 676
  • Release Date: Sep 16, 2007
  • implement multithreaded me=esa
Version full rev. 675
  • Release Date: Sep 13, 2007
  • fix some integer overflows. now vbv size can exceed 2 Gbit.
Version full rev. 674
  • Release Date: Sep 10, 2007
  • allow --vbv-init to take absolute values (in kbit), in addition to the previous fractions of vbv-bufsize.
Version full rev. 673
  • Release Date: Sep 9, 2007
  • remove a bashism
Version full rev. 672
  • Release Date: Sep 4, 2007
  • reorder headers so that largefile support is defined before the first copy of stdio
Version full rev. 671
  • Release Date: Aug 21, 2007
  • regression in r669: broke saving of configure args if make has to re-run configure
Version full rev. 670
  • Release Date: Aug 18, 2007
  • regression in r669: --enable-shared should imply --enable-pic on some archs.
Version full rev. 669
  • Release Date: Aug 14, 2007
  • Add a --host flag to allow overriding config.guess; this is particularly useful with a 64-bits kernel running a 32-bits userland to build 32-bits apps.
  • Normalize any host triplet into a quadruplet via config.sub.
  • Move option parsing before any use of architecture information.
  • Update config.guess.
Version full rev. 667
  • Release Date: Jul 18, 2007
  • mingw doesn't have strtok_r
  • move os/compiler specific defines to their own header
  • extend zones to support (some) encoding parameters in addition to ratecontrol.
Version full rev. 664
  • Release Date: Jul 7, 2007
  • cosmetics
Version full rev. 663
  • Release Date: Jun 29, 2007
  • limit vertical motion vectors to +/-512, since some decoders actually depend on that limit.
Version full rev. 662
  • Release Date: Jun 23, 2007
  • Add vertical and horizontal luma deblocking accelerated with Altivec, based on Graham Booker's code written for FFmpeg with slight modifications to re-use x264's macros
Version full rev. 661
  • Release Date: Jun 16, 2007
  • cosmetics in cpu detection
  • fix compilation without asm on x86_32 (r658 worked only on x86_64).
Version full rev. 659
  • Release Date: Jun 11, 2007
  • exempt 1080p from the non-mod16 warning
Version full rev. 658
  • Release Date: Jun 6, 2007
  • allow compiling without yasm/nasm on x86 and x86-64 platforms
  • updated MS VC8/VC7 build, patch by Gabriel Bouvigne
Version full rev. 656
  • Release Date: May 26, 2007
  • replace alloca with malloc everywhere. per manpage, use of alloca is discouraged. this may have a minor effect on the speed of ssim and esa, but that appears too small to measure.
Version full rev. 655
  • Release Date: May 3, 2007
  • require a ratecontrol method to be specified, it no longer defaults to cqp=26.
Version full rev. 654
  • Release Date: Apr 23, 2007
  • fix nnz computation in cavlc+8x8dct+deblock. (regression in r607)
  • fix the computation of bits used for vbv. (regression in r651)
Version full rev. 652
  • Release Date: Apr 22, 2007
  • c89 compile fix
Version full rev. 651
  • Release Date: Apr 22, 2007
  • cabac: use bytestream instead of bitstream. 35% faster cabac, 20% faster overall lossless, ~1% faster overall at normal bitrates.
Version full rev. 650
  • Release Date: Apr 13, 2007
  • remove the restriction on number of threads as a function of resolution (it was wrong anyway in the presence of B-frames), and raise the max number of threads in general (though more will have to be done before it can really scale to lots of cores).
Version full rev. 649
  • Release Date: Apr 11, 2007
  • tweak ssse3 quant
Version full rev. 648
  • Release Date: Apr 8, 2007
  • change some tables from int to int8_t. 13KB smaller executable.
  • faster cabac rdo. up to 10% faster at q0, but negligible at normal bitrates.
  • workaround gcc's inability to align variables on the stack.
    this crash was introduced in r642, but only because previous versions didn't use sse2 on the stack.
Version full rev. 645
  • Release Date: Apr 6, 2007
  • 32bit version of ssse3 satd. switch default assembler to yasm. it will still fallback to nasm if you don't have yasm.
  • simplify trellis
  • fix an arithmetic overflow in trellis with QP >= 42
  • 2x faster quant. 2% overall.
    side effects:
    not bit-identical to the previous algorithm. while the new algorithm covers a wider range of cqms than the previous one did, I couldn't find a good way to fallback to a general version for the extreme cqms. so now it refuses to encode extreme cqms instead of just being slower. lays a framework for custom deadzone matrices, though I didn't add an api.
  • when encoding with a cqm, probe_skip now also uses the cqm, instead of the flat matrix
Version full rev. 640
  • Release Date: Apr 4, 2007
  • cosmetics in asm macros
  • use only c-style comments in public header (patch by Vincent Torres
Version full rev. 638
  • Release Date: Apr 3, 2007
  • in hpel search, merge two 16x16 mc calls into one 16x17. 15% faster hpel, .3% overall.
  • Compile fix
Version full rev. 636
  • Release Date: Apr 1, 2007
  • remove private stuff from public headers. no more need for -D__X264__
Version full rev. 635
  • Release Date: Mar 25, 2007
  • adjust bitstream buffer sizes for very large frames
Version full rev. 634
  • Release Date: Mar 15, 2007
  • rev. 634: conflate HAVE_MMXEXT with HAVE_SSE2, since they were never used distinctly.
  • rev. 633: Made -DNEED_ALTIVEC unnecessary, thanks to Guillaume Poirier.
  • rev. 632: check x264_cpu_detect() before calling AltiVec functions.
  • rev. 631: ssse3 detection. x86_64 ssse3 satd and quant.
  • rev. 631: requires yasm >= 0.6.0
  • rev. 630: Use -maltivec when building dependencies, or cannot be used.
  • rev. 630: Do not declare vectors in non-AltiVec files.
  • rev. 629: common/cpu.c: runtime AltiVec autodetection on Linux.
  • rev. 629: configure, Makefile: do not build the whole project with -maltivec because it generates AltiVec code in weird places.
Version full rev. 628
  • Release Date: Mar 6, 2007
  • fix a small memleak. patch by Limin Wang.
Version full rev. 627
  • Release Date: Mar 4, 2007
  • compile fix for GCC-3.3 on OSX, based on a patch by Patrice Bensoussan % patrice P bensoussan A free P fr% Note: regression test still do not pass with GCC-3.3, but they never did as far as I can remember.
  • cosmetics in regression test
  • regression testing, run similar to fprofiled: VIDS='vid_720x480.yuv' make test
Version full rev. 624
  • Release Date: Mar 1, 2007
  • add ability to generate doxygen documentation; make dox
Version full rev. 623
  • Release Date: Feb 23, 2007
  • oops, scenecut detection failed to activate when using threads and not using B-frames
Version full rev. 622
  • Release Date: Jan 30, 2007
  • extras/getopt.c was BSD licensed. replace with a LGPL version (from glibc).
Version full rev. 621
  • Release Date: Jan 26, 2007
  • Fix build issues on Linux. Only gcc-4.x is supported, as on OSX.
  • Cleans up a few inconsistencies in the code too.
Version full rev. 620
  • Release Date: Jan 22, 2007
  • tweak block_residual_write_cavlc.
  • up to 1% faster lossless, no difference at normal bitrates.
Version full rev. 619
  • Release Date: Jan 21, 2007
  • don't assume int is exactly 4 bytes
Version full rev. 618
  • Release Date: Jan 12, 2007
  • make array_non_zero() compatible with -fstrict-aliasing
Version full rev. 617
  • Release Date: Jan 10, 2007
  • Honor CFLAGS and LDFLAGS set by the user
Version full rev. 616
  • Release Date: Jan 3, 2007
  • Check whether 'echo -n' works, otherwise try printf (fixes build on current OS X 10.5)
Version full rev. 615
  • Release Date: Jan 2, 2007
  • Check version of nasm on OS X / Intel
Version full rev. 614
  • Release Date: Dec 21, 2006
  • wrong reference frames were used with refs>=14 + pyramid (regression in r607)
  • enable thread synchronization primitives on linux too
Version full rev. 612
  • Release Date: Dec 20, 2006
  • fix a crash with x264_encoder_headers() + threads
Version full rev. 611
  • Release Date: Dec 16, 2006
  • don't skip autodection on configure --enable-pthread
  • more win32threads -> pthreads
  • cosmetics: rename list operators to be consistent with Perl, and move them to common/
  • win32: use pthreads instead of win32threads. for some reason, pthreads is much faster.
  • New threading method:
    Encode multiple frames in prallel instead of dividing each frame into slices. Improves speed, and reduces the bitrate penalty of threading.

    Side effects: It is no longer possible to re-encode a frame, so threaded scenecut detection must run in the pre-me pass, which is faster but less precise.
    It is now useful to use more threads than you have cpus. --threads=auto has been updated to use cpus*1.5.
    Minor changes to ratecontrol.
  • New options: --pre-scenecut, --mvrange-thread, --non-deterministic
Version full rev. 606
  • Release Date: Dec 13, 2006
  • Do not assume anything about sizeof(cpu_set_t).
Version full rev. 605
  • Release Date: Dec 12, 2006
  • Add support for kFreeBSD (FreeBSD kernel with GNU userland).
Version full rev. 604
  • Release Date: Nov 28, 2006
  • Add Altivec implementations of add8x8_idct8, add16x16_idct8, sa8d_8x8 and sa8d_16x16
    Note: doesn't take advantage of some possible aligned memory accesses, so there's still room for improvement
Version full rev. 603
  • Release Date: Nov 26, 2006
  • Force alignment of the fake .rodata on MacIntel
Version full rev. 602
  • Release Date: Nov 23, 2006
  • don't treat vbv_maxrate as a minrate too if it's higher than target average bitrate.
Version full rev. 601
  • Release Date: Nov 19, 2006
  • Merges Guillaume Poirier's AltiVec changes:
    • Adds optimized quant and sub*dct8 routines
    • Faster sub*dct routines
  • ~8% overall speed-up with default settings
Version full rev. 600
  • Release Date: Nov 7, 2006
  • 10% faster deblock mmx functions. ported from ffmpeg.
  • checkasm: ignore insignificant differences in floating-point ssim
Version full rev. 598
  • Release Date: Oct 31, 2006
  • display final ratefactor in abr when a loose vbv is applied. (still disabled in true cbr)
Version full rev. 597
  • Release Date: Oct 30, 2006
  • fix parsing of --deblock %d,%d (beta was ignored)
  • compute chroma_qp only once per mb
Version full rev. 595
  • Release Date: Oct 29, 2006
  • rd refinement of intra chroma direction (enabled in --subme 7)
    patch by Alex Wright.
Version full rev. 594
  • Release Date: Oct 19, 2006
  • fix a crash in avc2avi
Version full rev. 593
  • Release Date: Oct 17, 2006
  • skip deblocking and motion interpolation when using only I-frames
Version full rev. 592
  • Release Date: Oct 14, 2006
  • cosmetics
  • allow fractional values of crf
Version full rev. 590
  • Release Date: Oct 11, 2006
  • prefetch pixels for motion compensation and deblocking.
  • fix a crash on interlace + >8 reference frames
  • no more decoder. it never worked anyway, and the presence of defunct code was confusing people.
Version full rev. 587
  • Release Date: Oct 10, 2006
  • compute pskip_mv only once per macroblock, and store it
  • slightly faster chroma_mc_mmx
  • missing emms in plane_copy_mmx
Version full rev. 584
  • Release Date: Oct 7, 2006
  • merge center_filter_mmx with horizontal_filter_mmx
  • 1.5x faster center_filter_mmx (amd64)
Version full rev. 582
  • Release Date: Oct 6, 2006
  • mmx/prefetch implementation of plane_copy
  • no more vfw
  • gtk fixes:
    • in Makefile
      • fix datadir for mingw users
      • remove the shared lib during the clean rule
      • use $(ENCODE_BIN) instead of x264_gtk_encode
      • add some $(DESTDIR) and create some directories when necessary
      • remove -lintl
    • statfile_length -> statsfile_length
    • fix the "sensitivity" of the widget of update_statfile
    • the logo is now handled correctly on windows
  • added: beginning of multipass support
  • patch by Vincent Torri.
Version full rev. 579
  • Release Date: Oct 5, 2006
  • accept mencoder's option names as synonyms (api only, not in x264cli)
Version full rev. 578
  • Release Date: Oct 3, 2006
  • simplify satd_sse2
  • better error checking in x264_param_parse.
  • add synonyms for a few options.
Version full rev. 576
  • Release Date: Oct 2, 2006
  • fix some strides that weren't a multiple of 16.
  • tweak motion compensation amd64 asm. 0.3% overall speedup.
  • strip local symbols from asm .o files, since they confuse oprofile
  • add an option to control direct_8x8_inference_flag, default to enabled.
  • slightly faster encoding and decoding of p4x4 + B-frames, and is needed for strict Levels compliance.
Version full rev. 572
  • Release Date: Oct 1, 2006
  • allow custom deadzones for non-trellis quantization. patch by Alex Wright.
  • move zigzag scan functions to dsp function pointers.
  • mmx implementation of interlaced zigzag.
Version full rev. 570
  • Release Date: Oct 1, 2006
  • support interlace. uses MBAFF syntax, but is not adaptive yet.
Version full rev. 569
  • Release Date: Sep 28, 2006
  • allow --zones in cqp encodes
Version full rev. 568
  • Release Date: Sep 27, 2006
  • cli: fix some typos in vui parameters from r542. patch by Foxy Shadis.
Version full rev. 567
  • Release Date: Sep 26, 2006
  • Add an "all" rule to the Makefile. Ideally "default" should be renamed, but I don't want to break existing scripts.
Version full rev. 566
  • Release Date: Sep 25, 2006
  • workaround: on some systems, alloca() isn't aligned
Version full rev. 565
  • Release Date: Sep 23, 2006
  • missing picpop
Version full rev. 564
  • Release Date: Sep 14, 2006
  • fix a buffer overread from r540
Version full rev. 563
  • Release Date: Sep 13, 2006
  • cosmetics (spelling)
  • faster ESA
Version full rev. 560
  • Release Date: Sep 11, 2006
  • Use the autotool's config.guess script instead of uname to check the system and CPU types, to avoid issues when using for instance a 32-bit userland on top of a 64-bit kernel.
  • Add the autotool's config.guess script so that we can use it instead of uname in the configure script.
Version full rev. 558
  • Release Date: Aug 23, 2006
  • 10l in r553
Version full rev. 557
  • Release Date: Aug 21, 2006
  • ssim broke on amd64 w/ pic.
Version full rev. 556
  • Release Date: Aug 19, 2006
  • MSVC compatibility fix from Haali
Version full rev. 555
  • Release Date: Aug 18, 2006
  • support changing some more parameters in x264_encoder_reconfig()
  • SSIM computation. (default on, disable by --no-ssim)
Version full rev. 553
  • Release Date: Aug 17, 2006
  • configure: --enable-debug reduces optimization to -O1
  • cosmetics
Version full rev. 551
  • Release Date: Aug 4, 2006
  • gcc -fprofile-generate isn't threadsafe
  • cli: move some options from --help to --longhelp
  • cli: don't try to get resolution from filename unless input is rawyuv
  • r542 broke --visualize
Version full rev. 547
  • Release Date: Aug 3, 2006
  • Nicer OS X x264_cpu_num_processors (thanks David)
  • Support OS X and BeOS in x264_cpu_num_processors
  • Fixes contexts allocation with threads=auto
  • select initial qp for abr and cbr baased on satd and bitrate, rather than cq24.
  • --threads=auto to detect number of cpus
  • api addition: x264_param_parse() to set options by name
  • fix a rare NaN in ratecontrol
  • move quant_mf[] from x264_t to the heap, and merge duplicate entries
  • GTK update. patch by Vincent Torri.
    fixed: cleaning of Makefile time elapsed seems broken ('total time' label replaced by 'time remaining') text entries of the status window are now not editable added: compilation from x264/ (add --enable-gtk option to configure) shared lib creation if --enable-shared is passed to configure x264gtk.pc --b-rdo, --no-dct-decimate
  • new option: --qpfile forces frames types and QPs. (intended for ratecontrol experiments, not for real encodes)
Version full rev. 537
  • Release Date: Jul 18, 2006
  • api change: select ratecontrol method with an enum (param.rc.i_rc_method) instead of a bunch of booleans.
Version full rev. 536
  • Release Date: Jul 17, 2006
  • slightly faster mmx dct
  • OpenBSD build fixes.
  • patch by Vizeli Pascal (pvizeli at yahoo dot de)
Version full rev. 534
  • Release Date: Jul 9, 2006
  • mc_chroma width2 mmx
Version full rev. 533
  • Release Date: Jun 29, 2006
  • make libx264.so symlink relative
Version full rev. 532
  • Release Date: Jun 13, 2006
  • added:
    • direct=auto
    • no-fast-pskip
    • vbv
    • cqm
    • tooltips (without descriptions yet)
    • translations
    • `make clean` for .exe
    • when file exists, ask for override
  • fixes:
    • debug level bug
    • bitrate slider bug
    • mixed-refs can be set only if ref>1
    • i8x8 can be set only if 8x8 transform is enabled
    • # of threads capped at 4
    • fourcc can't be removed
    • cosmetics
Version full rev. 531
  • Release Date: Jun 1, 2006
  • vfw installer: tweak nsis compression. patch by Francesco Corriga.
Version full rev. 530
  • Release Date: May 31, 2006
  • Fixed typo that caused x264_encoder_open to always fail
Version full rev. 529
  • Release Date: May 30, 2006
  • check some mallocs' return value
  • make -> $(MAKE)
Version full rev. 527
  • Release Date: May 24, 2006
  • convert non-fatal errors to message level "warning".
Version full rev. 526
  • Release Date: May 23, 2006
  • fix a memory alignment. (no effect on x86, but might be needed for other simd)
Version full rev. 525
  • Release Date: May 21, 2006
  • when using DEBUG_DUMP_FRAME, write decoded pictures in display order. patch by Loic Le Loarer.
  • non-referenced B-frames should have the same frame_num as the following ref frame, not the previous. patch by Loic Le Loarer.
Version full rev. 523
  • Release Date: May 12, 2006
  • set the SPS constraint_set[01]_flag based on the profile in use, just in case some decoder cares
Version full rev. 522
  • Release Date: May 11, 2006
  • msvc doesn't like C99 named array initializers
  • allow sar=1/1.
    patch by Loic Le Loarer.
  • faster intra search: filter i8x8 edges only once, and reuse for multiple predictions.
Version full rev. 519
  • Release Date: May 10, 2006
  • faster intra search: some prediction modes don't have to compute a full hadamard transform. x86 and amd64 asm.
Version full rev. 518
  • Release Date: May 8, 2006
  • --sps-id, to allow concatenating streams with different settings.
Version full rev. 517
  • Release Date: May 4, 2006
  • typo in expand_border_mod16
Version full rev. 516
  • Release Date: Apr 30, 2006
  • typo impaired 2pass bitrate prediction.
Version full rev. 515
  • Release Date: Apr 29, 2006
  • Let the user choose the compiler with "CC=xxx ./configure"
Version full rev. 514
  • Release Date: Apr 29, 2006
  • More vector types fixes for gcc 3.3
Version full rev. 513
  • Release Date: Apr 29, 2006
  • More vector casts to try and make compilers happier
Version full rev. 512
  • Release Date: Apr 25, 2006
  • Use sa8d instead of satd for i8x8 search. +.01 dB, -.5% speed
Version full rev. 511
  • Release Date: Apr 25, 2006
  • Before evaluating the RD score of any mode, check satd and abort if it's much worse than some other mode.
  • Also apply more early termination to intra search. speed at -m1:+1%, -m4:+3%, -m6:+8%, -m7:+20%
Version full rev. 510
  • Release Date: Apr 25, 2006
  • common/ppc/pixel.c: fixed illegal implicit casts of vector types
Version full rev. 509
  • Release Date: Apr 25, 2006
  • Added %$#@#$! support for #@%$!#@ armv4l CPU.
Version full rev. 508
  • Release Date: Apr 24, 2006
  • When evaluating predictors to start fullpel motion search, use subpel positions instead of rounding to fullpel. about +.02 dB, -1.6% speed at subme>=3 patch by Alex Wright.
Version full rev. 507
  • Release Date: Apr 24, 2006
  • mmx implementation of x264_pixel_sa8d
Version full rev. 506
  • Release Date: Apr 21, 2006
  • 10l in r463 (q0 i16x16 dc was permuted)
Version full rev. 505
  • Release Date: Apr 20, 2006
  • typo in r504
Version full rev. 504
  • Release Date: Apr 20, 2006
  • update msvc project files.
  • patch by anonymous.
Version full rev. 503
  • Release Date: Apr 19, 2006
  • Before, we eliminated dct blocks containing only a small single coefficient. Now that behavior is optional, by --no-dct-decimate. based on a patch by Alex Wright.
Version full rev. 502
  • Release Date: Apr 17, 2006
  • Enables more agressive optimizations (-fastf -mcpu=G4) on OS X.
  • Adds AltiVec interleaved SAD and SSD16x16.
  • Overall speedup up to 20%.
Version full rev. 501
  • Release Date: Apr 17, 2006
  • faster cabac_encode_bypass
Version full rev. 500
  • Release Date: Apr 16, 2006
  • restored AltiVec dct
Version full rev. 499
  • Release Date: Apr 16, 2006
  • more AltiVec mc, ~4.5% overall speedup
Version full rev. 498
  • Release Date: Apr 12, 2006
  • slightly faster loopfilter
Version full rev. 496
  • Release Date: Apr 12, 2006
  • cosmetics in sad/ssd/satd mmx
Version full rev. 497
  • Release Date: Apr 12, 2006
  • 3% faster satd_mmx
Version full rev. 495
  • Release Date: Apr 11, 2006
  • store quoted configure options. needed e.g. for multiple args under --extra-cflags.
Version full rev. 494
  • Release Date: Apr 11, 2006
  • fix a yasm-incompatible syntax in x86 asm
Version full rev. 493
  • Release Date: Apr 11, 2006
  • yasm noexec stack
Version full rev. 492
  • Release Date: Apr 10, 2006
  • more interleaved SAD.
  • 25% faster halfpel.
Version full rev. 491
  • Release Date: Apr 10, 2006
  • more interleaved SAD.
  • 1% faster umh, 6% faster esa.
Version full rev. 489
  • Release Date: Apr 10, 2006
  • Added support for ppc64. I'm really f***ing tired of having to do this.
Version full rev. 490
  • Release Date: Apr 10, 2006
  • interleave multiple calls to SAD.
  • 15% faster fullpel motion estimation.
Version full rev. 488
  • Release Date: Apr 8, 2006
  • use LDFLAGS when linking shared lib
Version full rev. 487
  • Release Date: Mar 29, 2006
  • compilation fix for mingw, darwin (off_t was undefined)
Version full rev. 486
  • Release Date: Mar 28, 2006
  • (r486) GTK: support yuv4mpeg input. patch by Vincent Torri.
  • (r485) GTK: fix avs input. patch by Vincent Torri.
  • (r484) cli: support yuv4mpeg input. patch by anonymous.
  • (r483) GTK: compilation fixes
Version full rev. 477
  • Release Date: Mar 23, 2006
  • 10l in r473 and stdin
  • RD subpel motion estimation (--subme 7)
  • cosmetics in cabac_mb_cbf
Version full rev. 451
  • Release Date: Mar 4, 2006
  • 10l in r443 (p4x4 chroma)
  • common/i386/i386inc.asm: tell the ELF linker about our stack properties so that it does not assume the stack has to be executable.
  • configure common/i386/i386inc.asm: got rid of -DFORMAT_* nasm flags and use built-in preprocessor tests instead.
  • common/i386: factored the .rodata section declaration into i386inc.asm.
  • configure: activate minor nasm optimisations, such as assembling "add eax, 8" as "add eax, byte 8".
  • common/i386/*.asm: don't use the "GLOBAL" reserved word, some versions NASM complain about it. Replaced it with "GOT_ebx".
Top 10
Trailer



Features

H.264 Zone: H.264 news, articles and downloads



Software Submissions