|
|
Version full rev. 899
- Release Date: Jul 5, 2008
- Update my email address
Version full rev. 898
- Release Date: Jul 4, 2008
- Update file headers throughout x264
Update "Authors" lists based on actual authorship; highest is most important
Update copyright notices and remove old CVS tags from file headers
Add file headers to GTK and other sections missing them
Update FSF address
Other header-related cosmetics
Version full rev. 897
- Release Date: Jul 3, 2008
- r897: denoise_dct asm
- r896: cosmetics in permutation macros
SWAP can now take mmregs directly, rather than just their numbers
Version full rev. 895
- Release Date: Jul 3, 2008
- r895: Fix bug in adaptive quantization
In some cases adaptive quantization did not correctly calculate the variance.
Bug reported by MasterNobody
- r894: lowres_init asm
rounding is changed for asm convenience. this makes the c version slower, but there's no way around that if all the implementations are to have the same results.
- r893: Optimizations and cosmetics in macroblock.c
If an i4x4 dct block has no coefficients, don't bother with dequant/zigzag/idct. Not useful for larger sizes because the odds of an empty block are much lower.
Cosmetics in i16x16 to be more consistent with other similar functions.
Add an SSD threshold for chroma in probe_skip to improve speed and minimize time spent on chroma skip analysis.
Rename lambda arrays to lambda_tab for consistency.
Version full rev. 892
- Release Date: Jun 30, 2008
- some asm functions require aligned stack. disable these when compiling with msvc/icc.
Version full rev. 891
- Release Date: Jun 25, 2008
- r891: Move bitstream end check to macroblock level
- r891: Additionally, instead of silently truncating the frame upon reaching the end of the buffer, reallocate a larger buffer instead.
- r890: Convert NNZ to raster order and other optimizations
- r890: Converting NNZ to raster order simplifies a lot of the load/store code and allows more use of write-combining.
- r890: More use of write-combining throughout load/save code in common/macroblock.c
- r890: GCC has aliasing issues in the case of stores to 8-bit heap-allocated arrays; dereferencing the pointer once avoids this problem and significantly increases performance.
- r890: More manual loop unrolling and such.
- r890: Move all packXtoY functions to macroblock.h so any function can use them.
- r890: Add pack8to32.
- r890: Minor optimizations to encoder/macroblock.c
Version full rev. 889
- Release Date: Jun 19, 2008
- r889: mc_chroma_sse2/ssse3
- r888: checkasm --bench=function_name
- r887: interleave psnr/ssim computation with reference frame filtering, to improve cache coherency
Version full rev. 886
- Release Date: Jun 16, 2008
- r886: Add more inline asm and a runtime check for MMXEXT support
x264 will now terminate gracefully rather than SIGILL when run on a machine with no MMXEXT support.
A configure option is now available to build x264 without assembly support for support on such old CPUs as the Pentium 2, K6, etc.
- r885: Use aligned memcpy for x264_me_t struct and cosmetics
- r884: Cosmetics and loop unrolling
GCC is not very good at loop unrolling in cases where it can perform constant propagation, so the unrolling unfortunately has to be done manually.
Version full rev. 883
- Release Date: Jun 13, 2008
- r883: Fix regression in 64-bit in r882
i_mvc needs to be 64-bit when used with a 64-bit memory pointer
- r882: More tweaks to me.c
- r882: Added inline MMX version of UMH's predictor difference test
- r882: Various cosmetics throughout me.c
- r882: Removed a C99-ism introduced in r878.
Version full rev. 881
- Release Date: Jun 12, 2008
- Fix regression in r736
r736 added intra RD refinement to B-frames; however, it is possible for subme=7 to be used without b-rdo.
This means intra RD isn't run, and therefore it is possible for intra chroma analysis to not have been run, since update_cache was never called for an intra block, and chroma ME is not required even at subme=7.
r801, which removed a memset, made this worse because previously the chroma prediction mode was at least initialized to zero; now it was not initialized at all.
Therefore, --no-chroma-me, --subme 7, and no --b-rdo had the potential to crash.
This change restricts intra RD refinement to only be run when --b-rdo is enabled (sensible to begin with), thus preventing a crash in this case.
Version full rev. 880
- Release Date: Jun 11, 2008
- r880: Fix regression in r850
Bug resulted in rare incorrect chroma encoding
- r879: Cosmetics in VBV handling
- r878: Tweaks and cosmetics in me.c
Use write-combining for predictor checking and other tweaks.
Version full rev. 877
- Release Date: Jun 8, 2008
- r877: Partially inline trellis quantization
Inlining trellis into the 4x4/8x8 trellis wrappers increases trellis speed by about 5-10% through constant propagation.
- r876: Various cosmetic changes.
- r875: avg_weight_sse2
- r874: many changes to which asm functions are enabled on which cpus.
with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that.
some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that.
some ssse3 instructions didn't become useful until Penryn, so yet another flag.
disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx.
enable some sse2 functions on Athlon64 that always were faster and we just didn't notice.
remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast".
don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D).
- r873: enable ssse3 phadd satd on Penryn.
- r872: benchmark most of the asm functions (checkasm --bench).
Version full rev. 871
- Release Date: Jun 6, 2008
- Cosmetic: fix C99-ism
Version full rev. 870
- Release Date: Jun 6, 2008
- Use a gaussian window for cplxblur
Cplxblur was originally intended to use a gaussian window, but in its current form did not.This change provides a tiny improvement to 2pass ratecontrol.
Version full rev. 869
- Release Date: Jun 4, 2008
- r869: cosmetics
- r868: nasm compatible NX stack
- r867: CQP is incompatible with AQ
- r866: memzero_aligned_mmx
- r865: binmode stdin on mingw, not just msvc
- r864: omit redundant mc after non-rdo dct size decision, and in b-direct rdo
- r863: allow fractional CRF values with AQ.
- r862: fix some uninitialized partitions in rdo
Version full rev. 861
- Release Date: Jun 3, 2008
- r861: 2-pass VBV support and improved VBV handling
Dramatically improves 1-pass VBV ratecontrol (especially CBR) and provides support for VBV in 2-pass mode. This consists of a series of functions that attempts to find overflows and underflows in the VBV from the first-pass statsfile and fix them before encoding.
1-pass VBV code partially by Dark Shikari.
- r860: Fix noise reduction in threaded mode.
Previously enabling noise reduction with threads had no effect.
Note that this is not an optimal solution; each thread still tracks noise reducation separately (unlike in single-threaded mode).
Version full rev. 859
- Release Date: May 21, 2008
- r859: fix a crash on win32 with threads.
r852 introduced an assumption in deblock that the stack is aligned.
- r858: remove nasm version check. a feature check is all that's needed.
silence stderr in yasm version check.
- r857: cosmetics in cabac
- r856: faster residual_write_cabac
- r855: change DEBUG_DUMP_FRAME to run-time --dump-yuv
- r854: x264_median_mv_mmxext
this is the first non-runtime-detected use of mmxext, but it has to be inlined
- r853: factor duplicated code out of deblock chroma mmx
- r852: deblock_luma_intra_mmx
Version full rev. 851
- Release Date: May 17, 2008
- r851: write aspect ratio in mp4
- r850: omit delta_quant in i16x16 blocks with no residual
(all other block types were already covered, but i16x16 cbp is special)
- r849: explicit write combining, because gcc fails at optimizing consecutive memory accesses
- r848: force unroll macroblock_load_pic_pointers
and a few other minor optimizations
- r847: quant_2x2_dc_ssse3
- r846: r836 borked lossless cabac nnz
Version full rev. 845
- Release Date: May 15, 2008
- r845: use elf instead of a.out on netbsd
- r844: fix x264_realloc when not using libc realloc.
- r843: don't pretend to support win64. remove all related code.
it hasn't worked since probably some time in 2005, and won't ever be fixed unless someone steps up to maintain it.
- r842: cosmetics: replace last instances of parm# asm macros with r#
- r841: remove DEBUG_BENCHMARK
- r840: faster probe_skip
Version full rev. 839
- Release Date: Apr 28, 2008
- r839: drop support for pre-SSE3 assemblers
- r838: s/x264_cpu_restore/x264_emms/
no point in giving it a generic name when it's not generic
- r837: faster cabac_mb_cbp_luma
ported from ffmpeg
- r836: remove some redundant nnz counts
move some nnz counts from macroblock_encode to cavlc if cabac doesn't need them
- r835: compute missing nnz count in subme7 cavlc
- r834: remove a division in macroblock-level bookkeeping
- r833: omit P/B-skip mc from macroblock_encode if the pixels haven't been overwritten since probe_skip
- r832: earlier termination in SEA if mvcost exceeds residual
- r831: remove void* arithmetic from r821
Version full rev. 830
- Release Date: Apr 26, 2008
- r830: Fix define of illegal function identifiers (as defined in section "7.1.3 Reserved identiers" of C99 spec)
- r829: Fix define of illegal identifier (as defined in section "7.1.3 Reserved identiers" of C99 spec) "__UNUSED__", and use the one defined in common/osdep.h, i.e. "UNUSED"
based on a patch by Diego Biurrun
Version full rev. 828
- Release Date: Apr 25, 2008
- r828: more consistent include name (in line with other PPC includes
- r827: fix illegal identifiers in multiple inclusion guards
patch by Diego Biurrun % diego A biurrun P de %
Version full rev. 826
- Release Date: Apr 22, 2008
- r826: AQ now treats perfectly flat blocks as low energy, rather than retaining previous block's QP.
- r826: fixes occasional blocking in fades.
- r825: checkasm cabac
- r824: s/movdqa/movaps/g
- r823: --asm to allow testing of different versions of asm without recompile
- r822: copy left neighbor pixels directly from previous mb instead of main plane
Version full rev. 821
- Release Date: Apr 17, 2008
- r821: cacheline split workaround for mc_luma
- r820: add "SECTION_RODATA" before "SECTION .text" to setup the fakegot label used in macho binaries.
This fixes compilation with --enable-pic
Requires Yasm 0.7.0 or newer
Patch by Dave Lee % davelee P com A gmail P com %
Version full rev. 819
- Release Date: Apr 15, 2008
- more hpel fixes
Version full rev. 818
- Release Date: Apr 12, 2008
- r818: update msvc projectfile
- r817: r810 borked hpel_filter_sse2 on unaligned buffers
Version full rev. 816
- Release Date: Apr 10, 2008
- r816: threads=auto on multicore now implies thread input, just like explicit thread numbers already did
- r815: dct4 sse2
- r814: faster x86_32 dct8
- r813: macros to deal with macros that permute their arguments
- r812: mmx cachesplit sad of non-square sizes checked height instead of width
- r811: sfence after nontemporal stores
- r810: simplify hpel filter asm (move control flow to C) and add sse2, ssse3 versions
- r809: more mmx/xmm macros (mova, movu, movh)
Version full rev. 808
- Release Date: Apr 1, 2008
- r808: improve handling of cavlc dct coef overflows
support large coefs in high profile, and clip to allowed range in baseline/main
- r807: fix shared libs on MacOSX
based on a patch by İsmail Dönmez
- r806: typo in r803
Version full rev. 805
- Release Date: Mar 31, 2008
- r805: fix a crash on mp4 muxing with invalid params
- r804: variance-based psy adaptive quantization
- r804: new options: --aq-mode --aq-strength
- r804: AQ is enabled by default
- r803: fix naming of .dll on mingw
- r802: don't distinguish between mingw and cygwin
- r801: remove a memset
- r800: typo. don't evaluate rd pskip when p16x16 found ref>
- r799: 0r784 borked lossless dc zigzag
Version full rev. 798
- Release Date: Mar 26, 2008
- r798: fix an arithmetic overflow that disabled SEA threshold after finding a mv with SAD < mvcost.
- r797: fix hpel_filter_altivec picked up by checkasm
Patch by Manuel %maaanuuu A gmx.net % and Noboru Asai % noboru P asai A gmail P com %
Version full rev. 796
- Release Date: Mar 25, 2008
- r796: faster residual
- r795: nasm doesn't like align(nop) in structs
- r794: reduce the size of some cabac arrays
- r793: use cabac context transition table from trellis in normal residual coding too
- r792: rearrange cabac struct to reduce code size
Version full rev. 791
- Release Date: Mar 25, 2008
- r791: higher precision RD lambda
improves quality at QP<=12.
- r790: faster cabac_encode_ue_bypass
- r789: cabac asm.
mostly because gcc refuses to use cmov.
28% faster than c on core2, 11% on k8, 6% on p4.
- r788: cosmetics in cabac
- r787: inline cabac_size_decision
Version full rev. 786
- Release Date: Mar 23, 2008
- r786: cosmetics in DECLARE_ALIGNED
- r785: don't distinguish between luma4x4 and luma4x4ac
- r784: faster lossless zigzag
- r783: more alignment
Version full rev. 782
- Release Date: Mar 22, 2008
- r782: add tesa and lossless to fprofile
- r781: cosmetics in residual_write
- r780: remove unused bitstream reader
- r779: cosmetics in quant asm
- r778: special case dequant for flat matrix
Version full rev. 777
- Release Date: Mar 21, 2008
- r777: faster dequant
- r776: simplify hpel_filter_c
- r775: use x264_mc_copy_w16_sse2 in mc.copy, it was previously only in mc_luma
- r774: new ssd_8x*_sse2
- r774: align ssd_16x*_sse2
- r774: unroll ssd_4x*_mmx
- r773: update altivec zigzags
- r772: r768 borked cavlc
Version full rev. 771
- Release Date: Mar 20, 2008
- r771: cosmetics in intra predict
- r770: faster intra predict 8x8 hu/hd
- r769: reduce zigzag arrays from int to int16_t
- r768: reduce the size of some arrays
- r767: skip intra pred+dct+quant in cases where it's redundant (analyse vs encode)
large speedup with trellis=2, small speedup with trellis=0 and/or subme>=6
- r766: cosmetics in asm
- r765: satd_4x4_ssse3
- r764: get_ref_sse2
Version full rev. 763
- Release Date: Mar 19, 2008
- r763: continue instead of crash when the threading mv constraint is violated.
doesn't fix the underlying bug, but hopefully less annoying until we find it.
- r762: remove remaining reference to clip1.h
- r761: fix name mangling again.
apparently it's not just a convention, dll build fails if you try to export a non-prefixed name.
- r760: update msvc projectfile
- r759: missing #ifdef HAVE_SSE3
- r758: don't define offsetof since it's standard
- r757: shut up gcc warning in offsetof
Version full rev. 756
- Release Date: Mar 17, 2008
- r756: increase alignment of mv arrays
- r755: memcpy_aligned_sse2
- r754: checkasm check whether callee-saved regs are correctly saved
x86_32 only for now since x86_64 varargs are annoying
- r753: fix x86_32 ads which failed to preserve a register
- r752: fix some name mangling issues introduced by the merge
- r751: remove x264_mc_clip1.
it's wrong for sufficiently perverse inputs, and clip_uint8 is faster anyway.
- r750: merge x86_32 and x86_64 asm, with macros to abstract calling convention and register names
Version full rev. 749
- Release Date: Mar 12, 2008
- git compatible version script
Version full rev. 748
- Release Date: Mar 8, 2008
- check for broken versions of yasm
Version full rev. 747
- Release Date: Mar 7, 2008
- Rev. 746: .gitignore
- Rev. 747: increase the alignment of the i8x8 edge cache, needed for sse2 intra prediction.
patch by Alexander Strange.
Version full rev. 745
- Release Date: Mar 2, 2008
- Rev. 745: pic macros now keep track of which register holds the GOT, so variable access doesn't have to care
- Rev. 744: remove x86_64 predict_8x8_ddl_mmxext because sse2 is faster even on amd
- Rev. 743: cosmetics in dsp init
- Rev. 742: sse2 16x16 intra pred.
- Rev. 742: port the remaining intra pred functions from x86_64 to x86_32.
patch by Dark Shikari.
- Rev. 742: some simplifications to mmx intra pred that should have been done way back when we switched to constant fdec_stride.
- Rev. 742: and remove pic spills in functions that have a free caller-saved reg.
patch partly by Dark Shikari.
- Rev. 740: faster array_non_zero
- Rev. 739: x86_32 sse2 idct8
ported from ffmpeg by Dark Shikari
- Rev. 738: checkasm: relax the threshold for floating-point ssim
- Rev. 737: checkasm: test idct with the range of coefficients what can really be encountered, as opposed to random numbers which might overflow.
Version full rev. 736
- Release Date: Jan 29, 2008
- intra_rd_refine in B-frames
Version full rev. 735
- Release Date: Jan 28, 2008
- print average of macroblock QPs instead of frame's nominal QP
- update date
- remove colorspace conversion support, because it has no business in any codec
- misc fixes in checkasm
- remove a useless bit of me=umh (originally copied from JM, where it was used for something)
- fix a memleak in cqm
- fix a memleak in mkv muxer
patch by saintdev
- satd exhaustive motion search (--me tesa)
- fix cabac context for nonzero delta_qp of the 2nd mb of a frame in interlaced mode
- fix mapping of mvs to partitions in p4x4_chroma
patch by Noboru Asai
- fix mvp for b16x8 and b8x16 L1 search
patch by Wei-Yin Chen
- shave a couple cycles off cabac functions
- faster and smaller x264_macroblock_cache_mv etc
- configure test for endianness
Version full rev. 721
- Release Date: Jan 18, 2008
- change the meaning of --ref: it now selects DPB size (including B-frames), rather than L0 size (which B-frames are added to)
Version full rev. 720
- Release Date: Jan 15, 2008
- add / fix support for FreeBSD, based on a patch by Igor Mozolevsky % igor A hybrid-lab P co P uk %
Version full rev. 719
- Release Date: Jan 10, 2008
- shut up some valgrind warnings
- slightly wrong memory allocation in r717, fixes a potential crash with merange>32
Version full rev. 717
- Release Date: Jan 7, 2008
- convert absolute difference of sums from mmx to sse2
- convert mv bits cost and ads threshold from C to sse2
- convert bytemask-to-list from C to scalar asm
1.6x faster me=esa (x86_64) or 1.3x faster (x86_32). (times consider only motion estimation. overall encode speedup may vary.)
- round esa range to a multiple of 4
Version full rev. 715
- Release Date: Jan 4, 2008
- use define _WIN32 instead of __WIN32__ or WIN32 defines.
NSDN reference: http://msdn2.microsoft.com/en-us/library/b0084kay(VS.80).aspx
Patch by BugMaster %BugMaster A narod P ru%
Original thread:
date: Dec 27, 2007 3:18 AM
subject: [x264-devel] VS2008 compilation error (need of replacement __WIN32__ with _WIN32)
Version full rev. 714
- Release Date: Dec 21, 2007
- tweak x264_pixel_sad_x4_16x16_sse2 horizontal sum. 168 -> 166 cycles on core2.
Version full rev. 713
- Release Date: Dec 21, 2007
- fix a nondeterminism involving 8x8dct, rdo, and threads.
Version full rev. 712
- Release Date: Dec 14, 2007
- also test arch-specific x264_zigzag_* implementations in checkasm.c
patch by Patch by Noboru Asai % noboru P asai A gmail P com%
Version full rev. 711
- Release Date: Dec 11, 2007
- Add AltiVec implementation of
- x264_zigzag_scan_4x4_frame_altivec()
- x264_zigzag_scan_4x4ac_frame_altivec()
- x264_zigzag_scan_4x4_field_altivec()
- x264_zigzag_scan_4x4ac_field_altivec()
each around 1.3 tp 1.8x faster than C version
Patch by Noboru Asai % noboru P asai A gmail P com%
Version full rev. 710
- Release Date: Dec 10, 2007
- adds AliVec implementation of predict_16x16_p()
over 4x faster than C version
Version full rev. 709
- Release Date: Dec 7, 2007
- revert the x86_32 part of r708. elf shared libraries aren't important enough to be worth the extra lines of code to check for nasm.
Version full rev. 708
- Release Date: Dec 4, 2007
- Rev. 708: mark asm functions as hidden
- Rev. 707: check whether ld supports -Bsymbolic before using it
Version full rev. 706
- Release Date: Dec 3, 2007
- reduce the data type used in some tables. 16KB smaller exe.
Version full rev. 705
- Release Date: Dec 2, 2007
- Rev. 705: faster removal of duplicate mv predictors
- Rev. 704: avoid a division in x264_mb_predict_mv_ref16x16.
patch by Dark Shikari.
- Rev. 703: avoid a division in umh.
patch by Dark Shikari.
Version full rev. 702
- Release Date: Nov 27, 2007
- fix a memleak in h->mb.mvr
Version full rev. 701
- Release Date: Nov 26, 2007
- fix compilation as a shared library on x86_64 (regression in r696)
Version full rev. 700
- Release Date: Nov 22, 2007
- Rev. 700: add support for x86_64 on Darwin9.0 (Mac OS X 10.5, aka Leopard)
Patch by Antoine Gerschenfeld %gerschen A clipper P ens P fr%
- Rev. 699: cover some more options in fprofile. (esa, bime, cqm, nr, no-dct-decimate, trellis2)
previously, esa was slower with fprofile than without, since gcc thought it wasn't important. now esa benefits like anything else.
Version full rev. 698
- Release Date: Nov 21, 2007
- Rev. 698: Add AltiVec implementation of x264_pixel_ssd_8x8, 3x faster than C version
Overall speed-up: 0.7% with --bframes 3 --ref 5 -m 7 --b-rdo
Patch by Noboru Asai %noboru P asai A gmail P com%
- Rev. 697: limit mvs to [-512,511.75] instead of [-512,512]
- Rev. 696: avoid memory loads that span the border between two cachelines.
on core2 this makes x264_pixel_sad an average of 2x faster. other intel cpus gain various amounts. amd are unaffected.
overall speedup: 1-10%, depending on how much time is spent in fullpel motion estimation.
- Rev. 695: add cache info to cpu_detect. also print sse3.
Version full rev. 694
- Release Date: Nov 20, 2007
- Rev. 694: cosmetics: reorder mc_luma/mc_chroma/get_ref arguments for consistency with other functions
- Rev. 693: separate pixel_avg into cases for mc and for bipred
Version full rev. 692
- Release Date: Nov 19, 2007
- Rev. 692: add AltiVec implementation of ssim_4x4x2_core, about 4x faster than C version.
Overall: 0.1-0.2% faster with default encoding settings
Patch by Noboru Asai %noboru P asai A gmail P com%
- Rev. 691: Add AltiVec implementation ofx264_hpel_filter. Provides a 10-11% overall speed-up with default encoding options
Patch by Noboru Asai %noboru P asai A gmail P com
Version full rev. 690
- Release Date: Nov 18, 2007
- cosmetics in dsp function selection
Version full rev. 689
- Release Date: Nov 18, 2007
- remove sad_pde. it's been unused ever since successive elimination replaced it.
Version full rev. 688
- Release Date: Nov 17, 2007
- Rev. 688: cosmetics: use symbolic constants for frame padding radius
- Rev. 687: move hpel_filter cpu detection to a function pointer like everything else
Version full rev. 686
- Release Date: Nov 16, 2007
- cosmetics: use separate variables for frame width and stride
Version full rev. 685
- Release Date: Nov 14, 2007
- rev. 685: Add AltiVec implementation of add4x4_idct, add8x8_idct, add16x16_idct, 3.2x faster on average
1.05x faster overall with default encoding options
Patch by Noboru Asai % noboru DD asai AA gmail DD com %
- rev. 684: add AltiVec implementation of dequant_4x4 and dequant_8x8, 2.8x faster than C,
1.01x faster than previous revision with default encoding options
Patch by Noboru Asai % noboru DD asai AA gmail DD com %
Version full rev. 683
- Release Date: Nov 13, 2007
- Add AltiVec implementation of quant_2x2_dc,
fix Altivec implementation of quant_(4x4|8x8)(|_dc) wrt current C implementation
Patch by Noboru Asai % noboru DD asai AA gmail DD com %
Version full rev. 682
- Release Date: Nov 2, 2007
- fix a possible nondeterminism with me=umh + threads.
Version full rev. 681
- Release Date: Oct 31, 2007
- use hex instead of dia for rdo mv refinement. ~0.5% lower bitrate at subme=7.
patch by Dark Shikari.
Version full rev. 680
- Release Date: Sep 25, 2007
- port sad_*_x3_sse2 to x86_64
- don't overwrite pthread* namespace, because system headers might define those functions even if we don't want them
Version full rev. 678
- Release Date: Sep 22, 2007
- faster 4x4 sad
Version full rev. 677
- Release Date: Sep 21, 2007
- fix an arithmetic overflow in trellis at high qp.
Version full rev. 676
- Release Date: Sep 16, 2007
- implement multithreaded me=esa
Version full rev. 675
- Release Date: Sep 13, 2007
- fix some integer overflows. now vbv size can exceed 2 Gbit.
Version full rev. 674
- Release Date: Sep 10, 2007
- allow --vbv-init to take absolute values (in kbit), in addition to the previous fractions of vbv-bufsize.
Version full rev. 673
- Release Date: Sep 9, 2007
- remove a bashism
Version full rev. 672
- Release Date: Sep 4, 2007
- reorder headers so that largefile support is defined before the first copy of stdio
Version full rev. 671
- Release Date: Aug 21, 2007
- regression in r669: broke saving of configure args if make has to re-run configure
Version full rev. 670
- Release Date: Aug 18, 2007
- regression in r669: --enable-shared should imply --enable-pic on some archs.
Version full rev. 669
- Release Date: Aug 14, 2007
- Add a --host flag to allow overriding config.guess; this is particularly
useful with a 64-bits kernel running a 32-bits userland to build 32-bits
apps.
- Normalize any host triplet into a quadruplet via config.sub.
- Move option parsing before any use of architecture information.
- Update config.guess.
Version full rev. 667
- Release Date: Jul 18, 2007
- mingw doesn't have strtok_r
- move os/compiler specific defines to their own header
- extend zones to support (some) encoding parameters in addition to ratecontrol.
Version full rev. 664
- Release Date: Jul 7, 2007
- cosmetics
Version full rev. 663
- Release Date: Jun 29, 2007
- limit vertical motion vectors to +/-512, since some decoders actually depend on that limit.
Version full rev. 662
- Release Date: Jun 23, 2007
- Add vertical and horizontal luma deblocking accelerated with Altivec, based on Graham Booker's code written for FFmpeg with slight modifications to re-use x264's macros
Version full rev. 661
- Release Date: Jun 16, 2007
- cosmetics in cpu detection
- fix compilation without asm on x86_32 (r658 worked only on x86_64).
Version full rev. 659
- Release Date: Jun 11, 2007
- exempt 1080p from the non-mod16 warning
Version full rev. 658
- Release Date: Jun 6, 2007
- allow compiling without yasm/nasm on x86 and x86-64 platforms
- updated MS VC8/VC7 build, patch by Gabriel Bouvigne
Version full rev. 656
- Release Date: May 26, 2007
- replace alloca with malloc everywhere. per manpage, use of alloca is discouraged. this may have a minor effect on the speed of ssim and esa, but that appears too small to measure.
Version full rev. 655
- Release Date: May 3, 2007
- require a ratecontrol method to be specified, it no longer defaults to cqp=26.
Version full rev. 654
- Release Date: Apr 23, 2007
- fix nnz computation in cavlc+8x8dct+deblock. (regression in r607)
- fix the computation of bits used for vbv. (regression in r651)
Version full rev. 652
- Release Date: Apr 22, 2007
- c89 compile fix
Version full rev. 651
- Release Date: Apr 22, 2007
- cabac: use bytestream instead of bitstream.
35% faster cabac, 20% faster overall lossless, ~1% faster overall at normal bitrates.
Version full rev. 650
- Release Date: Apr 13, 2007
- remove the restriction on number of threads as a function of resolution (it was wrong anyway in the presence of B-frames), and raise the max number of threads in general (though more will have to be done before it can really scale to lots of cores).
Version full rev. 649
- Release Date: Apr 11, 2007
- tweak ssse3 quant
Version full rev. 648
- Release Date: Apr 8, 2007
- change some tables from int to int8_t. 13KB smaller executable.
- faster cabac rdo. up to 10% faster at q0, but negligible at normal bitrates.
- workaround gcc's inability to align variables on the stack.
this crash was introduced in r642, but only because previous versions didn't use sse2 on the stack.
Version full rev. 645
- Release Date: Apr 6, 2007
- 32bit version of ssse3 satd.
switch default assembler to yasm. it will still fallback to nasm if you don't have yasm.
- simplify trellis
- fix an arithmetic overflow in trellis with QP >= 42
- 2x faster quant. 2% overall.
side effects:
not bit-identical to the previous algorithm.
while the new algorithm covers a wider range of cqms than the previous one did,
I couldn't find a good way to fallback to a general version for the extreme
cqms. so now it refuses to encode extreme cqms instead of just being slower.
lays a framework for custom deadzone matrices, though I didn't add an api.
- when encoding with a cqm, probe_skip now also uses the cqm, instead of the flat matrix
Version full rev. 640
- Release Date: Apr 4, 2007
- cosmetics in asm macros
- use only c-style comments in public header (patch by Vincent Torres
Version full rev. 638
- Release Date: Apr 3, 2007
- in hpel search, merge two 16x16 mc calls into one 16x17. 15% faster hpel, .3% overall.
- Compile fix
Version full rev. 636
- Release Date: Apr 1, 2007
- remove private stuff from public headers. no more need for -D__X264__
Version full rev. 635
- Release Date: Mar 25, 2007
- adjust bitstream buffer sizes for very large frames
Version full rev. 634
- Release Date: Mar 15, 2007
- rev. 634: conflate HAVE_MMXEXT with HAVE_SSE2, since they were never used distinctly.
- rev. 633: Made -DNEED_ALTIVEC unnecessary, thanks to Guillaume Poirier.
- rev. 632: check x264_cpu_detect() before calling AltiVec functions.
- rev. 631: ssse3 detection. x86_64 ssse3 satd and quant.
- rev. 631: requires yasm >= 0.6.0
- rev. 630: Use -maltivec when building dependencies, or cannot be used.
- rev. 630: Do not declare vectors in non-AltiVec files.
- rev. 629: common/cpu.c: runtime AltiVec autodetection on Linux.
- rev. 629: configure, Makefile: do not build the whole project with -maltivec because
it generates AltiVec code in weird places.
Version full rev. 628
- Release Date: Mar 6, 2007
- fix a small memleak. patch by Limin Wang.
Version full rev. 627
- Release Date: Mar 4, 2007
- compile fix for GCC-3.3 on OSX, based on a patch by
Patrice Bensoussan % patrice P bensoussan A free P fr%
Note: regression test still do not pass with GCC-3.3,
but they never did as far as I can remember.
- cosmetics in regression test
- regression testing, run similar to fprofiled: VIDS='vid_720x480.yuv' make test
Version full rev. 624
- Release Date: Mar 1, 2007
- add ability to generate doxygen documentation; make dox
Version full rev. 623
- Release Date: Feb 23, 2007
- oops, scenecut detection failed to activate when using threads and not using B-frames
Version full rev. 622
- Release Date: Jan 30, 2007
- extras/getopt.c was BSD licensed. replace with a LGPL version (from glibc).
Version full rev. 621
- Release Date: Jan 26, 2007
- Fix build issues on Linux. Only gcc-4.x is supported, as on OSX.
- Cleans up a few inconsistencies in the code too.
Version full rev. 620
- Release Date: Jan 22, 2007
- tweak block_residual_write_cavlc.
- up to 1% faster lossless, no difference at normal bitrates.
Version full rev. 619
- Release Date: Jan 21, 2007
- don't assume int is exactly 4 bytes
Version full rev. 618
- Release Date: Jan 12, 2007
- make array_non_zero() compatible with -fstrict-aliasing
Version full rev. 617
- Release Date: Jan 10, 2007
- Honor CFLAGS and LDFLAGS set by the user
Version full rev. 616
- Release Date: Jan 3, 2007
- Check whether 'echo -n' works, otherwise try printf (fixes build on current OS X 10.5)
Version full rev. 615
- Release Date: Jan 2, 2007
- Check version of nasm on OS X / Intel
Version full rev. 614
- Release Date: Dec 21, 2006
- wrong reference frames were used with refs>=14 + pyramid (regression in r607)
- enable thread synchronization primitives on linux too
Version full rev. 612
- Release Date: Dec 20, 2006
- fix a crash with x264_encoder_headers() + threads
Version full rev. 611
- Release Date: Dec 16, 2006
- don't skip autodection on configure --enable-pthread
- more win32threads -> pthreads
- cosmetics: rename list operators to be consistent with Perl, and move them to common/
- win32: use pthreads instead of win32threads. for some reason, pthreads is much faster.
- New threading method:
Encode multiple frames in prallel instead of dividing each frame into slices. Improves speed, and reduces the bitrate penalty of threading.
Side effects: It is no longer possible to re-encode a frame, so threaded scenecut detection must run in the pre-me pass, which is faster but less precise.
It is now useful to use more threads than you have cpus. --threads=auto has been updated to use cpus*1.5.
Minor changes to ratecontrol.
- New options: --pre-scenecut, --mvrange-thread, --non-deterministic
Version full rev. 606
- Release Date: Dec 13, 2006
- Do not assume anything about sizeof(cpu_set_t).
Version full rev. 605
- Release Date: Dec 12, 2006
- Add support for kFreeBSD (FreeBSD kernel with GNU userland).
Version full rev. 604
- Release Date: Nov 28, 2006
- Add Altivec implementations of add8x8_idct8, add16x16_idct8, sa8d_8x8 and sa8d_16x16
Note: doesn't take advantage of some possible aligned memory accesses, so there's still room for improvement
Version full rev. 603
- Release Date: Nov 26, 2006
- Force alignment of the fake .rodata on MacIntel
Version full rev. 602
- Release Date: Nov 23, 2006
- don't treat vbv_maxrate as a minrate too if it's higher than target average bitrate.
Version full rev. 601
- Release Date: Nov 19, 2006
- Merges Guillaume Poirier's AltiVec changes:
- Adds optimized quant and sub*dct8 routines
- Faster sub*dct routines
- ~8% overall speed-up with default settings
Version full rev. 600
- Release Date: Nov 7, 2006
- 10% faster deblock mmx functions. ported from ffmpeg.
- checkasm: ignore insignificant differences in floating-point ssim
Version full rev. 598
- Release Date: Oct 31, 2006
- display final ratefactor in abr when a loose vbv is applied. (still disabled in true cbr)
Version full rev. 597
- Release Date: Oct 30, 2006
- fix parsing of --deblock %d,%d (beta was ignored)
- compute chroma_qp only once per mb
Version full rev. 595
- Release Date: Oct 29, 2006
- rd refinement of intra chroma direction (enabled in --subme 7)
patch by Alex Wright.
Version full rev. 594
- Release Date: Oct 19, 2006
- fix a crash in avc2avi
Version full rev. 593
- Release Date: Oct 17, 2006
- skip deblocking and motion interpolation when using only I-frames
Version full rev. 592
- Release Date: Oct 14, 2006
- cosmetics
- allow fractional values of crf
Version full rev. 590
- Release Date: Oct 11, 2006
- prefetch pixels for motion compensation and deblocking.
- fix a crash on interlace + >8 reference frames
- no more decoder. it never worked anyway, and the presence of defunct code was confusing people.
Version full rev. 587
- Release Date: Oct 10, 2006
- compute pskip_mv only once per macroblock, and store it
- slightly faster chroma_mc_mmx
- missing emms in plane_copy_mmx
Version full rev. 584
- Release Date: Oct 7, 2006
- merge center_filter_mmx with horizontal_filter_mmx
- 1.5x faster center_filter_mmx (amd64)
Version full rev. 582
- Release Date: Oct 6, 2006
- mmx/prefetch implementation of plane_copy
- no more vfw
- gtk fixes:
- in Makefile
- fix datadir for mingw users
- remove the shared lib during the clean rule
- use $(ENCODE_BIN) instead of x264_gtk_encode
- add some $(DESTDIR) and create some directories when necessary
- remove -lintl
- statfile_length -> statsfile_length
- fix the "sensitivity" of the widget of update_statfile
- the logo is now handled correctly on windows
- added: beginning of multipass support
- patch by Vincent Torri.
Version full rev. 579
- Release Date: Oct 5, 2006
- accept mencoder's option names as synonyms (api only, not in x264cli)
Version full rev. 578
- Release Date: Oct 3, 2006
- simplify satd_sse2
- better error checking in x264_param_parse.
- add synonyms for a few options.
Version full rev. 576
- Release Date: Oct 2, 2006
- fix some strides that weren't a multiple of 16.
- tweak motion compensation amd64 asm. 0.3% overall speedup.
- strip local symbols from asm .o files, since they confuse oprofile
- add an option to control direct_8x8_inference_flag, default to enabled.
- slightly faster encoding and decoding of p4x4 + B-frames,
and is needed for strict Levels compliance.
Version full rev. 572
- Release Date: Oct 1, 2006
- allow custom deadzones for non-trellis quantization.
patch by Alex Wright.
- move zigzag scan functions to dsp function pointers.
- mmx implementation of interlaced zigzag.
Version full rev. 570
- Release Date: Oct 1, 2006
- support interlace. uses MBAFF syntax, but is not adaptive yet.
Version full rev. 569
- Release Date: Sep 28, 2006
- allow --zones in cqp encodes
Version full rev. 568
- Release Date: Sep 27, 2006
- cli: fix some typos in vui parameters from r542.
patch by Foxy Shadis.
Version full rev. 567
- Release Date: Sep 26, 2006
- Add an "all" rule to the Makefile. Ideally "default" should be renamed, but I don't want to break existing scripts.
Version full rev. 566
- Release Date: Sep 25, 2006
- workaround: on some systems, alloca() isn't aligned
Version full rev. 565
- Release Date: Sep 23, 2006
- missing picpop
Version full rev. 564
- Release Date: Sep 14, 2006
- fix a buffer overread from r540
Version full rev. 563
- Release Date: Sep 13, 2006
- cosmetics (spelling)
- faster ESA
Version full rev. 560
- Release Date: Sep 11, 2006
- Use the autotool's config.guess script instead of uname to check the
system and CPU types, to avoid issues when using for instance a 32-bit
userland on top of a 64-bit kernel.
- Add the autotool's config.guess script so that we can use it instead
of uname in the configure script.
Version full rev. 558
- Release Date: Aug 23, 2006
- 10l in r553
Version full rev. 557
- Release Date: Aug 21, 2006
- ssim broke on amd64 w/ pic.
Version full rev. 556
- Release Date: Aug 19, 2006
- MSVC compatibility fix from Haali
Version full rev. 555
- Release Date: Aug 18, 2006
- support changing some more parameters in x264_encoder_reconfig()
- SSIM computation. (default on, disable by --no-ssim)
Version full rev. 553
- Release Date: Aug 17, 2006
- configure: --enable-debug reduces optimization to -O1
- cosmetics
Version full rev. 551
- Release Date: Aug 4, 2006
- gcc -fprofile-generate isn't threadsafe
- cli: move some options from --help to --longhelp
- cli: don't try to get resolution from filename unless input is rawyuv
- r542 broke --visualize
Version full rev. 547
- Release Date: Aug 3, 2006
- Nicer OS X x264_cpu_num_processors (thanks David)
- Support OS X and BeOS in x264_cpu_num_processors
- Fixes contexts allocation with threads=auto
- select initial qp for abr and cbr baased on satd and bitrate, rather than cq24.
- --threads=auto to detect number of cpus
- api addition: x264_param_parse() to set options by name
- fix a rare NaN in ratecontrol
- move quant_mf[] from x264_t to the heap, and merge duplicate entries
- GTK update. patch by Vincent Torri.
fixed:
cleaning of Makefile
time elapsed seems broken ('total time' label replaced by 'time remaining')
text entries of the status window are now not editable
added:
compilation from x264/ (add --enable-gtk option to configure)
shared lib creation if --enable-shared is passed to configure
x264gtk.pc
--b-rdo, --no-dct-decimate
- new option: --qpfile forces frames types and QPs. (intended for ratecontrol experiments, not for real encodes)
Version full rev. 537
- Release Date: Jul 18, 2006
- api change: select ratecontrol method with an enum (param.rc.i_rc_method) instead of a bunch of booleans.
Version full rev. 536
- Release Date: Jul 17, 2006
- slightly faster mmx dct
- OpenBSD build fixes.
- patch by Vizeli Pascal (pvizeli at yahoo dot de)
Version full rev. 534
- Release Date: Jul 9, 2006
- mc_chroma width2 mmx
Version full rev. 533
- Release Date: Jun 29, 2006
- make libx264.so symlink relative
Version full rev. 532
- Release Date: Jun 13, 2006
- added:
- direct=auto
- no-fast-pskip
- vbv
- cqm
- tooltips (without descriptions yet)
- translations
- `make clean` for .exe
- when file exists, ask for override
- fixes:
- debug level bug
- bitrate slider bug
- mixed-refs can be set only if ref>1
- i8x8 can be set only if 8x8 transform is enabled
- # of threads capped at 4
- fourcc can't be removed
- cosmetics
Version full rev. 531
- Release Date: Jun 1, 2006
- vfw installer: tweak nsis compression.
patch by Francesco Corriga.
Version full rev. 530
- Release Date: May 31, 2006
- Fixed typo that caused x264_encoder_open to always fail
Version full rev. 529
- Release Date: May 30, 2006
- check some mallocs' return value
- make -> $(MAKE)
Version full rev. 527
- Release Date: May 24, 2006
- convert non-fatal errors to message level "warning".
Version full rev. 526
- Release Date: May 23, 2006
- fix a memory alignment. (no effect on x86, but might be needed for other simd)
Version full rev. 525
- Release Date: May 21, 2006
- when using DEBUG_DUMP_FRAME, write decoded pictures in display order.
patch by Loic Le Loarer.
- non-referenced B-frames should have the same frame_num as the following ref frame, not the previous.
patch by Loic Le Loarer.
Version full rev. 523
- Release Date: May 12, 2006
- set the SPS constraint_set[01]_flag based on the profile in use, just in case some decoder cares
Version full rev. 522
- Release Date: May 11, 2006
- msvc doesn't like C99 named array initializers
- allow sar=1/1.
patch by Loic Le Loarer.
- faster intra search: filter i8x8 edges only once, and reuse for multiple predictions.
Version full rev. 519
- Release Date: May 10, 2006
- faster intra search: some prediction modes don't have to compute a full hadamard transform.
x86 and amd64 asm.
Version full rev. 518
- Release Date: May 8, 2006
- --sps-id, to allow concatenating streams with different settings.
Version full rev. 517
- Release Date: May 4, 2006
- typo in expand_border_mod16
Version full rev. 516
- Release Date: Apr 30, 2006
- typo impaired 2pass bitrate prediction.
Version full rev. 515
- Release Date: Apr 29, 2006
- Let the user choose the compiler with "CC=xxx ./configure"
Version full rev. 514
- Release Date: Apr 29, 2006
- More vector types fixes for gcc 3.3
Version full rev. 513
- Release Date: Apr 29, 2006
- More vector casts to try and make compilers happier
Version full rev. 512
- Release Date: Apr 25, 2006
- Use sa8d instead of satd for i8x8 search.
+.01 dB, -.5% speed
Version full rev. 511
- Release Date: Apr 25, 2006
- Before evaluating the RD score of any mode, check satd and abort if it's much worse than some other mode.
- Also apply more early termination to intra search.
speed at -m1:+1%, -m4:+3%, -m6:+8%, -m7:+20%
Version full rev. 510
- Release Date: Apr 25, 2006
- common/ppc/pixel.c: fixed illegal implicit casts of vector types
Version full rev. 509
- Release Date: Apr 25, 2006
- Added %$#@#$! support for #@%$!#@ armv4l CPU.
Version full rev. 508
- Release Date: Apr 24, 2006
- When evaluating predictors to start fullpel motion search, use subpel positions instead of rounding to fullpel.
about +.02 dB, -1.6% speed at subme>=3
patch by Alex Wright.
Version full rev. 507
- Release Date: Apr 24, 2006
- mmx implementation of x264_pixel_sa8d
Version full rev. 506
- Release Date: Apr 21, 2006
- 10l in r463 (q0 i16x16 dc was permuted)
Version full rev. 505
- Release Date: Apr 20, 2006
- typo in r504
Version full rev. 504
- Release Date: Apr 20, 2006
- update msvc project files.
- patch by anonymous.
Version full rev. 503
- Release Date: Apr 19, 2006
- Before, we eliminated dct blocks containing only a small single coefficient. Now that behavior is optional, by --no-dct-decimate.
based on a patch by Alex Wright.
Version full rev. 502
- Release Date: Apr 17, 2006
- Enables more agressive optimizations (-fastf -mcpu=G4) on OS X.
- Adds AltiVec interleaved SAD and SSD16x16.
- Overall speedup up to 20%.
Version full rev. 501
- Release Date: Apr 17, 2006
- faster cabac_encode_bypass
Version full rev. 500
- Release Date: Apr 16, 2006
- restored AltiVec dct
Version full rev. 499
- Release Date: Apr 16, 2006
- more AltiVec mc, ~4.5% overall speedup
Version full rev. 498
- Release Date: Apr 12, 2006
- slightly faster loopfilter
Version full rev. 496
- Release Date: Apr 12, 2006
- cosmetics in sad/ssd/satd mmx
Version full rev. 497
- Release Date: Apr 12, 2006
- 3% faster satd_mmx
Version full rev. 495
- Release Date: Apr 11, 2006
- store quoted configure options. needed e.g. for multiple args under --extra-cflags.
Version full rev. 494
- Release Date: Apr 11, 2006
- fix a yasm-incompatible syntax in x86 asm
Version full rev. 493
- Release Date: Apr 11, 2006
- yasm noexec stack
Version full rev. 492
- Release Date: Apr 10, 2006
- more interleaved SAD.
- 25% faster halfpel.
Version full rev. 491
- Release Date: Apr 10, 2006
- more interleaved SAD.
- 1% faster umh, 6% faster esa.
Version full rev. 489
- Release Date: Apr 10, 2006
- Added support for ppc64. I'm really f***ing tired of having to do this.
Version full rev. 490
- Release Date: Apr 10, 2006
- interleave multiple calls to SAD.
- 15% faster fullpel motion estimation.
Version full rev. 488
- Release Date: Apr 8, 2006
- use LDFLAGS when linking shared lib
Version full rev. 487
- Release Date: Mar 29, 2006
- compilation fix for mingw, darwin (off_t was undefined)
Version full rev. 486
- Release Date: Mar 28, 2006
- (r486) GTK: support yuv4mpeg input. patch by Vincent Torri.
- (r485) GTK: fix avs input. patch by Vincent Torri.
- (r484) cli: support yuv4mpeg input. patch by anonymous.
- (r483) GTK: compilation fixes
Version full rev. 477
- Release Date: Mar 23, 2006
- 10l in r473 and stdin
- RD subpel motion estimation (--subme 7)
- cosmetics in cabac_mb_cbf
Version full rev. 451
- Release Date: Mar 4, 2006
- 10l in r443 (p4x4 chroma)
- common/i386/i386inc.asm: tell the ELF linker about our stack properties so that it does not assume the stack has to be executable.
- configure common/i386/i386inc.asm: got rid of -DFORMAT_* nasm flags and use built-in preprocessor tests instead.
- common/i386: factored the .rodata section declaration into i386inc.asm.
- configure: activate minor nasm optimisations, such as assembling "add eax, 8" as "add eax, byte 8".
- common/i386/*.asm: don't use the "GLOBAL" reserved word, some versions NASM complain about it. Replaced it with "GOT_ebx".
|
|
|
 |
|