当前位置:   article > 正文

x264源代码简单分析:编码器主干部分-1_x264源码解读

x264源码解读

=====================================================

H.264源代码分析文章列表:

【编码 - x264】

x264源代码简单分析:概述

x264源代码简单分析:x264命令行工具(x264.exe)

x264源代码简单分析:编码器主干部分-1

x264源代码简单分析:编码器主干部分-2

x264源代码简单分析:x264_slice_write()

x264源代码简单分析:滤波(Filter)部分

x264源代码简单分析:宏块分析(Analysis)部分-帧内宏块(Intra)

x264源代码简单分析:宏块分析(Analysis)部分-帧间宏块(Inter)

x264源代码简单分析:宏块编码(Encode)部分

x264源代码简单分析:熵编码(Entropy Encoding)部分

FFmpeg与libx264接口源代码简单分析

【解码 - libavcodec H.264 解码器】

FFmpeg的H.264解码器源代码简单分析:概述

FFmpeg的H.264解码器源代码简单分析:解析器(Parser)部分

FFmpeg的H.264解码器源代码简单分析:解码器主干部分

FFmpeg的H.264解码器源代码简单分析:熵解码(EntropyDecoding)部分

FFmpeg的H.264解码器源代码简单分析:宏块解码(Decode)部分-帧内宏块(Intra)

FFmpeg的H.264解码器源代码简单分析:宏块解码(Decode)部分-帧间宏块(Inter)

FFmpeg的H.264解码器源代码简单分析:环路滤波(Loop Filter)部分

=====================================================


本文分析x264编码器主干部分的源代码。“主干部分”指的就是libx264中最核心的接口函数——x264_encoder_encode(),以及相关的几个接口函数x264_encoder_open(),x264_encoder_headers(),和x264_encoder_close()。这一部分源代码比较复杂,现在看了半天依然感觉很多地方不太清晰,暂且把已经理解的地方整理出来,以后再慢慢补充还不太清晰的地方。由于主干部分内容比较多,因此打算分成两篇文章来记录:第一篇文章记录x264_encoder_open(),x264_encoder_headers(),和x264_encoder_close()这三个函数,第二篇文章记录x264_encoder_encode()函数。



函数调用关系图

X264编码器主干部分的源代码在整个x264中的位置如下图所示。

X264编码器主干部分的函数调用关系如下图所示。

 

从图中可以看出,x264主干部分最复杂的函数就是x264_encoder_encode(),该函数完成了编码一帧YUV为H.264码流的工作。与之配合的还有打开编码器的函数x264_encoder_open(),关闭编码器的函数x264_encoder_close(),以及输出SPS/PPS/SEI这样的头信息的x264_encoder_headers()。

x264_encoder_open()用于打开编码器,其中初始化了libx264编码所需要的各种变量。它调用了下面的函数:
x264_validate_parameters():检查输入参数(例如输入图像的宽高是否为正数)。
x264_predict_16x16_init():初始化Intra16x16帧内预测汇编函数。
x264_predict_4x4_init():初始化Intra4x4帧内预测汇编函数。
x264_pixel_init():初始化像素值计算相关的汇编函数(包括SAD、SATD、SSD等)。
x264_dct_init():初始化DCT变换和DCT反变换相关的汇编函数。
x264_mc_init():初始化运动补偿相关的汇编函数。
x264_quant_init():初始化量化和反量化相关的汇编函数。
x264_deblock_init():初始化去块效应滤波器相关的汇编函数。
x264_lookahead_init():初始化Lookahead相关的变量。
x264_ratecontrol_new():初始化码率控制相关的变量。

x264_encoder_headers()输出SPS/PPS/SEI这些H.264码流的头信息。它调用了下面的函数:
x264_sps_write():输出SPS
x264_pps_write():输出PPS
x264_sei_version_write():输出SEI

x264_encoder_encode()编码一帧YUV为H.264码流。它调用了下面的函数:
x264_frame_pop_unused():获取1个x264_frame_t类型结构体fenc。如果frames.unused[]队列不为空,就调用x264_frame_pop()从unused[]队列取1个现成的;否则就调用x264_frame_new()创建一个新的。
x264_frame_copy_picture():将输入的图像数据拷贝至fenc。
x264_lookahead_put_frame():将fenc放入lookahead.next.list[]队列,等待确定帧类型。
x264_lookahead_get_frames():通过lookahead分析帧类型。该函数调用了x264_slicetype_decide(),x264_slicetype_analyse()和x264_slicetype_frame_cost()等函数。经过一些列分析之后,最终确定了帧类型信息,并且将帧放入frames.current[]队列。
x264_frame_shift():从frames.current[]队列取出1帧用于编码。
x264_reference_update():更新参考帧列表。
x264_reference_reset():如果为IDR帧,调用该函数清空参考帧列表。
x264_reference_hierarchy_reset():如果是I(非IDR帧)、P帧、B帧(可做为参考帧),调用该函数。
x264_reference_build_list():创建参考帧列表list0和list1。
x264_ratecontrol_start():开启码率控制。
x264_slice_init():创建 Slice Header。
x264_slices_write():编码数据(最关键的步骤)。其中调用了x264_slice_write()完成了编码的工作(注意“x264_slices_write()”和“x264_slice_write()”名字差了一个“s”)。
x264_encoder_frame_end():编码结束后做一些后续处理,例如记录一些统计信息。其中调用了x264_frame_push_unused()将fenc重新放回frames.unused[]队列,并且调用x264_ratecontrol_end()关闭码率控制。

x264_encoder_close()用于关闭解码器,同时输出一些统计信息。它调用了下面的函数:
x264_lookahead_delete():释放Lookahead相关的变量。
x264_ratecontrol_summary():汇总码率控制信息。

x264_ratecontrol_delete():关闭码率控制。


本文将会记录x264_encoder_open(),x264_encoder_headers(),和x264_encoder_close()这三个函数的源代码。下一篇文章记录x264_encoder_encode()函数。



x264_encoder_open()

x264_encoder_open()是一个libx264的API。该函数用于打开编码器,其中初始化了libx264编码所需要的各种变量。该函数的声明如下所示。

  1. /* x264_encoder_open:
  2. * create a new encoder handler, all parameters from x264_param_t are copied */
  3. x264_t *x264_encoder_open( x264_param_t * );
x264_encoder_open()的定义位于encoder\encoder.c,如下所示。
  1. /****************************************************************************
  2. * x264_encoder_open:
  3. * 注释和处理:雷霄骅
  4. * http://blog.csdn.net/leixiaohua1020
  5. * leixiaohua1020@126.com
  6. ****************************************************************************/
  7. //打开编码器
  8. x264_t *x264_encoder_open( x264_param_t *param )
  9. {
  10. x264_t *h;
  11. char buf[1000], *p;
  12. int qp, i_slicetype_length;
  13. CHECKED_MALLOCZERO( h, sizeof(x264_t) );
  14. /* Create a copy of param */
  15. //将参数拷贝进来
  16. memcpy( &h->param, param, sizeof(x264_param_t) );
  17. if( param->param_free )
  18. param->param_free( param );
  19. if( x264_threading_init() )
  20. {
  21. x264_log( h, X264_LOG_ERROR, "unable to initialize threading\n" );
  22. goto fail;
  23. }
  24. //检查输入参数
  25. if( x264_validate_parameters( h, 1 ) < 0 )
  26. goto fail;
  27. if( h->param.psz_cqm_file )
  28. if( x264_cqm_parse_file( h, h->param.psz_cqm_file ) < 0 )
  29. goto fail;
  30. if( h->param.rc.psz_stat_out )
  31. h->param.rc.psz_stat_out = strdup( h->param.rc.psz_stat_out );
  32. if( h->param.rc.psz_stat_in )
  33. h->param.rc.psz_stat_in = strdup( h->param.rc.psz_stat_in );
  34. x264_reduce_fraction( &h->param.i_fps_num, &h->param.i_fps_den );
  35. x264_reduce_fraction( &h->param.i_timebase_num, &h->param.i_timebase_den );
  36. /* Init x264_t */
  37. h->i_frame = -1;
  38. h->i_frame_num = 0;
  39. if( h->param.i_avcintra_class )
  40. h->i_idr_pic_id = 5;
  41. else
  42. h->i_idr_pic_id = 0;
  43. if( (uint64_t)h->param.i_timebase_den * 2 > UINT32_MAX )
  44. {
  45. x264_log( h, X264_LOG_ERROR, "Effective timebase denominator %u exceeds H.264 maximum\n", h->param.i_timebase_den );
  46. goto fail;
  47. }
  48. x264_set_aspect_ratio( h, &h->param, 1 );
  49. //初始化SPS和PPS
  50. x264_sps_init( h->sps, h->param.i_sps_id, &h->param );
  51. x264_pps_init( h->pps, h->param.i_sps_id, &h->param, h->sps );
  52. //检查级Level-通过宏块个数等等
  53. x264_validate_levels( h, 1 );
  54. h->chroma_qp_table = i_chroma_qp_table + 12 + h->pps->i_chroma_qp_index_offset;
  55. if( x264_cqm_init( h ) < 0 )
  56. goto fail;
  57. //各种赋值
  58. h->mb.i_mb_width = h->sps->i_mb_width;
  59. h->mb.i_mb_height = h->sps->i_mb_height;
  60. h->mb.i_mb_count = h->mb.i_mb_width * h->mb.i_mb_height;
  61. h->mb.chroma_h_shift = CHROMA_FORMAT == CHROMA_420 || CHROMA_FORMAT == CHROMA_422;
  62. h->mb.chroma_v_shift = CHROMA_FORMAT == CHROMA_420;
  63. /* Adaptive MBAFF and subme 0 are not supported as we require halving motion
  64. * vectors during prediction, resulting in hpel mvs.
  65. * The chosen solution is to make MBAFF non-adaptive in this case. */
  66. h->mb.b_adaptive_mbaff = PARAM_INTERLACED && h->param.analyse.i_subpel_refine;
  67. /* Init frames. */
  68. if( h->param.i_bframe_adaptive == X264_B_ADAPT_TRELLIS && !h->param.rc.b_stat_read )
  69. h->frames.i_delay = X264_MAX(h->param.i_bframe,3)*4;
  70. else
  71. h->frames.i_delay = h->param.i_bframe;
  72. if( h->param.rc.b_mb_tree || h->param.rc.i_vbv_buffer_size )
  73. h->frames.i_delay = X264_MAX( h->frames.i_delay, h->param.rc.i_lookahead );
  74. i_slicetype_length = h->frames.i_delay;
  75. h->frames.i_delay += h->i_thread_frames - 1;
  76. h->frames.i_delay += h->param.i_sync_lookahead;
  77. h->frames.i_delay += h->param.b_vfr_input;
  78. h->frames.i_bframe_delay = h->param.i_bframe ? (h->param.i_bframe_pyramid ? 2 : 1) : 0;
  79. h->frames.i_max_ref0 = h->param.i_frame_reference;
  80. h->frames.i_max_ref1 = X264_MIN( h->sps->vui.i_num_reorder_frames, h->param.i_frame_reference );
  81. h->frames.i_max_dpb = h->sps->vui.i_max_dec_frame_buffering;
  82. h->frames.b_have_lowres = !h->param.rc.b_stat_read
  83. && ( h->param.rc.i_rc_method == X264_RC_ABR
  84. || h->param.rc.i_rc_method == X264_RC_CRF
  85. || h->param.i_bframe_adaptive
  86. || h->param.i_scenecut_threshold
  87. || h->param.rc.b_mb_tree
  88. || h->param.analyse.i_weighted_pred );
  89. h->frames.b_have_lowres |= h->param.rc.b_stat_read && h->param.rc.i_vbv_buffer_size > 0;
  90. h->frames.b_have_sub8x8_esa = !!(h->param.analyse.inter & X264_ANALYSE_PSUB8x8);
  91. h->frames.i_last_idr =
  92. h->frames.i_last_keyframe = - h->param.i_keyint_max;
  93. h->frames.i_input = 0;
  94. h->frames.i_largest_pts = h->frames.i_second_largest_pts = -1;
  95. h->frames.i_poc_last_open_gop = -1;
  96. //CHECKED_MALLOCZERO(var, size)
  97. //调用malloc()分配内存,然后调用memset()置零
  98. CHECKED_MALLOCZERO( h->frames.unused[0], (h->frames.i_delay + 3) * sizeof(x264_frame_t *) );
  99. /* Allocate room for max refs plus a few extra just in case. */
  100. CHECKED_MALLOCZERO( h->frames.unused[1], (h->i_thread_frames + X264_REF_MAX + 4) * sizeof(x264_frame_t *) );
  101. CHECKED_MALLOCZERO( h->frames.current, (h->param.i_sync_lookahead + h->param.i_bframe
  102. + h->i_thread_frames + 3) * sizeof(x264_frame_t *) );
  103. if( h->param.analyse.i_weighted_pred > 0 )
  104. CHECKED_MALLOCZERO( h->frames.blank_unused, h->i_thread_frames * 4 * sizeof(x264_frame_t *) );
  105. h->i_ref[0] = h->i_ref[1] = 0;
  106. h->i_cpb_delay = h->i_coded_fields = h->i_disp_fields = 0;
  107. h->i_prev_duration = ((uint64_t)h->param.i_fps_den * h->sps->vui.i_time_scale) / ((uint64_t)h->param.i_fps_num * h->sps->vui.i_num_units_in_tick);
  108. h->i_disp_fields_last_frame = -1;
  109. //RDO初始化
  110. x264_rdo_init();
  111. /* init CPU functions */
  112. //初始化包含汇编优化的函数
  113. //帧内预测
  114. x264_predict_16x16_init( h->param.cpu, h->predict_16x16 );
  115. x264_predict_8x8c_init( h->param.cpu, h->predict_8x8c );
  116. x264_predict_8x16c_init( h->param.cpu, h->predict_8x16c );
  117. x264_predict_8x8_init( h->param.cpu, h->predict_8x8, &h->predict_8x8_filter );
  118. x264_predict_4x4_init( h->param.cpu, h->predict_4x4 );
  119. //SAD等和像素计算有关的函数
  120. x264_pixel_init( h->param.cpu, &h->pixf );
  121. //DCT
  122. x264_dct_init( h->param.cpu, &h->dctf );
  123. //“之”字扫描
  124. x264_zigzag_init( h->param.cpu, &h->zigzagf_progressive, &h->zigzagf_interlaced );
  125. memcpy( &h->zigzagf, PARAM_INTERLACED ? &h->zigzagf_interlaced : &h->zigzagf_progressive, sizeof(h->zigzagf) );
  126. //运动补偿
  127. x264_mc_init( h->param.cpu, &h->mc, h->param.b_cpu_independent );
  128. //量化
  129. x264_quant_init( h, h->param.cpu, &h->quantf );
  130. //去块效应滤波
  131. x264_deblock_init( h->param.cpu, &h->loopf, PARAM_INTERLACED );
  132. x264_bitstream_init( h->param.cpu, &h->bsf );
  133. //初始化CABAC或者是CAVLC
  134. if( h->param.b_cabac )
  135. x264_cabac_init( h );
  136. else
  137. x264_stack_align( x264_cavlc_init, h );
  138. //决定了像素比较的时候用SAD还是SATD
  139. mbcmp_init( h );
  140. chroma_dsp_init( h );
  141. //CPU属性
  142. p = buf + sprintf( buf, "using cpu capabilities:" );
  143. for( int i = 0; x264_cpu_names[i].flags; i++ )
  144. {
  145. if( !strcmp(x264_cpu_names[i].name, "SSE")
  146. && h->param.cpu & (X264_CPU_SSE2) )
  147. continue;
  148. if( !strcmp(x264_cpu_names[i].name, "SSE2")
  149. && h->param.cpu & (X264_CPU_SSE2_IS_FAST|X264_CPU_SSE2_IS_SLOW) )
  150. continue;
  151. if( !strcmp(x264_cpu_names[i].name, "SSE3")
  152. && (h->param.cpu & X264_CPU_SSSE3 || !(h->param.cpu & X264_CPU_CACHELINE_64)) )
  153. continue;
  154. if( !strcmp(x264_cpu_names[i].name, "SSE4.1")
  155. && (h->param.cpu & X264_CPU_SSE42) )
  156. continue;
  157. if( !strcmp(x264_cpu_names[i].name, "BMI1")
  158. && (h->param.cpu & X264_CPU_BMI2) )
  159. continue;
  160. if( (h->param.cpu & x264_cpu_names[i].flags) == x264_cpu_names[i].flags
  161. && (!i || x264_cpu_names[i].flags != x264_cpu_names[i-1].flags) )
  162. p += sprintf( p, " %s", x264_cpu_names[i].name );
  163. }
  164. if( !h->param.cpu )
  165. p += sprintf( p, " none!" );
  166. x264_log( h, X264_LOG_INFO, "%s\n", buf );
  167. float *logs = x264_analyse_prepare_costs( h );
  168. if( !logs )
  169. goto fail;
  170. for( qp = X264_MIN( h->param.rc.i_qp_min, QP_MAX_SPEC ); qp <= h->param.rc.i_qp_max; qp++ )
  171. if( x264_analyse_init_costs( h, logs, qp ) )
  172. goto fail;
  173. if( x264_analyse_init_costs( h, logs, X264_LOOKAHEAD_QP ) )
  174. goto fail;
  175. x264_free( logs );
  176. static const uint16_t cost_mv_correct[7] = { 24, 47, 95, 189, 379, 757, 1515 };
  177. /* Checks for known miscompilation issues. */
  178. if( h->cost_mv[X264_LOOKAHEAD_QP][2013] != cost_mv_correct[BIT_DEPTH-8] )
  179. {
  180. x264_log( h, X264_LOG_ERROR, "MV cost test failed: x264 has been miscompiled!\n" );
  181. goto fail;
  182. }
  183. /* Must be volatile or else GCC will optimize it out. */
  184. volatile int temp = 392;
  185. if( x264_clz( temp ) != 23 )
  186. {
  187. x264_log( h, X264_LOG_ERROR, "CLZ test failed: x264 has been miscompiled!\n" );
  188. #if ARCH_X86 || ARCH_X86_64
  189. x264_log( h, X264_LOG_ERROR, "Are you attempting to run an SSE4a/LZCNT-targeted build on a CPU that\n" );
  190. x264_log( h, X264_LOG_ERROR, "doesn't support it?\n" );
  191. #endif
  192. goto fail;
  193. }
  194. h->out.i_nal = 0;
  195. h->out.i_bitstream = X264_MAX( 1000000, h->param.i_width * h->param.i_height * 4
  196. * ( h->param.rc.i_rc_method == X264_RC_ABR ? pow( 0.95, h->param.rc.i_qp_min )
  197. : pow( 0.95, h->param.rc.i_qp_constant ) * X264_MAX( 1, h->param.rc.f_ip_factor )));
  198. h->nal_buffer_size = h->out.i_bitstream * 3/2 + 4 + 64; /* +4 for startcode, +64 for nal_escape assembly padding */
  199. CHECKED_MALLOC( h->nal_buffer, h->nal_buffer_size );
  200. CHECKED_MALLOC( h->reconfig_h, sizeof(x264_t) );
  201. if( h->param.i_threads > 1 &&
  202. x264_threadpool_init( &h->threadpool, h->param.i_threads, (void*)x264_encoder_thread_init, h ) )
  203. goto fail;
  204. if( h->param.i_lookahead_threads > 1 &&
  205. x264_threadpool_init( &h->lookaheadpool, h->param.i_lookahead_threads, NULL, NULL ) )
  206. goto fail;
  207. #if HAVE_OPENCL
  208. if( h->param.b_opencl )
  209. {
  210. h->opencl.ocl = x264_opencl_load_library();
  211. if( !h->opencl.ocl )
  212. {
  213. x264_log( h, X264_LOG_WARNING, "failed to load OpenCL\n" );
  214. h->param.b_opencl = 0;
  215. }
  216. }
  217. #endif
  218. h->thread[0] = h;
  219. for( int i = 1; i < h->param.i_threads + !!h->param.i_sync_lookahead; i++ )
  220. CHECKED_MALLOC( h->thread[i], sizeof(x264_t) );
  221. if( h->param.i_lookahead_threads > 1 )
  222. for( int i = 0; i < h->param.i_lookahead_threads; i++ )
  223. {
  224. CHECKED_MALLOC( h->lookahead_thread[i], sizeof(x264_t) );
  225. *h->lookahead_thread[i] = *h;
  226. }
  227. *h->reconfig_h = *h;
  228. for( int i = 0; i < h->param.i_threads; i++ )
  229. {
  230. int init_nal_count = h->param.i_slice_count + 3;
  231. int allocate_threadlocal_data = !h->param.b_sliced_threads || !i;
  232. if( i > 0 )
  233. *h->thread[i] = *h;
  234. if( x264_pthread_mutex_init( &h->thread[i]->mutex, NULL ) )
  235. goto fail;
  236. if( x264_pthread_cond_init( &h->thread[i]->cv, NULL ) )
  237. goto fail;
  238. if( allocate_threadlocal_data )
  239. {
  240. h->thread[i]->fdec = x264_frame_pop_unused( h, 1 );
  241. if( !h->thread[i]->fdec )
  242. goto fail;
  243. }
  244. else
  245. h->thread[i]->fdec = h->thread[0]->fdec;
  246. CHECKED_MALLOC( h->thread[i]->out.p_bitstream, h->out.i_bitstream );
  247. /* Start each thread with room for init_nal_count NAL units; it'll realloc later if needed. */
  248. CHECKED_MALLOC( h->thread[i]->out.nal, init_nal_count*sizeof(x264_nal_t) );
  249. h->thread[i]->out.i_nals_allocated = init_nal_count;
  250. if( allocate_threadlocal_data && x264_macroblock_cache_allocate( h->thread[i] ) < 0 )
  251. goto fail;
  252. }
  253. #if HAVE_OPENCL
  254. if( h->param.b_opencl && x264_opencl_lookahead_init( h ) < 0 )
  255. h->param.b_opencl = 0;
  256. #endif
  257. //初始化lookahead
  258. if( x264_lookahead_init( h, i_slicetype_length ) )
  259. goto fail;
  260. for( int i = 0; i < h->param.i_threads; i++ )
  261. if( x264_macroblock_thread_allocate( h->thread[i], 0 ) < 0 )
  262. goto fail;
  263. //创建码率控制
  264. if( x264_ratecontrol_new( h ) < 0 )
  265. goto fail;
  266. if( h->param.i_nal_hrd )
  267. {
  268. x264_log( h, X264_LOG_DEBUG, "HRD bitrate: %i bits/sec\n", h->sps->vui.hrd.i_bit_rate_unscaled );
  269. x264_log( h, X264_LOG_DEBUG, "CPB size: %i bits\n", h->sps->vui.hrd.i_cpb_size_unscaled );
  270. }
  271. if( h->param.psz_dump_yuv )
  272. {
  273. /* create or truncate the reconstructed video file */
  274. FILE *f = x264_fopen( h->param.psz_dump_yuv, "w" );
  275. if( !f )
  276. {
  277. x264_log( h, X264_LOG_ERROR, "dump_yuv: can't write to %s\n", h->param.psz_dump_yuv );
  278. goto fail;
  279. }
  280. else if( !x264_is_regular_file( f ) )
  281. {
  282. x264_log( h, X264_LOG_ERROR, "dump_yuv: incompatible with non-regular file %s\n", h->param.psz_dump_yuv );
  283. goto fail;
  284. }
  285. fclose( f );
  286. }
  287. //这写法......
  288. const char *profile = h->sps->i_profile_idc == PROFILE_BASELINE ? "Constrained Baseline" :
  289. h->sps->i_profile_idc == PROFILE_MAIN ? "Main" :
  290. h->sps->i_profile_idc == PROFILE_HIGH ? "High" :
  291. h->sps->i_profile_idc == PROFILE_HIGH10 ? (h->sps->b_constraint_set3 == 1 ? "High 10 Intra" : "High 10") :
  292. h->sps->i_profile_idc == PROFILE_HIGH422 ? (h->sps->b_constraint_set3 == 1 ? "High 4:2:2 Intra" : "High 4:2:2") :
  293. h->sps->b_constraint_set3 == 1 ? "High 4:4:4 Intra" : "High 4:4:4 Predictive";
  294. char level[4];
  295. snprintf( level, sizeof(level), "%d.%d", h->sps->i_level_idc/10, h->sps->i_level_idc%10 );
  296. if( h->sps->i_level_idc == 9 || ( h->sps->i_level_idc == 11 && h->sps->b_constraint_set3 &&
  297. (h->sps->i_profile_idc == PROFILE_BASELINE || h->sps->i_profile_idc == PROFILE_MAIN) ) )
  298. strcpy( level, "1b" );
  299. //输出型和级
  300. if( h->sps->i_profile_idc < PROFILE_HIGH10 )
  301. {
  302. x264_log( h, X264_LOG_INFO, "profile %s, level %s\n",
  303. profile, level );
  304. }
  305. else
  306. {
  307. static const char * const subsampling[4] = { "4:0:0", "4:2:0", "4:2:2", "4:4:4" };
  308. x264_log( h, X264_LOG_INFO, "profile %s, level %s, %s %d-bit\n",
  309. profile, level, subsampling[CHROMA_FORMAT], BIT_DEPTH );
  310. }
  311. return h;
  312. fail:
  313. //释放
  314. x264_free( h );
  315. return NULL;
  316. }

由于源代码中已经做了比较详细的注释,在这里就不重复叙述了。下面根据函数调用的顺序,看一下x264_encoder_open()调用的下面几个函数:
x264_sps_init():根据输入参数生成H.264码流的SPS信息。
x264_pps_init():根据输入参数生成H.264码流的PPS信息。
x264_predict_16x16_init():初始化Intra16x16帧内预测汇编函数。
x264_predict_4x4_init():初始化Intra4x4帧内预测汇编函数。
x264_pixel_init():初始化像素值计算相关的汇编函数(包括SAD、SATD、SSD等)。
x264_dct_init():初始化DCT变换和DCT反变换相关的汇编函数。
x264_mc_init():初始化运动补偿相关的汇编函数。
x264_quant_init():初始化量化和反量化相关的汇编函数。
x264_deblock_init():初始化去块效应滤波器相关的汇编函数。
mbcmp_init():决定像素比较的时候使用SAD还是SATD。

x264_sps_init()

x264_sps_init()根据输入参数生成H.264码流的SPS (Sequence Parameter Set,序列参数集)信息。该函数的定义位于encoder\set.c,如下所示。
  1. //初始化SPS
  2. void x264_sps_init( x264_sps_t *sps, int i_id, x264_param_t *param )
  3. {
  4. int csp = param->i_csp & X264_CSP_MASK;
  5. sps->i_id = i_id;
  6. //以宏块为单位的宽度
  7. sps->i_mb_width = ( param->i_width + 15 ) / 16;
  8. //以宏块为单位的高度
  9. sps->i_mb_height= ( param->i_height + 15 ) / 16;
  10. //色度取样格式
  11. sps->i_chroma_format_idc = csp >= X264_CSP_I444 ? CHROMA_444 :
  12. csp >= X264_CSP_I422 ? CHROMA_422 : CHROMA_420;
  13. sps->b_qpprime_y_zero_transform_bypass = param->rc.i_rc_method == X264_RC_CQP && param->rc.i_qp_constant == 0;
  14. //型profile
  15. if( sps->b_qpprime_y_zero_transform_bypass || sps->i_chroma_format_idc == CHROMA_444 )
  16. sps->i_profile_idc = PROFILE_HIGH444_PREDICTIVE;//YUV444的时候
  17. else if( sps->i_chroma_format_idc == CHROMA_422 )
  18. sps->i_profile_idc = PROFILE_HIGH422;
  19. else if( BIT_DEPTH > 8 )
  20. sps->i_profile_idc = PROFILE_HIGH10;
  21. else if( param->analyse.b_transform_8x8 || param->i_cqm_preset != X264_CQM_FLAT )
  22. sps->i_profile_idc = PROFILE_HIGH;//高型 High Profile 目前最常见
  23. else if( param->b_cabac || param->i_bframe > 0 || param->b_interlaced || param->b_fake_interlaced || param->analyse.i_weighted_pred > 0 )
  24. sps->i_profile_idc = PROFILE_MAIN;//主型
  25. else
  26. sps->i_profile_idc = PROFILE_BASELINE;//基本型
  27. sps->b_constraint_set0 = sps->i_profile_idc == PROFILE_BASELINE;
  28. /* x264 doesn't support the features that are in Baseline and not in Main,
  29. * namely arbitrary_slice_order and slice_groups. */
  30. sps->b_constraint_set1 = sps->i_profile_idc <= PROFILE_MAIN;
  31. /* Never set constraint_set2, it is not necessary and not used in real world. */
  32. sps->b_constraint_set2 = 0;
  33. sps->b_constraint_set3 = 0;
  34. //级level
  35. sps->i_level_idc = param->i_level_idc;
  36. if( param->i_level_idc == 9 && ( sps->i_profile_idc == PROFILE_BASELINE || sps->i_profile_idc == PROFILE_MAIN ) )
  37. {
  38. sps->b_constraint_set3 = 1; /* level 1b with Baseline or Main profile is signalled via constraint_set3 */
  39. sps->i_level_idc = 11;
  40. }
  41. /* Intra profiles */
  42. if( param->i_keyint_max == 1 && sps->i_profile_idc > PROFILE_HIGH )
  43. sps->b_constraint_set3 = 1;
  44. sps->vui.i_num_reorder_frames = param->i_bframe_pyramid ? 2 : param->i_bframe ? 1 : 0;
  45. /* extra slot with pyramid so that we don't have to override the
  46. * order of forgetting old pictures */
  47. //参考帧数量
  48. sps->vui.i_max_dec_frame_buffering =
  49. sps->i_num_ref_frames = X264_MIN(X264_REF_MAX, X264_MAX4(param->i_frame_reference, 1 + sps->vui.i_num_reorder_frames,
  50. param->i_bframe_pyramid ? 4 : 1, param->i_dpb_size));
  51. sps->i_num_ref_frames -= param->i_bframe_pyramid == X264_B_PYRAMID_STRICT;
  52. if( param->i_keyint_max == 1 )
  53. {
  54. sps->i_num_ref_frames = 0;
  55. sps->vui.i_max_dec_frame_buffering = 0;
  56. }
  57. /* number of refs + current frame */
  58. int max_frame_num = sps->vui.i_max_dec_frame_buffering * (!!param->i_bframe_pyramid+1) + 1;
  59. /* Intra refresh cannot write a recovery time greater than max frame num-1 */
  60. if( param->b_intra_refresh )
  61. {
  62. int time_to_recovery = X264_MIN( sps->i_mb_width - 1, param->i_keyint_max ) + param->i_bframe - 1;
  63. max_frame_num = X264_MAX( max_frame_num, time_to_recovery+1 );
  64. }
  65. sps->i_log2_max_frame_num = 4;
  66. while( (1 << sps->i_log2_max_frame_num) <= max_frame_num )
  67. sps->i_log2_max_frame_num++;
  68. //POC类型
  69. sps->i_poc_type = param->i_bframe || param->b_interlaced ? 0 : 2;
  70. if( sps->i_poc_type == 0 )
  71. {
  72. int max_delta_poc = (param->i_bframe + 2) * (!!param->i_bframe_pyramid + 1) * 2;
  73. sps->i_log2_max_poc_lsb = 4;
  74. while( (1 << sps->i_log2_max_poc_lsb) <= max_delta_poc * 2 )
  75. sps->i_log2_max_poc_lsb++;
  76. }
  77. sps->b_vui = 1;
  78. sps->b_gaps_in_frame_num_value_allowed = 0;
  79. sps->b_frame_mbs_only = !(param->b_interlaced || param->b_fake_interlaced);
  80. if( !sps->b_frame_mbs_only )
  81. sps->i_mb_height = ( sps->i_mb_height + 1 ) & ~1;
  82. sps->b_mb_adaptive_frame_field = param->b_interlaced;
  83. sps->b_direct8x8_inference = 1;
  84. sps->crop.i_left = param->crop_rect.i_left;
  85. sps->crop.i_top = param->crop_rect.i_top;
  86. sps->crop.i_right = param->crop_rect.i_right + sps->i_mb_width*16 - param->i_width;
  87. sps->crop.i_bottom = (param->crop_rect.i_bottom + sps->i_mb_height*16 - param->i_height) >> !sps->b_frame_mbs_only;
  88. sps->b_crop = sps->crop.i_left || sps->crop.i_top ||
  89. sps->crop.i_right || sps->crop.i_bottom;
  90. sps->vui.b_aspect_ratio_info_present = 0;
  91. if( param->vui.i_sar_width > 0 && param->vui.i_sar_height > 0 )
  92. {
  93. sps->vui.b_aspect_ratio_info_present = 1;
  94. sps->vui.i_sar_width = param->vui.i_sar_width;
  95. sps->vui.i_sar_height= param->vui.i_sar_height;
  96. }
  97. sps->vui.b_overscan_info_present = param->vui.i_overscan > 0 && param->vui.i_overscan <= 2;
  98. if( sps->vui.b_overscan_info_present )
  99. sps->vui.b_overscan_info = ( param->vui.i_overscan == 2 ? 1 : 0 );
  100. sps->vui.b_signal_type_present = 0;
  101. sps->vui.i_vidformat = ( param->vui.i_vidformat >= 0 && param->vui.i_vidformat <= 5 ? param->vui.i_vidformat : 5 );
  102. sps->vui.b_fullrange = ( param->vui.b_fullrange >= 0 && param->vui.b_fullrange <= 1 ? param->vui.b_fullrange :
  103. ( csp >= X264_CSP_BGR ? 1 : 0 ) );
  104. sps->vui.b_color_description_present = 0;
  105. sps->vui.i_colorprim = ( param->vui.i_colorprim >= 0 && param->vui.i_colorprim <= 9 ? param->vui.i_colorprim : 2 );
  106. sps->vui.i_transfer = ( param->vui.i_transfer >= 0 && param->vui.i_transfer <= 15 ? param->vui.i_transfer : 2 );
  107. sps->vui.i_colmatrix = ( param->vui.i_colmatrix >= 0 && param->vui.i_colmatrix <= 10 ? param->vui.i_colmatrix :
  108. ( csp >= X264_CSP_BGR ? 0 : 2 ) );
  109. if( sps->vui.i_colorprim != 2 ||
  110. sps->vui.i_transfer != 2 ||
  111. sps->vui.i_colmatrix != 2 )
  112. {
  113. sps->vui.b_color_description_present = 1;
  114. }
  115. if( sps->vui.i_vidformat != 5 ||
  116. sps->vui.b_fullrange ||
  117. sps->vui.b_color_description_present )
  118. {
  119. sps->vui.b_signal_type_present = 1;
  120. }
  121. /* FIXME: not sufficient for interlaced video */
  122. sps->vui.b_chroma_loc_info_present = param->vui.i_chroma_loc > 0 && param->vui.i_chroma_loc <= 5 &&
  123. sps->i_chroma_format_idc == CHROMA_420;
  124. if( sps->vui.b_chroma_loc_info_present )
  125. {
  126. sps->vui.i_chroma_loc_top = param->vui.i_chroma_loc;
  127. sps->vui.i_chroma_loc_bottom = param->vui.i_chroma_loc;
  128. }
  129. sps->vui.b_timing_info_present = param->i_timebase_num > 0 && param->i_timebase_den > 0;
  130. if( sps->vui.b_timing_info_present )
  131. {
  132. sps->vui.i_num_units_in_tick = param->i_timebase_num;
  133. sps->vui.i_time_scale = param->i_timebase_den * 2;
  134. sps->vui.b_fixed_frame_rate = !param->b_vfr_input;
  135. }
  136. sps->vui.b_vcl_hrd_parameters_present = 0; // we don't support VCL HRD
  137. sps->vui.b_nal_hrd_parameters_present = !!param->i_nal_hrd;
  138. sps->vui.b_pic_struct_present = param->b_pic_struct;
  139. // NOTE: HRD related parts of the SPS are initialised in x264_ratecontrol_init_reconfigurable
  140. sps->vui.b_bitstream_restriction = param->i_keyint_max > 1;
  141. if( sps->vui.b_bitstream_restriction )
  142. {
  143. sps->vui.b_motion_vectors_over_pic_boundaries = 1;
  144. sps->vui.i_max_bytes_per_pic_denom = 0;
  145. sps->vui.i_max_bits_per_mb_denom = 0;
  146. sps->vui.i_log2_max_mv_length_horizontal =
  147. sps->vui.i_log2_max_mv_length_vertical = (int)log2f( X264_MAX( 1, param->analyse.i_mv_range*4-1 ) ) + 1;
  148. }
  149. }

从源代码可以看出,x264_sps_init()根据输入参数集x264_param_t中的信息,初始化了SPS结构体中的成员变量。有关这些成员变量的具体信息,可以参考《H.264标准》。

x264_pps_init()

x264_pps_init()根据输入参数生成H.264码流的PPS(Picture Parameter Set,图像参数集)信息。该函数的定义位于encoder\set.c,如下所示。
  1. //初始化PPS
  2. void x264_pps_init( x264_pps_t *pps, int i_id, x264_param_t *param, x264_sps_t *sps )
  3. {
  4. pps->i_id = i_id;
  5. //所属的SPS
  6. pps->i_sps_id = sps->i_id;
  7. //是否使用CABAC?
  8. pps->b_cabac = param->b_cabac;
  9. pps->b_pic_order = !param->i_avcintra_class && param->b_interlaced;
  10. pps->i_num_slice_groups = 1;
  11. //目前参考帧队列的长度
  12. //注意是这个队列中当前实际的、已存在的参考帧数目,这从它的名字“active”中也可以看出来。
  13. pps->i_num_ref_idx_l0_default_active = param->i_frame_reference;
  14. pps->i_num_ref_idx_l1_default_active = 1;
  15. //加权预测
  16. pps->b_weighted_pred = param->analyse.i_weighted_pred > 0;
  17. pps->b_weighted_bipred = param->analyse.b_weighted_bipred ? 2 : 0;
  18. //量化参数QP的初始值
  19. pps->i_pic_init_qp = param->rc.i_rc_method == X264_RC_ABR || param->b_stitchable ? 26 + QP_BD_OFFSET : SPEC_QP( param->rc.i_qp_constant );
  20. pps->i_pic_init_qs = 26 + QP_BD_OFFSET;
  21. pps->i_chroma_qp_index_offset = param->analyse.i_chroma_qp_offset;
  22. pps->b_deblocking_filter_control = 1;
  23. pps->b_constrained_intra_pred = param->b_constrained_intra;
  24. pps->b_redundant_pic_cnt = 0;
  25. pps->b_transform_8x8_mode = param->analyse.b_transform_8x8 ? 1 : 0;
  26. pps->i_cqm_preset = param->i_cqm_preset;
  27. switch( pps->i_cqm_preset )
  28. {
  29. case X264_CQM_FLAT:
  30. for( int i = 0; i < 8; i++ )
  31. pps->scaling_list[i] = x264_cqm_flat16;
  32. break;
  33. case X264_CQM_JVT:
  34. for( int i = 0; i < 8; i++ )
  35. pps->scaling_list[i] = x264_cqm_jvt[i];
  36. break;
  37. case X264_CQM_CUSTOM:
  38. /* match the transposed DCT & zigzag */
  39. transpose( param->cqm_4iy, 4 );
  40. transpose( param->cqm_4py, 4 );
  41. transpose( param->cqm_4ic, 4 );
  42. transpose( param->cqm_4pc, 4 );
  43. transpose( param->cqm_8iy, 8 );
  44. transpose( param->cqm_8py, 8 );
  45. transpose( param->cqm_8ic, 8 );
  46. transpose( param->cqm_8pc, 8 );
  47. pps->scaling_list[CQM_4IY] = param->cqm_4iy;
  48. pps->scaling_list[CQM_4PY] = param->cqm_4py;
  49. pps->scaling_list[CQM_4IC] = param->cqm_4ic;
  50. pps->scaling_list[CQM_4PC] = param->cqm_4pc;
  51. pps->scaling_list[CQM_8IY+4] = param->cqm_8iy;
  52. pps->scaling_list[CQM_8PY+4] = param->cqm_8py;
  53. pps->scaling_list[CQM_8IC+4] = param->cqm_8ic;
  54. pps->scaling_list[CQM_8PC+4] = param->cqm_8pc;
  55. for( int i = 0; i < 8; i++ )
  56. for( int j = 0; j < (i < 4 ? 16 : 64); j++ )
  57. if( pps->scaling_list[i][j] == 0 )
  58. pps->scaling_list[i] = x264_cqm_jvt[i];
  59. break;
  60. }
  61. }

从源代码可以看出,x264_pps_init()根据输入参数集x264_param_t中的信息,初始化了PPS结构体中的成员变量。有关这些成员变量的具体信息,可以参考《H.264标准》。

x264_predict_16x16_init()

x264_predict_16x16_init()用于初始化Intra16x16帧内预测汇编函数。该函数的定义位于x264\common\predict.c,如下所示。
  1. //Intra16x16帧内预测汇编函数初始化
  2. void x264_predict_16x16_init( int cpu, x264_predict_t pf[7] )
  3. {
  4. //C语言版本
  5. //================================================
  6. //垂直 Vertical
  7. pf[I_PRED_16x16_V ] = x264_predict_16x16_v_c;
  8. //水平 Horizontal
  9. pf[I_PRED_16x16_H ] = x264_predict_16x16_h_c;
  10. //DC
  11. pf[I_PRED_16x16_DC] = x264_predict_16x16_dc_c;
  12. //Plane
  13. pf[I_PRED_16x16_P ] = x264_predict_16x16_p_c;
  14. //这几种是啥?
  15. pf[I_PRED_16x16_DC_LEFT]= x264_predict_16x16_dc_left_c;
  16. pf[I_PRED_16x16_DC_TOP ]= x264_predict_16x16_dc_top_c;
  17. pf[I_PRED_16x16_DC_128 ]= x264_predict_16x16_dc_128_c;
  18. //================================================
  19. //MMX版本
  20. #if HAVE_MMX
  21. x264_predict_16x16_init_mmx( cpu, pf );
  22. #endif
  23. //ALTIVEC版本
  24. #if HAVE_ALTIVEC
  25. if( cpu&X264_CPU_ALTIVEC )
  26. x264_predict_16x16_init_altivec( pf );
  27. #endif
  28. //ARMV6版本
  29. #if HAVE_ARMV6
  30. x264_predict_16x16_init_arm( cpu, pf );
  31. #endif
  32. //AARCH64版本
  33. #if ARCH_AARCH64
  34. x264_predict_16x16_init_aarch64( cpu, pf );
  35. #endif
  36. }

从源代码可看出,x264_predict_16x16_init()首先对帧内预测函数指针数组x264_predict_t[]中的元素赋值了C语言版本的函数x264_predict_16x16_v_c(),x264_predict_16x16_h_c(),x264_predict_16x16_dc_c(),x264_predict_16x16_p_c();然后会判断系统平台的特性,如果平台支持的话,会调用x264_predict_16x16_init_mmx(),x264_predict_16x16_init_arm()等给x264_predict_t[]中的元素赋值经过汇编优化的函数。下文将会简单看几个其中的函数。

相关知识简述

简单记录一下帧内预测的方法。帧内预测根据宏块左边和上边的边界像素值推算宏块内部的像素值,帧内预测的效果如下图所示。其中左边的图为图像原始画面,右边的图为经过帧内预测后没有叠加残差的画面。


H.264中有两种帧内预测模式:16x16亮度帧内预测模式和4x4亮度帧内预测模式。其中16x16帧内预测模式一共有4种,如下图所示。
 
这4种模式列表如下。

模式

描述

Vertical

由上边像素推出相应像素值

Horizontal

由左边像素推出相应像素值

DC

由上边和左边像素平均值推出相应像素值

Plane

由上边和左边像素推出相应像素值


4x4帧内预测模式一共有9种,如下图所示。
 
有关Intra4x4的帧内预测模式的代码将在后文中进行记录。下面举例看一下Intra16x16的Vertical预测模式的实现函数x264_predict_16x16_v_c()。

x264_predict_16x16_v_c()

x264_predict_16x16_v_c()实现了Intra16x16的Vertical预测模式。该函数的定义位于common\predict.c,如下所示。
  1. //16x16帧内预测
  2. //垂直预测(Vertical)
  3. void x264_predict_16x16_v_c( pixel *src )
  4. {
  5. /*
  6. * Vertical预测方式
  7. * |X1 X2 X3 X4
  8. * --+-----------
  9. * |X1 X2 X3 X4
  10. * |X1 X2 X3 X4
  11. * |X1 X2 X3 X4
  12. * |X1 X2 X3 X4
  13. *
  14. */
  15. /*
  16. * 【展开宏定义】
  17. * uint32_t v0 = ((x264_union32_t*)(&src[ 0-FDEC_STRIDE]))->i;
  18. * uint32_t v1 = ((x264_union32_t*)(&src[ 4-FDEC_STRIDE]))->i;
  19. * uint32_t v2 = ((x264_union32_t*)(&src[ 8-FDEC_STRIDE]))->i;
  20. * uint32_t v3 = ((x264_union32_t*)(&src[12-FDEC_STRIDE]))->i;
  21. * 在这里,上述代码实际上相当于:
  22. * uint32_t v0 = *((uint32_t*)(&src[ 0-FDEC_STRIDE]));
  23. * uint32_t v1 = *((uint32_t*)(&src[ 4-FDEC_STRIDE]));
  24. * uint32_t v2 = *((uint32_t*)(&src[ 8-FDEC_STRIDE]));
  25. * uint32_t v3 = *((uint32_t*)(&src[12-FDEC_STRIDE]));
  26. * 即分成4次,每次取出4个像素(一共16个像素),分别赋值给v0,v1,v2,v3
  27. * 取出的值源自于16x16块上面的一行像素
  28. * 0| 4 8 12 16
  29. * || v0 | v1 | v2 | v3 |
  30. * ---++==========+==========+==========+==========+
  31. * ||
  32. * ||
  33. * ||
  34. * ||
  35. * ||
  36. * ||
  37. *
  38. */
  39. //pixel4实际上是uint32_t(占用32bit),存储4个像素的值(每个像素占用8bit)
  40. pixel4 v0 = MPIXEL_X4( &src[ 0-FDEC_STRIDE] );
  41. pixel4 v1 = MPIXEL_X4( &src[ 4-FDEC_STRIDE] );
  42. pixel4 v2 = MPIXEL_X4( &src[ 8-FDEC_STRIDE] );
  43. pixel4 v3 = MPIXEL_X4( &src[12-FDEC_STRIDE] );
  44. //循环赋值16行
  45. for( int i = 0; i < 16; i++ )
  46. {
  47. //【展开宏定义】
  48. //(((x264_union32_t*)(src+ 0))->i) = v0;
  49. //(((x264_union32_t*)(src+ 4))->i) = v1;
  50. //(((x264_union32_t*)(src+ 8))->i) = v2;
  51. //(((x264_union32_t*)(src+12))->i) = v3;
  52. //即分成4次,每次赋值4个像素
  53. //
  54. MPIXEL_X4( src+ 0 ) = v0;
  55. MPIXEL_X4( src+ 4 ) = v1;
  56. MPIXEL_X4( src+ 8 ) = v2;
  57. MPIXEL_X4( src+12 ) = v3;
  58. //下一行
  59. //FDEC_STRIDE=32,是重建宏块缓存fdec_buf一行的数据量
  60. src += FDEC_STRIDE;
  61. }
  62. }

从源代码可以看出,x264_predict_16x16_v_c()首先取出了16x16图像块上面一行16个像素的值存储在v0,v1,v2,v3四个变量中(每个变量存储4个像素),然后循环16次将v0,v1,v2,v3赋值给16x16图像块的16行。

看完C语言版本Intra16x16的Vertical预测模式的实现函数之后,我们可以继续看一下该预测模式汇编语言版本的实现函数。从前面的初始化函数中已经可以看出,当系统支持X86汇编的时候,会调用x264_predict_16x16_init_mmx()初始化x86汇编优化过的函数;当系统支持ARM的时候,会调用x264_predict_16x16_init_arm()初始化ARM汇编优化过的函数。

x264_predict_16x16_init_mmx()

x264_predict_16x16_init_mmx()用于初始化经过x86汇编优化过的Intra16x16的帧内预测函数。该函数的定义位于common\x86\predict-c.c(在“x86”子文件夹下),如下所示。
  1. //Intra16x16帧内预测汇编函数-MMX版本
  2. void x264_predict_16x16_init_mmx( int cpu, x264_predict_t pf[7] )
  3. {
  4. if( !(cpu&X264_CPU_MMX2) )
  5. return;
  6. pf[I_PRED_16x16_DC] = x264_predict_16x16_dc_mmx2;
  7. pf[I_PRED_16x16_DC_TOP] = x264_predict_16x16_dc_top_mmx2;
  8. pf[I_PRED_16x16_DC_LEFT] = x264_predict_16x16_dc_left_mmx2;
  9. pf[I_PRED_16x16_V] = x264_predict_16x16_v_mmx2;
  10. pf[I_PRED_16x16_H] = x264_predict_16x16_h_mmx2;
  11. #if HIGH_BIT_DEPTH
  12. if( !(cpu&X264_CPU_SSE) )
  13. return;
  14. pf[I_PRED_16x16_V] = x264_predict_16x16_v_sse;
  15. if( !(cpu&X264_CPU_SSE2) )
  16. return;
  17. pf[I_PRED_16x16_DC] = x264_predict_16x16_dc_sse2;
  18. pf[I_PRED_16x16_DC_TOP] = x264_predict_16x16_dc_top_sse2;
  19. pf[I_PRED_16x16_DC_LEFT] = x264_predict_16x16_dc_left_sse2;
  20. pf[I_PRED_16x16_H] = x264_predict_16x16_h_sse2;
  21. pf[I_PRED_16x16_P] = x264_predict_16x16_p_sse2;
  22. if( !(cpu&X264_CPU_AVX) )
  23. return;
  24. pf[I_PRED_16x16_V] = x264_predict_16x16_v_avx;
  25. if( !(cpu&X264_CPU_AVX2) )
  26. return;
  27. pf[I_PRED_16x16_H] = x264_predict_16x16_h_avx2;
  28. #else
  29. #if !ARCH_X86_64
  30. pf[I_PRED_16x16_P] = x264_predict_16x16_p_mmx2;
  31. #endif
  32. if( !(cpu&X264_CPU_SSE) )
  33. return;
  34. pf[I_PRED_16x16_V] = x264_predict_16x16_v_sse;
  35. if( !(cpu&X264_CPU_SSE2) )
  36. return;
  37. pf[I_PRED_16x16_DC] = x264_predict_16x16_dc_sse2;
  38. if( cpu&X264_CPU_SSE2_IS_SLOW )
  39. return;
  40. pf[I_PRED_16x16_DC_TOP] = x264_predict_16x16_dc_top_sse2;
  41. pf[I_PRED_16x16_DC_LEFT] = x264_predict_16x16_dc_left_sse2;
  42. pf[I_PRED_16x16_P] = x264_predict_16x16_p_sse2;
  43. if( !(cpu&X264_CPU_SSSE3) )
  44. return;
  45. if( !(cpu&X264_CPU_SLOW_PSHUFB) )
  46. pf[I_PRED_16x16_H] = x264_predict_16x16_h_ssse3;
  47. #if HAVE_X86_INLINE_ASM
  48. pf[I_PRED_16x16_P] = x264_predict_16x16_p_ssse3;
  49. #endif
  50. if( !(cpu&X264_CPU_AVX) )
  51. return;
  52. pf[I_PRED_16x16_P] = x264_predict_16x16_p_avx;
  53. #endif // HIGH_BIT_DEPTH
  54. if( cpu&X264_CPU_AVX2 )
  55. {
  56. pf[I_PRED_16x16_P] = x264_predict_16x16_p_avx2;
  57. pf[I_PRED_16x16_DC] = x264_predict_16x16_dc_avx2;
  58. pf[I_PRED_16x16_DC_TOP] = x264_predict_16x16_dc_top_avx2;
  59. pf[I_PRED_16x16_DC_LEFT] = x264_predict_16x16_dc_left_avx2;
  60. }
  61. }

可以看出,针对Intra16x16的Vertical帧内预测模式,x264_predict_16x16_init_mmx()会根据系统的特型初始化2个函数:如果系统仅支持MMX指令集,就会初始化x264_predict_16x16_v_mmx2();如果系统还支持SSE指令集,就会初始化x264_predict_16x16_v_sse()。下面看一下这2个函数的代码。

x264_predict_16x16_v_mmx2()

x264_predict_16x16_v_sse()

在x264中,x264_predict_16x16_v_mmx2()和x264_predict_16x16_v_sse()这两个函数的定义是写到一起的。它们的定义位于common\x86\predict-a.asm,如下所示。
;-----------------------------------------------------------------------------
; void predict_16x16_v( pixel *src )
; Intra16x16帧内预测Vertical模式
;-----------------------------------------------------------------------------
;SIZEOF_PIXEL取值为1
;FDEC_STRIDEB为重建宏块缓存fdec_buf一行像素的大小,取值为32
;
;平台相关的信息位于x86inc.asm
;INIT_MMX中
;  mmsize为8
;  mova为movq
;INIT_XMM中:
;  mmsize为16
;  mova为movdqa
;
;STORE16的定义在前面,用于循环16行存储数据

%macro PREDICT_16x16_V 0
cglobal predict_16x16_v, 1,2
%assign %%i 0
%rep 16*SIZEOF_PIXEL/mmsize                         ;rep循环执行,拷贝16x16块上方的1行像素数据至m0,m1...
                                                    ;mmssize为指令1次处理比特数
    mova m %+ %%i, [r0-FDEC_STRIDEB+%%i*mmsize]     ;移入m0,m1...
%assign %%i %%i+1
%endrep
%if 16*SIZEOF_PIXEL/mmsize == 4                     ;1行需要处理4次
    STORE16 m0, m1, m2, m3                          ;循环存储16行,每次存储4个寄存器
%elif 16*SIZEOF_PIXEL/mmsize == 2                   ;1行需要处理2次
    STORE16 m0, m1                                  ;循环存储16行,每次存储2个寄存器
%else                                               ;1行需要处理1次
    STORE16 m0                                      ;循环存储16行,每次存储1个寄存器
%endif
    RET
%endmacro

INIT_MMX mmx2
PREDICT_16x16_V
INIT_XMM sse
PREDICT_16x16_V

从汇编代码可以看出,x264_predict_16x16_v_mmx2()和x264_predict_16x16_v_sse()的逻辑是一模一样的。它们之间的不同主要在于一条指令处理的数据量:MMX指令的MOVA对应的是MOVQ,一次处理8Byte(8个像素);SSE指令的MOVA对应的是MOVDQA,一次处理16Byte(16个像素,正好是16x16块中的一行像素)。
作为对比,我们可以看一下ARM平台下汇编优化过的Intra16x16的帧内预测函数。这些汇编函数的初始化函数是x264_predict_16x16_init_arm()。

x264_predict_16x16_init_arm()

x264_predict_16x16_init_arm()用于初始化ARM平台下汇编优化过的Intra16x16的帧内预测函数。该函数的定义位于common\arm\predict-c.c(“arm”文件夹下),如下所示。
  1. void x264_predict_16x16_init_arm( int cpu, x264_predict_t pf[7] )
  2. {
  3. if (!(cpu&X264_CPU_NEON))
  4. return;
  5. #if !HIGH_BIT_DEPTH
  6. pf[I_PRED_16x16_DC ] = x264_predict_16x16_dc_neon;
  7. pf[I_PRED_16x16_DC_TOP] = x264_predict_16x16_dc_top_neon;
  8. pf[I_PRED_16x16_DC_LEFT]= x264_predict_16x16_dc_left_neon;
  9. pf[I_PRED_16x16_H ] = x264_predict_16x16_h_neon;
  10. pf[I_PRED_16x16_V ] = x264_predict_16x16_v_neon;
  11. pf[I_PRED_16x16_P ] = x264_predict_16x16_p_neon;
  12. #endif // !HIGH_BIT_DEPTH
  13. }

从源代码可以看出,针对Vertical预测模式,x264_predict_16x16_init_arm()初始化了经过NEON指令集优化的函数x264_predict_16x16_v_neon()。

x264_predict_16x16_v_neon()

x264_predict_16x16_v_neon()的定义位于common\arm\predict-a.S,如下所示。
/*
 * Intra16x16帧内预测Vertical模式-NEON
 *
 */
 /* FDEC_STRIDE=32Bytes,为重建宏块一行像素的大小 */
 /* R0存储16x16像素块地址 */
function x264_predict_16x16_v_neon
    sub         r0, r0, #FDEC_STRIDE     /* r0=r0-FDEC_STRIDE */
    mov         ip, #FDEC_STRIDE         /* ip=32 */
                                         /* VLD向量加载: 内存->NEON寄存器 */
                                         /* d0,d164bit双字寄存器,共16Byte,在这里存储16x16块上方一行像素 */
    vld1.64     {d0-d1}, [r0,:128], ip   /* 将R0指向的数据从内存加载到d0和d1寄存器(64bit*/
                                         /* r0=r0+ip */
.rept 16                                 /* 循环16次,一次处理1*/
                                         /* VST向量存储: NEON寄存器->内存 */
    vst1.64     {d0-d1}, [r0,:128], ip   /* 将d0和d1寄存器中的数据传递给R0指向的内存 */
                                         /* r0=r0+ip */
.endr
    bx          lr                       /* 子程序返回 */
endfunc

可以看出,x264_predict_16x16_v_neon()使用vld1.64指令载入16x16块上方的一行像素,然后在一个16次的循环中,使用vst1.64指令将该行像素值赋值给16x16块的每一行。
至此有关Intra16x16的Vertical帧内预测方式的源代码就分析完了。后文为了简便,都只讨论C语言版本汇编函数。

x264_predict_4x4_init()

x264_predict_4x4_init()用于初始化Intra4x4帧内预测汇编函数。该函数的定义位于common\predict.c,如下所示。
  1. //Intra4x4帧内预测汇编函数初始化
  2. void x264_predict_4x4_init( int cpu, x264_predict_t pf[12] )
  3. {
  4. //9种Intra4x4预测方式
  5. pf[I_PRED_4x4_V] = x264_predict_4x4_v_c;
  6. pf[I_PRED_4x4_H] = x264_predict_4x4_h_c;
  7. pf[I_PRED_4x4_DC] = x264_predict_4x4_dc_c;
  8. pf[I_PRED_4x4_DDL] = x264_predict_4x4_ddl_c;
  9. pf[I_PRED_4x4_DDR] = x264_predict_4x4_ddr_c;
  10. pf[I_PRED_4x4_VR] = x264_predict_4x4_vr_c;
  11. pf[I_PRED_4x4_HD] = x264_predict_4x4_hd_c;
  12. pf[I_PRED_4x4_VL] = x264_predict_4x4_vl_c;
  13. pf[I_PRED_4x4_HU] = x264_predict_4x4_hu_c;
  14. //这些是?
  15. pf[I_PRED_4x4_DC_LEFT]= x264_predict_4x4_dc_left_c;
  16. pf[I_PRED_4x4_DC_TOP] = x264_predict_4x4_dc_top_c;
  17. pf[I_PRED_4x4_DC_128] = x264_predict_4x4_dc_128_c;
  18. #if HAVE_MMX
  19. x264_predict_4x4_init_mmx( cpu, pf );
  20. #endif
  21. #if HAVE_ARMV6
  22. x264_predict_4x4_init_arm( cpu, pf );
  23. #endif
  24. #if ARCH_AARCH64
  25. x264_predict_4x4_init_aarch64( cpu, pf );
  26. #endif
  27. }

从源代码可看出,x264_predict_4x4_init()首先对帧内预测函数指针数组x264_predict_t[]中的元素赋值了C语言版本的函数x264_predict_4x4_v_c(),x264_predict_4x4_h_c(),x264_predict_4x4_dc_c(),x264_predict_4x4_p_c()等一系列函数(Intra4x4有9种,后面那几种是怎么回事?);然后会判断系统平台的特性,如果平台支持的话,会调用x264_predict_4x4_init_mmx(),x264_predict_4x4_init_arm()等给x264_predict_t[]中的元素赋值经过汇编优化的函数。作为例子,下文看一个Intra4x4的Vertical帧内预测模式的C语言函数。

相关知识简述

Intra4x4的帧内预测模式一共有9种。如下图所示。
 
可以看出,Intra4x4帧内预测模式中前4种和Intra16x16是一样的。后面多增加了几种预测箭头不是45度角的方式——前面的箭头位于“口”中,而后面的箭头位于“日”中。

x264_predict_4x4_v_c()

x264_predict_4x4_v_c()实现了Intra4x4的Vertical帧内预测方式。该函数的定义位于common\predict.c,如下所示。
  1. void x264_predict_4x4_v_c( pixel *src )
  2. {
  3. /*
  4. * Vertical预测方式
  5. * |X1 X2 X3 X4
  6. * --+-----------
  7. * |X1 X2 X3 X4
  8. * |X1 X2 X3 X4
  9. * |X1 X2 X3 X4
  10. * |X1 X2 X3 X4
  11. *
  12. */
  13. /*
  14. * 宏展开后的结果如下所示
  15. * 注:重建宏块缓存fdec_buf一行的数据量为32Byte
  16. *
  17. * (((x264_union32_t*)(&src[(0)+(0)*32]))->i) =
  18. * (((x264_union32_t*)(&src[(0)+(1)*32]))->i) =
  19. * (((x264_union32_t*)(&src[(0)+(2)*32]))->i) =
  20. * (((x264_union32_t*)(&src[(0)+(3)*32]))->i) = (((x264_union32_t*)(&src[(0)+(-1)*32]))->i);
  21. */
  22. PREDICT_4x4_DC(SRC_X4(0,-1));
  23. }

x264_predict_4x4_v_c()函数的函数体极其简单,只有一个宏定义“PREDICT_4x4_DC(SRC_X4(0,-1));”。如果把该宏展开后,可以看出它取了4x4块上面一行4个像素的值,然后分别赋值给4x4块的4行像素。

x264_pixel_init()

x264_pixel_init()初始化像素值计算相关的汇编函数(包括SAD、SATD、SSD等)。该函数的定义位于common\pixel.c,如下所示。
  1. /****************************************************************************
  2. * x264_pixel_init:
  3. ****************************************************************************/
  4. //SAD等和像素计算有关的函数
  5. void x264_pixel_init( int cpu, x264_pixel_function_t *pixf )
  6. {
  7. memset( pixf, 0, sizeof(*pixf) );
  8. //初始化2个函数-16x16,16x8
  9. #define INIT2_NAME( name1, name2, cpu ) \
  10. pixf->name1[PIXEL_16x16] = x264_pixel_##name2##_16x16##cpu;\
  11. pixf->name1[PIXEL_16x8] = x264_pixel_##name2##_16x8##cpu;
  12. //初始化4个函数-(16x16,16x8),8x16,8x8
  13. #define INIT4_NAME( name1, name2, cpu ) \
  14. INIT2_NAME( name1, name2, cpu ) \
  15. pixf->name1[PIXEL_8x16] = x264_pixel_##name2##_8x16##cpu;\
  16. pixf->name1[PIXEL_8x8] = x264_pixel_##name2##_8x8##cpu;
  17. //初始化5个函数-(16x16,16x8,8x16,8x8),8x4
  18. #define INIT5_NAME( name1, name2, cpu ) \
  19. INIT4_NAME( name1, name2, cpu ) \
  20. pixf->name1[PIXEL_8x4] = x264_pixel_##name2##_8x4##cpu;
  21. //初始化6个函数-(16x16,16x8,8x16,8x8,8x4),4x8
  22. #define INIT6_NAME( name1, name2, cpu ) \
  23. INIT5_NAME( name1, name2, cpu ) \
  24. pixf->name1[PIXEL_4x8] = x264_pixel_##name2##_4x8##cpu;
  25. //初始化7个函数-(16x16,16x8,8x16,8x8,8x4,4x8),4x4
  26. #define INIT7_NAME( name1, name2, cpu ) \
  27. INIT6_NAME( name1, name2, cpu ) \
  28. pixf->name1[PIXEL_4x4] = x264_pixel_##name2##_4x4##cpu;
  29. #define INIT8_NAME( name1, name2, cpu ) \
  30. INIT7_NAME( name1, name2, cpu ) \
  31. pixf->name1[PIXEL_4x16] = x264_pixel_##name2##_4x16##cpu;
  32. //重新起个名字
  33. #define INIT2( name, cpu ) INIT2_NAME( name, name, cpu )
  34. #define INIT4( name, cpu ) INIT4_NAME( name, name, cpu )
  35. #define INIT5( name, cpu ) INIT5_NAME( name, name, cpu )
  36. #define INIT6( name, cpu ) INIT6_NAME( name, name, cpu )
  37. #define INIT7( name, cpu ) INIT7_NAME( name, name, cpu )
  38. #define INIT8( name, cpu ) INIT8_NAME( name, name, cpu )
  39. #define INIT_ADS( cpu ) \
  40. pixf->ads[PIXEL_16x16] = x264_pixel_ads4##cpu;\
  41. pixf->ads[PIXEL_16x8] = x264_pixel_ads2##cpu;\
  42. pixf->ads[PIXEL_8x8] = x264_pixel_ads1##cpu;
  43. //8个sad函数
  44. INIT8( sad, );
  45. INIT8_NAME( sad_aligned, sad, );
  46. //7个sad函数-一次性计算3次
  47. INIT7( sad_x3, );
  48. //7个sad函数-一次性计算4次
  49. INIT7( sad_x4, );
  50. //8个ssd函数
  51. //ssd可以用来计算PSNR
  52. INIT8( ssd, );
  53. //8个satd函数
  54. //satd计算的是经过Hadamard变换后的值
  55. INIT8( satd, );
  56. //8个satd函数-一次性计算3次
  57. INIT7( satd_x3, );
  58. //8个satd函数-一次性计算4次
  59. INIT7( satd_x4, );
  60. INIT4( hadamard_ac, );
  61. INIT_ADS( );
  62. pixf->sa8d[PIXEL_16x16] = x264_pixel_sa8d_16x16;
  63. pixf->sa8d[PIXEL_8x8] = x264_pixel_sa8d_8x8;
  64. pixf->var[PIXEL_16x16] = x264_pixel_var_16x16;
  65. pixf->var[PIXEL_8x16] = x264_pixel_var_8x16;
  66. pixf->var[PIXEL_8x8] = x264_pixel_var_8x8;
  67. pixf->var2[PIXEL_8x16] = x264_pixel_var2_8x16;
  68. pixf->var2[PIXEL_8x8] = x264_pixel_var2_8x8;
  69. //计算UV的
  70. pixf->ssd_nv12_core = pixel_ssd_nv12_core;
  71. //计算SSIM
  72. pixf->ssim_4x4x2_core = ssim_4x4x2_core;
  73. pixf->ssim_end4 = ssim_end4;
  74. pixf->vsad = pixel_vsad;
  75. pixf->asd8 = pixel_asd8;
  76. pixf->intra_sad_x3_4x4 = x264_intra_sad_x3_4x4;
  77. pixf->intra_satd_x3_4x4 = x264_intra_satd_x3_4x4;
  78. pixf->intra_sad_x3_8x8 = x264_intra_sad_x3_8x8;
  79. pixf->intra_sa8d_x3_8x8 = x264_intra_sa8d_x3_8x8;
  80. pixf->intra_sad_x3_8x8c = x264_intra_sad_x3_8x8c;
  81. pixf->intra_satd_x3_8x8c = x264_intra_satd_x3_8x8c;
  82. pixf->intra_sad_x3_8x16c = x264_intra_sad_x3_8x16c;
  83. pixf->intra_satd_x3_8x16c = x264_intra_satd_x3_8x16c;
  84. pixf->intra_sad_x3_16x16 = x264_intra_sad_x3_16x16;
  85. pixf->intra_satd_x3_16x16 = x264_intra_satd_x3_16x16;
  86. //后面的初始化基本上都是汇编优化过的函数
  87. #if HIGH_BIT_DEPTH
  88. #if HAVE_MMX
  89. if( cpu&X264_CPU_MMX2 )
  90. {
  91. INIT7( sad, _mmx2 );
  92. INIT7_NAME( sad_aligned, sad, _mmx2 );
  93. INIT7( sad_x3, _mmx2 );
  94. INIT7( sad_x4, _mmx2 );
  95. INIT8( satd, _mmx2 );
  96. INIT7( satd_x3, _mmx2 );
  97. INIT7( satd_x4, _mmx2 );
  98. INIT4( hadamard_ac, _mmx2 );
  99. INIT8( ssd, _mmx2 );
  100. INIT_ADS( _mmx2 );
  101. pixf->ssd_nv12_core = x264_pixel_ssd_nv12_core_mmx2;
  102. pixf->var[PIXEL_16x16] = x264_pixel_var_16x16_mmx2;
  103. pixf->var[PIXEL_8x8] = x264_pixel_var_8x8_mmx2;
  104. #if ARCH_X86
  105. pixf->var2[PIXEL_8x8] = x264_pixel_var2_8x8_mmx2;
  106. pixf->var2[PIXEL_8x16] = x264_pixel_var2_8x16_mmx2;
  107. #endif
  108. pixf->intra_sad_x3_4x4 = x264_intra_sad_x3_4x4_mmx2;
  109. pixf->intra_satd_x3_4x4 = x264_intra_satd_x3_4x4_mmx2;
  110. pixf->intra_sad_x3_8x8 = x264_intra_sad_x3_8x8_mmx2;
  111. pixf->intra_sad_x3_8x8c = x264_intra_sad_x3_8x8c_mmx2;
  112. pixf->intra_satd_x3_8x8c = x264_intra_satd_x3_8x8c_mmx2;
  113. pixf->intra_sad_x3_8x16c = x264_intra_sad_x3_8x16c_mmx2;
  114. pixf->intra_satd_x3_8x16c = x264_intra_satd_x3_8x16c_mmx2;
  115. pixf->intra_sad_x3_16x16 = x264_intra_sad_x3_16x16_mmx2;
  116. pixf->intra_satd_x3_16x16 = x264_intra_satd_x3_16x16_mmx2;
  117. }
  118. if( cpu&X264_CPU_SSE2 )
  119. {
  120. INIT4_NAME( sad_aligned, sad, _sse2_aligned );
  121. INIT5( ssd, _sse2 );
  122. INIT6( satd, _sse2 );
  123. pixf->satd[PIXEL_4x16] = x264_pixel_satd_4x16_sse2;
  124. pixf->sa8d[PIXEL_16x16] = x264_pixel_sa8d_16x16_sse2;
  125. pixf->sa8d[PIXEL_8x8] = x264_pixel_sa8d_8x8_sse2;
  126. #if ARCH_X86_64
  127. pixf->intra_sa8d_x3_8x8 = x264_intra_sa8d_x3_8x8_sse2;
  128. pixf->sa8d_satd[PIXEL_16x16] = x264_pixel_sa8d_satd_16x16_sse2;
  129. #endif
  130. pixf->intra_sad_x3_4x4 = x264_intra_sad_x3_4x4_sse2;
  131. pixf->ssd_nv12_core = x264_pixel_ssd_nv12_core_sse2;
  132. pixf->ssim_4x4x2_core = x264_pixel_ssim_4x4x2_core_sse2;
  133. pixf->ssim_end4 = x264_pixel_ssim_end4_sse2;
  134. pixf->var[PIXEL_16x16] = x264_pixel_var_16x16_sse2;
  135. pixf->var[PIXEL_8x8] = x264_pixel_var_8x8_sse2;
  136. pixf->var2[PIXEL_8x8] = x264_pixel_var2_8x8_sse2;
  137. pixf->var2[PIXEL_8x16] = x264_pixel_var2_8x16_sse2;
  138. pixf->intra_sad_x3_8x8 = x264_intra_sad_x3_8x8_sse2;
  139. }
  140. //此处省略大量的X86、ARM等平台的汇编函数初始化代码
  141. }

x264_pixel_init()的源代码非常的长,主要原因在于它把C语言版本的函数以及各种平台的汇编函数都写到一块了(不知道现在最新的版本是不是还是这样)。x264_pixel_init()包含了大量和像素计算有关的函数,包括SAD、SATD、SSD、SSIM等等。它的输入参数x264_pixel_function_t是一个结构体,其中包含了各种像素计算的函数接口。x264_pixel_function_t的定义如下所示。
  1. typedef struct
  2. {
  3. x264_pixel_cmp_t sad[8];
  4. x264_pixel_cmp_t ssd[8];
  5. x264_pixel_cmp_t satd[8];
  6. x264_pixel_cmp_t ssim[7];
  7. x264_pixel_cmp_t sa8d[4];
  8. x264_pixel_cmp_t mbcmp[8]; /* either satd or sad for subpel refine and mode decision */
  9. x264_pixel_cmp_t mbcmp_unaligned[8]; /* unaligned mbcmp for subpel */
  10. x264_pixel_cmp_t fpelcmp[8]; /* either satd or sad for fullpel motion search */
  11. x264_pixel_cmp_x3_t fpelcmp_x3[7];
  12. x264_pixel_cmp_x4_t fpelcmp_x4[7];
  13. x264_pixel_cmp_t sad_aligned[8]; /* Aligned SAD for mbcmp */
  14. int (*vsad)( pixel *, intptr_t, int );
  15. int (*asd8)( pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2, int height );
  16. uint64_t (*sa8d_satd[1])( pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2 );
  17. uint64_t (*var[4])( pixel *pix, intptr_t stride );
  18. int (*var2[4])( pixel *pix1, intptr_t stride1,
  19. pixel *pix2, intptr_t stride2, int *ssd );
  20. uint64_t (*hadamard_ac[4])( pixel *pix, intptr_t stride );
  21. void (*ssd_nv12_core)( pixel *pixuv1, intptr_t stride1,
  22. pixel *pixuv2, intptr_t stride2, int width, int height,
  23. uint64_t *ssd_u, uint64_t *ssd_v );
  24. void (*ssim_4x4x2_core)( const pixel *pix1, intptr_t stride1,
  25. const pixel *pix2, intptr_t stride2, int sums[2][4] );
  26. float (*ssim_end4)( int sum0[5][4], int sum1[5][4], int width );
  27. /* multiple parallel calls to cmp. */
  28. x264_pixel_cmp_x3_t sad_x3[7];
  29. x264_pixel_cmp_x4_t sad_x4[7];
  30. x264_pixel_cmp_x3_t satd_x3[7];
  31. x264_pixel_cmp_x4_t satd_x4[7];
  32. /* abs-diff-sum for successive elimination.
  33. * may round width up to a multiple of 16. */
  34. int (*ads[7])( int enc_dc[4], uint16_t *sums, int delta,
  35. uint16_t *cost_mvx, int16_t *mvs, int width, int thresh );
  36. /* calculate satd or sad of V, H, and DC modes. */
  37. void (*intra_mbcmp_x3_16x16)( pixel *fenc, pixel *fdec, int res[3] );
  38. void (*intra_satd_x3_16x16) ( pixel *fenc, pixel *fdec, int res[3] );
  39. void (*intra_sad_x3_16x16) ( pixel *fenc, pixel *fdec, int res[3] );
  40. void (*intra_mbcmp_x3_4x4) ( pixel *fenc, pixel *fdec, int res[3] );
  41. void (*intra_satd_x3_4x4) ( pixel *fenc, pixel *fdec, int res[3] );
  42. void (*intra_sad_x3_4x4) ( pixel *fenc, pixel *fdec, int res[3] );
  43. void (*intra_mbcmp_x3_chroma)( pixel *fenc, pixel *fdec, int res[3] );
  44. void (*intra_satd_x3_chroma) ( pixel *fenc, pixel *fdec, int res[3] );
  45. void (*intra_sad_x3_chroma) ( pixel *fenc, pixel *fdec, int res[3] );
  46. void (*intra_mbcmp_x3_8x16c) ( pixel *fenc, pixel *fdec, int res[3] );
  47. void (*intra_satd_x3_8x16c) ( pixel *fenc, pixel *fdec, int res[3] );
  48. void (*intra_sad_x3_8x16c) ( pixel *fenc, pixel *fdec, int res[3] );
  49. void (*intra_mbcmp_x3_8x8c) ( pixel *fenc, pixel *fdec, int res[3] );
  50. void (*intra_satd_x3_8x8c) ( pixel *fenc, pixel *fdec, int res[3] );
  51. void (*intra_sad_x3_8x8c) ( pixel *fenc, pixel *fdec, int res[3] );
  52. void (*intra_mbcmp_x3_8x8) ( pixel *fenc, pixel edge[36], int res[3] );
  53. void (*intra_sa8d_x3_8x8) ( pixel *fenc, pixel edge[36], int res[3] );
  54. void (*intra_sad_x3_8x8) ( pixel *fenc, pixel edge[36], int res[3] );
  55. /* find minimum satd or sad of all modes, and set fdec.
  56. * may be NULL, in which case just use pred+satd instead. */
  57. int (*intra_mbcmp_x9_4x4)( pixel *fenc, pixel *fdec, uint16_t *bitcosts );
  58. int (*intra_satd_x9_4x4) ( pixel *fenc, pixel *fdec, uint16_t *bitcosts );
  59. int (*intra_sad_x9_4x4) ( pixel *fenc, pixel *fdec, uint16_t *bitcosts );
  60. int (*intra_mbcmp_x9_8x8)( pixel *fenc, pixel *fdec, pixel edge[36], uint16_t *bitcosts, uint16_t *satds );
  61. int (*intra_sa8d_x9_8x8) ( pixel *fenc, pixel *fdec, pixel edge[36], uint16_t *bitcosts, uint16_t *satds );
  62. int (*intra_sad_x9_8x8) ( pixel *fenc, pixel *fdec, pixel edge[36], uint16_t *bitcosts, uint16_t *satds );
  63. } x264_pixel_function_t;
在x264_pixel_init()中定义了好几个宏,用于给x264_pixel_function_t结构体中的函数接口赋值。例如“INIT8( sad, )”用于给x264_pixel_function_t中的sad[8]赋值。该宏展开后的代码如下。
  1. pixf->sad[PIXEL_16x16] = x264_pixel_sad_16x16;
  2. pixf->sad[PIXEL_16x8] = x264_pixel_sad_16x8;
  3. pixf->sad[PIXEL_8x16] = x264_pixel_sad_8x16;
  4. pixf->sad[PIXEL_8x8] = x264_pixel_sad_8x8;
  5. pixf->sad[PIXEL_8x4] = x264_pixel_sad_8x4;
  6. pixf->sad[PIXEL_4x8] = x264_pixel_sad_4x8;
  7. pixf->sad[PIXEL_4x4] = x264_pixel_sad_4x4;
  8. pixf->sad[PIXEL_4x16] = x264_pixel_sad_4x16;
“INIT8( ssd, )” 用于给x264_pixel_function_t中的ssd[8]赋值。该宏展开后的代码如下。
  1. pixf->ssd[PIXEL_16x16] = x264_pixel_ssd_16x16;
  2. pixf->ssd[PIXEL_16x8] = x264_pixel_ssd_16x8;
  3. pixf->ssd[PIXEL_8x16] = x264_pixel_ssd_8x16;
  4. pixf->ssd[PIXEL_8x8] = x264_pixel_ssd_8x8;
  5. pixf->ssd[PIXEL_8x4] = x264_pixel_ssd_8x4;
  6. pixf->ssd[PIXEL_4x8] = x264_pixel_ssd_4x8;
  7. pixf->ssd[PIXEL_4x4] = x264_pixel_ssd_4x4;
  8. pixf->ssd[PIXEL_4x16] = x264_pixel_ssd_4x16;
“INIT8( satd, )” 用于给x264_pixel_function_t中的satd[8]赋值。该宏展开后的代码如下。
  1. pixf->satd[PIXEL_16x16] = x264_pixel_satd_16x16;
  2. pixf->satd[PIXEL_16x8] = x264_pixel_satd_16x8;
  3. pixf->satd[PIXEL_8x16] = x264_pixel_satd_8x16;
  4. pixf->satd[PIXEL_8x8] = x264_pixel_satd_8x8;
  5. pixf->satd[PIXEL_8x4] = x264_pixel_satd_8x4;
  6. pixf->satd[PIXEL_4x8] = x264_pixel_satd_4x8;
  7. pixf->satd[PIXEL_4x4] = x264_pixel_satd_4x4;
  8. pixf->satd[PIXEL_4x16] = x264_pixel_satd_4x16;
下文打算分别记录SAD、SSD和SATD计算的函数x264_pixel_sad_4x4(),x264_pixel_ssd_4x4(),和x264_pixel_satd_4x4()。此外再记录一个一次性“批量”计算4个点的函数x264_pixel_sad_x4_4x4()。

相关知识简述

简单记录几个像素计算中的概念。SAD和SATD主要用于帧内预测模式以及帧间预测模式的判断。有关SAD、SATD、SSD的定义如下:
SAD(Sum of Absolute Difference)也可以称为SAE(Sum of Absolute Error),即绝对误差和。它的计算方法就是求出两个像素块对应像素点的差值,将这些差值分别求绝对值之后再进行累加。
SATD(Sum of Absolute Transformed Difference)即Hadamard变换后再绝对值求和。它和SAD的区别在于多了一个“变换”。
SSD(Sum of Squared Difference)也可以称为SSE(Sum of Squared Error),即差值的平方和。它和SAD的区别在于多了一个“平方”。

H.264中使用SAD和SATD进行宏块预测模式的判断。早期的编码器使用SAD进行计算,近期的编码器多使用SATD进行计算。为什么使用SATD而不使用SAD呢?关键原因在于编码之后码流的大小是和图像块DCT变换后频域信息紧密相关的,而和变换前的时域信息关联性小一些。SAD只能反应时域信息;SATD却可以反映频域信息,而且计算复杂度也低于DCT变换,因此是比较合适的模式选择的依据。

使用SAD进行模式选择的示例如下所示。下面这张图代表了一个普通的Intra16x16的宏块的像素。它的下方包含了使用Vertical,Horizontal,DC和Plane四种帧内预测模式预测的像素。通过计算可以得到这几种预测像素和原始像素之间的SAD(SAE)分别为3985,5097,4991,2539。由于Plane模式的SAD取值最小,由此可以断定Plane模式对于这个宏块来说是最好的帧内预测模式。




x264_pixel_sad_4x4()

x264_pixel_sad_4x4()用于计算4x4块的SAD。该函数的定义位于common\pixel.c,如下所示。
  1. static int x264_pixel_sad_4x4( pixel *pix1, intptr_t i_stride_pix1,
  2. pixel *pix2, intptr_t i_stride_pix2 )
  3. {
  4. int i_sum = 0;
  5. for( int y = 0; y < 4; y++ ) //4个像素
  6. {
  7. for( int x = 0; x < 4; x++ ) //4个像素
  8. {
  9. i_sum += abs( pix1[x] - pix2[x] );//相减之后求绝对值,然后累加
  10. }
  11. pix1 += i_stride_pix1;
  12. pix2 += i_stride_pix2;
  13. }
  14. return i_sum;
  15. }
可以看出x264_pixel_sad_4x4()将两个4x4图像块对应点相减之后,调用abs()求出绝对值,然后累加到i_sum变量上。

x264_pixel_sad_x4_4x4()

x264_pixel_sad_4x4()用于计算4个4x4块的SAD。该函数的定义位于common\pixel.c,如下所示。
  1. static void x264_pixel_sad_x4_4x4( pixel *fenc, pixel *pix0, pixel *pix1,pixel *pix2, pixel *pix3,
  2. intptr_t i_stride, int scores[4] )
  3. {
  4. scores[0] = x264_pixel_sad_4x4( fenc, 16, pix0, i_stride );
  5. scores[1] = x264_pixel_sad_4x4( fenc, 16, pix1, i_stride );
  6. scores[2] = x264_pixel_sad_4x4( fenc, 16, pix2, i_stride );
  7. scores[3] = x264_pixel_sad_4x4( fenc, 16, pix3, i_stride );
  8. }
可以看出,x264_pixel_sad_4x4()计算了起始点在pix0,pix1,pix2,pix3四个4x4的图像块和fenc之间的SAD,并将结果存储于scores[4]数组中。

x264_pixel_ssd_4x4()

x264_pixel_ssd_4x4()用于计算4x4块的SSD。该函数的定义位于common\pixel.c,如下所示。
  1. static int x264_pixel_ssd_4x4( pixel *pix1, intptr_t i_stride_pix1,
  2. pixel *pix2, intptr_t i_stride_pix2 )
  3. {
  4. int i_sum = 0;
  5. for( int y = 0; y < 4; y++ ) //4个像素
  6. {
  7. for( int x = 0; x < 4; x++ ) //4个像素
  8. {
  9. int d = pix1[x] - pix2[x]; //相减
  10. i_sum += d*d; //平方之后,累加
  11. }
  12. pix1 += i_stride_pix1;
  13. pix2 += i_stride_pix2;
  14. }
  15. return i_sum;
  16. }
可以看出x264_pixel_ssd_4x4()将两个4x4图像块对应点相减之后,取了平方值,然后累加到i_sum变量上。

x264_pixel_satd_4x4()

x264_pixel_satd_4x4()用于计算4x4块的SATD。该函数的定义位于common\pixel.c,如下所示。
  1. //SAD(Sum of Absolute Difference)=SAE(Sum of Absolute Error)即绝对误差和
  2. //SATD(Sum of Absolute Transformed Difference)即hadamard变换后再绝对值求和
  3. //
  4. //为什么帧内模式选择要用SATD?
  5. //SAD即绝对误差和,仅反映残差时域差异,影响PSNR值,不能有效反映码流的大小。
  6. //SATD即将残差经哈德曼变换的4x4块的预测残差绝对值总和,可以将其看作简单的时频变换,其值在一定程度上可以反映生成码流的大小。
  7. //4x4的SATD
  8. static NOINLINE int x264_pixel_satd_4x4( pixel *pix1, intptr_t i_pix1, pixel *pix2, intptr_t i_pix2 )
  9. {
  10. sum2_t tmp[4][2];
  11. sum2_t a0, a1, a2, a3, b0, b1;
  12. sum2_t sum = 0;
  13. for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
  14. {
  15. a0 = pix1[0] - pix2[0];
  16. a1 = pix1[1] - pix2[1];
  17. b0 = (a0+a1) + ((a0-a1)<<BITS_PER_SUM);
  18. a2 = pix1[2] - pix2[2];
  19. a3 = pix1[3] - pix2[3];
  20. b1 = (a2+a3) + ((a2-a3)<<BITS_PER_SUM);
  21. tmp[i][0] = b0 + b1;
  22. tmp[i][1] = b0 - b1;
  23. }
  24. for( int i = 0; i < 2; i++ )
  25. {
  26. HADAMARD4( a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i] );
  27. a0 = abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3);
  28. sum += ((sum_t)a0) + (a0>>BITS_PER_SUM);
  29. }
  30. return sum >> 1;
  31. }
有关x264_pixel_satd_4x4()中的Hadamard变换在下面的DCT变换中再进行分析。可以看出该函数调用了一个宏HADAMARD4()用于Hadamard变换的计算,并最终将两个像素块Hadamard变换后对应元素求差的绝对值之后,累加到sum变量上。

x264_dct_init()

x264_dct_init()用于初始化DCT变换和DCT反变换相关的汇编函数。该函数的定义位于common\dct.c,如下所示。
  1. /****************************************************************************
  2. * x264_dct_init:
  3. ****************************************************************************/
  4. void x264_dct_init( int cpu, x264_dct_function_t *dctf )
  5. {
  6. //C语言版本
  7. //4x4DCT变换
  8. dctf->sub4x4_dct = sub4x4_dct;
  9. dctf->add4x4_idct = add4x4_idct;
  10. //8x8块:分解成4个4x4DCT变换,调用4次sub4x4_dct()
  11. dctf->sub8x8_dct = sub8x8_dct;
  12. dctf->sub8x8_dct_dc = sub8x8_dct_dc;
  13. dctf->add8x8_idct = add8x8_idct;
  14. dctf->add8x8_idct_dc = add8x8_idct_dc;
  15. dctf->sub8x16_dct_dc = sub8x16_dct_dc;
  16. //16x16块:分解成4个8x8块,调用4次sub8x8_dct()
  17. //实际上每个sub8x8_dct()又分解成4个4x4DCT变换,调用4次sub4x4_dct()
  18. dctf->sub16x16_dct = sub16x16_dct;
  19. dctf->add16x16_idct = add16x16_idct;
  20. dctf->add16x16_idct_dc = add16x16_idct_dc;
  21. //8x8DCT,注意:后缀是_dct8
  22. dctf->sub8x8_dct8 = sub8x8_dct8;
  23. dctf->add8x8_idct8 = add8x8_idct8;
  24. dctf->sub16x16_dct8 = sub16x16_dct8;
  25. dctf->add16x16_idct8 = add16x16_idct8;
  26. //Hadamard变换
  27. dctf->dct4x4dc = dct4x4dc;
  28. dctf->idct4x4dc = idct4x4dc;
  29. dctf->dct2x4dc = dct2x4dc;
  30. #if HIGH_BIT_DEPTH
  31. #if HAVE_MMX
  32. if( cpu&X264_CPU_MMX )
  33. {
  34. dctf->sub4x4_dct = x264_sub4x4_dct_mmx;
  35. dctf->sub8x8_dct = x264_sub8x8_dct_mmx;
  36. dctf->sub16x16_dct = x264_sub16x16_dct_mmx;
  37. }
  38. if( cpu&X264_CPU_SSE2 )
  39. {
  40. dctf->add4x4_idct = x264_add4x4_idct_sse2;
  41. dctf->dct4x4dc = x264_dct4x4dc_sse2;
  42. dctf->idct4x4dc = x264_idct4x4dc_sse2;
  43. dctf->sub8x8_dct8 = x264_sub8x8_dct8_sse2;
  44. dctf->sub16x16_dct8 = x264_sub16x16_dct8_sse2;
  45. dctf->add8x8_idct = x264_add8x8_idct_sse2;
  46. dctf->add16x16_idct = x264_add16x16_idct_sse2;
  47. dctf->add8x8_idct8 = x264_add8x8_idct8_sse2;
  48. dctf->add16x16_idct8 = x264_add16x16_idct8_sse2;
  49. dctf->sub8x8_dct_dc = x264_sub8x8_dct_dc_sse2;
  50. dctf->add8x8_idct_dc = x264_add8x8_idct_dc_sse2;
  51. dctf->sub8x16_dct_dc = x264_sub8x16_dct_dc_sse2;
  52. dctf->add16x16_idct_dc= x264_add16x16_idct_dc_sse2;
  53. }
  54. if( cpu&X264_CPU_SSE4 )
  55. {
  56. dctf->sub8x8_dct8 = x264_sub8x8_dct8_sse4;
  57. dctf->sub16x16_dct8 = x264_sub16x16_dct8_sse4;
  58. }
  59. if( cpu&X264_CPU_AVX )
  60. {
  61. dctf->add4x4_idct = x264_add4x4_idct_avx;
  62. dctf->dct4x4dc = x264_dct4x4dc_avx;
  63. dctf->idct4x4dc = x264_idct4x4dc_avx;
  64. dctf->sub8x8_dct8 = x264_sub8x8_dct8_avx;
  65. dctf->sub16x16_dct8 = x264_sub16x16_dct8_avx;
  66. dctf->add8x8_idct = x264_add8x8_idct_avx;
  67. dctf->add16x16_idct = x264_add16x16_idct_avx;
  68. dctf->add8x8_idct8 = x264_add8x8_idct8_avx;
  69. dctf->add16x16_idct8 = x264_add16x16_idct8_avx;
  70. dctf->add8x8_idct_dc = x264_add8x8_idct_dc_avx;
  71. dctf->sub8x16_dct_dc = x264_sub8x16_dct_dc_avx;
  72. dctf->add16x16_idct_dc= x264_add16x16_idct_dc_avx;
  73. }
  74. #endif // HAVE_MMX
  75. #else // !HIGH_BIT_DEPTH
  76. //MMX版本
  77. #if HAVE_MMX
  78. if( cpu&X264_CPU_MMX )
  79. {
  80. dctf->sub4x4_dct = x264_sub4x4_dct_mmx;
  81. dctf->add4x4_idct = x264_add4x4_idct_mmx;
  82. dctf->idct4x4dc = x264_idct4x4dc_mmx;
  83. dctf->sub8x8_dct_dc = x264_sub8x8_dct_dc_mmx2;
  84. //此处省略大量的X86、ARM等平台的汇编函数初始化代码
  85. }

从源代码可以看出,x264_dct_init()初始化了一系列的DCT变换的函数,这些DCT函数名称有如下规律:
(1)DCT函数名称前面有“sub”,代表对两块像素相减得到残差之后,再进行DCT变换。
(2)DCT反变换函数名称前面有“add”,代表将DCT反变换之后的残差数据叠加到预测数据上。
(3)以“dct8”为结尾的函数使用了8x8DCT,其余函数是用的都是4x4DCT。

x264_dct_init()的输入参数x264_dct_function_t是一个结构体,其中包含了各种DCT函数的接口。x264_dct_function_t的定义如下所示。

  1. typedef struct
  2. {
  3. // pix1 stride = FENC_STRIDE
  4. // pix2 stride = FDEC_STRIDE
  5. // p_dst stride = FDEC_STRIDE
  6. void (*sub4x4_dct) ( dctcoef dct[16], pixel *pix1, pixel *pix2 );
  7. void (*add4x4_idct) ( pixel *p_dst, dctcoef dct[16] );
  8. void (*sub8x8_dct) ( dctcoef dct[4][16], pixel *pix1, pixel *pix2 );
  9. void (*sub8x8_dct_dc)( dctcoef dct[4], pixel *pix1, pixel *pix2 );
  10. void (*add8x8_idct) ( pixel *p_dst, dctcoef dct[4][16] );
  11. void (*add8x8_idct_dc) ( pixel *p_dst, dctcoef dct[4] );
  12. void (*sub8x16_dct_dc)( dctcoef dct[8], pixel *pix1, pixel *pix2 );
  13. void (*sub16x16_dct) ( dctcoef dct[16][16], pixel *pix1, pixel *pix2 );
  14. void (*add16x16_idct)( pixel *p_dst, dctcoef dct[16][16] );
  15. void (*add16x16_idct_dc) ( pixel *p_dst, dctcoef dct[16] );
  16. void (*sub8x8_dct8) ( dctcoef dct[64], pixel *pix1, pixel *pix2 );
  17. void (*add8x8_idct8) ( pixel *p_dst, dctcoef dct[64] );
  18. void (*sub16x16_dct8) ( dctcoef dct[4][64], pixel *pix1, pixel *pix2 );
  19. void (*add16x16_idct8)( pixel *p_dst, dctcoef dct[4][64] );
  20. void (*dct4x4dc) ( dctcoef d[16] );
  21. void (*idct4x4dc)( dctcoef d[16] );
  22. void (*dct2x4dc)( dctcoef dct[8], dctcoef dct4x4[8][16] );
  23. } x264_dct_function_t;
x264_dct_init()的工作就是对x264_dct_function_t中的函数指针进行赋值。由于DCT函数很多,不便于一一研究,下文仅举例分析几个典型的4x4DCT函数:4x4DCT变换函数sub4x4_dct(),4x4IDCT变换函数add4x4_idct(),8x8块的4x4DCT变换函数sub8x8_dct(),16x16块的4x4DCT变换函数sub16x16_dct(),4x4Hadamard变换函数dct4x4dc()。

相关知识简述

简单记录一下DCT相关的知识。DCT变换的核心理念就是把图像的低频信息(对应大面积平坦区域)变换到系数矩阵的左上角,而把高频信息变换到系数矩阵的右下角,这样就可以在压缩的时候(量化)去除掉人眼不敏感的高频信息(位于矩阵右下角的系数)从而达到压缩数据的目的。二维8x8DCT变换常见的示意图如下所示。

早期的DCT变换都使用了8x8的矩阵(变换系数为小数)。在H.264标准中新提出了一种4x4的矩阵。这种4x4 DCT变换的系数都是整数,一方面提高了运算的准确性,一方面也利于代码的优化。4x4整数DCT变换的示意图如下所示(作为对比,右侧为4x4块的Hadamard变换的示意图)。
 

4x4整数DCT变换的公式如下所示。

 

对该公式中的矩阵乘法可以转换为2次一维DCT变换:首先对4x4块中的每行像素进行一维DCT变换,然后再对4x4块中的每列像素进行一维DCT变换。而一维的DCT变换是可以改造成为蝶形快速算法的,如下所示。
 
同理,DCT反变换就是DCT变换的逆变换。DCT反变换的公式如下所示。
 
同理,DCT反变换的矩阵乘法也可以改造成为2次一维IDCT变换:首先对4x4块中的每行像素进行一维IDCT变换,然后再对4x4块中的每列像素进行一维IDCT变换。而一维的IDCT变换也可以改造成为蝶形快速算法,如下所示。
 
除了4x4DCT变换之外,新版本的H.264标准中还引入了一种8x8DCT。目前针对这种8x8DCT我还没有做研究,暂时不做记录。

sub4x4_dct()

sub4x4_dct()可以将两块4x4的图像相减求残差后,进行DCT变换。该函数的定义位于common\dct.c,如下所示。
  1. /*
  2. * 求残差用
  3. * 注意求的是一个“方块”形像素
  4. *
  5. * 参数的含义如下:
  6. * diff:输出的残差数据
  7. * i_size:方块的大小
  8. * pix1:输入数据1
  9. * i_pix1:输入数据1一行像素大小(stride)
  10. * pix2:输入数据2
  11. * i_pix2:输入数据2一行像素大小(stride)
  12. *
  13. */
  14. static inline void pixel_sub_wxh( dctcoef *diff, int i_size,
  15. pixel *pix1, int i_pix1, pixel *pix2, int i_pix2 )
  16. {
  17. for( int y = 0; y < i_size; y++ )
  18. {
  19. for( int x = 0; x < i_size; x++ )
  20. diff[x + y*i_size] = pix1[x] - pix2[x];//求残差
  21. pix1 += i_pix1;//前进到下一行
  22. pix2 += i_pix2;
  23. }
  24. }
  25. //4x4DCT变换
  26. //注意首先获取pix1和pix2两块数据的残差,然后再进行变换
  27. //返回dct[16]
  28. static void sub4x4_dct( dctcoef dct[16], pixel *pix1, pixel *pix2 )
  29. {
  30. dctcoef d[16];
  31. dctcoef tmp[16];
  32. //获取残差数据,存入d[16]
  33. //pix1一般为编码帧(enc)
  34. //pix2一般为重建帧(dec)
  35. pixel_sub_wxh( d, 4, pix1, FENC_STRIDE, pix2, FDEC_STRIDE );
  36. //处理残差d[16]
  37. //蝶形算法:横向4个像素
  38. for( int i = 0; i < 4; i++ )
  39. {
  40. int s03 = d[i*4+0] + d[i*4+3];
  41. int s12 = d[i*4+1] + d[i*4+2];
  42. int d03 = d[i*4+0] - d[i*4+3];
  43. int d12 = d[i*4+1] - d[i*4+2];
  44. tmp[0*4+i] = s03 + s12;
  45. tmp[1*4+i] = 2*d03 + d12;
  46. tmp[2*4+i] = s03 - s12;
  47. tmp[3*4+i] = d03 - 2*d12;
  48. }
  49. //蝶形算法:纵向
  50. for( int i = 0; i < 4; i++ )
  51. {
  52. int s03 = tmp[i*4+0] + tmp[i*4+3];
  53. int s12 = tmp[i*4+1] + tmp[i*4+2];
  54. int d03 = tmp[i*4+0] - tmp[i*4+3];
  55. int d12 = tmp[i*4+1] - tmp[i*4+2];
  56. dct[i*4+0] = s03 + s12;
  57. dct[i*4+1] = 2*d03 + d12;
  58. dct[i*4+2] = s03 - s12;
  59. dct[i*4+3] = d03 - 2*d12;
  60. }
  61. }
从源代码可以看出,sub4x4_dct()首先调用pixel_sub_wxh()求出两个输入图像块的残差,然后使用蝶形快速算法计算残差图像的DCT系数。

add4x4_idct()

add4x4_idct()可以将残差数据进行DCT反变换,并将变换后得到的残差像素数据叠加到预测数据上。该函数的定义位于common\dct.c,如下所示。
  1. //4x4DCT反变换(“add”代表叠加到已有的像素上)
  2. static void add4x4_idct( pixel *p_dst, dctcoef dct[16] )
  3. {
  4. dctcoef d[16];
  5. dctcoef tmp[16];
  6. for( int i = 0; i < 4; i++ )
  7. {
  8. int s02 = dct[0*4+i] + dct[2*4+i];
  9. int d02 = dct[0*4+i] - dct[2*4+i];
  10. int s13 = dct[1*4+i] + (dct[3*4+i]>>1);
  11. int d13 = (dct[1*4+i]>>1) - dct[3*4+i];
  12. tmp[i*4+0] = s02 + s13;
  13. tmp[i*4+1] = d02 + d13;
  14. tmp[i*4+2] = d02 - d13;
  15. tmp[i*4+3] = s02 - s13;
  16. }
  17. for( int i = 0; i < 4; i++ )
  18. {
  19. int s02 = tmp[0*4+i] + tmp[2*4+i];
  20. int d02 = tmp[0*4+i] - tmp[2*4+i];
  21. int s13 = tmp[1*4+i] + (tmp[3*4+i]>>1);
  22. int d13 = (tmp[1*4+i]>>1) - tmp[3*4+i];
  23. d[0*4+i] = ( s02 + s13 + 32 ) >> 6;
  24. d[1*4+i] = ( d02 + d13 + 32 ) >> 6;
  25. d[2*4+i] = ( d02 - d13 + 32 ) >> 6;
  26. d[3*4+i] = ( s02 - s13 + 32 ) >> 6;
  27. }
  28. for( int y = 0; y < 4; y++ )
  29. {
  30. for( int x = 0; x < 4; x++ )
  31. p_dst[x] = x264_clip_pixel( p_dst[x] + d[y*4+x] );
  32. p_dst += FDEC_STRIDE;
  33. }
  34. }
从源代码可以看出,add4x4_idct()首先采用快速蝶形算法对DCT系数进行DCT反变换后得到残差像素数据,然后再将残差数据叠加到p_dst指向的像素上。需要注意这里是“叠加”而不是“赋值”。

sub8x8_dct()

sub8x8_dct()可以将两块8x8的图像相减求残差后,进行4x4DCT变换。该函数的定义位于common\dct.c,如下所示。
  1. //8x8块:分解成4个4x4DCT变换,调用4次sub4x4_dct()
  2. //返回dct[4][16]
  3. static void sub8x8_dct( dctcoef dct[4][16], pixel *pix1, pixel *pix2 )
  4. {
  5. /*
  6. * 8x8 宏块被划分为4个4x4子块
  7. *
  8. * +---+---+
  9. * | 0 | 1 |
  10. * +---+---+
  11. * | 2 | 3 |
  12. * +---+---+
  13. *
  14. */
  15. sub4x4_dct( dct[0], &pix1[0], &pix2[0] );
  16. sub4x4_dct( dct[1], &pix1[4], &pix2[4] );
  17. sub4x4_dct( dct[2], &pix1[4*FENC_STRIDE+0], &pix2[4*FDEC_STRIDE+0] );
  18. sub4x4_dct( dct[3], &pix1[4*FENC_STRIDE+4], &pix2[4*FDEC_STRIDE+4] );
  19. }
从源代码可以看出, sub8x8_dct()将8x8的图像块分成4个4x4的图像块,分别调用了sub4x4_dct()。

sub16x16_dct()

sub16x16_dct()可以将两块16x16的图像相减求残差后,进行4x4DCT变换。该函数的定义位于common\dct.c,如下所示。
  1. //16x16块:分解成4个8x8的块做DCT变换,调用4次sub8x8_dct()
  2. //返回dct[16][16]
  3. static void sub16x16_dct( dctcoef dct[16][16], pixel *pix1, pixel *pix2 )
  4. {
  5. /*
  6. * 16x16 宏块被划分为4个8x8子块
  7. *
  8. * +--------+--------+
  9. * | | |
  10. * | 0 | 1 |
  11. * | | |
  12. * +--------+--------+
  13. * | | |
  14. * | 2 | 3 |
  15. * | | |
  16. * +--------+--------+
  17. *
  18. */
  19. sub8x8_dct( &dct[ 0], &pix1[0], &pix2[0] ); //0
  20. sub8x8_dct( &dct[ 4], &pix1[8], &pix2[8] ); //1
  21. sub8x8_dct( &dct[ 8], &pix1[8*FENC_STRIDE+0], &pix2[8*FDEC_STRIDE+0] ); //2
  22. sub8x8_dct( &dct[12], &pix1[8*FENC_STRIDE+8], &pix2[8*FDEC_STRIDE+8] ); //3
  23. }
从源代码可以看出, sub8x8_dct()将16x16的图像块分成4个8x8的图像块,分别调用了sub8x8_dct()。而sub8x8_dct()实际上又调用了4次sub4x4_dct()。所以可以得知,不论sub16x16_dct(),sub8x8_dct()还是sub4x4_dct(),本质都是进行4x4DCT。

dct4x4dc()

dct4x4dc()可以将输入的4x4图像块进行Hadamard变换。该函数的定义位于common\dct.c,如下所示。
  1. //Hadamard变换
  2. static void dct4x4dc( dctcoef d[16] )
  3. {
  4. dctcoef tmp[16];
  5. //蝶形算法:横向的4个像素
  6. for( int i = 0; i < 4; i++ )
  7. {
  8. int s01 = d[i*4+0] + d[i*4+1];
  9. int d01 = d[i*4+0] - d[i*4+1];
  10. int s23 = d[i*4+2] + d[i*4+3];
  11. int d23 = d[i*4+2] - d[i*4+3];
  12. tmp[0*4+i] = s01 + s23;
  13. tmp[1*4+i] = s01 - s23;
  14. tmp[2*4+i] = d01 - d23;
  15. tmp[3*4+i] = d01 + d23;
  16. }
  17. //蝶形算法:纵向
  18. for( int i = 0; i < 4; i++ )
  19. {
  20. int s01 = tmp[i*4+0] + tmp[i*4+1];
  21. int d01 = tmp[i*4+0] - tmp[i*4+1];
  22. int s23 = tmp[i*4+2] + tmp[i*4+3];
  23. int d23 = tmp[i*4+2] - tmp[i*4+3];
  24. d[i*4+0] = ( s01 + s23 + 1 ) >> 1;
  25. d[i*4+1] = ( s01 - s23 + 1 ) >> 1;
  26. d[i*4+2] = ( d01 - d23 + 1 ) >> 1;
  27. d[i*4+3] = ( d01 + d23 + 1 ) >> 1;
  28. }
  29. }
从源代码可以看出,dct4x4dc()实现了Hadamard快速蝶形算法。

x264_mc_init()

x264_mc_init()用于初始化运动补偿相关的汇编函数。该函数的定义位于common\mc.c,如下所示。
  1. //运动补偿
  2. void x264_mc_init( int cpu, x264_mc_functions_t *pf, int cpu_independent )
  3. {
  4. //亮度运动补偿
  5. pf->mc_luma = mc_luma;
  6. //获得匹配块
  7. pf->get_ref = get_ref;
  8. pf->mc_chroma = mc_chroma;
  9. //求平均
  10. pf->avg[PIXEL_16x16]= pixel_avg_16x16;
  11. pf->avg[PIXEL_16x8] = pixel_avg_16x8;
  12. pf->avg[PIXEL_8x16] = pixel_avg_8x16;
  13. pf->avg[PIXEL_8x8] = pixel_avg_8x8;
  14. pf->avg[PIXEL_8x4] = pixel_avg_8x4;
  15. pf->avg[PIXEL_4x16] = pixel_avg_4x16;
  16. pf->avg[PIXEL_4x8] = pixel_avg_4x8;
  17. pf->avg[PIXEL_4x4] = pixel_avg_4x4;
  18. pf->avg[PIXEL_4x2] = pixel_avg_4x2;
  19. pf->avg[PIXEL_2x8] = pixel_avg_2x8;
  20. pf->avg[PIXEL_2x4] = pixel_avg_2x4;
  21. pf->avg[PIXEL_2x2] = pixel_avg_2x2;
  22. //加权相关
  23. pf->weight = x264_mc_weight_wtab;
  24. pf->offsetadd = x264_mc_weight_wtab;
  25. pf->offsetsub = x264_mc_weight_wtab;
  26. pf->weight_cache = x264_weight_cache;
  27. //赋值-只包含了方形的
  28. pf->copy_16x16_unaligned = mc_copy_w16;
  29. pf->copy[PIXEL_16x16] = mc_copy_w16;
  30. pf->copy[PIXEL_8x8] = mc_copy_w8;
  31. pf->copy[PIXEL_4x4] = mc_copy_w4;
  32. pf->store_interleave_chroma = store_interleave_chroma;
  33. pf->load_deinterleave_chroma_fenc = load_deinterleave_chroma_fenc;
  34. pf->load_deinterleave_chroma_fdec = load_deinterleave_chroma_fdec;
  35. //拷贝像素-不论像素块大小
  36. pf->plane_copy = x264_plane_copy_c;
  37. pf->plane_copy_interleave = x264_plane_copy_interleave_c;
  38. pf->plane_copy_deinterleave = x264_plane_copy_deinterleave_c;
  39. pf->plane_copy_deinterleave_rgb = x264_plane_copy_deinterleave_rgb_c;
  40. pf->plane_copy_deinterleave_v210 = x264_plane_copy_deinterleave_v210_c;
  41. //关键:半像素内插
  42. pf->hpel_filter = hpel_filter;
  43. //几个空函数
  44. pf->prefetch_fenc_420 = prefetch_fenc_null;
  45. pf->prefetch_fenc_422 = prefetch_fenc_null;
  46. pf->prefetch_ref = prefetch_ref_null;
  47. pf->memcpy_aligned = memcpy;
  48. pf->memzero_aligned = memzero_aligned;
  49. //降低分辨率-线性内插(不是半像素内插)
  50. pf->frame_init_lowres_core = frame_init_lowres_core;
  51. pf->integral_init4h = integral_init4h;
  52. pf->integral_init8h = integral_init8h;
  53. pf->integral_init4v = integral_init4v;
  54. pf->integral_init8v = integral_init8v;
  55. pf->mbtree_propagate_cost = mbtree_propagate_cost;
  56. pf->mbtree_propagate_list = mbtree_propagate_list;
  57. //各种汇编版本
  58. #if HAVE_MMX
  59. x264_mc_init_mmx( cpu, pf );
  60. #endif
  61. #if HAVE_ALTIVEC
  62. if( cpu&X264_CPU_ALTIVEC )
  63. x264_mc_altivec_init( pf );
  64. #endif
  65. #if HAVE_ARMV6
  66. x264_mc_init_arm( cpu, pf );
  67. #endif
  68. #if ARCH_AARCH64
  69. x264_mc_init_aarch64( cpu, pf );
  70. #endif
  71. if( cpu_independent )
  72. {
  73. pf->mbtree_propagate_cost = mbtree_propagate_cost;
  74. pf->mbtree_propagate_list = mbtree_propagate_list;
  75. }
  76. }

从源代码可以看出,x264_mc_init()中包含了大量的像素内插、拷贝、求平均的函数。这些函数都是用于在H.264编码过程中进行运动估计和运动补偿的。x264_mc_init()的参数x264_mc_functions_t是一个结构体,其中包含了运动补偿函数相关的函数接口。x264_mc_functions_t的定义如下。
  1. typedef struct
  2. {
  3. void (*mc_luma)( pixel *dst, intptr_t i_dst, pixel **src, intptr_t i_src,
  4. int mvx, int mvy, int i_width, int i_height, const x264_weight_t *weight );
  5. /* may round up the dimensions if they're not a power of 2 */
  6. pixel* (*get_ref)( pixel *dst, intptr_t *i_dst, pixel **src, intptr_t i_src,
  7. int mvx, int mvy, int i_width, int i_height, const x264_weight_t *weight );
  8. /* mc_chroma may write up to 2 bytes of garbage to the right of dst,
  9. * so it must be run from left to right. */
  10. void (*mc_chroma)( pixel *dstu, pixel *dstv, intptr_t i_dst, pixel *src, intptr_t i_src,
  11. int mvx, int mvy, int i_width, int i_height );
  12. void (*avg[12])( pixel *dst, intptr_t dst_stride, pixel *src1, intptr_t src1_stride,
  13. pixel *src2, intptr_t src2_stride, int i_weight );
  14. /* only 16x16, 8x8, and 4x4 defined */
  15. void (*copy[7])( pixel *dst, intptr_t dst_stride, pixel *src, intptr_t src_stride, int i_height );
  16. void (*copy_16x16_unaligned)( pixel *dst, intptr_t dst_stride, pixel *src, intptr_t src_stride, int i_height );
  17. void (*store_interleave_chroma)( pixel *dst, intptr_t i_dst, pixel *srcu, pixel *srcv, int height );
  18. void (*load_deinterleave_chroma_fenc)( pixel *dst, pixel *src, intptr_t i_src, int height );
  19. void (*load_deinterleave_chroma_fdec)( pixel *dst, pixel *src, intptr_t i_src, int height );
  20. void (*plane_copy)( pixel *dst, intptr_t i_dst, pixel *src, intptr_t i_src, int w, int h );
  21. void (*plane_copy_interleave)( pixel *dst, intptr_t i_dst, pixel *srcu, intptr_t i_srcu,
  22. pixel *srcv, intptr_t i_srcv, int w, int h );
  23. /* may write up to 15 pixels off the end of each plane */
  24. void (*plane_copy_deinterleave)( pixel *dstu, intptr_t i_dstu, pixel *dstv, intptr_t i_dstv,
  25. pixel *src, intptr_t i_src, int w, int h );
  26. void (*plane_copy_deinterleave_rgb)( pixel *dsta, intptr_t i_dsta, pixel *dstb, intptr_t i_dstb,
  27. pixel *dstc, intptr_t i_dstc, pixel *src, intptr_t i_src, int pw, int w, int h );
  28. void (*plane_copy_deinterleave_v210)( pixel *dsty, intptr_t i_dsty,
  29. pixel *dstc, intptr_t i_dstc,
  30. uint32_t *src, intptr_t i_src, int w, int h );
  31. void (*hpel_filter)( pixel *dsth, pixel *dstv, pixel *dstc, pixel *src,
  32. intptr_t i_stride, int i_width, int i_height, int16_t *buf );
  33. /* prefetch the next few macroblocks of fenc or fdec */
  34. void (*prefetch_fenc) ( pixel *pix_y, intptr_t stride_y, pixel *pix_uv, intptr_t stride_uv, int mb_x );
  35. void (*prefetch_fenc_420)( pixel *pix_y, intptr_t stride_y, pixel *pix_uv, intptr_t stride_uv, int mb_x );
  36. void (*prefetch_fenc_422)( pixel *pix_y, intptr_t stride_y, pixel *pix_uv, intptr_t stride_uv, int mb_x );
  37. /* prefetch the next few macroblocks of a hpel reference frame */
  38. void (*prefetch_ref)( pixel *pix, intptr_t stride, int parity );
  39. void *(*memcpy_aligned)( void *dst, const void *src, size_t n );
  40. void (*memzero_aligned)( void *dst, size_t n );
  41. /* successive elimination prefilter */
  42. void (*integral_init4h)( uint16_t *sum, pixel *pix, intptr_t stride );
  43. void (*integral_init8h)( uint16_t *sum, pixel *pix, intptr_t stride );
  44. void (*integral_init4v)( uint16_t *sum8, uint16_t *sum4, intptr_t stride );
  45. void (*integral_init8v)( uint16_t *sum8, intptr_t stride );
  46. void (*frame_init_lowres_core)( pixel *src0, pixel *dst0, pixel *dsth, pixel *dstv, pixel *dstc,
  47. intptr_t src_stride, intptr_t dst_stride, int width, int height );
  48. weight_fn_t *weight;
  49. weight_fn_t *offsetadd;
  50. weight_fn_t *offsetsub;
  51. void (*weight_cache)( x264_t *, x264_weight_t * );
  52. void (*mbtree_propagate_cost)( int16_t *dst, uint16_t *propagate_in, uint16_t *intra_costs,
  53. uint16_t *inter_costs, uint16_t *inv_qscales, float *fps_factor, int len );
  54. void (*mbtree_propagate_list)( x264_t *h, uint16_t *ref_costs, int16_t (*mvs)[2],
  55. int16_t *propagate_amount, uint16_t *lowres_costs,
  56. int bipred_weight, int mb_y, int len, int list );
  57. } x264_mc_functions_t;
x264_mc_init()的工作就是对x264_mc_functions_t中的函数指针进行赋值。由于运动估计和运动补偿在x264中属于相对复杂的环节,其中许多函数的作用很难三言两语表述出来,因此只举一个相对简单的例子——半像素内插函数hpel_filter()。

相关知识简述

简单记录一下半像素插值的知识。《H.264标准》中规定,运动估计为1/4像素精度。因此在H.264编码和解码的过程中,需要将画面中的像素进行插值——简单地说就是把原先的1个像素点拓展成4x4一共16个点。下图显示了H.264编码和解码过程中像素插值情况。可以看出原先的G点的右下方通过插值的方式产生了a、b、c、d等一共16个点。
如图所示,1/4像素内插一般分成两步:
(1)半像素内插。这一步通过6抽头滤波器获得5个半像素点。
(2)线性内插。这一步通过简单的线性内插获得剩余的1/4像素点。

图中半像素内插点为b、m、h、s、j五个点。半像素内插方法是对整像素点进行6 抽头滤波得出,滤波器的权重为(1/32, -5/32, 5/8, 5/8, -5/32, 1/32)。例如b的计算公式为:

b=round( (E - 5F + 20G + 20H - 5I + J ) / 32)

剩下几个半像素点的计算关系如下:
m:由B、D、H、N、S、U计算
h:由A、C、G、M、R、T计算
s:由K、L、M、N、P、Q计算
j:由cc、dd、h、m、ee、ff计算。需要注意j点的运算量比较大,因为cc、dd、ee、ff都需要通过半像素内插方法进行计算。

在获得半像素点之后,就可以通过简单的线性内插获得1/4像素内插点了。1/4像素内插的方式如下图所示。例如图中a点的计算公式如下:

A=round( (G+b)/2 )

在这里有一点需要注意:位于4个角的e、g、p、r四个点并不是通过j点计算计算的,而是通过b、h、s、m四个半像素点计算的。



hpel_filter()

hpel_filter()用于进行半像素插值。该函数的定义位于common\mc.c,如下所示。
  1. //半像素插值公式
  2. //b= (E - 5F + 20G + 20H - 5I + J)/32
  3. // x
  4. //d取1,水平滤波器;d取stride,垂直滤波器(这里没有除以32)
  5. #define TAPFILTER(pix, d) ((pix)[x-2*d] + (pix)[x+3*d] - 5*((pix)[x-d] + (pix)[x+2*d]) + 20*((pix)[x] + (pix)[x+d]))
  6. /*
  7. * 半像素插值
  8. * dsth:水平滤波得到的半像素点(aa,bb,b,s,gg,hh)
  9. * dstv:垂直滤波的到的半像素点(cc,dd,h,m,ee,ff)
  10. * dstc:“水平+垂直”滤波得到的位于4个像素中间的半像素点(j)
  11. *
  12. * 半像素插值示意图如下:
  13. *
  14. * A aa B
  15. *
  16. * C bb D
  17. *
  18. * E F G b H I J
  19. *
  20. * cc dd h j m ee ff
  21. *
  22. * K L M s N P Q
  23. *
  24. * R gg S
  25. *
  26. * T hh U
  27. *
  28. * 计算公式如下:
  29. * b=round( (E - 5F + 20G + 20H - 5I + J ) / 32)
  30. *
  31. * 剩下几个半像素点的计算关系如下:
  32. * m:由B、D、H、N、S、U计算
  33. * h:由A、C、G、M、R、T计算
  34. * s:由K、L、M、N、P、Q计算
  35. * j:由cc、dd、h、m、ee、ff计算。需要注意j点的运算量比较大,因为cc、dd、ee、ff都需要通过半像素内插方法进行计算。
  36. *
  37. */
  38. static void hpel_filter( pixel *dsth, pixel *dstv, pixel *dstc, pixel *src,
  39. intptr_t stride, int width, int height, int16_t *buf )
  40. {
  41. const int pad = (BIT_DEPTH > 9) ? (-10 * PIXEL_MAX) : 0;
  42. /*
  43. * 几种半像素点之间的位置关系
  44. *
  45. * X: 像素点
  46. * H:水平滤波半像素点
  47. * V:垂直滤波半像素点
  48. * C: 中间位置半像素点
  49. *
  50. * X H X X X
  51. *
  52. * V C
  53. *
  54. * X X X X
  55. *
  56. *
  57. *
  58. * X X X X
  59. *
  60. */
  61. //一行一行处理
  62. for( int y = 0; y < height; y++ )
  63. {
  64. //一个一个点处理
  65. //每个整像素点都对应h,v,c三个半像素点
  66. //v
  67. for( int x = -2; x < width+3; x++ )//(aa,bb,b,s,gg,hh),结果存入buf
  68. {
  69. //垂直滤波半像素点
  70. int v = TAPFILTER(src,stride);
  71. dstv[x] = x264_clip_pixel( (v + 16) >> 5 );
  72. /* transform v for storage in a 16-bit integer */
  73. //这应该是给dstc计算使用的?
  74. buf[x+2] = v + pad;
  75. }
  76. //c
  77. for( int x = 0; x < width; x++ )
  78. dstc[x] = x264_clip_pixel( (TAPFILTER(buf+2,1) - 32*pad + 512) >> 10 );//四个相邻像素中间的半像素点
  79. //h
  80. for( int x = 0; x < width; x++ )
  81. dsth[x] = x264_clip_pixel( (TAPFILTER(src,1) + 16) >> 5 );//水平滤波半像素点
  82. dsth += stride;
  83. dstv += stride;
  84. dstc += stride;
  85. src += stride;
  86. }
  87. }

从源代码可以看出,hpel_filter()中包含了一个宏TAPFILTER()用来完成半像素点像素值的计算。在完成半像素插值工作后,dsth中存储的是经过水平插值后的半像素点,dstv中存储的是经过垂直插值后的半像素点,dstc中存储的是位于4个相邻像素点中间位置的半像素点。这三块内存中的点的位置关系如下图所示(灰色的点是整像素点)。



x264_quant_init()

x264_quant_init()初始化量化和反量化相关的汇编函数。该函数的定义位于common\quant.c,如下所示。
  1. //量化
  2. void x264_quant_init( x264_t *h, int cpu, x264_quant_function_t *pf )
  3. {
  4. //这个好像是针对8x8DCT的
  5. pf->quant_8x8 = quant_8x8;
  6. //量化4x4=16个
  7. pf->quant_4x4 = quant_4x4;
  8. //注意:处理4个4x4的块
  9. pf->quant_4x4x4 = quant_4x4x4;
  10. //Intra16x16中,16个DC系数Hadamard变换后对的它们量化
  11. pf->quant_4x4_dc = quant_4x4_dc;
  12. pf->quant_2x2_dc = quant_2x2_dc;
  13. //反量化4x4=16个
  14. pf->dequant_4x4 = dequant_4x4;
  15. pf->dequant_4x4_dc = dequant_4x4_dc;
  16. pf->dequant_8x8 = dequant_8x8;
  17. pf->idct_dequant_2x4_dc = idct_dequant_2x4_dc;
  18. pf->idct_dequant_2x4_dconly = idct_dequant_2x4_dconly;
  19. pf->optimize_chroma_2x2_dc = optimize_chroma_2x2_dc;
  20. pf->optimize_chroma_2x4_dc = optimize_chroma_2x4_dc;
  21. pf->denoise_dct = x264_denoise_dct;
  22. pf->decimate_score15 = x264_decimate_score15;
  23. pf->decimate_score16 = x264_decimate_score16;
  24. pf->decimate_score64 = x264_decimate_score64;
  25. pf->coeff_last4 = x264_coeff_last4;
  26. pf->coeff_last8 = x264_coeff_last8;
  27. pf->coeff_last[ DCT_LUMA_AC] = x264_coeff_last15;
  28. pf->coeff_last[ DCT_LUMA_4x4] = x264_coeff_last16;
  29. pf->coeff_last[ DCT_LUMA_8x8] = x264_coeff_last64;
  30. pf->coeff_level_run4 = x264_coeff_level_run4;
  31. pf->coeff_level_run8 = x264_coeff_level_run8;
  32. pf->coeff_level_run[ DCT_LUMA_AC] = x264_coeff_level_run15;
  33. pf->coeff_level_run[ DCT_LUMA_4x4] = x264_coeff_level_run16;
  34. #if HIGH_BIT_DEPTH
  35. #if HAVE_MMX
  36. INIT_TRELLIS( sse2 );
  37. if( cpu&X264_CPU_MMX2 )
  38. {
  39. #if ARCH_X86
  40. pf->denoise_dct = x264_denoise_dct_mmx;
  41. pf->decimate_score15 = x264_decimate_score15_mmx2;
  42. pf->decimate_score16 = x264_decimate_score16_mmx2;
  43. pf->decimate_score64 = x264_decimate_score64_mmx2;
  44. pf->coeff_last8 = x264_coeff_last8_mmx2;
  45. pf->coeff_last[ DCT_LUMA_AC] = x264_coeff_last15_mmx2;
  46. pf->coeff_last[ DCT_LUMA_4x4] = x264_coeff_last16_mmx2;
  47. pf->coeff_last[ DCT_LUMA_8x8] = x264_coeff_last64_mmx2;
  48. pf->coeff_level_run8 = x264_coeff_level_run8_mmx2;
  49. pf->coeff_level_run[ DCT_LUMA_AC] = x264_coeff_level_run15_mmx2;
  50. pf->coeff_level_run[ DCT_LUMA_4x4] = x264_coeff_level_run16_mmx2;
  51. #endif
  52. pf->coeff_last4 = x264_coeff_last4_mmx2;
  53. pf->coeff_level_run4 = x264_coeff_level_run4_mmx2;
  54. if( cpu&X264_CPU_LZCNT )
  55. pf->coeff_level_run4 = x264_coeff_level_run4_mmx2_lzcnt;
  56. }
  57. //此处省略大量的X86、ARM等平台的汇编函数初始化代码
  58. }

从源代码可以看出,x264_quant_init ()初始化了一系列的量化相关的函数。它的输入参数x264_quant_function_t是一个结构体,其中包含了和量化相关各种函数指针。x264_quant_function_t的定义如下所示。
  1. typedef struct
  2. {
  3. int (*quant_8x8) ( dctcoef dct[64], udctcoef mf[64], udctcoef bias[64] );
  4. int (*quant_4x4) ( dctcoef dct[16], udctcoef mf[16], udctcoef bias[16] );
  5. int (*quant_4x4x4)( dctcoef dct[4][16], udctcoef mf[16], udctcoef bias[16] );
  6. int (*quant_4x4_dc)( dctcoef dct[16], int mf, int bias );
  7. int (*quant_2x2_dc)( dctcoef dct[4], int mf, int bias );
  8. void (*dequant_8x8)( dctcoef dct[64], int dequant_mf[6][64], int i_qp );
  9. void (*dequant_4x4)( dctcoef dct[16], int dequant_mf[6][16], int i_qp );
  10. void (*dequant_4x4_dc)( dctcoef dct[16], int dequant_mf[6][16], int i_qp );
  11. void (*idct_dequant_2x4_dc)( dctcoef dct[8], dctcoef dct4x4[8][16], int dequant_mf[6][16], int i_qp );
  12. void (*idct_dequant_2x4_dconly)( dctcoef dct[8], int dequant_mf[6][16], int i_qp );
  13. int (*optimize_chroma_2x2_dc)( dctcoef dct[4], int dequant_mf );
  14. int (*optimize_chroma_2x4_dc)( dctcoef dct[8], int dequant_mf );
  15. void (*denoise_dct)( dctcoef *dct, uint32_t *sum, udctcoef *offset, int size );
  16. int (*decimate_score15)( dctcoef *dct );
  17. int (*decimate_score16)( dctcoef *dct );
  18. int (*decimate_score64)( dctcoef *dct );
  19. int (*coeff_last[14])( dctcoef *dct );
  20. int (*coeff_last4)( dctcoef *dct );
  21. int (*coeff_last8)( dctcoef *dct );
  22. int (*coeff_level_run[13])( dctcoef *dct, x264_run_level_t *runlevel );
  23. int (*coeff_level_run4)( dctcoef *dct, x264_run_level_t *runlevel );
  24. int (*coeff_level_run8)( dctcoef *dct, x264_run_level_t *runlevel );
  25. #define TRELLIS_PARAMS const int *unquant_mf, const uint8_t *zigzag, int lambda2,\
  26. int last_nnz, dctcoef *coefs, dctcoef *quant_coefs, dctcoef *dct,\
  27. uint8_t *cabac_state_sig, uint8_t *cabac_state_last,\
  28. uint64_t level_state0, uint16_t level_state1
  29. int (*trellis_cabac_4x4)( TRELLIS_PARAMS, int b_ac );
  30. int (*trellis_cabac_8x8)( TRELLIS_PARAMS, int b_interlaced );
  31. int (*trellis_cabac_4x4_psy)( TRELLIS_PARAMS, int b_ac, dctcoef *fenc_dct, int psy_trellis );
  32. int (*trellis_cabac_8x8_psy)( TRELLIS_PARAMS, int b_interlaced, dctcoef *fenc_dct, int psy_trellis );
  33. int (*trellis_cabac_dc)( TRELLIS_PARAMS, int num_coefs );
  34. int (*trellis_cabac_chroma_422_dc)( TRELLIS_PARAMS );
  35. } x264_quant_function_t;
x264_quant_init ()的工作就是对x264_quant_function_t中的函数指针进行赋值。下文举例分析其中2个函数:4x4矩阵量化函数quant_4x4(),4个4x4矩阵量化函数quant_4x4x4()。

相关知识简述

简单记录一下量化的概念。量化是H.264视频压缩编码中对视频质量影响最大的地方,也是会导致“信息丢失”的地方。量化的原理可以表示为下面公式:

FQ=round(y/Qstep)

其中,y 为输入样本点编码,Qstep为量化步长,FQ 为y 的量化值,round()为取整函数(其输出为与输入实数最近的整数)。其相反过程,即反量化为:

y’=FQ*Qstep
如果Qstep较大,则量化值FQ取值较小,其相应的编码长度较小,但是但反量化时损失较多的图像细节信息。简而言之,Qstep越大,视频压缩编码后体积越小,视频质量越差。
在H.264 中,量化步长Qstep 共有52 个值,如下表所示。其中QP 是量化参数,是量化步长的序号。当QP 取最小值0 时代表最精细的量化,当QP 取最大值51 时代表最粗糙的量化。QP 每增加6,Qstep 增加一倍。
 

《H.264标准》中规定,量化过程除了完成本职工作外,还需要完成它前一步DCT变换中“系数相乘”的工作。这一步骤的推导过程不再记录,直接给出最终的公式(这个公式完全为整数运算,同时避免了除法的使用):

|Zij| = (|Wij|*MF + f)>>qbits

sign(Zij) = sign (Wij)

其中:
sign()为符号函数。
Wij为DCT变换后的系数。
MF的值如下表所示。表中只列出对应QP 值为0 到5 的MF 值。QP大于6之后,将QP实行对6取余数操作,再找到MF的值。
qbits计算公式为“qbits = 15 + floor(QP/6)”。即它的值随QP 值每增加6 而增加1。
f 是偏移量(用于改善恢复图像的视觉效果)。对帧内预测图像块取2^qbits/3,对帧间预测图像块取2^qbits/6。
 
为了更形象的显示MF的取值,做了下面一张示意图。图中深蓝色代表MF取值较大的点,而浅蓝色代表MF取值较小的点。
 

quant_4x4()

quant_4x4()用于对4x4的DCT残差矩阵进行量化。该函数的定义位于common\quant.c,如下所示。
  1. //4x4量化
  2. //输入输出都是dct[16]
  3. static int quant_4x4( dctcoef dct[16], udctcoef mf[16], udctcoef bias[16] )
  4. {
  5. int nz = 0;
  6. //循环16个元素
  7. for( int i = 0; i < 16; i++ )
  8. QUANT_ONE( dct[i], mf[i], bias[i] );
  9. return !!nz;
  10. }
可以看出quant_4x4()循环16次调用了QUANT_ONE()完成了量化工作。并且将DCT系数值,MF值,bias偏移值直接传递给了该宏。

QUANT_ONE()

QUANT_ONE()完成了一个DCT系数的量化工作,它的定义如下。
  1. //量化1个元素
  2. #define QUANT_ONE( coef, mf, f ) \
  3. { \
  4. if( (coef) > 0 ) \
  5. (coef) = (f + (coef)) * (mf) >> 16; \
  6. else \
  7. (coef) = - ((f - (coef)) * (mf) >> 16); \
  8. nz |= (coef); \
  9. }
从QUANT_ONE()的定义可以看出,它实现了上文提到的H.264标准中的量化公式。

quant_4x4x4()

quant_4x4x4()用于对4个4x4的DCT残差矩阵进行量化。该函数的定义位于common\quant.c,如下所示。
  1. //处理4个4x4量化
  2. //输入输出都是dct[4][16]
  3. static int quant_4x4x4( dctcoef dct[4][16], udctcoef mf[16], udctcoef bias[16] )
  4. {
  5. int nza = 0;
  6. //处理4个
  7. for( int j = 0; j < 4; j++ )
  8. {
  9. int nz = 0;
  10. //量化
  11. for( int i = 0; i < 16; i++ )
  12. QUANT_ONE( dct[j][i], mf[i], bias[i] );
  13. nza |= (!!nz)<<j;
  14. }
  15. return nza;
  16. }
从quant_4x4x4()的定义可以看出,该函数相当于调用了4次quant_4x4()函数。

x264_deblock_init()

x264_deblock_init()用于初始化去块效应滤波器相关的汇编函数。该函数的定义位于common\deblock.c,如下所示。
  1. //去块效应滤波
  2. void x264_deblock_init( int cpu, x264_deblock_function_t *pf, int b_mbaff )
  3. {
  4. //注意:标记“v”的垂直滤波器是处理水平边界用的
  5. //亮度-普通滤波器-边界强度Bs=1,2,3
  6. pf->deblock_luma[1] = deblock_v_luma_c;
  7. pf->deblock_luma[0] = deblock_h_luma_c;
  8. //色度的
  9. pf->deblock_chroma[1] = deblock_v_chroma_c;
  10. pf->deblock_h_chroma_420 = deblock_h_chroma_c;
  11. pf->deblock_h_chroma_422 = deblock_h_chroma_422_c;
  12. //亮度-强滤波器-边界强度Bs=4
  13. pf->deblock_luma_intra[1] = deblock_v_luma_intra_c;
  14. pf->deblock_luma_intra[0] = deblock_h_luma_intra_c;
  15. pf->deblock_chroma_intra[1] = deblock_v_chroma_intra_c;
  16. pf->deblock_h_chroma_420_intra = deblock_h_chroma_intra_c;
  17. pf->deblock_h_chroma_422_intra = deblock_h_chroma_422_intra_c;
  18. pf->deblock_luma_mbaff = deblock_h_luma_mbaff_c;
  19. pf->deblock_chroma_420_mbaff = deblock_h_chroma_mbaff_c;
  20. pf->deblock_luma_intra_mbaff = deblock_h_luma_intra_mbaff_c;
  21. pf->deblock_chroma_420_intra_mbaff = deblock_h_chroma_intra_mbaff_c;
  22. pf->deblock_strength = deblock_strength_c;
  23. #if HAVE_MMX
  24. if( cpu&X264_CPU_MMX2 )
  25. {
  26. #if ARCH_X86
  27. pf->deblock_luma[1] = x264_deblock_v_luma_mmx2;
  28. pf->deblock_luma[0] = x264_deblock_h_luma_mmx2;
  29. pf->deblock_chroma[1] = x264_deblock_v_chroma_mmx2;
  30. pf->deblock_h_chroma_420 = x264_deblock_h_chroma_mmx2;
  31. pf->deblock_chroma_420_mbaff = x264_deblock_h_chroma_mbaff_mmx2;
  32. pf->deblock_h_chroma_422 = x264_deblock_h_chroma_422_mmx2;
  33. pf->deblock_h_chroma_422_intra = x264_deblock_h_chroma_422_intra_mmx2;
  34. pf->deblock_luma_intra[1] = x264_deblock_v_luma_intra_mmx2;
  35. pf->deblock_luma_intra[0] = x264_deblock_h_luma_intra_mmx2;
  36. pf->deblock_chroma_intra[1] = x264_deblock_v_chroma_intra_mmx2;
  37. pf->deblock_h_chroma_420_intra = x264_deblock_h_chroma_intra_mmx2;
  38. pf->deblock_chroma_420_intra_mbaff = x264_deblock_h_chroma_intra_mbaff_mmx2;
  39. #endif
  40. //此处省略大量的X86、ARM等平台的汇编函数初始化代码
  41. }

从源代码可以看出,x264_deblock_init()中初始化了一系列环路滤波函数。这些函数名称的规则如下:
(1)包含“v”的是垂直滤波器,用于处理水平边界;包含“h”的是水平滤波器,用于处理垂直边界。
(2)包含“luma”的是亮度滤波器,包含“chroma”的是色度滤波器。
(3)包含“intra”的是处理边界强度Bs为4的强滤波器,不包含“intra”的是普通滤波器。

x264_deblock_init()的输入参数x264_deblock_function_t是一个结构体,其中包含了环路滤波器相关的函数指针。x264_deblock_function_t的定义如下所示。

  1. typedef struct
  2. {
  3. x264_deblock_inter_t deblock_luma[2];
  4. x264_deblock_inter_t deblock_chroma[2];
  5. x264_deblock_inter_t deblock_h_chroma_420;
  6. x264_deblock_inter_t deblock_h_chroma_422;
  7. x264_deblock_intra_t deblock_luma_intra[2];
  8. x264_deblock_intra_t deblock_chroma_intra[2];
  9. x264_deblock_intra_t deblock_h_chroma_420_intra;
  10. x264_deblock_intra_t deblock_h_chroma_422_intra;
  11. x264_deblock_inter_t deblock_luma_mbaff;
  12. x264_deblock_inter_t deblock_chroma_mbaff;
  13. x264_deblock_inter_t deblock_chroma_420_mbaff;
  14. x264_deblock_inter_t deblock_chroma_422_mbaff;
  15. x264_deblock_intra_t deblock_luma_intra_mbaff;
  16. x264_deblock_intra_t deblock_chroma_intra_mbaff;
  17. x264_deblock_intra_t deblock_chroma_420_intra_mbaff;
  18. x264_deblock_intra_t deblock_chroma_422_intra_mbaff;
  19. void (*deblock_strength) ( uint8_t nnz[X264_SCAN8_SIZE], int8_t ref[2][X264_SCAN8_LUMA_SIZE],
  20. int16_t mv[2][X264_SCAN8_LUMA_SIZE][2], uint8_t bs[2][8][4], int mvy_limit,
  21. int bframe );
  22. } x264_deblock_function_t;
x264_deblock_init()的工作就是对x264_deblock_function_t中的函数指针进行赋值。可以看出x264_deblock_function_t中很多的元素是一个包含2个元素的数组,例如deblock_luma[2],deblock_luma_intra[2]等。这些数组中的元素[0]一般是水平滤波器,而元素[1]是垂直滤波器。下文将会举例分析一个普通边界的亮度垂直滤波器函数deblock_v_luma_c()。

相关知识简述

简单记录一下环路滤波(去块效应滤波)的知识。X264的重建帧(通过解码得到)一般情况下会出现方块效应。产生这种效应的原因主要有两个:
(1)DCT变换后的量化造成误差(主要原因)。
(2)运动补偿
正是由于这种块效应的存在,才需要添加环路滤波器调整相邻的“块”边缘上的像素值以减轻这种视觉上的不连续感。下面一张图显示了环路滤波的效果。图中左边的图没有使用环路滤波,而右边的图使用了环路滤波。
 

环路滤波分类
环路滤波器根据滤波的强度可以分为两种:
(1)普通滤波器。针对边界的Bs(边界强度)为1、2、3的滤波器。此时环路滤波涉及到方块边界周围的6个点(边界两边各3个点):p2,p1,p0,q0,q1,q2。需要处理4个点(边界两边各2个点,只以p点为例):
p0’ = p0 + (((q0 - p0 ) << 2) + (p1 - q1) + 4) >> 3
p1’ = ( p2 + ( ( p0 + q0 + 1 ) >> 1) – 2p1 ) >> 1
(2)强滤波器。针对边界的Bs(边界强度)为4的滤波器。此时环路滤波涉及到方块边界周围的8个点(边界两边各4个点):p3,p2,p1,p0,q0,q1,q2,q3。需要处理6个点(边界两边各3个点,只以p点为例):
p0’ = ( p2 + 2*p1 + 2*p0 + 2*q0 + q1 + 4 ) >> 3
p1’ = ( p2 + p1 + p0 + q0 + 2 ) >> 2
p2’ = ( 2*p3 + 3*p2 + p1 + p0 + q0 + 4 ) >> 3
其中上文中提到的边界强度Bs的判定方式如下。

条件(针对两边的图像块)

Bs

有一个块为帧内预测 + 边界为宏块边界

4

有一个块为帧内预测

3

有一个块对残差编码

2

运动矢量差不小于1像素

1

运动补偿参考帧不同

1

其它

0


总体说来,与帧内预测相关的图像块(帧内预测块)的边界强度比较大,取值为3或者4;与运动补偿相关的图像块(帧间预测块)的边界强度比较小,取值为1。

环路滤波的门限
并不是所有的块的边界处都需要环路滤波。例如画面中物体的边界正好和块的边界重合的话,就不能进行滤波,否则会使画面中物体的边界变模糊。因此需要区别开物体边界和块效应边界。一般情况下,物体边界两边的像素值差别很大,而块效应边界两边像素值差别比较小。《H.264标准》以这个特点定义了2个变量alpha和beta来判决边界是否需要进行环路滤波。只有满足下面三个条件的时候才能进行环路滤波:
| p0 - q0 | < alpha
| p1 – p0 | < beta
| q1 - q0 | < beta
简而言之,就是边界两边的两个点的像素值不能太大,即不能超过alpha;边界一边的前两个点之间的像素值也不能太大,即不能超过beta。其中alpha和beta是根据量化参数QP推算出来(具体方法不再记录)。总体说来QP越大,alpha和beta的值也越大,也就越容易触发环路滤波。由于QP越大表明压缩的程度越大,所以也可以得知高压缩比的情况下更需要进行环路滤波。

deblock_v_luma_c()

deblock_v_luma_c()是一个普通强度的垂直滤波器,用于处理边界强度Bs为1,2,3的水平边界。该函数的定义位于common\deblock.c,如下所示。
  1. //去块效应滤波-普通滤波,Bs为1,2,3
  2. //垂直(Vertical)滤波器
  3. // 边界
  4. // x
  5. // x
  6. // 边界----------
  7. // x
  8. // x
  9. //
  10. //
  11. static void deblock_v_luma_c( pixel *pix, intptr_t stride, int alpha, int beta, int8_t *tc0 )
  12. {
  13. //xstride=stride(用于选择滤波的像素)
  14. //ystride=1
  15. deblock_luma_c( pix, stride, 1, alpha, beta, tc0 );
  16. }
可以看出deblock_v_luma_c()调用了另一个函数deblock_luma_c()。需要注意传递给deblock_luma_c()是一个水平滤波器和垂直滤波器都会调用的“通用”滤波器函数。在这里传递给deblock_luma_c()第二个参数xstride的值为stride,第三个参数ystride的值为1。

deblock_luma_c()

deblock_luma_c()是一个通用的滤波器函数,定义如下所示。
  1. //去块效应滤波-普通滤波,Bs为1,2,3
  2. static inline void deblock_luma_c( pixel *pix, intptr_t xstride, intptr_t ystride, int alpha, int beta, int8_t *tc0 )
  3. {
  4. for( int i = 0; i < 4; i++ )
  5. {
  6. if( tc0[i] < 0 )
  7. {
  8. pix += 4*ystride;
  9. continue;
  10. }
  11. //滤4个像素
  12. for( int d = 0; d < 4; d++, pix += ystride )
  13. deblock_edge_luma_c( pix, xstride, alpha, beta, tc0[i] );
  14. }
  15. }
从源代码中可以看出,具体的滤波在deblock_edge_luma_c()中完成。处理完一个像素后,会继续处理与当前像素距离为ystride的像素。

deblock_edge_luma_c()

deblock_edge_luma_c()用于完成具体的滤波工作。该函数的定义如下所示。
  1. /* From ffmpeg */
  2. //去块效应滤波-普通滤波,Bs为1,2,3
  3. //从FFmpeg复制过来的?
  4. static ALWAYS_INLINE void deblock_edge_luma_c( pixel *pix, intptr_t xstride, int alpha, int beta, int8_t tc0 )
  5. {
  6. //p和q
  7. //如果xstride=stride,ystride=1
  8. //就是处理纵向的6个像素
  9. //对应的是方块的横向边界的滤波,即如下所示:
  10. // p2
  11. // p1
  12. // p0
  13. //=====图像边界=====
  14. // q0
  15. // q1
  16. // q2
  17. //
  18. //如果xstride=1,ystride=stride
  19. //就是处理纵向的6个像素
  20. //对应的是方块的横向边界的滤波,即如下所示:
  21. // ||
  22. // p2 p1 p0 || q0 q1 q2
  23. // ||
  24. // 边界
  25. //注意:这里乘的是xstride
  26. int p2 = pix[-3*xstride];
  27. int p1 = pix[-2*xstride];
  28. int p0 = pix[-1*xstride];
  29. int q0 = pix[ 0*xstride];
  30. int q1 = pix[ 1*xstride];
  31. int q2 = pix[ 2*xstride];
  32. //计算方法参考相关的标准
  33. //alpha和beta是用于检查图像内容的2个参数
  34. //只有满足if()里面3个取值条件的时候(只涉及边界旁边的4个点),才会滤波
  35. if( abs( p0 - q0 ) < alpha && abs( p1 - p0 ) < beta && abs( q1 - q0 ) < beta )
  36. {
  37. int tc = tc0;
  38. int delta;
  39. //上面2个点(p0,p2)满足条件的时候,滤波p1
  40. //int x264_clip3( int v, int i_min, int i_max )用于限幅
  41. if( abs( p2 - p0 ) < beta )
  42. {
  43. if( tc0 )
  44. pix[-2*xstride] = p1 + x264_clip3( (( p2 + ((p0 + q0 + 1) >> 1)) >> 1) - p1, -tc0, tc0 );
  45. tc++;
  46. }
  47. //下面2个点(q0,q2)满足条件的时候,滤波q1
  48. if( abs( q2 - q0 ) < beta )
  49. {
  50. if( tc0 )
  51. pix[ 1*xstride] = q1 + x264_clip3( (( q2 + ((p0 + q0 + 1) >> 1)) >> 1) - q1, -tc0, tc0 );
  52. tc++;
  53. }
  54. delta = x264_clip3( (((q0 - p0 ) << 2) + (p1 - q1) + 4) >> 3, -tc, tc );
  55. //p0
  56. pix[-1*xstride] = x264_clip_pixel( p0 + delta ); /* p0' */
  57. //q0
  58. pix[ 0*xstride] = x264_clip_pixel( q0 - delta ); /* q0' */
  59. }
  60. }
从源代码可以看出,deblock_edge_luma_c()实现了前文记录的滤波公式。

deblock_h_luma_c()

deblock_h_luma_c()是一个普通强度的水平滤波器,用于处理边界强度Bs为1,2,3的垂直边界。该函数的定义如下所示。
  1. //去块效应滤波-普通滤波,Bs为1,2,3
  2. //水平(Horizontal)滤波器
  3. // 边界
  4. // |
  5. // x x x | x x x
  6. // |
  7. static void deblock_h_luma_c( pixel *pix, intptr_t stride, int alpha, int beta, int8_t *tc0 )
  8. {
  9. //xstride=1(用于选择滤波的像素)
  10. //ystride=stride
  11. deblock_luma_c( pix, 1, stride, alpha, beta, tc0 );
  12. }
从源代码可以看出,和deblock_v_luma_c()类似,deblock_h_luma_c()同样调用了deblock_luma_c()函数。唯一的不同在于它传递给deblock_luma_c()的第2个参数xstride为1,第3个参数ystride为stride。

mbcmp_init()

mbcmp_init()函数决定了x264_pixel_function_t中的像素比较的一系列函数(mbcmp[])使用SAD还是SATD。该函数的定义位于encoder\encoder.c,如下所示。
  1. //决定了像素比较的时候用SAD还是SATD
  2. static void mbcmp_init( x264_t *h )
  3. {
  4. //b_lossless一般为0
  5. //主要看i_subpel_refine,大于1的话就使用SATD
  6. int satd = !h->mb.b_lossless && h->param.analyse.i_subpel_refine > 1;
  7. //sad或者satd赋值给mbcmp
  8. memcpy( h->pixf.mbcmp, satd ? h->pixf.satd : h->pixf.sad_aligned, sizeof(h->pixf.mbcmp) );
  9. memcpy( h->pixf.mbcmp_unaligned, satd ? h->pixf.satd : h->pixf.sad, sizeof(h->pixf.mbcmp_unaligned) );
  10. h->pixf.intra_mbcmp_x3_16x16 = satd ? h->pixf.intra_satd_x3_16x16 : h->pixf.intra_sad_x3_16x16;
  11. h->pixf.intra_mbcmp_x3_8x16c = satd ? h->pixf.intra_satd_x3_8x16c : h->pixf.intra_sad_x3_8x16c;
  12. h->pixf.intra_mbcmp_x3_8x8c = satd ? h->pixf.intra_satd_x3_8x8c : h->pixf.intra_sad_x3_8x8c;
  13. h->pixf.intra_mbcmp_x3_8x8 = satd ? h->pixf.intra_sa8d_x3_8x8 : h->pixf.intra_sad_x3_8x8;
  14. h->pixf.intra_mbcmp_x3_4x4 = satd ? h->pixf.intra_satd_x3_4x4 : h->pixf.intra_sad_x3_4x4;
  15. h->pixf.intra_mbcmp_x9_4x4 = h->param.b_cpu_independent || h->mb.b_lossless ? NULL
  16. : satd ? h->pixf.intra_satd_x9_4x4 : h->pixf.intra_sad_x9_4x4;
  17. h->pixf.intra_mbcmp_x9_8x8 = h->param.b_cpu_independent || h->mb.b_lossless ? NULL
  18. : satd ? h->pixf.intra_sa8d_x9_8x8 : h->pixf.intra_sad_x9_8x8;
  19. satd &= h->param.analyse.i_me_method == X264_ME_TESA;
  20. memcpy( h->pixf.fpelcmp, satd ? h->pixf.satd : h->pixf.sad, sizeof(h->pixf.fpelcmp) );
  21. memcpy( h->pixf.fpelcmp_x3, satd ? h->pixf.satd_x3 : h->pixf.sad_x3, sizeof(h->pixf.fpelcmp_x3) );
  22. memcpy( h->pixf.fpelcmp_x4, satd ? h->pixf.satd_x4 : h->pixf.sad_x4, sizeof(h->pixf.fpelcmp_x4) );
  23. }

从mbcmp_init()的源代码可以看出,当i_subpel_refine取值大于1的时候,satd变量为1,此时后续代码中赋值给mbcmp[]相关的一系列函数指针的函数就是SATD函数;当i_subpel_refine取值小于等于1的时候,satd变量为0,此时后续代码中赋值给mbcmp[]相关的一系列函数指针的函数就是SAD函数。

至此x264_encoder_open()的源代码就分析完毕了。下文继续分析x264_encoder_headers()和x264_encoder_close()函数。


x264_encoder_headers()

x264_encoder_headers()是libx264的一个API函数,用于输出SPS/PPS/SEI这些H.264码流的头信息。该函数的声明如下。
  1. /* x264_encoder_headers:
  2. * return the SPS and PPS that will be used for the whole stream.
  3. * *pi_nal is the number of NAL units outputted in pp_nal.
  4. * returns the number of bytes in the returned NALs.
  5. * returns negative on error.
  6. * the payloads of all output NALs are guaranteed to be sequential in memory. */
  7. int x264_encoder_headers( x264_t *, x264_nal_t **pp_nal, int *pi_nal );
x264_encoder_headers()的定义位于encoder\encoder.c,如下所示。
  1. /****************************************************************************
  2. * x264_encoder_headers:
  3. * 注释和处理:雷霄骅
  4. * http://blog.csdn.net/leixiaohua1020
  5. * leixiaohua1020@126.com
  6. ****************************************************************************/
  7. //输出文件头(SPS、PPS、SEI)
  8. int x264_encoder_headers( x264_t *h, x264_nal_t **pp_nal, int *pi_nal )
  9. {
  10. int frame_size = 0;
  11. /* init bitstream context */
  12. h->out.i_nal = 0;
  13. bs_init( &h->out.bs, h->out.p_bitstream, h->out.i_bitstream );
  14. /* Write SEI, SPS and PPS. */
  15. /* generate sequence parameters */
  16. //输出SPS
  17. x264_nal_start( h, NAL_SPS, NAL_PRIORITY_HIGHEST );
  18. x264_sps_write( &h->out.bs, h->sps );
  19. if( x264_nal_end( h ) )
  20. return -1;
  21. /* generate picture parameters */
  22. x264_nal_start( h, NAL_PPS, NAL_PRIORITY_HIGHEST );
  23. //输出PPS
  24. x264_pps_write( &h->out.bs, h->sps, h->pps );
  25. if( x264_nal_end( h ) )
  26. return -1;
  27. /* identify ourselves */
  28. x264_nal_start( h, NAL_SEI, NAL_PRIORITY_DISPOSABLE );
  29. //输出SEI(其中包含了配置信息)
  30. if( x264_sei_version_write( h, &h->out.bs ) )
  31. return -1;
  32. if( x264_nal_end( h ) )
  33. return -1;
  34. frame_size = x264_encoder_encapsulate_nals( h, 0 );
  35. if( frame_size < 0 )
  36. return -1;
  37. /* now set output*/
  38. *pi_nal = h->out.i_nal;
  39. *pp_nal = &h->out.nal[0];
  40. h->out.i_nal = 0;
  41. return frame_size;
  42. }

从源代码可以看出,x264_encoder_headers()分别调用了x264_sps_write(),x264_pps_write(),x264_sei_version_write()输出了SPS,PPS,和SEI信息。在输出每个NALU之前,需要调用x264_nal_start(),在输出NALU之后,需要调用x264_nal_end()。下文继续分析上述三个函数。

x264_sps_write()

x264_sps_write()用于输出SPS。该函数的定义位于encoder\set.c,如下所示。
  1. //输出SPS
  2. void x264_sps_write( bs_t *s, x264_sps_t *sps )
  3. {
  4. bs_realign( s );
  5. //型profile,8bit
  6. bs_write( s, 8, sps->i_profile_idc );
  7. bs_write1( s, sps->b_constraint_set0 );
  8. bs_write1( s, sps->b_constraint_set1 );
  9. bs_write1( s, sps->b_constraint_set2 );
  10. bs_write1( s, sps->b_constraint_set3 );
  11. bs_write( s, 4, 0 ); /* reserved */
  12. //级level,8bit
  13. bs_write( s, 8, sps->i_level_idc );
  14. //本SPS的 id号
  15. bs_write_ue( s, sps->i_id );
  16. if( sps->i_profile_idc >= PROFILE_HIGH )
  17. {
  18. //色度取样格式
  19. //0代表单色
  20. //1代表4:2:0
  21. //2代表4:2:2
  22. //3代表4:4:4
  23. bs_write_ue( s, sps->i_chroma_format_idc );
  24. if( sps->i_chroma_format_idc == CHROMA_444 )
  25. bs_write1( s, 0 ); // separate_colour_plane_flag
  26. //亮度
  27. //颜色位深=bit_depth_luma_minus8+8
  28. bs_write_ue( s, BIT_DEPTH-8 ); // bit_depth_luma_minus8
  29. //色度与亮度一样
  30. bs_write_ue( s, BIT_DEPTH-8 ); // bit_depth_chroma_minus8
  31. bs_write1( s, sps->b_qpprime_y_zero_transform_bypass );
  32. bs_write1( s, 0 ); // seq_scaling_matrix_present_flag
  33. }
  34. //log2_max_frame_num_minus4主要是为读取另一个句法元素frame_num服务的
  35. //frame_num 是最重要的句法元素之一
  36. //这个句法元素指明了frame_num的所能达到的最大值:
  37. //MaxFrameNum = 2^( log2_max_frame_num_minus4 + 4 )
  38. bs_write_ue( s, sps->i_log2_max_frame_num - 4 );
  39. //pic_order_cnt_type 指明了poc (picture order count) 的编码方法
  40. //poc标识图像的播放顺序。
  41. //由于H.264使用了B帧预测,使得图像的解码顺序并不一定等于播放顺序,但它们之间存在一定的映射关系
  42. //poc 可以由frame-num 通过映射关系计算得来,也可以索性由编码器显式地传送。
  43. //H.264 中一共定义了三种poc 的编码方法
  44. bs_write_ue( s, sps->i_poc_type );
  45. if( sps->i_poc_type == 0 )
  46. bs_write_ue( s, sps->i_log2_max_poc_lsb - 4 );
  47. //num_ref_frames 指定参考帧队列可能达到的最大长度,解码器依照这个句法元素的值开辟存储区,这个存储区用于存放已解码的参考帧,
  48. //H.264 规定最多可用16 个参考帧,因此最大值为16。
  49. bs_write_ue( s, sps->i_num_ref_frames );
  50. bs_write1( s, sps->b_gaps_in_frame_num_value_allowed );
  51. //pic_width_in_mbs_minus1加1后为图像宽(以宏块为单位):
  52. // PicWidthInMbs = pic_width_in_mbs_minus1 + 1
  53. //以像素为单位图像宽度(亮度):width=PicWidthInMbs*16
  54. bs_write_ue( s, sps->i_mb_width - 1 );
  55. //pic_height_in_map_units_minus1加1后指明图像高度(以宏块为单位)
  56. bs_write_ue( s, (sps->i_mb_height >> !sps->b_frame_mbs_only) - 1);
  57. bs_write1( s, sps->b_frame_mbs_only );
  58. if( !sps->b_frame_mbs_only )
  59. bs_write1( s, sps->b_mb_adaptive_frame_field );
  60. bs_write1( s, sps->b_direct8x8_inference );
  61. bs_write1( s, sps->b_crop );
  62. if( sps->b_crop )
  63. {
  64. int h_shift = sps->i_chroma_format_idc == CHROMA_420 || sps->i_chroma_format_idc == CHROMA_422;
  65. int v_shift = sps->i_chroma_format_idc == CHROMA_420;
  66. bs_write_ue( s, sps->crop.i_left >> h_shift );
  67. bs_write_ue( s, sps->crop.i_right >> h_shift );
  68. bs_write_ue( s, sps->crop.i_top >> v_shift );
  69. bs_write_ue( s, sps->crop.i_bottom >> v_shift );
  70. }
  71. bs_write1( s, sps->b_vui );
  72. if( sps->b_vui )
  73. {
  74. bs_write1( s, sps->vui.b_aspect_ratio_info_present );
  75. if( sps->vui.b_aspect_ratio_info_present )
  76. {
  77. int i;
  78. static const struct { uint8_t w, h, sar; } sar[] =
  79. {
  80. // aspect_ratio_idc = 0 -> unspecified
  81. { 1, 1, 1 }, { 12, 11, 2 }, { 10, 11, 3 }, { 16, 11, 4 },
  82. { 40, 33, 5 }, { 24, 11, 6 }, { 20, 11, 7 }, { 32, 11, 8 },
  83. { 80, 33, 9 }, { 18, 11, 10}, { 15, 11, 11}, { 64, 33, 12},
  84. {160, 99, 13}, { 4, 3, 14}, { 3, 2, 15}, { 2, 1, 16},
  85. // aspect_ratio_idc = [17..254] -> reserved
  86. { 0, 0, 255 }
  87. };
  88. for( i = 0; sar[i].sar != 255; i++ )
  89. {
  90. if( sar[i].w == sps->vui.i_sar_width &&
  91. sar[i].h == sps->vui.i_sar_height )
  92. break;
  93. }
  94. bs_write( s, 8, sar[i].sar );
  95. if( sar[i].sar == 255 ) /* aspect_ratio_idc (extended) */
  96. {
  97. bs_write( s, 16, sps->vui.i_sar_width );
  98. bs_write( s, 16, sps->vui.i_sar_height );
  99. }
  100. }
  101. bs_write1( s, sps->vui.b_overscan_info_present );
  102. if( sps->vui.b_overscan_info_present )
  103. bs_write1( s, sps->vui.b_overscan_info );
  104. bs_write1( s, sps->vui.b_signal_type_present );
  105. if( sps->vui.b_signal_type_present )
  106. {
  107. bs_write( s, 3, sps->vui.i_vidformat );
  108. bs_write1( s, sps->vui.b_fullrange );
  109. bs_write1( s, sps->vui.b_color_description_present );
  110. if( sps->vui.b_color_description_present )
  111. {
  112. bs_write( s, 8, sps->vui.i_colorprim );
  113. bs_write( s, 8, sps->vui.i_transfer );
  114. bs_write( s, 8, sps->vui.i_colmatrix );
  115. }
  116. }
  117. bs_write1( s, sps->vui.b_chroma_loc_info_present );
  118. if( sps->vui.b_chroma_loc_info_present )
  119. {
  120. bs_write_ue( s, sps->vui.i_chroma_loc_top );
  121. bs_write_ue( s, sps->vui.i_chroma_loc_bottom );
  122. }
  123. bs_write1( s, sps->vui.b_timing_info_present );
  124. if( sps->vui.b_timing_info_present )
  125. {
  126. bs_write32( s, sps->vui.i_num_units_in_tick );
  127. bs_write32( s, sps->vui.i_time_scale );
  128. bs_write1( s, sps->vui.b_fixed_frame_rate );
  129. }
  130. bs_write1( s, sps->vui.b_nal_hrd_parameters_present );
  131. if( sps->vui.b_nal_hrd_parameters_present )
  132. {
  133. bs_write_ue( s, sps->vui.hrd.i_cpb_cnt - 1 );
  134. bs_write( s, 4, sps->vui.hrd.i_bit_rate_scale );
  135. bs_write( s, 4, sps->vui.hrd.i_cpb_size_scale );
  136. bs_write_ue( s, sps->vui.hrd.i_bit_rate_value - 1 );
  137. bs_write_ue( s, sps->vui.hrd.i_cpb_size_value - 1 );
  138. bs_write1( s, sps->vui.hrd.b_cbr_hrd );
  139. bs_write( s, 5, sps->vui.hrd.i_initial_cpb_removal_delay_length - 1 );
  140. bs_write( s, 5, sps->vui.hrd.i_cpb_removal_delay_length - 1 );
  141. bs_write( s, 5, sps->vui.hrd.i_dpb_output_delay_length - 1 );
  142. bs_write( s, 5, sps->vui.hrd.i_time_offset_length );
  143. }
  144. bs_write1( s, sps->vui.b_vcl_hrd_parameters_present );
  145. if( sps->vui.b_nal_hrd_parameters_present || sps->vui.b_vcl_hrd_parameters_present )
  146. bs_write1( s, 0 ); /* low_delay_hrd_flag */
  147. bs_write1( s, sps->vui.b_pic_struct_present );
  148. bs_write1( s, sps->vui.b_bitstream_restriction );
  149. if( sps->vui.b_bitstream_restriction )
  150. {
  151. bs_write1( s, sps->vui.b_motion_vectors_over_pic_boundaries );
  152. bs_write_ue( s, sps->vui.i_max_bytes_per_pic_denom );
  153. bs_write_ue( s, sps->vui.i_max_bits_per_mb_denom );
  154. bs_write_ue( s, sps->vui.i_log2_max_mv_length_horizontal );
  155. bs_write_ue( s, sps->vui.i_log2_max_mv_length_vertical );
  156. bs_write_ue( s, sps->vui.i_num_reorder_frames );
  157. bs_write_ue( s, sps->vui.i_max_dec_frame_buffering );
  158. }
  159. }
  160. //RBSP拖尾
  161. //无论比特流当前位置是否字节对齐 , 都向其中写入一个比特1及若干个(0~7个)比特0 , 使其字节对齐
  162. bs_rbsp_trailing( s );
  163. bs_flush( s );
  164. }

可以看出x264_sps_write()将x264_sps_t结构体中的信息输出出来形成了一个NALU。有关SPS相关的知识可以参考《H.264标准》。

x264_pps_write()

x264_pps_write()用于输出PPS。该函数的定义位于encoder\set.c,如下所示。
  1. //输出PPS
  2. void x264_pps_write( bs_t *s, x264_sps_t *sps, x264_pps_t *pps )
  3. {
  4. bs_realign( s );
  5. //PPS的ID
  6. bs_write_ue( s, pps->i_id );
  7. //该PPS引用的SPS的ID
  8. bs_write_ue( s, pps->i_sps_id );
  9. //entropy_coding_mode_flag
  10. //0表示熵编码使用CAVLC,1表示熵编码使用CABAC
  11. bs_write1( s, pps->b_cabac );
  12. bs_write1( s, pps->b_pic_order );
  13. bs_write_ue( s, pps->i_num_slice_groups - 1 );
  14. bs_write_ue( s, pps->i_num_ref_idx_l0_default_active - 1 );
  15. bs_write_ue( s, pps->i_num_ref_idx_l1_default_active - 1 );
  16. //P Slice 是否使用加权预测?
  17. bs_write1( s, pps->b_weighted_pred );
  18. //B Slice 是否使用加权预测?
  19. bs_write( s, 2, pps->b_weighted_bipred );
  20. //pic_init_qp_minus26加26后用以指明亮度分量的QP的初始值。
  21. bs_write_se( s, pps->i_pic_init_qp - 26 - QP_BD_OFFSET );
  22. bs_write_se( s, pps->i_pic_init_qs - 26 - QP_BD_OFFSET );
  23. bs_write_se( s, pps->i_chroma_qp_index_offset );
  24. bs_write1( s, pps->b_deblocking_filter_control );
  25. bs_write1( s, pps->b_constrained_intra_pred );
  26. bs_write1( s, pps->b_redundant_pic_cnt );
  27. if( pps->b_transform_8x8_mode || pps->i_cqm_preset != X264_CQM_FLAT )
  28. {
  29. bs_write1( s, pps->b_transform_8x8_mode );
  30. bs_write1( s, (pps->i_cqm_preset != X264_CQM_FLAT) );
  31. if( pps->i_cqm_preset != X264_CQM_FLAT )
  32. {
  33. scaling_list_write( s, pps, CQM_4IY );
  34. scaling_list_write( s, pps, CQM_4IC );
  35. bs_write1( s, 0 ); // Cr = Cb
  36. scaling_list_write( s, pps, CQM_4PY );
  37. scaling_list_write( s, pps, CQM_4PC );
  38. bs_write1( s, 0 ); // Cr = Cb
  39. if( pps->b_transform_8x8_mode )
  40. {
  41. if( sps->i_chroma_format_idc == CHROMA_444 )
  42. {
  43. scaling_list_write( s, pps, CQM_8IY+4 );
  44. scaling_list_write( s, pps, CQM_8IC+4 );
  45. bs_write1( s, 0 ); // Cr = Cb
  46. scaling_list_write( s, pps, CQM_8PY+4 );
  47. scaling_list_write( s, pps, CQM_8PC+4 );
  48. bs_write1( s, 0 ); // Cr = Cb
  49. }
  50. else
  51. {
  52. scaling_list_write( s, pps, CQM_8IY+4 );
  53. scaling_list_write( s, pps, CQM_8PY+4 );
  54. }
  55. }
  56. }
  57. bs_write_se( s, pps->i_chroma_qp_index_offset );
  58. }
  59. //RBSP拖尾
  60. //无论比特流当前位置是否字节对齐 , 都向其中写入一个比特1及若干个(0~7个)比特0 , 使其字节对齐
  61. bs_rbsp_trailing( s );
  62. bs_flush( s );
  63. }

可以看出x264_pps_write()将x264_pps_t结构体中的信息输出出来形成了一个NALU。

x264_sei_version_write()

x264_sei_version_write()用于输出SEI。SEI中一般存储了H.264中的一些附加信息,例如下图中红色方框中的文字就是x264存储在SEI中的中的信息。
 
x264_sei_version_write()的定义位于encoder\set.c,如下所示。
  1. //输出SEI(其中包含了配置信息)
  2. int x264_sei_version_write( x264_t *h, bs_t *s )
  3. {
  4. // random ID number generated according to ISO-11578
  5. static const uint8_t uuid[16] =
  6. {
  7. 0xdc, 0x45, 0xe9, 0xbd, 0xe6, 0xd9, 0x48, 0xb7,
  8. 0x96, 0x2c, 0xd8, 0x20, 0xd9, 0x23, 0xee, 0xef
  9. };
  10. //把设置信息转换为字符串
  11. char *opts = x264_param2string( &h->param, 0 );
  12. char *payload;
  13. int length;
  14. if( !opts )
  15. return -1;
  16. CHECKED_MALLOC( payload, 200 + strlen( opts ) );
  17. memcpy( payload, uuid, 16 );
  18. //配置信息的内容
  19. //opts字符串内容还是挺多的
  20. sprintf( payload+16, "x264 - core %d%s - H.264/MPEG-4 AVC codec - "
  21. "Copy%s 2003-2014 - http://www.videolan.org/x264.html - options: %s",
  22. X264_BUILD, X264_VERSION, HAVE_GPL?"left":"right", opts );
  23. length = strlen(payload)+1;
  24. //输出SEI
  25. //数据类型为USER_DATA_UNREGISTERED
  26. x264_sei_write( s, (uint8_t *)payload, length, SEI_USER_DATA_UNREGISTERED );
  27. x264_free( opts );
  28. x264_free( payload );
  29. return 0;
  30. fail:
  31. x264_free( opts );
  32. return -1;
  33. }

从源代码可以看出,x264_sei_version_write()首先调用了x264_param2string()将当前的配置参数保存到字符串opts[]中,然后调用sprintf()结合opt[]生成完整的SEI信息,最后调用x264_sei_write()输出SEI信息。在这个过程中涉及到一个libx264的API函数x264_param2string()。

x264_param2string()

x264_param2string()用于将当前设置转换为字符串输出出来。该函数的声明如下。
  1. /* x264_param2string: return a (malloced) string containing most of
  2. * the encoding options */
  3. char *x264_param2string( x264_param_t *p, int b_res );
x264_param2string()的定义位于common\common.c,如下所示。
  1. /****************************************************************************
  2. * x264_param2string:
  3. ****************************************************************************/
  4. //把设置信息转换为字符串
  5. char *x264_param2string( x264_param_t *p, int b_res )
  6. {
  7. int len = 1000;
  8. char *buf, *s;
  9. if( p->rc.psz_zones )
  10. len += strlen(p->rc.psz_zones);
  11. //1000字节?
  12. buf = s = x264_malloc( len );
  13. if( !buf )
  14. return NULL;
  15. if( b_res )
  16. {
  17. s += sprintf( s, "%dx%d ", p->i_width, p->i_height );
  18. s += sprintf( s, "fps=%u/%u ", p->i_fps_num, p->i_fps_den );
  19. s += sprintf( s, "timebase=%u/%u ", p->i_timebase_num, p->i_timebase_den );
  20. s += sprintf( s, "bitdepth=%d ", BIT_DEPTH );
  21. }
  22. if( p->b_opencl )
  23. s += sprintf( s, "opencl=%d ", p->b_opencl );
  24. s += sprintf( s, "cabac=%d", p->b_cabac );
  25. s += sprintf( s, " ref=%d", p->i_frame_reference );
  26. s += sprintf( s, " deblock=%d:%d:%d", p->b_deblocking_filter,
  27. p->i_deblocking_filter_alphac0, p->i_deblocking_filter_beta );
  28. s += sprintf( s, " analyse=%#x:%#x", p->analyse.intra, p->analyse.inter );
  29. s += sprintf( s, " me=%s", x264_motion_est_names[ p->analyse.i_me_method ] );
  30. s += sprintf( s, " subme=%d", p->analyse.i_subpel_refine );
  31. s += sprintf( s, " psy=%d", p->analyse.b_psy );
  32. if( p->analyse.b_psy )
  33. s += sprintf( s, " psy_rd=%.2f:%.2f", p->analyse.f_psy_rd, p->analyse.f_psy_trellis );
  34. s += sprintf( s, " mixed_ref=%d", p->analyse.b_mixed_references );
  35. s += sprintf( s, " me_range=%d", p->analyse.i_me_range );
  36. s += sprintf( s, " chroma_me=%d", p->analyse.b_chroma_me );
  37. s += sprintf( s, " trellis=%d", p->analyse.i_trellis );
  38. s += sprintf( s, " 8x8dct=%d", p->analyse.b_transform_8x8 );
  39. s += sprintf( s, " cqm=%d", p->i_cqm_preset );
  40. s += sprintf( s, " deadzone=%d,%d", p->analyse.i_luma_deadzone[0], p->analyse.i_luma_deadzone[1] );
  41. s += sprintf( s, " fast_pskip=%d", p->analyse.b_fast_pskip );
  42. s += sprintf( s, " chroma_qp_offset=%d", p->analyse.i_chroma_qp_offset );
  43. s += sprintf( s, " threads=%d", p->i_threads );
  44. s += sprintf( s, " lookahead_threads=%d", p->i_lookahead_threads );
  45. s += sprintf( s, " sliced_threads=%d", p->b_sliced_threads );
  46. if( p->i_slice_count )
  47. s += sprintf( s, " slices=%d", p->i_slice_count );
  48. if( p->i_slice_count_max )
  49. s += sprintf( s, " slices_max=%d", p->i_slice_count_max );
  50. if( p->i_slice_max_size )
  51. s += sprintf( s, " slice_max_size=%d", p->i_slice_max_size );
  52. if( p->i_slice_max_mbs )
  53. s += sprintf( s, " slice_max_mbs=%d", p->i_slice_max_mbs );
  54. if( p->i_slice_min_mbs )
  55. s += sprintf( s, " slice_min_mbs=%d", p->i_slice_min_mbs );
  56. s += sprintf( s, " nr=%d", p->analyse.i_noise_reduction );
  57. s += sprintf( s, " decimate=%d", p->analyse.b_dct_decimate );
  58. s += sprintf( s, " interlaced=%s", p->b_interlaced ? p->b_tff ? "tff" : "bff" : p->b_fake_interlaced ? "fake" : "0" );
  59. s += sprintf( s, " bluray_compat=%d", p->b_bluray_compat );
  60. if( p->b_stitchable )
  61. s += sprintf( s, " stitchable=%d", p->b_stitchable );
  62. s += sprintf( s, " constrained_intra=%d", p->b_constrained_intra );
  63. s += sprintf( s, " bframes=%d", p->i_bframe );
  64. if( p->i_bframe )
  65. {
  66. s += sprintf( s, " b_pyramid=%d b_adapt=%d b_bias=%d direct=%d weightb=%d open_gop=%d",
  67. p->i_bframe_pyramid, p->i_bframe_adaptive, p->i_bframe_bias,
  68. p->analyse.i_direct_mv_pred, p->analyse.b_weighted_bipred, p->b_open_gop );
  69. }
  70. s += sprintf( s, " weightp=%d", p->analyse.i_weighted_pred > 0 ? p->analyse.i_weighted_pred : 0 );
  71. if( p->i_keyint_max == X264_KEYINT_MAX_INFINITE )
  72. s += sprintf( s, " keyint=infinite" );
  73. else
  74. s += sprintf( s, " keyint=%d", p->i_keyint_max );
  75. s += sprintf( s, " keyint_min=%d scenecut=%d intra_refresh=%d",
  76. p->i_keyint_min, p->i_scenecut_threshold, p->b_intra_refresh );
  77. if( p->rc.b_mb_tree || p->rc.i_vbv_buffer_size )
  78. s += sprintf( s, " rc_lookahead=%d", p->rc.i_lookahead );
  79. s += sprintf( s, " rc=%s mbtree=%d", p->rc.i_rc_method == X264_RC_ABR ?
  80. ( p->rc.b_stat_read ? "2pass" : p->rc.i_vbv_max_bitrate == p->rc.i_bitrate ? "cbr" : "abr" )
  81. : p->rc.i_rc_method == X264_RC_CRF ? "crf" : "cqp", p->rc.b_mb_tree );
  82. if( p->rc.i_rc_method == X264_RC_ABR || p->rc.i_rc_method == X264_RC_CRF )
  83. {
  84. if( p->rc.i_rc_method == X264_RC_CRF )
  85. s += sprintf( s, " crf=%.1f", p->rc.f_rf_constant );
  86. else
  87. s += sprintf( s, " bitrate=%d ratetol=%.1f",
  88. p->rc.i_bitrate, p->rc.f_rate_tolerance );
  89. s += sprintf( s, " qcomp=%.2f qpmin=%d qpmax=%d qpstep=%d",
  90. p->rc.f_qcompress, p->rc.i_qp_min, p->rc.i_qp_max, p->rc.i_qp_step );
  91. if( p->rc.b_stat_read )
  92. s += sprintf( s, " cplxblur=%.1f qblur=%.1f",
  93. p->rc.f_complexity_blur, p->rc.f_qblur );
  94. if( p->rc.i_vbv_buffer_size )
  95. {
  96. s += sprintf( s, " vbv_maxrate=%d vbv_bufsize=%d",
  97. p->rc.i_vbv_max_bitrate, p->rc.i_vbv_buffer_size );
  98. if( p->rc.i_rc_method == X264_RC_CRF )
  99. s += sprintf( s, " crf_max=%.1f", p->rc.f_rf_constant_max );
  100. }
  101. }
  102. else if( p->rc.i_rc_method == X264_RC_CQP )
  103. s += sprintf( s, " qp=%d", p->rc.i_qp_constant );
  104. if( p->rc.i_vbv_buffer_size )
  105. s += sprintf( s, " nal_hrd=%s filler=%d", x264_nal_hrd_names[p->i_nal_hrd], p->rc.b_filler );
  106. if( p->crop_rect.i_left | p->crop_rect.i_top | p->crop_rect.i_right | p->crop_rect.i_bottom )
  107. s += sprintf( s, " crop_rect=%u,%u,%u,%u", p->crop_rect.i_left, p->crop_rect.i_top,
  108. p->crop_rect.i_right, p->crop_rect.i_bottom );
  109. if( p->i_frame_packing >= 0 )
  110. s += sprintf( s, " frame-packing=%d", p->i_frame_packing );
  111. if( !(p->rc.i_rc_method == X264_RC_CQP && p->rc.i_qp_constant == 0) )
  112. {
  113. s += sprintf( s, " ip_ratio=%.2f", p->rc.f_ip_factor );
  114. if( p->i_bframe && !p->rc.b_mb_tree )
  115. s += sprintf( s, " pb_ratio=%.2f", p->rc.f_pb_factor );
  116. s += sprintf( s, " aq=%d", p->rc.i_aq_mode );
  117. if( p->rc.i_aq_mode )
  118. s += sprintf( s, ":%.2f", p->rc.f_aq_strength );
  119. if( p->rc.psz_zones )
  120. s += sprintf( s, " zones=%s", p->rc.psz_zones );
  121. else if( p->rc.i_zones )
  122. s += sprintf( s, " zones" );
  123. }
  124. return buf;
  125. }

可以看出x264_param2string()几乎遍历了libx264的所有设置选项,使用“s += sprintf()”的形式将它们连接成一个很长的字符串,并最终将该字符串返回。



x264_encoder_close()

x264_encoder_close()是libx264的一个API函数。该函数用于关闭编码器,同时输出一些统计信息。该函数执行的时候输出的统计信息如下图所示。

 
x264_encoder_close()的声明如下所示。
  1. /* x264_encoder_close:
  2. * close an encoder handler */
  3. void x264_encoder_close ( x264_t * );
x264_encoder_close()的定义位于encoder\encoder.c,如下所示。
  1. /****************************************************************************
  2. * x264_encoder_close:
  3. * 注释和处理:雷霄骅
  4. * http://blog.csdn.net/leixiaohua1020
  5. * leixiaohua1020@126.com
  6. ****************************************************************************/
  7. void x264_encoder_close ( x264_t *h )
  8. {
  9. int64_t i_yuv_size = FRAME_SIZE( h->param.i_width * h->param.i_height );
  10. int64_t i_mb_count_size[2][7] = {{0}};
  11. char buf[200];
  12. int b_print_pcm = h->stat.i_mb_count[SLICE_TYPE_I][I_PCM]
  13. || h->stat.i_mb_count[SLICE_TYPE_P][I_PCM]
  14. || h->stat.i_mb_count[SLICE_TYPE_B][I_PCM];
  15. x264_lookahead_delete( h );
  16. #if HAVE_OPENCL
  17. x264_opencl_lookahead_delete( h );
  18. x264_opencl_function_t *ocl = h->opencl.ocl;
  19. #endif
  20. if( h->param.b_sliced_threads )
  21. x264_threadpool_wait_all( h );
  22. if( h->param.i_threads > 1 )
  23. x264_threadpool_delete( h->threadpool );
  24. if( h->param.i_lookahead_threads > 1 )
  25. x264_threadpool_delete( h->lookaheadpool );
  26. if( h->i_thread_frames > 1 )
  27. {
  28. for( int i = 0; i < h->i_thread_frames; i++ )
  29. if( h->thread[i]->b_thread_active )
  30. {
  31. assert( h->thread[i]->fenc->i_reference_count == 1 );
  32. x264_frame_delete( h->thread[i]->fenc );
  33. }
  34. x264_t *thread_prev = h->thread[h->i_thread_phase];
  35. x264_thread_sync_ratecontrol( h, thread_prev, h );
  36. x264_thread_sync_ratecontrol( thread_prev, thread_prev, h );
  37. h->i_frame = thread_prev->i_frame + 1 - h->i_thread_frames;
  38. }
  39. h->i_frame++;
  40. /*
  41. * x264控制台输出示例
  42. *
  43. * x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
  44. * x264 [info]: profile High, level 2.1
  45. * x264 [info]: frame I:2 Avg QP:20.51 size: 20184 PSNR Mean Y:45.32 U:47.54 V:47.62 Avg:45.94 Global:45.52
  46. * x264 [info]: frame P:33 Avg QP:23.08 size: 3230 PSNR Mean Y:43.23 U:47.06 V:46.87 Avg:44.15 Global:44.00
  47. * x264 [info]: frame B:65 Avg QP:27.87 size: 352 PSNR Mean Y:42.76 U:47.21 V:47.05 Avg:43.79 Global:43.65
  48. * x264 [info]: consecutive B-frames: 3.0% 10.0% 63.0% 24.0%
  49. * x264 [info]: mb I I16..4: 15.3% 37.5% 47.3%
  50. * x264 [info]: mb P I16..4: 0.6% 0.4% 0.2% P16..4: 34.6% 21.2% 12.7% 0.0% 0.0% skip:30.4%
  51. * x264 [info]: mb B I16..4: 0.0% 0.0% 0.0% B16..8: 21.2% 4.1% 0.7% direct: 0.8% skip:73.1% L0:28.7% L1:53.0% BI:18.3%
  52. * x264 [info]: 8x8 transform intra:37.1% inter:51.0%
  53. * x264 [info]: coded y,uvDC,uvAC intra: 74.1% 83.3% 58.9% inter: 10.4% 6.6% 0.4%
  54. * x264 [info]: i16 v,h,dc,p: 21% 25% 7% 48%
  55. * x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 23% 13% 6% 5% 5% 6% 8% 10%
  56. * x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 20% 9% 7% 7% 8% 8% 7% 12%
  57. * x264 [info]: i8c dc,h,v,p: 43% 20% 27% 10%
  58. * x264 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
  59. * x264 [info]: ref P L0: 62.5% 19.7% 13.8% 4.0%
  60. * x264 [info]: ref B L0: 88.8% 9.4% 1.9%
  61. * x264 [info]: ref B L1: 92.6% 7.4%
  62. * x264 [info]: PSNR Mean Y:42.967 U:47.163 V:47.000 Avg:43.950 Global:43.796 kb/s:339.67
  63. *
  64. * encoded 100 frames, 178.25 fps, 339.67 kb/s
  65. *
  66. */
  67. /* Slices used and PSNR */
  68. /* 示例
  69. * x264 [info]: frame I:2 Avg QP:20.51 size: 20184 PSNR Mean Y:45.32 U:47.54 V:47.62 Avg:45.94 Global:45.52
  70. * x264 [info]: frame P:33 Avg QP:23.08 size: 3230 PSNR Mean Y:43.23 U:47.06 V:46.87 Avg:44.15 Global:44.00
  71. * x264 [info]: frame B:65 Avg QP:27.87 size: 352 PSNR Mean Y:42.76 U:47.21 V:47.05 Avg:43.79 Global:43.65
  72. */
  73. for( int i = 0; i < 3; i++ )
  74. {
  75. static const uint8_t slice_order[] = { SLICE_TYPE_I, SLICE_TYPE_P, SLICE_TYPE_B };
  76. int i_slice = slice_order[i];
  77. if( h->stat.i_frame_count[i_slice] > 0 )
  78. {
  79. int i_count = h->stat.i_frame_count[i_slice];
  80. double dur = h->stat.f_frame_duration[i_slice];
  81. if( h->param.analyse.b_psnr )
  82. {
  83. //输出统计信息-包含PSNR
  84. //注意PSNR都是通过SSD换算过来的,换算方法就是调用x264_psnr()方法
  85. x264_log( h, X264_LOG_INFO,
  86. "frame %c:%-5d Avg QP:%5.2f size:%6.0f PSNR Mean Y:%5.2f U:%5.2f V:%5.2f Avg:%5.2f Global:%5.2f\n",
  87. slice_type_to_char[i_slice],
  88. i_count,
  89. h->stat.f_frame_qp[i_slice] / i_count,
  90. (double)h->stat.i_frame_size[i_slice] / i_count,
  91. h->stat.f_psnr_mean_y[i_slice] / dur, h->stat.f_psnr_mean_u[i_slice] / dur, h->stat.f_psnr_mean_v[i_slice] / dur,
  92. h->stat.f_psnr_average[i_slice] / dur,
  93. x264_psnr( h->stat.f_ssd_global[i_slice], dur * i_yuv_size ) );
  94. }
  95. else
  96. {
  97. //输出统计信息-不包含PSNR
  98. x264_log( h, X264_LOG_INFO,
  99. "frame %c:%-5d Avg QP:%5.2f size:%6.0f\n",
  100. slice_type_to_char[i_slice],
  101. i_count,
  102. h->stat.f_frame_qp[i_slice] / i_count,
  103. (double)h->stat.i_frame_size[i_slice] / i_count );
  104. }
  105. }
  106. }
  107. /* 示例
  108. * x264 [info]: consecutive B-frames: 3.0% 10.0% 63.0% 24.0%
  109. *
  110. */
  111. if( h->param.i_bframe && h->stat.i_frame_count[SLICE_TYPE_B] )
  112. {
  113. //B帧相关信息
  114. char *p = buf;
  115. int den = 0;
  116. // weight by number of frames (including the I/P-frames) that are in a sequence of N B-frames
  117. for( int i = 0; i <= h->param.i_bframe; i++ )
  118. den += (i+1) * h->stat.i_consecutive_bframes[i];
  119. for( int i = 0; i <= h->param.i_bframe; i++ )
  120. p += sprintf( p, " %4.1f%%", 100. * (i+1) * h->stat.i_consecutive_bframes[i] / den );
  121. x264_log( h, X264_LOG_INFO, "consecutive B-frames:%s\n", buf );
  122. }
  123. for( int i_type = 0; i_type < 2; i_type++ )
  124. for( int i = 0; i < X264_PARTTYPE_MAX; i++ )
  125. {
  126. if( i == D_DIRECT_8x8 ) continue; /* direct is counted as its own type */
  127. i_mb_count_size[i_type][x264_mb_partition_pixel_table[i]] += h->stat.i_mb_partition[i_type][i];
  128. }
  129. /* MB types used */
  130. /* 示例
  131. * x264 [info]: mb I I16..4: 15.3% 37.5% 47.3%
  132. * x264 [info]: mb P I16..4: 0.6% 0.4% 0.2% P16..4: 34.6% 21.2% 12.7% 0.0% 0.0% skip:30.4%
  133. * x264 [info]: mb B I16..4: 0.0% 0.0% 0.0% B16..8: 21.2% 4.1% 0.7% direct: 0.8% skip:73.1% L0:28.7% L1:53.0% BI:18.3%
  134. */
  135. if( h->stat.i_frame_count[SLICE_TYPE_I] > 0 )
  136. {
  137. int64_t *i_mb_count = h->stat.i_mb_count[SLICE_TYPE_I];
  138. double i_count = h->stat.i_frame_count[SLICE_TYPE_I] * h->mb.i_mb_count / 100.0;
  139. //Intra宏块信息-存于buf
  140. //从左到右3个信息,依次为I16x16,I8x8,I4x4
  141. x264_print_intra( i_mb_count, i_count, b_print_pcm, buf );
  142. x264_log( h, X264_LOG_INFO, "mb I %s\n", buf );
  143. }
  144. if( h->stat.i_frame_count[SLICE_TYPE_P] > 0 )
  145. {
  146. int64_t *i_mb_count = h->stat.i_mb_count[SLICE_TYPE_P];
  147. double i_count = h->stat.i_frame_count[SLICE_TYPE_P] * h->mb.i_mb_count / 100.0;
  148. int64_t *i_mb_size = i_mb_count_size[SLICE_TYPE_P];
  149. //Intra宏块信息-存于buf
  150. x264_print_intra( i_mb_count, i_count, b_print_pcm, buf );
  151. //Intra宏块信息-放在最前面
  152. //后面添加P宏块信息
  153. //从左到右6个信息,依次为P16x16, P16x8+P8x16, P8x8, P8x4+P4x8, P4x4, PSKIP
  154. x264_log( h, X264_LOG_INFO,
  155. "mb P %s P16..4: %4.1f%% %4.1f%% %4.1f%% %4.1f%% %4.1f%% skip:%4.1f%%\n",
  156. buf,
  157. i_mb_size[PIXEL_16x16] / (i_count*4),
  158. (i_mb_size[PIXEL_16x8] + i_mb_size[PIXEL_8x16]) / (i_count*4),
  159. i_mb_size[PIXEL_8x8] / (i_count*4),
  160. (i_mb_size[PIXEL_8x4] + i_mb_size[PIXEL_4x8]) / (i_count*4),
  161. i_mb_size[PIXEL_4x4] / (i_count*4),
  162. i_mb_count[P_SKIP] / i_count );
  163. }
  164. if( h->stat.i_frame_count[SLICE_TYPE_B] > 0 )
  165. {
  166. int64_t *i_mb_count = h->stat.i_mb_count[SLICE_TYPE_B];
  167. double i_count = h->stat.i_frame_count[SLICE_TYPE_B] * h->mb.i_mb_count / 100.0;
  168. double i_mb_list_count;
  169. int64_t *i_mb_size = i_mb_count_size[SLICE_TYPE_B];
  170. int64_t list_count[3] = {0}; /* 0 == L0, 1 == L1, 2 == BI */
  171. //Intra宏块信息
  172. x264_print_intra( i_mb_count, i_count, b_print_pcm, buf );
  173. for( int i = 0; i < X264_PARTTYPE_MAX; i++ )
  174. for( int j = 0; j < 2; j++ )
  175. {
  176. int l0 = x264_mb_type_list_table[i][0][j];
  177. int l1 = x264_mb_type_list_table[i][1][j];
  178. if( l0 || l1 )
  179. list_count[l1+l0*l1] += h->stat.i_mb_count[SLICE_TYPE_B][i] * 2;
  180. }
  181. list_count[0] += h->stat.i_mb_partition[SLICE_TYPE_B][D_L0_8x8];
  182. list_count[1] += h->stat.i_mb_partition[SLICE_TYPE_B][D_L1_8x8];
  183. list_count[2] += h->stat.i_mb_partition[SLICE_TYPE_B][D_BI_8x8];
  184. i_mb_count[B_DIRECT] += (h->stat.i_mb_partition[SLICE_TYPE_B][D_DIRECT_8x8]+2)/4;
  185. i_mb_list_count = (list_count[0] + list_count[1] + list_count[2]) / 100.0;
  186. //Intra宏块信息-放在最前面
  187. //后面添加B宏块信息
  188. //从左到右5个信息,依次为B16x16, B16x8+B8x16, B8x8, BDIRECT, BSKIP
  189. //
  190. //SKIP和DIRECT区别
  191. //P_SKIP的CBP为0,无像素残差,无运动矢量残
  192. //B_SKIP宏块的模式为B_DIRECT且CBP为0,无像素残差,无运动矢量残
  193. //B_DIRECT的CBP不为0,有像素残差,无运动矢量残
  194. sprintf( buf + strlen(buf), " B16..8: %4.1f%% %4.1f%% %4.1f%% direct:%4.1f%% skip:%4.1f%%",
  195. i_mb_size[PIXEL_16x16] / (i_count*4),
  196. (i_mb_size[PIXEL_16x8] + i_mb_size[PIXEL_8x16]) / (i_count*4),
  197. i_mb_size[PIXEL_8x8] / (i_count*4),
  198. i_mb_count[B_DIRECT] / i_count,
  199. i_mb_count[B_SKIP] / i_count );
  200. if( i_mb_list_count != 0 )
  201. sprintf( buf + strlen(buf), " L0:%4.1f%% L1:%4.1f%% BI:%4.1f%%",
  202. list_count[0] / i_mb_list_count,
  203. list_count[1] / i_mb_list_count,
  204. list_count[2] / i_mb_list_count );
  205. x264_log( h, X264_LOG_INFO, "mb B %s\n", buf );
  206. }
  207. //码率控制信息
  208. /* 示例
  209. * x264 [info]: final ratefactor: 20.01
  210. */
  211. x264_ratecontrol_summary( h );
  212. if( h->stat.i_frame_count[SLICE_TYPE_I] + h->stat.i_frame_count[SLICE_TYPE_P] + h->stat.i_frame_count[SLICE_TYPE_B] > 0 )
  213. {
  214. #define SUM3(p) (p[SLICE_TYPE_I] + p[SLICE_TYPE_P] + p[SLICE_TYPE_B])
  215. #define SUM3b(p,o) (p[SLICE_TYPE_I][o] + p[SLICE_TYPE_P][o] + p[SLICE_TYPE_B][o])
  216. int64_t i_i8x8 = SUM3b( h->stat.i_mb_count, I_8x8 );
  217. int64_t i_intra = i_i8x8 + SUM3b( h->stat.i_mb_count, I_4x4 )
  218. + SUM3b( h->stat.i_mb_count, I_16x16 );
  219. int64_t i_all_intra = i_intra + SUM3b( h->stat.i_mb_count, I_PCM);
  220. int64_t i_skip = SUM3b( h->stat.i_mb_count, P_SKIP )
  221. + SUM3b( h->stat.i_mb_count, B_SKIP );
  222. const int i_count = h->stat.i_frame_count[SLICE_TYPE_I] +
  223. h->stat.i_frame_count[SLICE_TYPE_P] +
  224. h->stat.i_frame_count[SLICE_TYPE_B];
  225. int64_t i_mb_count = (int64_t)i_count * h->mb.i_mb_count;
  226. int64_t i_inter = i_mb_count - i_skip - i_intra;
  227. const double duration = h->stat.f_frame_duration[SLICE_TYPE_I] +
  228. h->stat.f_frame_duration[SLICE_TYPE_P] +
  229. h->stat.f_frame_duration[SLICE_TYPE_B];
  230. float f_bitrate = SUM3(h->stat.i_frame_size) / duration / 125;
  231. //隔行
  232. if( PARAM_INTERLACED )
  233. {
  234. char *fieldstats = buf;
  235. fieldstats[0] = 0;
  236. if( i_inter )
  237. fieldstats += sprintf( fieldstats, " inter:%.1f%%", h->stat.i_mb_field[1] * 100.0 / i_inter );
  238. if( i_skip )
  239. fieldstats += sprintf( fieldstats, " skip:%.1f%%", h->stat.i_mb_field[2] * 100.0 / i_skip );
  240. x264_log( h, X264_LOG_INFO, "field mbs: intra: %.1f%%%s\n",
  241. h->stat.i_mb_field[0] * 100.0 / i_intra, buf );
  242. }
  243. //8x8DCT信息
  244. if( h->pps->b_transform_8x8_mode )
  245. {
  246. buf[0] = 0;
  247. if( h->stat.i_mb_count_8x8dct[0] )
  248. sprintf( buf, " inter:%.1f%%", 100. * h->stat.i_mb_count_8x8dct[1] / h->stat.i_mb_count_8x8dct[0] );
  249. x264_log( h, X264_LOG_INFO, "8x8 transform intra:%.1f%%%s\n", 100. * i_i8x8 / i_intra, buf );
  250. }
  251. if( (h->param.analyse.i_direct_mv_pred == X264_DIRECT_PRED_AUTO ||
  252. (h->stat.i_direct_frames[0] && h->stat.i_direct_frames[1]))
  253. && h->stat.i_frame_count[SLICE_TYPE_B] )
  254. {
  255. x264_log( h, X264_LOG_INFO, "direct mvs spatial:%.1f%% temporal:%.1f%%\n",
  256. h->stat.i_direct_frames[1] * 100. / h->stat.i_frame_count[SLICE_TYPE_B],
  257. h->stat.i_direct_frames[0] * 100. / h->stat.i_frame_count[SLICE_TYPE_B] );
  258. }
  259. buf[0] = 0;
  260. int csize = CHROMA444 ? 4 : 1;
  261. if( i_mb_count != i_all_intra )
  262. sprintf( buf, " inter: %.1f%% %.1f%% %.1f%%",
  263. h->stat.i_mb_cbp[1] * 100.0 / ((i_mb_count - i_all_intra)*4),
  264. h->stat.i_mb_cbp[3] * 100.0 / ((i_mb_count - i_all_intra)*csize),
  265. h->stat.i_mb_cbp[5] * 100.0 / ((i_mb_count - i_all_intra)*csize) );
  266. /*
  267. * 示例
  268. * x264 [info]: coded y,uvDC,uvAC intra: 74.1% 83.3% 58.9% inter: 10.4% 6.6% 0.4%
  269. */
  270. x264_log( h, X264_LOG_INFO, "coded y,%s,%s intra: %.1f%% %.1f%% %.1f%%%s\n",
  271. CHROMA444?"u":"uvDC", CHROMA444?"v":"uvAC",
  272. h->stat.i_mb_cbp[0] * 100.0 / (i_all_intra*4),
  273. h->stat.i_mb_cbp[2] * 100.0 / (i_all_intra*csize),
  274. h->stat.i_mb_cbp[4] * 100.0 / (i_all_intra*csize), buf );
  275. /*
  276. * 帧内预测信息
  277. * 从上到下分别为I16x16,I8x8,I4x4
  278. * 从左到右顺序为Vertical, Horizontal, DC, Plane ....
  279. *
  280. * 示例
  281. *
  282. * x264 [info]: i16 v,h,dc,p: 21% 25% 7% 48%
  283. * x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 23% 13% 6% 5% 5% 6% 8% 10%
  284. * x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 20% 9% 7% 7% 8% 8% 7% 12%
  285. * x264 [info]: i8c dc,h,v,p: 43% 20% 27% 10%
  286. *
  287. */
  288. int64_t fixed_pred_modes[4][9] = {{0}};
  289. int64_t sum_pred_modes[4] = {0};
  290. for( int i = 0; i <= I_PRED_16x16_DC_128; i++ )
  291. {
  292. fixed_pred_modes[0][x264_mb_pred_mode16x16_fix[i]] += h->stat.i_mb_pred_mode[0][i];
  293. sum_pred_modes[0] += h->stat.i_mb_pred_mode[0][i];
  294. }
  295. if( sum_pred_modes[0] )
  296. x264_log( h, X264_LOG_INFO, "i16 v,h,dc,p: %2.0f%% %2.0f%% %2.0f%% %2.0f%%\n",
  297. fixed_pred_modes[0][0] * 100.0 / sum_pred_modes[0],
  298. fixed_pred_modes[0][1] * 100.0 / sum_pred_modes[0],
  299. fixed_pred_modes[0][2] * 100.0 / sum_pred_modes[0],
  300. fixed_pred_modes[0][3] * 100.0 / sum_pred_modes[0] );
  301. for( int i = 1; i <= 2; i++ )
  302. {
  303. for( int j = 0; j <= I_PRED_8x8_DC_128; j++ )
  304. {
  305. fixed_pred_modes[i][x264_mb_pred_mode4x4_fix(j)] += h->stat.i_mb_pred_mode[i][j];
  306. sum_pred_modes[i] += h->stat.i_mb_pred_mode[i][j];
  307. }
  308. if( sum_pred_modes[i] )
  309. x264_log( h, X264_LOG_INFO, "i%d v,h,dc,ddl,ddr,vr,hd,vl,hu: %2.0f%% %2.0f%% %2.0f%% %2.0f%% %2.0f%% %2.0f%% %2.0f%% %2.0f%% %2.0f%%\n", (3-i)*4,
  310. fixed_pred_modes[i][0] * 100.0 / sum_pred_modes[i],
  311. fixed_pred_modes[i][1] * 100.0 / sum_pred_modes[i],
  312. fixed_pred_modes[i][2] * 100.0 / sum_pred_modes[i],
  313. fixed_pred_modes[i][3] * 100.0 / sum_pred_modes[i],
  314. fixed_pred_modes[i][4] * 100.0 / sum_pred_modes[i],
  315. fixed_pred_modes[i][5] * 100.0 / sum_pred_modes[i],
  316. fixed_pred_modes[i][6] * 100.0 / sum_pred_modes[i],
  317. fixed_pred_modes[i][7] * 100.0 / sum_pred_modes[i],
  318. fixed_pred_modes[i][8] * 100.0 / sum_pred_modes[i] );
  319. }
  320. for( int i = 0; i <= I_PRED_CHROMA_DC_128; i++ )
  321. {
  322. fixed_pred_modes[3][x264_mb_chroma_pred_mode_fix[i]] += h->stat.i_mb_pred_mode[3][i];
  323. sum_pred_modes[3] += h->stat.i_mb_pred_mode[3][i];
  324. }
  325. if( sum_pred_modes[3] && !CHROMA444 )
  326. x264_log( h, X264_LOG_INFO, "i8c dc,h,v,p: %2.0f%% %2.0f%% %2.0f%% %2.0f%%\n",
  327. fixed_pred_modes[3][0] * 100.0 / sum_pred_modes[3],
  328. fixed_pred_modes[3][1] * 100.0 / sum_pred_modes[3],
  329. fixed_pred_modes[3][2] * 100.0 / sum_pred_modes[3],
  330. fixed_pred_modes[3][3] * 100.0 / sum_pred_modes[3] );
  331. if( h->param.analyse.i_weighted_pred >= X264_WEIGHTP_SIMPLE && h->stat.i_frame_count[SLICE_TYPE_P] > 0 )
  332. x264_log( h, X264_LOG_INFO, "Weighted P-Frames: Y:%.1f%% UV:%.1f%%\n",
  333. h->stat.i_wpred[0] * 100.0 / h->stat.i_frame_count[SLICE_TYPE_P],
  334. h->stat.i_wpred[1] * 100.0 / h->stat.i_frame_count[SLICE_TYPE_P] );
  335. /*
  336. * 参考帧信息
  337. * 从左到右依次为不同序号的参考帧
  338. *
  339. * 示例
  340. *
  341. * x264 [info]: ref P L0: 62.5% 19.7% 13.8% 4.0%
  342. * x264 [info]: ref B L0: 88.8% 9.4% 1.9%
  343. * x264 [info]: ref B L1: 92.6% 7.4%
  344. *
  345. */
  346. for( int i_list = 0; i_list < 2; i_list++ )
  347. for( int i_slice = 0; i_slice < 2; i_slice++ )
  348. {
  349. char *p = buf;
  350. int64_t i_den = 0;
  351. int i_max = 0;
  352. for( int i = 0; i < X264_REF_MAX*2; i++ )
  353. if( h->stat.i_mb_count_ref[i_slice][i_list][i] )
  354. {
  355. i_den += h->stat.i_mb_count_ref[i_slice][i_list][i];
  356. i_max = i;
  357. }
  358. if( i_max == 0 )
  359. continue;
  360. for( int i = 0; i <= i_max; i++ )
  361. p += sprintf( p, " %4.1f%%", 100. * h->stat.i_mb_count_ref[i_slice][i_list][i] / i_den );
  362. x264_log( h, X264_LOG_INFO, "ref %c L%d:%s\n", "PB"[i_slice], i_list, buf );
  363. }
  364. if( h->param.analyse.b_ssim )
  365. {
  366. float ssim = SUM3( h->stat.f_ssim_mean_y ) / duration;
  367. x264_log( h, X264_LOG_INFO, "SSIM Mean Y:%.7f (%6.3fdb)\n", ssim, x264_ssim( ssim ) );
  368. }
  369. /*
  370. * 示例
  371. *
  372. * x264 [info]: PSNR Mean Y:42.967 U:47.163 V:47.000 Avg:43.950 Global:43.796 kb/s:339.67
  373. *
  374. */
  375. if( h->param.analyse.b_psnr )
  376. {
  377. x264_log( h, X264_LOG_INFO,
  378. "PSNR Mean Y:%6.3f U:%6.3f V:%6.3f Avg:%6.3f Global:%6.3f kb/s:%.2f\n",
  379. SUM3( h->stat.f_psnr_mean_y ) / duration,
  380. SUM3( h->stat.f_psnr_mean_u ) / duration,
  381. SUM3( h->stat.f_psnr_mean_v ) / duration,
  382. SUM3( h->stat.f_psnr_average ) / duration,
  383. x264_psnr( SUM3( h->stat.f_ssd_global ), duration * i_yuv_size ),
  384. f_bitrate );
  385. }
  386. else
  387. x264_log( h, X264_LOG_INFO, "kb/s:%.2f\n", f_bitrate );
  388. }
  389. //各种释放
  390. /* rc */
  391. x264_ratecontrol_delete( h );
  392. /* param */
  393. if( h->param.rc.psz_stat_out )
  394. free( h->param.rc.psz_stat_out );
  395. if( h->param.rc.psz_stat_in )
  396. free( h->param.rc.psz_stat_in );
  397. x264_cqm_delete( h );
  398. x264_free( h->nal_buffer );
  399. x264_free( h->reconfig_h );
  400. x264_analyse_free_costs( h );
  401. if( h->i_thread_frames > 1 )
  402. h = h->thread[h->i_thread_phase];
  403. /* frames */
  404. x264_frame_delete_list( h->frames.unused[0] );
  405. x264_frame_delete_list( h->frames.unused[1] );
  406. x264_frame_delete_list( h->frames.current );
  407. x264_frame_delete_list( h->frames.blank_unused );
  408. h = h->thread[0];
  409. for( int i = 0; i < h->i_thread_frames; i++ )
  410. if( h->thread[i]->b_thread_active )
  411. for( int j = 0; j < h->thread[i]->i_ref[0]; j++ )
  412. if( h->thread[i]->fref[0][j] && h->thread[i]->fref[0][j]->b_duplicate )
  413. x264_frame_delete( h->thread[i]->fref[0][j] );
  414. if( h->param.i_lookahead_threads > 1 )
  415. for( int i = 0; i < h->param.i_lookahead_threads; i++ )
  416. x264_free( h->lookahead_thread[i] );
  417. for( int i = h->param.i_threads - 1; i >= 0; i-- )
  418. {
  419. x264_frame_t **frame;
  420. if( !h->param.b_sliced_threads || i == 0 )
  421. {
  422. for( frame = h->thread[i]->frames.reference; *frame; frame++ )
  423. {
  424. assert( (*frame)->i_reference_count > 0 );
  425. (*frame)->i_reference_count--;
  426. if( (*frame)->i_reference_count == 0 )
  427. x264_frame_delete( *frame );
  428. }
  429. frame = &h->thread[i]->fdec;
  430. if( *frame )
  431. {
  432. assert( (*frame)->i_reference_count > 0 );
  433. (*frame)->i_reference_count--;
  434. if( (*frame)->i_reference_count == 0 )
  435. x264_frame_delete( *frame );
  436. }
  437. x264_macroblock_cache_free( h->thread[i] );
  438. }
  439. x264_macroblock_thread_free( h->thread[i], 0 );
  440. x264_free( h->thread[i]->out.p_bitstream );
  441. x264_free( h->thread[i]->out.nal );
  442. x264_pthread_mutex_destroy( &h->thread[i]->mutex );
  443. x264_pthread_cond_destroy( &h->thread[i]->cv );
  444. x264_free( h->thread[i] );
  445. }
  446. #if HAVE_OPENCL
  447. x264_opencl_close_library( ocl );
  448. #endif
  449. }

从源代码可以看出,x264_encoder_close()主要用于输出编码的统计信息。源代码中已经做了比较充分的注释,就不再详细叙述了。其中输出日志的时候用到了libx264中输出日志的API函数libx264(),下面记录一下。

x264_log()

x264_log()用于输出日志。该函数的定义位于common\common.c,如下所示。
  1. /****************************************************************************
  2. * x264_log:
  3. ****************************************************************************/
  4. //日志输出函数
  5. void x264_log( x264_t *h, int i_level, const char *psz_fmt, ... )
  6. {
  7. if( !h || i_level <= h->param.i_log_level )
  8. {
  9. va_list arg;
  10. va_start( arg, psz_fmt );
  11. if( !h )
  12. x264_log_default( NULL, i_level, psz_fmt, arg );//默认日志输出函数
  13. else
  14. h->param.pf_log( h->param.p_log_private, i_level, psz_fmt, arg );
  15. va_end( arg );
  16. }
  17. }

可以看出x264_log()再开始的时候做了一个判断:只有该条日志级别i_level小于当前系统的日志级别param.i_log_level的时候,才会输出日志。libx264中定义了下面几种日志级别,数值越小,代表日志越紧急。
  1. /* Log level */
  2. #define X264_LOG_NONE (-1)
  3. #define X264_LOG_ERROR 0
  4. #define X264_LOG_WARNING 1
  5. #define X264_LOG_INFO 2
  6. #define X264_LOG_DEBUG 3
接下来x264_log()会根据输入的结构体x264_t是否为空来决定是调用x264_log_default()或者是x264_t中的param.pf_log()函数。假如都使用默认配置的话,param.pf_log()在x264_param_default()函数中也会被设置为指向x264_log_default()。因此可以继续看一下x264_log_default()函数。

x264_log_default()

x264_log_default()是libx264默认的日志输出函数。该函数的定义如下所示。
  1. //默认日志输出函数
  2. static void x264_log_default( void *p_unused, int i_level, const char *psz_fmt, va_list arg )
  3. {
  4. char *psz_prefix;
  5. //日志级别
  6. switch( i_level )
  7. {
  8. case X264_LOG_ERROR:
  9. psz_prefix = "error";
  10. break;
  11. case X264_LOG_WARNING:
  12. psz_prefix = "warning";
  13. break;
  14. case X264_LOG_INFO:
  15. psz_prefix = "info";
  16. break;
  17. case X264_LOG_DEBUG:
  18. psz_prefix = "debug";
  19. break;
  20. default:
  21. psz_prefix = "unknown";
  22. break;
  23. }
  24. //日志级别两边加上“[]”
  25. //输出到stderr
  26. fprintf( stderr, "x264 [%s]: ", psz_prefix );
  27. x264_vfprintf( stderr, psz_fmt, arg );
  28. }

从源代码可以看出,x264_log_default()会在日志信息前面加上形如“x264 [日志级别]”的信息,然后将处理后的日志输出到stderr。


至此,对x264中x264_encoder_open(),x264_encoder_headers(),和x264_encoder_close()这三个函数的分析就完成了。下一篇文章继续记录x264编码器主干部分的x264_encoder_encode()函数。



雷霄骅
leixiaohua1020@126.com
http://blog.csdn.net/leixiaohua1020


声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/722335
推荐阅读
相关标签
  

闽ICP备14008679号