x264プロファイル

SPU化するにあたり、どこがネックか調べるためプロファイルを取ってみる。なぜか32ビットビルド版では変な出力が出るので64ビットビルドする。staitc関数部分はちゃんと出ないようなので、適当にstatic宣言をコメントアウトする。

で、gprofの結果。

		     Call graph (explanation follows)


granularity: each sample hit covers 2 byte(s) for 0.00% of 223.10 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     99.4    0.00  221.74                 .main [1]
                0.00  221.74    1793/1793        .__gmon_start__ [2]
                0.00    0.00       1/8           .x264_param_default [116]
                0.00    0.00       1/1           .x264_encoder_close [119]
                0.00    0.00       1/1           .x264_encoder_open [120]
                0.00    0.00    1792/1792        .read_frame_y4m [132]
                0.00    0.00       2/2           .x264_mdate [144]
                0.00    0.00       1/1           .open_file_bsf [149]
                0.00    0.00       1/1           .open_file_y4m [150]
                0.00    0.00       1/1           .get_frame_total_y4m [148]
                0.00    0.00       1/1           .set_param_bsf [152]
                0.00    0.00       1/1           .x264_picture_alloc [161]
                0.00    0.00       1/1           .x264_picture_clean [162]
                0.00    0.00       1/1           .close_file_y4m [147]
                0.00    0.00       1/1           .close_file_bsf [146]
-----------------------------------------------
                0.00  221.74    1793/1793        .main [1]
[2]     99.4    0.00  221.74    1793         .__gmon_start__ [2]
                0.00  221.69    1793/1793        .x264_encoder_encode [3]
                0.05    0.00    1809/1809        .x264_nal_encode [92]
                0.00    0.00    1809/1809        .write_nalu_bsf [127]
                0.00    0.00    1792/1792        .set_eop_bsf [133]
-----------------------------------------------
                0.00  221.69    1793/1793        .__gmon_start__ [2]
[3]     99.4    0.00  221.69    1793         .x264_encoder_encode [3]
                0.32  205.62    1793/1793        .x264_slice_write [4]
                0.02   14.21    1793/1793        .x264_encoder_frame_end [16]
                0.00    1.49    1793/55583       .x264_fdec_filter_row [8]
                0.00    0.02    1792/1792        .x264_frame_copy_picture [109]
                0.01    0.00    3584/3585        .x264_frame_pop_unused [111]
                0.00    0.00    1793/2422343     .x264_ratecontrol_qp [98]
                0.00    0.00       1/1800        .x264_log [118]
                0.00    0.00    5376/5376        .x264_frame_push [124]
                0.00    0.00    5368/5368        .x264_frame_shift [125]
                0.00    0.00    1793/1793        .x264_ratecontrol_start [130]
                0.00    0.00    1793/1793        .x264_macroblock_slice_init [129]
                0.00    0.00    1793/10755       .x264_cpu_restore [123]
                0.00    0.00    1792/1792        .x264_slicetype_decide [136]
                0.00    0.00    1791/3583        .x264_frame_push_unused [126]
                0.00    0.00       8/8           .x264_sps_write [140]
                0.00    0.00       8/8           .x264_pps_write [139]
                0.00    0.00       7/7           .x264_frame_pop [141]
                0.00    0.00       1/1           .x264_sei_version_write [174]
-----------------------------------------------
                0.32  205.62    1793/1793        .x264_encoder_encode [3]
[4]     92.3    0.32  205.62    1793         .x264_slice_write [4]
                0.68  129.50 2420550/2420550     .x264_macroblock_analyse [5]
                0.04   44.67   53790/55583       .x264_fdec_filter_row [8]
                2.47   12.69 2420550/2420550     .x264_macroblock_encode [15]
                3.72    1.65 2420550/2420550     .x264_macroblock_cache_load [28]
                0.50    4.86  815913/815913      .x264_macroblock_write_cabac [29]
                1.80    1.59 2420550/2420550     .x264_macroblock_cache_save [36]
                0.47    0.84 2408400/10299344     .x264_cabac_mb_skip [27]
                0.08    0.00 2420550/2533996     .x264_cabac_encode_terminal [85]
                0.03    0.00    1793/1793        .x264_cabac_context_init [99]
                0.03    0.00    1793/1793        .x264_cabac_encode_flush [100]
                0.00    0.00    1793/1793        .x264_slice_header_write [131]
                0.00    0.00    1793/1793        .x264_cabac_encode_init [128]
-----------------------------------------------
                0.68  129.50 2420550/2420550     .x264_slice_write [4]
[5]     58.4    0.68  129.50 2420550         .x264_macroblock_analyse [5]
                0.61   40.19 2408400/2408400     .x264_mb_analyse_inter_p16x16 [9]
                0.32   38.83  828509/828509      .x264_mb_analyse_inter_p8x8 [10]
                2.53   16.83  840659/840659      .x264_mb_analyse_intra [14]
                0.72   10.30 1273855/8423971     .refine_subpel [7]
                0.10    8.36  356920/356920      .x264_mb_analyse_inter_p16x8 [23]
                0.10    8.36  356920/356920      .x264_mb_analyse_inter_p8x16 [24]
                0.17    1.18  828509/1056344     .x264_mb_analyse_intra_chroma [46]
                0.49    0.00 2420550/2420550     .x264_mb_analyse_init [67]
                0.16    0.13  840659/2420550     .x264_analyse_update_cache [59]
                0.06    0.00 1273855/1273855     .x264_me_refine_qpel [90]
                0.04    0.00 4816800/4816800     .prefetch_ref_null [96]
                0.03    0.00 2420550/2422343     .x264_ratecontrol_qp [98]
                0.00    0.00       1/74          .x264_malloc [138]
-----------------------------------------------
                0.42    7.87  713840/7150116     .x264_mb_analyse_inter_p8x16 [24]
                0.42    7.87  713840/7150116     .x264_mb_analyse_inter_p16x8 [23]
                1.43   26.55 2408400/7150116     .x264_mb_analyse_inter_p16x16 [9]
                1.96   36.53 3314036/7150116     .x264_mb_analyse_inter_p8x8 [10]
[6]     37.2    4.24   78.81 7150116         .x264_me_search_ref [6]
                4.06   57.79 7150116/8423971     .refine_subpel [7]
                4.56    0.00 20729236/89686850     .get_ref_altivec [13]
                2.43    0.00 12254405/12254405     .pixel_sad_16x16_altivec [40]
                2.09    0.00 4816800/7225200     .pixel_sad_x4_16x16_altivec [37]
                1.93    0.00 5211876/5211876     .pixel_sad_x3_16x16_altivec [43]
                1.30    0.00 14967199/14967199     .pixel_sad_8x8_altivec [51]
                1.23    0.00 7421855/7421855     .pixel_sad_x3_8x8_altivec [52]
                1.18    0.00 6628072/9942108     .pixel_sad_x4_8x8_altivec [45]
                0.67    0.00 3929056/3929056     .pixel_sad_8x16_altivec [61]
                0.45    0.00 3878808/3878808     .pixel_sad_16x8_altivec [69]
                0.32    0.00 1427680/2141520     .pixel_sad_x4_8x16_altivec [68]
                0.29    0.00 1605798/1605798     .pixel_sad_x3_8x16_altivec [74]
                0.26    0.00 1605758/1605758     .pixel_sad_x3_16x8_altivec [75]
                0.25    0.00 1427680/2141520     .pixel_sad_x4_16x8_altivec [71]
-----------------------------------------------
                0.72   10.30 1273855/8423971     .x264_macroblock_analyse [5]
                4.06   57.79 7150116/8423971     .x264_me_search_ref [6]
[7]     32.7    4.79   68.08 8423971         .refine_subpel [7]
               23.59    0.00 87259249/95397012     .mc_chroma_altivec [12]
               15.18    0.00 68957614/89686850     .get_ref_altivec [13]
                9.80    0.00 52182329/58895403     .pixel_satd_8x8_altivec [19]
                7.58    0.00 16152488/19509025     .pixel_satd_16x16_altivec [22]
                3.24    0.00 37295461/110035475     .pixel_satd_4x4_altivec [21]
                2.38    0.00 5687964/5687964     .pixel_satd_8x16_altivec [41]
                1.98    0.00 9845655/9845655     .pixel_satd_4x8_altivec [42]
                1.56    0.00 5804230/5804230     .pixel_satd_16x8_altivec [49]
                1.04    0.00 2408400/7225200     .pixel_sad_x4_16x16_altivec [37]
                0.86    0.00 10066684/10066684     .pixel_satd_8x4_altivec [57]
                0.59    0.00 3314036/9942108     .pixel_sad_x4_8x8_altivec [45]
                0.16    0.00  713840/2141520     .pixel_sad_x4_8x16_altivec [68]
                0.13    0.00  713840/2141520     .pixel_sad_x4_16x8_altivec [71]
-----------------------------------------------
                0.00    1.49    1793/55583       .x264_encoder_encode [3]
                0.04   44.67   53790/55583       .x264_slice_write [4]
[8]     20.7    0.04   46.16   55583         .x264_fdec_filter_row [8]
               38.73    0.00   53790/53790       .x264_frame_filter [11]
                2.95    4.35   53790/53790       .x264_frame_deblock_row [25]
                0.02    0.05   53790/53790       .x264_frame_expand_border [87]
                0.01    0.03   53790/53790       .x264_frame_expand_border_filtered [95]
                0.02    0.00   53790/322740      .plane_expand_border [83]
-----------------------------------------------
                0.61   40.19 2408400/2408400     .x264_macroblock_analyse [5]
[9]     18.3    0.61   40.19 2408400         .x264_mb_analyse_inter_p16x16 [9]
                1.43   26.55 2408400/7150116     .x264_me_search_ref [6]
                3.08    7.92 1744757/1744757     .x264_macroblock_probe_skip [20]
                0.30    0.24 1579891/2420550     .x264_analyse_update_cache [59]
                0.50    0.00 2408400/2408400     .x264_mb_predict_mv_ref16x16 [66]
                0.18    0.00 2408400/3152617     .x264_mb_predict_mv_16x16 [78]
-----------------------------------------------
                0.32   38.83  828509/828509      .x264_macroblock_analyse [5]
[10]    17.5    0.32   38.83  828509         .x264_mb_analyse_inter_p8x8 [10]
                1.96   36.53 3314036/7150116     .x264_me_search_ref [6]
                0.34    0.00 3314036/5610143     .x264_mb_predict_mv [63]
-----------------------------------------------
               38.73    0.00   53790/53790       .x264_fdec_filter_row [8]
[11]    17.4   38.73    0.00   53790         .x264_frame_filter [11]
-----------------------------------------------

x264_slice_write関数がメインぽい。マルチCPUだとこれをマルチスレッドで実行している様子。ここから下をSPUに実行させればよいと思うがたぶんLSに入らないやろうなぁ。これを参考にどのあたりが収まりそうか見ていきましょう。