x264プロファイル
SPU化するにあたり、どこがネックか調べるためプロファイルを取ってみる。なぜか32ビットビルド版では変な出力が出るので64ビットビルドする。staitc関数部分はちゃんと出ないようなので、適当にstatic宣言をコメントアウトする。
で、gprofの結果。
Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.00% of 223.10 seconds index % time self children called name <spontaneous> [1] 99.4 0.00 221.74 .main [1] 0.00 221.74 1793/1793 .__gmon_start__ [2] 0.00 0.00 1/8 .x264_param_default [116] 0.00 0.00 1/1 .x264_encoder_close [119] 0.00 0.00 1/1 .x264_encoder_open [120] 0.00 0.00 1792/1792 .read_frame_y4m [132] 0.00 0.00 2/2 .x264_mdate [144] 0.00 0.00 1/1 .open_file_bsf [149] 0.00 0.00 1/1 .open_file_y4m [150] 0.00 0.00 1/1 .get_frame_total_y4m [148] 0.00 0.00 1/1 .set_param_bsf [152] 0.00 0.00 1/1 .x264_picture_alloc [161] 0.00 0.00 1/1 .x264_picture_clean [162] 0.00 0.00 1/1 .close_file_y4m [147] 0.00 0.00 1/1 .close_file_bsf [146] ----------------------------------------------- 0.00 221.74 1793/1793 .main [1] [2] 99.4 0.00 221.74 1793 .__gmon_start__ [2] 0.00 221.69 1793/1793 .x264_encoder_encode [3] 0.05 0.00 1809/1809 .x264_nal_encode [92] 0.00 0.00 1809/1809 .write_nalu_bsf [127] 0.00 0.00 1792/1792 .set_eop_bsf [133] ----------------------------------------------- 0.00 221.69 1793/1793 .__gmon_start__ [2] [3] 99.4 0.00 221.69 1793 .x264_encoder_encode [3] 0.32 205.62 1793/1793 .x264_slice_write [4] 0.02 14.21 1793/1793 .x264_encoder_frame_end [16] 0.00 1.49 1793/55583 .x264_fdec_filter_row [8] 0.00 0.02 1792/1792 .x264_frame_copy_picture [109] 0.01 0.00 3584/3585 .x264_frame_pop_unused [111] 0.00 0.00 1793/2422343 .x264_ratecontrol_qp [98] 0.00 0.00 1/1800 .x264_log [118] 0.00 0.00 5376/5376 .x264_frame_push [124] 0.00 0.00 5368/5368 .x264_frame_shift [125] 0.00 0.00 1793/1793 .x264_ratecontrol_start [130] 0.00 0.00 1793/1793 .x264_macroblock_slice_init [129] 0.00 0.00 1793/10755 .x264_cpu_restore [123] 0.00 0.00 1792/1792 .x264_slicetype_decide [136] 0.00 0.00 1791/3583 .x264_frame_push_unused [126] 0.00 0.00 8/8 .x264_sps_write [140] 0.00 0.00 8/8 .x264_pps_write [139] 0.00 0.00 7/7 .x264_frame_pop [141] 0.00 0.00 1/1 .x264_sei_version_write [174] ----------------------------------------------- 0.32 205.62 1793/1793 .x264_encoder_encode [3] [4] 92.3 0.32 205.62 1793 .x264_slice_write [4] 0.68 129.50 2420550/2420550 .x264_macroblock_analyse [5] 0.04 44.67 53790/55583 .x264_fdec_filter_row [8] 2.47 12.69 2420550/2420550 .x264_macroblock_encode [15] 3.72 1.65 2420550/2420550 .x264_macroblock_cache_load [28] 0.50 4.86 815913/815913 .x264_macroblock_write_cabac [29] 1.80 1.59 2420550/2420550 .x264_macroblock_cache_save [36] 0.47 0.84 2408400/10299344 .x264_cabac_mb_skip [27] 0.08 0.00 2420550/2533996 .x264_cabac_encode_terminal [85] 0.03 0.00 1793/1793 .x264_cabac_context_init [99] 0.03 0.00 1793/1793 .x264_cabac_encode_flush [100] 0.00 0.00 1793/1793 .x264_slice_header_write [131] 0.00 0.00 1793/1793 .x264_cabac_encode_init [128] ----------------------------------------------- 0.68 129.50 2420550/2420550 .x264_slice_write [4] [5] 58.4 0.68 129.50 2420550 .x264_macroblock_analyse [5] 0.61 40.19 2408400/2408400 .x264_mb_analyse_inter_p16x16 [9] 0.32 38.83 828509/828509 .x264_mb_analyse_inter_p8x8 [10] 2.53 16.83 840659/840659 .x264_mb_analyse_intra [14] 0.72 10.30 1273855/8423971 .refine_subpel [7] 0.10 8.36 356920/356920 .x264_mb_analyse_inter_p16x8 [23] 0.10 8.36 356920/356920 .x264_mb_analyse_inter_p8x16 [24] 0.17 1.18 828509/1056344 .x264_mb_analyse_intra_chroma [46] 0.49 0.00 2420550/2420550 .x264_mb_analyse_init [67] 0.16 0.13 840659/2420550 .x264_analyse_update_cache [59] 0.06 0.00 1273855/1273855 .x264_me_refine_qpel [90] 0.04 0.00 4816800/4816800 .prefetch_ref_null [96] 0.03 0.00 2420550/2422343 .x264_ratecontrol_qp [98] 0.00 0.00 1/74 .x264_malloc [138] ----------------------------------------------- 0.42 7.87 713840/7150116 .x264_mb_analyse_inter_p8x16 [24] 0.42 7.87 713840/7150116 .x264_mb_analyse_inter_p16x8 [23] 1.43 26.55 2408400/7150116 .x264_mb_analyse_inter_p16x16 [9] 1.96 36.53 3314036/7150116 .x264_mb_analyse_inter_p8x8 [10] [6] 37.2 4.24 78.81 7150116 .x264_me_search_ref [6] 4.06 57.79 7150116/8423971 .refine_subpel [7] 4.56 0.00 20729236/89686850 .get_ref_altivec [13] 2.43 0.00 12254405/12254405 .pixel_sad_16x16_altivec [40] 2.09 0.00 4816800/7225200 .pixel_sad_x4_16x16_altivec [37] 1.93 0.00 5211876/5211876 .pixel_sad_x3_16x16_altivec [43] 1.30 0.00 14967199/14967199 .pixel_sad_8x8_altivec [51] 1.23 0.00 7421855/7421855 .pixel_sad_x3_8x8_altivec [52] 1.18 0.00 6628072/9942108 .pixel_sad_x4_8x8_altivec [45] 0.67 0.00 3929056/3929056 .pixel_sad_8x16_altivec [61] 0.45 0.00 3878808/3878808 .pixel_sad_16x8_altivec [69] 0.32 0.00 1427680/2141520 .pixel_sad_x4_8x16_altivec [68] 0.29 0.00 1605798/1605798 .pixel_sad_x3_8x16_altivec [74] 0.26 0.00 1605758/1605758 .pixel_sad_x3_16x8_altivec [75] 0.25 0.00 1427680/2141520 .pixel_sad_x4_16x8_altivec [71] ----------------------------------------------- 0.72 10.30 1273855/8423971 .x264_macroblock_analyse [5] 4.06 57.79 7150116/8423971 .x264_me_search_ref [6] [7] 32.7 4.79 68.08 8423971 .refine_subpel [7] 23.59 0.00 87259249/95397012 .mc_chroma_altivec [12] 15.18 0.00 68957614/89686850 .get_ref_altivec [13] 9.80 0.00 52182329/58895403 .pixel_satd_8x8_altivec [19] 7.58 0.00 16152488/19509025 .pixel_satd_16x16_altivec [22] 3.24 0.00 37295461/110035475 .pixel_satd_4x4_altivec [21] 2.38 0.00 5687964/5687964 .pixel_satd_8x16_altivec [41] 1.98 0.00 9845655/9845655 .pixel_satd_4x8_altivec [42] 1.56 0.00 5804230/5804230 .pixel_satd_16x8_altivec [49] 1.04 0.00 2408400/7225200 .pixel_sad_x4_16x16_altivec [37] 0.86 0.00 10066684/10066684 .pixel_satd_8x4_altivec [57] 0.59 0.00 3314036/9942108 .pixel_sad_x4_8x8_altivec [45] 0.16 0.00 713840/2141520 .pixel_sad_x4_8x16_altivec [68] 0.13 0.00 713840/2141520 .pixel_sad_x4_16x8_altivec [71] ----------------------------------------------- 0.00 1.49 1793/55583 .x264_encoder_encode [3] 0.04 44.67 53790/55583 .x264_slice_write [4] [8] 20.7 0.04 46.16 55583 .x264_fdec_filter_row [8] 38.73 0.00 53790/53790 .x264_frame_filter [11] 2.95 4.35 53790/53790 .x264_frame_deblock_row [25] 0.02 0.05 53790/53790 .x264_frame_expand_border [87] 0.01 0.03 53790/53790 .x264_frame_expand_border_filtered [95] 0.02 0.00 53790/322740 .plane_expand_border [83] ----------------------------------------------- 0.61 40.19 2408400/2408400 .x264_macroblock_analyse [5] [9] 18.3 0.61 40.19 2408400 .x264_mb_analyse_inter_p16x16 [9] 1.43 26.55 2408400/7150116 .x264_me_search_ref [6] 3.08 7.92 1744757/1744757 .x264_macroblock_probe_skip [20] 0.30 0.24 1579891/2420550 .x264_analyse_update_cache [59] 0.50 0.00 2408400/2408400 .x264_mb_predict_mv_ref16x16 [66] 0.18 0.00 2408400/3152617 .x264_mb_predict_mv_16x16 [78] ----------------------------------------------- 0.32 38.83 828509/828509 .x264_macroblock_analyse [5] [10] 17.5 0.32 38.83 828509 .x264_mb_analyse_inter_p8x8 [10] 1.96 36.53 3314036/7150116 .x264_me_search_ref [6] 0.34 0.00 3314036/5610143 .x264_mb_predict_mv [63] ----------------------------------------------- 38.73 0.00 53790/53790 .x264_fdec_filter_row [8] [11] 17.4 38.73 0.00 53790 .x264_frame_filter [11] -----------------------------------------------
x264_slice_write関数がメインぽい。マルチCPUだとこれをマルチスレッドで実行している様子。ここから下をSPUに実行させればよいと思うがたぶんLSに入らないやろうなぁ。これを参考にどのあたりが収まりそうか見ていきましょう。