Efficient Chinese string intercept function

xiaoxiao2021-03-06  83

Efficient Chinese string interception function of: Xu Zuning characters will be cut off when the phenomenon string contains Chinese characters using traditional string interception function substr processing in php. When you can use a PHP extension library, we can replace with MB_SUBSTR. However, the expansion library has certain difficulties in connection - LINUX needs to recompile PHP, sometimes it can not be done, not to mention more redundant functions. Many functions that implement this feature can be seen on the network. However, the algorithm is mostly loop judgment, and when the string is large, the efficiency is extremely low. To do this, two efficient functions are introduced here: c_substr, m_substr. Their usage is completely the same as Substr and MB_SUBSTR. The difference is that the c_substr is calculated by byte, that is, the length of one Chinese character is 2; m_substr is calculated, that is, the length of one Chinese character is 1. Can be selected as needed.

Function C_SUBSTR ($ STR, $ START = 0) {$ CH = CHR (127); $ P = array ("/ [/ x81- / xfe] | [/ x40- / xfe] ) / "/ [/ x01- / x77] /"); $ r = array ("," "); if (func_num_args ()> 2) $ end = func_get_arg (2); Else $ END = Strun ($ STR); IF ($ start <0) $ start = $ END; if ($ s = substr) {$ s = substr ($ Str, 0, $ start); if ($ S [Strlen ($ S [S [Strlen ) -1]> $ ch) {$ S = preg_replace ($ P, $ R, $ S); $ start = Strlen ($ S);}} $ s = SUBSTR ($ STR, $ START, $ END) $ END = Strlen ($ S); if ($ S [$ END-1]> $ CH) {$ S = preg_replace ($ P, $ R, $ S); $ END = Strlen ($ S); } Return Substr ($ STR, $ START, $ END);} Function M_Substr ($ STR, $ START) {preg_match_all ("/ [/ x80- / xff] ?/", $ STR, $ AR); IF ( FUNC_NUM_ARGS ()> = 3) {$ END = func_get_arg (2); returnjin ("", array_slice ($ Ar [0], $ start, $ END));} else returnjin (", array_slice ($ AR) [0], $ start);} Performance Test: 1. Use the PEAR's Benchmark_Iterate class as the timer 2, with a cyclic judgment function Function TrimCHINESE ($ STR, $ LEN) {$ r_str = ""; $ i = 0; While ($ CH = Substr ($ STR, $ I, 1); if (Ord ($ CH)> 0x80) $ i ; $ i ; } $ R_str = substr ($ STR, 0, $ I); RETURN $ R_STR;} 3, Test Environment: P2 / 166, NT4 IIS4 PHP4.3.14, Test Code: Require_once "Benchmark / Iterate.php"; $ Benchmark = New Benchmark_Iterate; $ Benchmark-> Run (100, "TrimCHINESE", $ STR, 1000); $ Result = $ benchmark-> get (); echo "TrimChinese:". $ results [mean]. "
" $ Benchmark-> Run (100, "C_Substr", $ STR, 3, 1000); $ Result = $ BENCHMARK-> Get (); echo "c_substr:" $ results [mean]. "
"; $ Benchmark-> Run (100, "m_substr"

$ STR, 3, 1000; $ Result = $ benchmark-> get (); echo "m_substr:" $ results [mean]. "
"; $ benchmark-> run (100, "mb_substr", $ STR, 3, 1000; $ Result = $ benchmark-> get (); echo "MB_SUBSTR:" $ results [mean]. "
"; 5, test text: this article 6, test results: (second TrimChinese: 0.058972c_substr: 0.000809m_substr: 0.000666MB_SUBSTR: 0.000458 Author Blog:

http://blog.9cbs.net/xuzuning/

转载请注明原文地址:https://www.9cbs.com/read-106887.html

New Post(0)