Efficient Chinese string intercept function

zhaozj2021-02-16  46

Efficient Chinese string interception function of: Xu Zuning

The use of traditional string intercept functions in PHP processes a string containing the Chinese characters when Chinese characters are cut off. When you can use a PHP extension library, we can replace with MB_SUBSTR. However, the expansion library has certain difficulties in connection - LINUX needs to recompile PHP, sometimes it can not be done, not to mention more redundant functions. Many functions that implement this feature can be seen on the network. However, the algorithm is mostly loop judgment, and when the string is large, the efficiency is extremely low. To do this, two efficient functions are introduced here: c_substr, m_substr. Their usage is completely the same as Substr and MB_SUBSTR. The difference is that the c_substr is calculated by byte, that is, the length of one Chinese character is 2; m_substr is calculated, that is, the length of one Chinese character is 1. Can be selected as needed.

Function C_SUBSTR ($ STR, $ START = 0) {$ CH = CHR (127); $ P = array ("/ [/ x81- / xfe] | [/ x40- / xfe] ) / "/ [/ x01- / x77] /"); $ r = array ("," "); if (func_num_args ()> 2) $ end = func_get_arg (2); Else $ END = Strun ($ STR); if ($ start <0) $ start = $ END;

IF ($ START> 0) {$ S = SUBSTR ($ Str, 0, $ Start); if ($ S [STRLEN ($ S) -1]> $ CH) {$ S = preg_replace ($ P, $ R) $ S); $ START = Strlen ($ s);}} $ s = SUBSTR ($ STR, $ START, $ END); $ END = Strlen ($ S); if ($ S [$ END-1 ]> $ ch) {$ S = preg_replace ($ P, $ R, $ S); $ END = Strlen ($ S);} Return Substr ($ STR, $ START, $ END);}

Function M_SUBSTR ($ STR, $ START) {preg_match_all ("/ x80- / xff] ?/", $ STR, $ AR); if (func_num_args ()> = 3) {$ END = FUNC_GET_ARG (2) Return Join ("", Array_SLICE ($ AR [0], $ START, $ END));} else returnjin ("", array_slice ($ ar [0], $ start);}

Performance Test: 1. Use the PEAR's Benchmark_Iiterate class as the timer 2 to cycle judgment function Function TrimCHINESE ($ STR, $ LEN) {$ r_str = ""; $ I = 0; While ($ I <$ LEN) {$ CH = SUBSTR ($ STR, $ I, 1); IF (ORD ($ CH)> 0x80) $ i ; $ I ;} $ r_str = Substr ($ Str, 0, $ I); Return $ R_STR; } 3, test environment: P2 / 166, NT4 IIS4 php4.3.14, test code: Require_once "benchmark / item); $ benchmark = new benchmark_iterate; $ Benchmark-> Run (100," TrimChinese ", $ Str, $ Result = $ benchmark-> get (); echo "TrimCHinese:". $ Results. "
";

$ BENCHMARK-> Run (100, "C_Substr", $ STR, 3, 1000); $ Result = $ Benchmark-> get (); echo "c_substr:" $ results [mean]. "
";

$ Benchmark-> Run (100, "M_Substr", $ STR, 3, 1000); $ Result = $ Benchmark-> get (); echo "m_substr:" $ results [mean]. "
";

$ Benchmark-> Run (100, "MB_SUBSTR", $ STR, 3, 1000); $ Result = $ benchmark-> get (); echo "MB_SUBSTR:" $ results [mean]. "
"; 5 , Test Text: This article 6, Test Results: (second) TrimChinese: 0.058972c_substr: 0.000809m_substr: 0.000666MB_SUBSTR: 0.000458

转载请注明原文地址:https://www.9cbs.com/read-23742.html

New Post(0)