Use the sort rule characteristics to calculate the Chinese character strokes and get the first letters of the pinyin

zhaozj2021-02-16  52

SQL Server's sorting rules are not a lot of usage, perhaps many beginners are more unfamiliar, but there is an error to come to: SQL Server database, when the query is connected to the library multi-table, if the two database default character sets The system returns such an error: "Unable to solve the rule of the rule of the Equal TO operation."

I. Error analysis: This error is caused by the unanimous sort rules, we do a test, such as: Create Table # T1 (Name Varchar (20) Collate Albanian_CI_AI_WS, VALUE INT)

Create Table # T2 (Name Varchar (20) Collate Chinese_PRC_CI_AI_WS, VALUE INT)

After the table is built, the connection query is performed:

Select * from # T1 a inner join # T2 b on a.name = B.Name

In this way, the error will appear:

Server: Message 446, Level 16, Status 9, Row 1 Unable to resolve the sorting rule conflicts of the Equal TO operation. To exclude this error, the easiest way is to specify its sort rules when the table is connected, so that the error will no longer appear. Write this sentence:

Select * from # T1 a inner join # T2 b on a.name = B.NAME collate chinese_prc_ci_ai_ws

2. Sort Rules Description:

What is the sort rule? MS is described in this: "In Microsoft SQL Server 2000, the physical storage of strings is controlled by the sort rule. Sorting rules specify the rules indicating the bit mode of each character and storage and comparison characters." In the query analyzer Execute the following statement, you can get all the sorting rules supported by SQL Server.

Select * from :: fn_helpcollations ()

The sorting rule name consists of two parts, and the first half refers to the character set supported by this sorting rule. As: Chinese_PRC_CS_AI_WS, the first half of the first half: refers to the Unicode character set, Chinese_PRC_ refers to the sorting rules for continental Simplifiedin Unicode. The second half of the sorting rules is the suffix meaning: _bin binary sort _CI (CS) is case sensitive, CI does not distinguish, whether the CS distinguishing _ai (AS) is distinguished, AI does not distinguish, AS distinguishing _ki (KS) Whether to distinguish between a false name, Ki does not distinguish, whether the KS distinguished_wi (ws) is different from the width Wi is different, WS distinguish

Case sensitivity: If you want to make comparison, you should choose the uppercase letters and lowercase letters, please select this option. Distinguish the key: If you want to make the more accent and non-reconounced letters don't wait, select this option. If this option is selected, the letter is also considered to be different. Distinguish the fake name: If you want to make the comparison to deepen the slice of pseudonym, please don't wait, select this option. Distinguishing the width: If you want to make the comparison to treat the half-angle character and full range of characters, please select this option.

III. Application of Sort Rules: SQL Server provides a large number of Windows and SQLServer dedicated sort rules, but its applications are often ignored by developers. In fact, it is useful in practice.

Example 1: Let the contents of the table Name column are sorted by Pinyin:

Create Table #t (ID INT, Name Varchar (20)) Insert #t SELECT 1, 'China Union All Select 2,' National 'Union All Select 3,' People 'Union All Select 4,' A'SELECT * FROM #t ORDER BY Name Collate Chinese_PRC_CS_AS_KS_WS DROP TABLE # T / * Results: Id Name ------------------------------ 4 A 2 countries 3 people 1 * /

Example 2: Let the contents of the table name list according to the last name stroke:

Create Table #t (ID INT, Name Varchar (20))

INSERT #T SELECT 1, 'Three' Union All Select 2, 'B' Union All Select 3, 'Second' Union All Select 4, 'One' Union All Select 5, 'Ten' SELECT * FROM #t Order by Name Collate Chinese_prc_stroke_cs_as_ks_ws drop Table # T / * Results: ID Name ---------------------------- 4 1 2 B 3 2 5 1 three * /

IV. Extended SQL Server Chinese characters in practice SQL Server Chinese characters can be sorted according to pinyin, strokes, so how do we use this feature to handle some of the challenges of Chinese characters? I will give an example now:

Calculate the Chinese character stroke with the characteristics of the sort rules

To calculate the Chinese character stroke, we have to prepare for work first, we know, Windows multi-country Chinese characters, Unicode is currently included in Chinese characters. The Simplified GBK code Chinese characters unicode values ​​start from 19968. First, let's get all Chinese characters in the SQLServer method, don't have a dictionary, we can get the SQL statement you can get:

Select Top 20902 Code = Id, 19968, 1) INTO #T from syscolumns a, syscolumns b

Use the following statement, we get all Chinese characters, which is in order to follow the unicode value:

Select code, nchar (code) as cnword from #t

Then we use the SELECT statement to order it according to the stroke.

Select code, nchar (code) as cnword from #t Order by nchar (code) collate chinese_prc_stroke_cs_as_ks_ws, code

RESULTS: CODE CNWORD ---------------- 19968 1 20008 丨 20022 丶 20031 丿 20032 乀 20033 乁 20057 200 20058 乚 20059 乛 20101 亅 19969 Ding ....... ...

From the above results, we can clearly see that a Chinese character, code is from 19968 to 20101, from small to the big row, but to the first word "Ding" of the two Chinese characters, Code is 19969, no The order is restarted. With this result, we can easily use the SQL statement to get the first or last Chinese character classified by each stroke Chinese character. The following uses the statement to get the last Chinese character: Create Table # T1 (ID Int Id Id Int Int, Cnword Nvarchar (2))

INSERT # T1 (CNWORD) Select code, nchar (code) as cnword from #t ORDER BY nchar (code) collate Chinese_prc_stroke_cs_as_ks_ws, code

Select a.cnword from # T1 a left join # T1 b on a.id = B.ID-1 and a.code

Get 36 Chinese characters, each Chinese character is the last Chinese character sorted by chinese_prc_stroke_cs_as_ks_ws sequencing rules:

亅阝马 风龙 鸩龀 鸩龀 鸩龀 龂 龂 龊 龊 龠厐 龠厐 麷 麷 龝齹 龝齹 龝齹 龝齹 齈 齈 齈

It can be seen above: "亅" is the last word after all Chinese characters sort, "阝" is the last word after all two Chinese characters ... Wait. But at the same time, it also found that the strokes behind the 33th Chinese characters "龗 (33)" are somewhat chaos, incorrect. But there is no relationship, there are only four Chinese characters than "龗" strokes, we hand together: 齾 35, 齉 36, 靐 39, 龘 64

Building a Chinese character stroke table (tab_hzbh): Create Table Tab_hzbh (ID Int IDITENTITY, CNWORD NCHAR (1)) - Put the first 33 Chinese character insert Tab_hzbhselect top 33 a.cnword from # T1 b on A.ID = B.ID-1 and a.code

So, we can get the result, such as we want to get the brush of the Chinese character "country":

Declare @A nchar (1) set @ a = 'country' select top 1 id from tab_hzbh where cnword> = @ a collate chinese_prc_stroke_cs_as_ks_wsorder by ID

ID ----------- 8 (Result: Chinese characters "national" strokes are 8)

All of the above preparations, just to write the following functions, this function opens all temporary tables and fixed tables above, and writes the contents of the table tab_hzbh in the statement for universal and code transfer, and then calculates the user into a string of Chinese characters. Totual stroke: Create function fun_getbh (@str nvarchar (4000)) Returns Intasbegindeclare @Word nchar (1), @ n int, @ i intset @ i = 1set @ n = 0WHILE SUBSTRING (@ Str, @ i, 1) <> '' or @i <= len (@str) beginset @ word = substring (@ Str, @ i, 1) - if not Chinese characters, strokes 0 meter set @ n = @ n (Case When Unicode (@Word) Between 19968 and 19968 20901Then (SELECT TOP 1 ID, N '亅' AS Word Union All Select 2, N '阝' Union All SELECT 3, N 'Horse' Union All Slect 4, N 'Wind 'Union All Select 5, N' Dragon 'Union All Select 6, N' 'Union All Select 7, N' Turtle 'Union All Select 8, N' Tooth 'Union All Slect 9, N' 鸩 'Union All Select 10 , N '龀' union all select 12, n '龂' union all select 13, n '龆' union all SELECT 14, N 'gaven' union all Select 15, n '龊' Union All Select 16, N 'Dragon' Union All Select 17, N '龠' Union All Select 18, N '厐' Union All Select 19, N 'Poun' Union All Select 21, N '龑' Union All SELECT 21, N '龡' union all SELECT 22, N 'and' Union All SELECT 23, N '龝' Union All Select 24, N '齹' Union All Se Lect 25, N '龣' Union All Select 26, N '龥' Union All SELECT 27, N '齈' Union All Select 28, N '龞' Union All Select 29, N '麷' Union All SELECT 30, N '鸾 'Union All SELECT 31, N' 麣 'Union All SELECT 32, N' 龖 'Union All Select 33, N' 龗 'Union All Select 35, N' 齾 'Union All Select 36, N' 齉 'Union All SELECT 39, N '靐'

Union All SELECT 64, N '龘') T Where Word> = @ Word Collate Chinese_PRC_STROKE_CS_AS_KS_WSORDER BY ID ASC) ELSE 0 END) SET @ i = @ i 1ENDRETURN @ NEND - Function Call instance: SELECT DBO.FUN_GETBH ('China People '), DBO.FUN_GETBH (' "People's Republic '') Executive: The total number of strokes is 39 and 46, respectively.

Of course, you can also exist in a fixed table in the Chinese characters and strokes in the "Union All" above, and the column sort rules are set to: chinese_prc_stroke_cs_as_ks_ws is faster. If you use a BIG5 code operating system, you have to generate Chinese characters, the method is the same. But one thing to remember: These Chinese characters are from SQL statement SELECT, not manual input, not to check the word code, because Xinhua Dictionary is different from Unicode character set, and the quotation of the dictionary will be incorrect.

Use the characteristics of the sort rules to get Chinese characters Pinyin

Using the same method of the total number of strokes, we can also write a function of seeking Chinese characters in the first letters. as follows:

Create function fun_getpy (@str nvarchar (4000)) Returns nvarchar (4000) asbegindeclare @Word nchar (1), @py nvarchar (4000), @ i = 1WHile (Substring (@str, @ i, 1) <> 'or @i <= len (@str)) beginset @ Word = Substring (@ Str, @ i, 1) - If the Chinese characters, return to the original characters set @ py = @ py (Case When Unicode (@Word) Between 19968 and 19968 20901Then (SELECT TOP 1 PY from (SELECT 'A' AS PY, N '骜' AS Wordunion All Select 'B', N 'Book' Union All Select 'C' , N 'wrong' union all select 'd', n '鵽' union all select 'e', ​​n '樲' union all select 'f', n '鳆' union all select 'g', n '腂' union All select 'h', n '夻' union all select 'j', n '攈' union all select 'k', n '穒' union all select 'L', n '鱳' union all succ 'M', N '旀' union all select 'n', n '桛' union all select 'o', n '沤' union all succ 'p', n 'exposure' union all select 'q', n '囕' union all SELECT 'R', N '鶸' Union All Select 'S', N '蜶' Union All Select 'T', N '箨' Union All Select 'W', N '鹜' Union All Select 'X', N '鑂' union all select 'y', n 'rhyme' union all select 'z', n '咗') T where word> = @ word collate Chinese_prc_cs_as_ks_ws order by py asc) else @Word End) Set @ i = @ i 1ndreturn @ pyend - function call instance: select dbo.fun_getpy ('"People's Republic'), DBO.FUN_GETPY ('" People's Republic') results are: zhrmghg

转载请注明原文地址:https://www.9cbs.com/read-25887.html

New Post(0)