From
Inheritance standard
Translation
When I started XHTML 1.1, I have never known what should be written on XML: lang, I want to use Chinese, its value is ZH, ZH-CN / EN-CN or GB2312 / GBK / GB18030 or UTF8? I usually encounter problems, I am prioritizing google Chinese, helpless can't find the answer. I saw GB2312 on some authoritative websites, I almost believed, but according to the experience of setting up language in Linux, I intuken. This is wrong. So I started narrowing the range to W3C to Google, I found Tutorial: Using Language Information in XHTML, HTML and CSS (Draft), read, finally got out of the misunderstanding, and will be shared with you.
It is still translation, but this article is too long, and there are many information we don't need, this time I have chosen some, I hope I can clear the problem clear.
Disclaimer documentation and text language
Why not state the language
Information about document language is extremely important for screen readers and ease of use, and it is advantageous from the beginning. These procedures need to be understood whether they can generate output from text, or whether they need to go to different language modes.
Mark language information is also good for applying appropriate style changes. For example, you need to change the font to adjust different characters, and generate unused quotes, etc. according to language.
Some browsers use language information for Chinese Simplified, Chinese Traditional, Japanese and Korean to detect suitable fonts. However, in a page that uses Unicode encoded, these languages may share the same demonstration character. People who operate these languages may differ from some small details on these characters. The following illustration demonstrates that only the language tag is changed, the effect of the text on Mozilla:
Marking language information also allows you to extract the elements of the specified language using scripts. For example, use the XSLT LANG () function to extract the text of the specified language from a file, or apply the language specified style when XSL-FO conversion.
In many cases, when developing content, you may not be aware of the importance of these applications, although they are generally easier to add when they are created, this will encounter trouble when the style is needed.
In addition, some programs for language tags are also in early development or lack, however, from now you should add language information for your content in order to harvest future benefits when technically mature.
Always always declare language in the tag
The language of the HTML document should be declared in the document, which can be implemented by adding the LANG attribute in the HTML tag. For example, a documentation using Canadian French:
We will tell how to specify values for language properties later.
When the XHTML servo is Text / HTML, you should use the lang attribute and XML: Lang attribute in the HTML element. XML: LANG attribute is a standard usage for determining language information in XML. The following demonstrates how you should mark an example of XHTML 1.0 that is previously described by Text / HTML:
XML: LANG attribute does not actually use the HTML file, but inherits from the Lang property means you want to make the script or verifier as XML as XML.
If you use XML (for example, using MIME type like Applications / XHTML XML) or XHTML 1.1 to serve XHTML, you no longer need a lang property because it has been separated from the HTML language. Separate XML: LANG attribute is enough.
Always declare the language change of text
In the text different from the primary language of the content, you should point out the language of the text. Methods The section is always the same as the document in the tag that is the same - using the LANG or XML: LANG attribute. For example, in HTML you can write:
the French for cat em> is
The LANG attribute can be used on any HTML element other than Applet, Base, BaseFont, Br, Frame, Frameset, IFRAME, PARAM, and Script.
Also, with the TEXT / HTML servo XHTML 1.0, you can use two properties together, such as:
the title in Chinese is
XML: lang = "zh-cn"> Chinese Academy of Sciences Document Information Center span>. P> Note that in the last example, the surrounding of the Chinese text has not allowed to attach the mark of language information, and the SPAN element is introduced to achieve the goal. (Please check the source code of this paragraph - Translator Note) If you are described in XML servo, you should use only XML: LANG properties as described in the previous section. Specifies the value of the language attribute Use RFC 3066 rules RFC 3066 is a standard that defines how to identify languages using language tags. Language tags are separated by a simple characters by a primary subtag, trailing zero or more attached SUBTAG. The main Subtag represents a language (there are two exceptions, I- and X-, will discuss below), any trailing Subtag serving dialect or usage of the language. The back Subtag generally represents the country, dialect or text system. The following example shows that the document not only uses English but also English English, that is, it is written in English with American English. Subtag is sensitive to uppercase, including letters and numbers from A to Z, A to Z, 0 to 9, and cannot be more than 8 characters. It should be noted that the HTML specification still recommends using RFC 1766 to determine the language. RFC 3066 is an upgrade of RFC 1766 and has a big transcendence, and there is a planned verification table in the HTML specification, so you should use RFC 3066 how to explain on the current phase of the HTML specification. Lord Subtag All initial subtag must be a length of 1, 2 or 3 letters. All 2 and 3 letters Subtag are defined code to represent language code in ISO 639 Part 2 in the language. 1 letter's Subtag is an I- or X-prefix, and we will describe later. Although the code is sensitive, they are often lowercase, but this is just a practice. Note that when ISO provides 2 letters and 3 alphabetic selection, you should choose 2 letters. This ensures that a unique code is used as soon as possible, and the 2 letter code (based on RFC 1766, the code is not allowed) is not changed. At the same time, the problem should be avoided by the 3 alphabetic code, because all the few languages with two different 3 alphabetic code will also have 2 alphabetic code. Subtag Increasing Subtag can represent geographic regions, dialects, text systems, or other for major (language) subtag improvements. The main subtag can end with any number of subtag, although more than one is not common. RFC 3066 pointed out that any 2 letter subtag in the secondary position is ISO 3166 country code. There is no rule using Subtag in any third position or next location. 2 letters used to represent countries' ISO code usually capitalize, but this is just a practice. Special primary SUBTAG RFC 3066 defines some examples that may not start from the ISO language code. The language label started with i-start is reserved for IANA-registered language tags. Some examples: I-mingoi-klingoni-tao The language label started with X-start provides the user's custom language tag. The label on the secondary position must be more than one letter and cannot be the following retains Subtag: AA, QM-QZ, XA-XZ, AND ZZ. Of course, these identification methods are not required when the ISO code based on 2 letters or 3 letters is available. These methods are used to limit or prevent confusion of interoperability. Iana registration language tag The IANA language tag can be registered by the Email submission program mentioned in RFC 3066. These tags can have 3 to 8 letters long second position code. Registering the IANA code is better than using user-defined code because it minimizes the possibility of confusion because IANA code is dominant. On the other hand, the IANA label is a new code that is not approved by the ISO standard statement. The IANA tag that is not approved includes NO-BOK (Norwegian "Book Language" - using ISO 639 NB), I-Navajo (Navajo - LB using ISO 639), I-LUX ((luxembourgish - use ISO 639 LB), there are more more. If this reason is based on this reason, the IANA registration code should only appear to fill the space of the ISO code. Although the I-prefix is reserved for the IANA code, not all IANA code starts from it. For example, there are many Chinese dialects that have already registered IANA code, including zh-guoyu (Mandarin, huh, why not Putonghua?), ZH-Hakka (Hakka), ZH-MIN (), zh-min-nan (south) , Zh-wuu (Wu), etc. At the same time, the IANA code that has been registered allows you to specify Traditional or Simplified Chinese. In the past, this must be used in Simplified Chinese using ZH-CN (mainland China) and uses ZH-TW (China Taiwan) for Traditional Chinese. But you can't guarantee that other people can understand and even follow this practice. For example, some people use zh-hk to express traditional Chinese. Now IANA uses the ZH-HANS and EN-HANT code to specify Simplified Chinese and Traditional Chinese. The following two paragraphs illustrate the use of these two codes: When the world needs to communicate, please use Unicode! p> When the world needs to communicate, please use unified code p> Other points of language tag Although most of the RFC3066 language labels have good operation, there is still some questions: Need more code than ISO to convert the language close to 6000 in the world that has not yet covered the code that needs to express universal zones. For example, there is still no short-grade Spanish code for many organizations to create Spanish content. There is no code that needs to express universal zones, for example, now there is still no multimachi Spanish code for many organizations to create Spanish content. It is still lacking the clarity of the language tag value and LOCALE. Locale is a combination of geographic regions and is usually used in software to set up a date and time. Sometimes you really need to distinguish between the text system attached to the language. For example, Mongol may write Mongolian or Slavic, Croatia may also write Latin or Slavic ... Staff from ISO TC37, SIL and W3C is working hard for these problems. At the same time, you should always remember that you can register your language tags you need at Iana. Extend reading Original: TUTORAL: USING LANGUAGE INXHTML, HTML AND CSS (Draft), this section is selected from some of the chapters