Improve Java string decomposition method

xiaoxiao2021-03-06  41

1. Overview Most Java programmers have used Java.util.StringTokenizer classes. It is a very convenient string detacher, mainly used to split the string to TOKEN according to the separator, and then return to each tag according to the request. This process is called tokenization, in fact, the character sequence is converted into multiple tags that the application can understand. Although StringTokenizer is very convenient, its function is limited. This class simply finds the separator in the input string, and the string is split once the separator is found. It does not check if the separator is in the substrings, and when two consecutive separators appear in the input string, it does not return "" The string length is 0) forms. In order to break through these limitations, the Java 2 platform provides the BreakItemrator class, which is a string resken that is improved on StringTokenizer. Since JDK 1.1.x does not provide this class, developers often spend a lot of time to write a resolver from the header. This type of customized string detacher is sometimes visible everywhere in a large engineering that involves data formatting processing, and this is not rare. The goal of this article is to help you use the existing StringTokenizer class to write a Advanced String Dracton. Second, STRINGTOKENIZER Limitations You can create StringTokenizer Decomposers in any of the three constructors: StringTokenizer (STRING SINPUT): Separator ("", "/ t", "/ n") is separator segmentation String. StringTokenizer (STRING SINPUT, STRING SDELIMITER): Split the string with SDELIMITER as a separator. StringTokenizer (STRING SINPUT, STRING SDELIMITER, Boolean BreturntoKens): Split strings with SDELIMITER as a separator, but if BreturntoKens is true, the separator returns as a tag. The first constructor does not check if the input string contains a substring. For example, if a gap character is separated as a separator "Hello. Today /" I am / "Going to My Home Town", the string decomposition result is Hello., Today, "I, AM,", Going, etc., not Hello., Today, "I am", going, etc. The second constructor does not check the case where the two separators are continuously emerged. For example, if you split "Book, Author, Publication,, Date Published" strings, StringTokenizer returns the four tags of Book, Author, Publication and Date Published, not Book, Author, Publication , ",", "," "" "" "" "" Indicates a 0 length string). To get 6 markup answers, you must set the StringTokenizer's BreturNToKens parameter to True. The BreturNKens parameter that allows the setting value is True is an important feature because it considers the case where the separator is continuous.

For example, when using the second constructor, if the data is dynamically collected and to update the table in the database, the tag in the input string corresponds to the value of the column in the table, then we cannot determine which one should be set To "", we cannot map the tags in the input string to the database column. Suppose we want to insert records into a table with 6 columns, and input data contains two consecutive separators. At this point, the decomposition result of StringTokenizer is 5 tags (two consecutive separator "tags, which will be ignored by StringTokenizer), and we have six fields that need to be set. At the same time, we don't know where the continuous separator appears, so I don't know which one should be set to "". The third constructor is invalid when the marker itself is equivalent to the separator (whether the length is still value) and within the subtrunter. For example, if we want to solve the string "Book, Author, Publication, /", / ", /", Date Published "(this string contains a" mark, which is the same as the separator). These six tags of Book, Author, Publication, ",", Date Published, not Book, Author, Publication, (comma character), Date Published these five tags. A reminder, even if we set the StringTokenizer's BreturNToKens parameter settings to True, there is no help in this case. Third, the advanced string detachor before writing the code, you have to figure out which of the basic requirements of a good resolver. Because Java developers have become accustomed to using StringTokenizer classes, a good digestor should provide all practical methods available for StringTokenizer classes, such as HasmoreToKens (), NextToken () () () () () () () () () () () () () () () () () () () The code provided herein is simple, and most of the code is sufficient to explain themselves. Here, I mainly take advantage of the StringTokenizer class (when creating class instances, the Breturns parameter is set to true), and provides several methods mentioned above. Most of the tags are different from the separator, and sometimes the separator is output as a marker output (although very rare), if there is a request for the tag, the resolver outputs the separator as a marker. When you create a PowerFultokenizer object, you only need to provide both parameters of the input string and separator, and PowerFultokenizer will set it to TRINGTOKENIZER internally. (This is the reason for this is that if it is not to use BreturntoKens to set the StringTokenizer, it will be restricted when the previously proposed issues will be restricted). In order to properly control the resolver, the code is in several places (calculating the total number of tags, and NEXTTOKEN ()) check whether BreturntoKens is set to TRUE. You may have discovered that PowerFultokenizer implements the Enumeration interface, which also implements two methods of HasMoreElements () and nextElement (), and these two methods have directly delegated to HasmoreToKens () and NEXTTOKEN (). (Due to the enumeration interface, PowerFultokenizer implements backward compatibility with StringTokenizer.

Let's take an example, assume that the input string is "Hello, Today,, /" I, AM / ", Going to,, /" Buy, A, Book / ", the separator is", ". The return results are shown in Table 1 when dividing this string with a resolver: Table 1: String Decomposition Result Input string contains 11 comma (,) characters, three of which are in the substrings, 4 continuous appearances ("Today ,, "" Two consecutive commas is included, the first comma is the separator of Today). Below is an algorithm for the total number of PowerFultokenizer computing marks: The reason is that for substrs "Buy, A, BOOK", StringTokenizer will return 5 tags (ie "Buy:,: a::::: /), and PowerFultokenizer will return a tag (ie" Buy, A, BOOK " ), The difference between the two is 4 (i.e., the number of separators in 2 in the sub-string). This formula is valid for all substrings containing the separator. Similarly, for the case of BreturntoKens = FALSE, we subtract the expression from the actual total (19) [Separator Total (11) - The number of segments in the continuous separator (4) substrings (3)]. Since we don't return the separator, they (non-continuous appearance or in the subsidence), the above formula returns the total number of tags (9). Remember these two formulas that are the core of PowerFultokenizer. These two formulas apply to almost all of their respective conditions. However, if you have more complex requirements, you can't use these two formulas, then you should analyze a variety of possible situations before writing code, and design your own formula. / / Check if the separator is located within the substring for FOR (INT i = 1; I 转载请注明原文地址:https://www.9cbs.com/read-50932.html


New Post(0)