Turn Java Network Programming URI, URL Research (on)

xiaoxiao2021-03-06 55

URI, URL, and URN are standard ways to identify, locate, and naming resources on the Internet. This paper analyzes the URI, URL, and Java API's URI and URL classes (and classes related to URL) and demonstrate how to use them in the program.

In 1989, Tim Berners-Lee invented the Internet (WORLD WIDE Web). WWW is considered to be a collection of actual and abstract resources globally - it provides an information entity based on demand - Access to Internet. The scope of the actual resource is from the file to people, and abstract resources include database queries. Because you want to identify resources through a variety of ways (the person's name may be the same, the computer file can only be combined by unique path name), so it is necessary to identify WWW resources to the WWW resource. In order to meet this needs, Tim Berners-Lee introduces standard identification, positioning, and naming paths: URI, URL, and URN.

What is URI, URL and URN?

URIs, URLs, and URN in the system are associated with each other. The category of the URI is located on the top floor of the system, URL, and URN located in the underlying of the system. This arrangement shows that URLs and URN are submarked as a URI, as shown in Figure 1:

Figure 1: Hierarchical relationship between URI, URL, and URN. URL and URN are sub-categories of URIs.

The URI indicates a unified resource identifier, which is a simple string that identifies the resource in a unified (standardized) manner. Typically, this string begins with Scheme (Name URI's namespace identifier - a set of related names), the syntax is as follows:

[Scheme:] Scheme-Specific-Part

The URI starts with Scheme and colon. Scheme starts with uppercase / lowercase letters, back is empty or following more uppercase / lowercase letters, numbers, plus, minus, and points. The colon is separated from Scheme to Scheme-Specific-Part, and the syntax and semantics of Scheme-Specific-Part (meaning) are determined by the namespace of the URI. One of the examples is http://www.cnn.com, where http is scheme, // http://www.cn.com is scheme-specific-part, and its scheme is separated from Scheme-Specific-Part .

We can put the URI according to an absolute or relative classification. The absolute URI refers to the URI starting with Scheme (followed by colon). The first mentioned http://www.cnn.com is an example of absolute URI, and other examples There are also Mailto: Jeff @ javajeff.com, news: comp.lang.java.help and xyz: // wherever. You can think of absolute URI as a certain amount of resources in some way, and this way does not depend on the environment of the identifier. If you use a file system as a bit ratio, an absolute URI is similar to a path to a file starting from the root directory. Different from absolute URIs, the opposing URI is not the URI starting with Scheme (followed by colon). An example of it is Articles / Articles.html. You can regard the opposite URI as a certain way to reference some kind of resource, and this way depends on the environment where the identifier appears. If the file system is used, the relative URI is similar to the file path starting from the current directory.

The URI can be further divided into two types of opaque and layered. The opaque URI refers to the absolute URI that Scheme-Specific-Part is not starting with a forward slash (/). Its example has news: comp.lang.java and front of Mailto: jeff@javajeff.com. The opaque URI is not used to decompose (beyond the scope of the identification of Scheme), as it does not need to verify the validity of Scheme-Specific-Part. Unlike it, the layered URI can be an absolute URI or opposite URL at the beginning of the positive slash. Different from the opaque URI, the hierarchical URI Scheme-Specific-Part must be decomposed into several components. What is these components? The normal subset of the hierarchical URI identification component is compliant with the following syntax:

[// authority] [PATH] [? query] [#fragment]

An optional Authority component identifies the naming mechanism of the URI name space. If there is this part, it is starting with a pair of forward slash, which can be based on a server or registered, and it ends with the backward front rumor, question mark or no other symbols. The syllability of the registered authorization agency component has a specific outline (there is no discussion this article, because very little uses), the syntax of the server-based authorized authority component is as follows:

[userinfo @] Host [: port]

According to this grammar, the server-based authorized authority component can start with user information (such as username), followed by a @ symbol, followed by the name of the host, and colon and port numbers. For example, jeff@x.com: 90 is a server-based authorized authority component, where Jeff contains user information, X.com contains a host, 90 contains ports.

Optional PATH components identify the location (or location) of the resource based on the authorization agency component (if provided) or outline (if there is no authorization agency component). The path (PATH) can be divided into a series of path segments, each path slice using a positive slash and other path pieces. If the first path piece of the path begins with a positive slash, the path is considered absolute. Otherwise the path is considered to be relative. For example, / A / b / c consists of three path pieces --a, b, b, b, b, and c, in addition, this path is absolute because the prefix of the first path chip (a) is a forward slash.

Optional QUERY components identify data to be passed to some kind of resource. This resource uses this data to acquire or generates data of other delivery backups. For example, http://www.somesite.net/a? X = y, x = y is a query (query), in this query, x = y is the data passed to some kind of resource --X is some kind The name of the entity, Y is the value of the entity.

The last component is Fragment. Although the component appears as part of the URI, it is not absolute. When a retrieval operation is performed using the URI, the software after the operation is used to focus on the resource portion of the software (after the software successfully retrieves the data).

In order to actually express the components information mentioned earlier, you can use the following URI:

FTP: //george@x.com: 90 / public / notes? text = shakespeare # hamlet

The above URI identifies the FTP as an outline, identifying the george@x.com: 90 as a server-based authorization authority (where george is user information, X.Com is the host, 90 is the port), put / public / notes identify path , Identify text = shakespeare as query, identify the Hamlet as a piece. Essentially it is a Hamlet information that retrieves Shakespeare text on the 90-port of Server X.com by / public / notes path. After SHAKESPEARE successfully returns to the program, the program locates the Hamlet segment and present it to the user. Standardization can be understood by directory terminology. Assume that directory X is directly under the root directory, x has subdirectors A and B, B have file memo.txt, and a is the current directory. In order to display the content in Memo.txt (under Microsoft Windows), you may enter type / ././b/Memo.txt. You may also enter Type /X/a/../b/Memo.txt, in which case A, and .. appear is not necessary. These two forms are not the simplest. But if you enter /X/B/Memo.txt, you specify the easiest path, start accessing Memo.txt from the root directory. The simplest / x/b/memo.txt path is a standardized path.

It is usually accessed through basic and relative URIs. The basic URI is an absolute URI that uniquely identifies the namespace of some resource, while the opposing URI identifies the resource relative to the foundation URI. (Unlike basic URIs, the relative URI can never need to change within a certain resource survival period). Because basic and relative URIs cannot completely identify certain resources, it is necessary to combine the two URIs through the resolution process. Conversely, it is also possible to extract the relative URI from the combined URI.

note

The opaque URI is different from other URIs, which does not obey standardization, decomposition, and relative.

Assume that you use the X: // A / as the basis of the URI, and use B / C as a relative URI. This relative URI will result in X: // A / B / C based on the underlying URI decomposition. B / C will occur based on x: // A / relative X: // A / B / C will result.

URI cannot locate or read / write resources. This is the task of the unified resource locator (URL). The URL is a URI, but its outline component is a known network protocol (referred to as protocol), and it puts the URI component with some protocol handler (a resource locator and constraint rules and resource communication based on protocol and resource communication Read / write mechanism).

URI generally does not provide a long-lasting name for resources. This is a unified resource name (URN) task. URN is also a URI, but the world is unique, lasting, even if the resources are not existed or no longer used.

Using the URI Network API By providing the URI class (located in the Java.net package), we use the URI in the source layer to use the URI. The constructor of the URI establishes the URI object of the encapsulation URI; the URI method establishes URI objects; if the authorization agency component is based on the server, the URI component is analyzed, the URI of the URI object is absolute or relative; determine the URI object The URI is opaque or hierarchical; comparing the URI in the two URI objects; Normalize URI objects URI; decompose a relative URI according to the base URI of the URI object to obtain a decomposed URI; according to URI objects The base URI is associated with a decomposed URI to get a relative URI, convert the URI object to a URL object.

We further view the URI class, there are five constructor in it. The simplest is the URI (String Uri). This constructor uses the URI as a parameter of the String type, decomposes URI into components and stores these components in a new URI object. If the URI of the String object violates the syntax rules of the RFC 2396, the other four constructor uRi (String URI) will generate a java.net.urisyntaxException object. The following code snippet demonstrates the URI object that uses the URI (String Uri) to encapsulate a simple URI component:

URI URI = New URI ("http://www.cn.com);

Typically the URI constructor is used to establish a URI object of the URI specified by the package user. Since the user may enter an incorrect URI, the URI constructor generates an inspected UrisyntaxException object. This means that your code must explicitly try to call a URI constructor and capture exceptions, or enumerate UriSyntaxException in the throws clause of the method to "shirk responsibility".

If you know the URI is valid (eg, in the source code), the URISYNTAXEXCEPTION object will not be generated. Since the exception processing requirements for processing a URI constructor in this case may be difficult, the URI provides a static Create (String URI) method. This method decomposes the String object included by the URI contains the URI, if the URI does not violate any syntax rules, establish a URI object (and returns it from the method), otherwise you will capture an internal UriSyntaxException object, which object will be captured. Packaging a unchecked IllegaLaRgumentException object and throws this IllegaLaRgumentException object. Because IllegalarGumentException is not checked, you don't need to clearly try code and capture exceptions or enumerate it in the THROWS clause.

The following code snippet demonstrates CREATE (String Uri):

URI URI = Uri.create ("http://www.cnn.com);

The URI Constructor and the Create (String URI) method attempt to decompose the user information, the host, and port portion of the authorization authority component of a URI. They are successful for server-based authorized authority components based on normal form. For server-based authorized authority components formed in poorly, they will fail - and the authorization authority component is based on registration. Sometimes you may know that a URI's authorization agency component must be based on a server. You can make sure that the URI's authorization agency component breaks down user information, hosts, and ports, or you can make sure you will generate an exception (along with corresponding diagnostic information). You can implement this operation by calling the ParseselRAUTHORITY () method of the URI. If the URI is successfully decomposed, the method returns a reference to the new URI object that contains the extracted user information, the host, and port portion of the URI object (but if the authorization authority component has been decomposed, will return to call ParseselAuthority () URI The reference to the object.), Otherwise the method will generate a URISYNTAXEXCEPTION object.

The following code snippet demonstrates ParseselVerauThority ():

// What happens after ParseselRAUTHORITY () call appears? URI URI = New URI ("// foo: bar"). ParseselRVERAUTHORITY (); Once you have a URI object, you can call getAuthority (), getFragment (), gethost (), getPath (), getPort (), GetQuery (), getScheme (), getSchemespect (), and getUserInfo () methods extract a variety of components. You can also determine whether the URI is absolute or relative by calling isabsolute (), by calling isopaque () to determine whether the URI is opaque or hierarchical. If the return value is TRUE means that the URI is absolute or opaque, if the return value is False means that the URI is relative or hierarchical. The program in the list 1 establishes a URI object with the command line parameter, calling the URI component extraction method to retrieve the URI component, and call the URI's isabsolute () and isopaque () method to classify the URI to absolute / relative and opaque / Hierarchical.

Listing 1: UrIDemo1.java

// UrIDemo1.javaimport java.net. *; Class uridemo1 {public static void main (string [] args) throws exception {if (args.length! = 1) {system.err.println ("USAGE: Java Uridemo1 URI" ); Return;} URI URI = New URI (Args [0]); System.out.Println ("Authority =" Uri.getAutAuthority ()); System.out.Println ("fragment =" Uri.GetFragment ( ))); System.out.println ("host =" uri.getHost ()); system.out.println ("path =" uri.getpath ()); system.out.println ("port =" Uri.getPort ()); System.out.Println ("query =" uri.getQuery ()); system.out.println ("scheme =" uri.getscheme ()); system.out.println (" Scheme-Specific Part = " Uri.getschemespecificPart ()); System.out.Println (" User Info = " Uri.getuserInfo ()); System.out.Println (" URI IS ABSOLUTE: " Uri.isabsolute )); System.out.println ("URI IS Opaque:" Uri.isopaque ());}}

After entering the java uridemo1 command, the output of the list 1 is as follows:

query: //jeff@books.com: 9000 / public / manuals / appliances stove # ge:? Authority = jeff@books.com: 9000Fragment = geHost = books.comPath = / public / manuals / appliancesPort = 9000Query = stoveScheme = query //jeff@books.com:9000/public/manuals/appliances?stoveuser info = JEFFURI IS ABSOLUTE: TRUEURI IS OPAQUE: False The output display This URI is absolute because it specifies an outline, and URI is a hierarchical because Query has a / symbol.

skill

You should call the URI's CompareTo (Object O) and Equals (Object O) to determine the order of the URI (for sorting purposes) and equivalents. You can refer to the SDK documentation to review these methods more information.

The URI class supports basic URI operations, including normalization, decomposition, and relative. Standardization is supported by the Normalize () method of the URI. When Normalize () is called, it returns a reference to the new URI object, which contains the criteria of the URI of the called URI object.

Listing 2 demonstrates the Normalize () method. It puts the URI as the unique parameter of the program, and UrIDemo2 prints the standard equally URI.

Listing 2: UrIDemo2.java

// Uridemo2.javaimport java.net. *; Class uridemo2 {public static void main (string [] args) throws exception {if (args.length! = 1) {system.err.println ("USAGE: Java Uridemo2 URI" ); Return;} URI URI = New URI (Args [0]); System.out.Println ("Normalized Uri =" Uri.Normalize () .tostring ());}

After compiling uridemo2, enter Java UrIDemo2 X / Y /../ z /./ q at the command line, will see the output below:

Normalized URI = X / Z / Q

The output is displayed y, .. and. Disappear. This is because .. means you want to access the Z section of the namespace directly below X, which means you want to access the Q section of the namespace related to the Z section.

The URI supports reverse parsing and relative operation by providing Resolve (String Uri), Resolve (URI) and the Relative (URI) method. If the URI reference is empty (NULL) three methods generate NullPointerException objects. Similarly, if the specified URI violates the RFC 2396 syntax rules, the Create (String URI) calls passing through the resolve (string URI) indirectly generate an IllegaLaRgumentException object.

The code of the list 3 demonstrates Resolve (String Uri) and the Relative (URI URI).

Listing 3: Uridemo3.java

// Uridemo3.javaimport java.net. *; Class uridemo3 {public static void main (string [] args) throws exception {if (args.length! = 2) {system.err.println ("USAGE:" "Java UrIDemo3 Uribase UrireLative "); Return;} Uri UriBase = New URI (Args [0]); System.out.Println (" Base Uri = " Uribase.toString ()); URI URIRLATIVE = New URI (Args [1] ); System.out.println ("Relative Uri =" URIRLATIVE.TOSTRING ()); URI URIRESOLVED = Uribase.Resolve (UrireLative); System.OriRiTrintln ("Resolved URI =" UriResolved.toString ()); URI uriRelativized = uriBase.relativize (uriResolved); System.out.println ( "Relativized URI =" uriRelativized.toString ());}} after compilation URIDemo3, at the command line java URIDemo3 http: //www.somedomain. COM / X /../ y. Output is as follows:

Base uri = http://www.somain.com/relarative uri = x /../ YRESOLVED URI = http://www.somain.com/yrelatival uri = Y

The above output shows the X / H / H / h / h ... based on the base URI http://www.somain.com/-standardized, and it has been decomposed http://www.somain.com/ URI. Given the URI and the base URI, the decomposed URI obtains Y based on the base URI relative, which is the original but standard opposing URI.

skill

The TOURL () method calling the URI converts the URI to the URL.

In this Sunday's topic, I will introduce the readers how to use the concept of URL and MIME (multi-purpose Internet mail expansion protocols and how it will contact URL, so stay tuned.

转载请注明原文地址:https://www.9cbs.com/read-60035.html

9cbs

New Post(0)