Design and Implementation of Mail Recognition and Packaging Module in PDA Email System

xiaoxiao2021-03-06 162

1 Introduction

PDA is a personal digital assistant, a high-tech product integrating digital, text, and images; is a digital device for processing personal affairs. Its product function is mainly manifested in recording and processing personal information such as address book, name card, calculator, schedule, date information prompt, etc. As the Internet is getting deeper into people's lives, users' requirements for PDA devices are getting higher and higher. Users want PDA to receive emails through access to Internet, and can also get various value-added services through email. If financial (stock, finance), travel, meteorology, etc. are received via email. To make PDAs become a practical information terminal, the email system is an indispensable component.

2, PDA email transceiver system

2.1 system function

The main body of the PDA email system is a set of email transceivers based on TCP / IP protocols. The mail transceree running on the PDA is connected to the ISP via the modem dialing, and then access the Internet. With Internet, PDA can establish a connection with the message receiving server, use the POP3 protocol to receive mail; you can also establish a connection with the mail sending server, using the SMTP protocol to send mail.

2.2 Hardware Platform

The PDA hardware platform we use is based on the ADZ80 chip, with a 50MHz run speed. In addition to the basic configuration of the input and output devices, CPUs, memory, etc., also includes comitary LCD control screens, touch screen controllers, clocks, interrupts. The controller and other peripherals, so it is a PDA device with excellent performance and low price.

The PDA is a special device. It is too small in the RAM resource relative to the PC. Due to the limitations of the system structure, segment switching must be performed when the size of the code exceeds 64K. Since the PDA email system is developed using ANSI C, it is quite difficult to implement segment switching with the C language, so it is necessary to arrange the final code and data of the email system in two different segments, and each segment Can't exceed 64K. In addition, the PDA device is a system without file management functions and memory management functions, so it cannot be dynamically applied to mail storage according to the size of the message, and only a fixed size 8K buffer can only be assigned a single message. When the message exceeds 8K, give it to the message.

3, design and implementation of email identification and package module

3.1 identify the function of encapsulation module

The mail recognition feature of the PDA email transceiver system refers to the transceiver to identify the message collected from the mail server and extract the mail content you need from it, and then hand it over to the PDA master program; the function of the mail package is PDA calls the SMTP protocol to send the message to the information that the user wants to send into the MIME structure defined by RFC822 and RFC1521.

3.2, receive control of the email size.

The PDA prototyper only provides 8K mail receiving buffer. When the message exceeds 8K, if you want to receive the message, the buffer overflows, resulting in an abnormality of the system. So before receiving the entire mail content, it must be determined whether the size of the message exceeds 8K, if more than 8K mail is given, give up the message. Our solution is the first use of the POP3 protocol's list command, which returns the mail directory in the user mailbox and the size of each message, then use its extended command TOP to obtain a summary information of a single mail. Returning these mail directory contents to the master program, the PDA master program is displayed on the PDA screen for the user to select some or all of them. When the user selects more than 8K mail, the receiving module skips the message to receive the next one.

3.3 identifying data interfaces with generating modules and PDA master programs

The PDA prototype does not have an operating system, only one control system running. For the email system, the master program is handed over to the email system after entering the user, and the received message is entered from the mail system, displays on the PDA screen for users to browse. In order to complete the interaction with the PDA master program, the email system defines the following mail data exchange interface: Serial No. / 0 Sender / 0 Received / 0 Send Date / 0 Topics / 0 Mail Size / 0 No Attachment / 0 Body size / 0 Body content / 0 Accessories size / 0 Accessories / 0 / 0xFF (if there is an attachment).

This structure has the following advantages: (1) Save space. In this structure, only the most needed mail content is retained, omitting most unwanted or mail agents to add mail header information. The content of the data block is preferably placed in a segment, so it defines such a structure to save a certain space for the system. (2) Easy to expand. This structure only defines an attachment. If there is a plurality of attachments, it is only necessary to continue to add attachment size / 0 accessories / 0 at the end of the structure, and will end in the end with / 0xff. (3) Easy to handle it. To separate each data field with characters / 0, it is easy to use various string library functions, which is very convenient to develop with ANSI C.

In order to save the received email and the interaction with the PDA master program, the system defines two buffers with a size of 8k. In order to describe convenience, it is called buffer A and buffer B. The mail buffer A is used to store the message content defined by the MIME format from the mail server, called the mail source code; the mail buffer B is used to store the identified mail content, called the identified email, the mail transceiver is used This buffer is exchanged with the master program.

The message after the mail source code in buffer A is written to the mail buffer B in advance, but during the write process, the message buffer B must be more than 8K to avoid system abnormalities. . The encapsulation email also has a problem with buffer overflow. If a user needs to send an e-mail, the user writes the message on the PDA interface, and the master program is saved in the buffer B by pre-defined data structure, the mail package module Packaging the contents of B into a mail source, written in the transmit buffer A. When writing to the transmission buffer, it is also necessary to determine whether or not it exceeds 8K. In view of the above, the module defines a global variable WriteBufferSize to save the size of the message that has been written to the message or send buffer. When WriteBuffers is more than 8K, stop writing content to the mail buffer and returns an error number To indicate that the mail buffer has overflow.

3.4, the structure of the mail.

Before introducing mail identification, you will briefly describe the MIME mail structure defined by RFC. An email consists of an email head and an email body. The email header, the mail body is a series of text lines, each line ends with the Route Wrap (CRLF). After the mail head is an empty line, used to separate the mail header and the mail body. The message header consists of several fields, each of which consists of one or more lines. For fields across multi-line, non-first lines must start with a space bar or table button (Tab), called Continuation Line. Each message header field consists of the following parts: Field name, optional space, a colon, optional annotation space and an optional field body, but usually there is no space between field name and colon.

The mail body is the actual content of the user needs. Current email not only contains text, but also with multimedia information such as images, videos, sounds. MIME specifies that the field content-type specifies the type of the mail body. The protocol specifies seven major types, which are TEXT, Image, AUDIO, VIDEO, Application, Message, Multipart, and each type is divided into several seed types. Therefore, the type description is made from the main type / subtype. Analyze the current email, there are more mainly two types of text, Multipart. TEXT Type Description This message only contains the body, which mainly has three types of Plain, HTML, XML, and has an important parameter charset to indicate the encoding character set used by the text content. Multipart combines multiple types of data into a message, which is usually referred to as an attachment, it has an important parameter Boundary, which defines a boundary separation string. The string combines two leading horizontal lines to form a boundary, which is used to separate various subtypes that make up the message. After the last subtype, there is an end boundaries, which consists of two horizontal lines after a conventional boundary string to represent the end of the entire mail body. Each type of Multipart is similar to the same format, which is similar to the mail header and the mail body, and the mail header is separated from the mail body, which is called a nested email.

Although there may be individual emails for four-layer nested (mainly when sending very large messages), the PDA prototype platform can only handle text and BMP images within 8K, so we can assume that the email has up to three Layer nested relationships, emails that exceed three-layer nested relationships are not recognized. So we consider five situations: (1) Email content is one of plain text or hypertext; (2) Mail content can include plain text and hypertext; (3) containing attachments in email, but only contains One of plain text or hypertext; (4), (5) contains an attachment in the message content, which may contain plain text or hypertext.

The following uses the forest data structure to intuitively represent the nested structure of these five emails:

Note :

(1) The above type is the type defined by the Content-Type field, not only for these types, but also other types, such as Message / RFC822, etc.

(2) Only two attachments are listed above, and multiple attachments may be present in the actual mail.

3.5, identification of mail headers

Regardless of the email summary or the entire email, you must handle the email header, because the message head not only contains the basic information of the sender of the message, the message recipient, the send date, and the mail topic, contains many mails. Important information related to the body. Therefore, the identification of the mail head is an indispensable part of the module.

For MIME's mail structure, the identification algorithm of the mail head is as follows: Scan the mail source code, when scanned to the CRLF character, indicate the end of the mail header field, and then determine whether the next line starts with the Backspace or Tab button, if yes, continue scanning; Until the end of a mail header field, and remove the field name, it is necessary to save the field value if necessary, so if you need to scan the mail source, until two consecutive CRLFs, then represent the mail head end.

3.6, mail identification

Prototype PDA Due to platform restrictions, all types of data cannot be processed, and only images attachments for plain text and BMP types can be handled. This module defines a pointer array to store pure text and the start position of the BMP image. Since only the case of the maximum number of the mail nested hierarchy is considered, two variables are defined in the algorithm to save the boundary string of nested messages, corresponding to the Boundary parameters of the algorithm and the subboundary parameter. If the root node of the tree structure described in 3.4 is used as the first layer, the Boundary parameter saves the outermost message boundary string, and the Subboundary parameter saves the second layer of mail boundary string. For five cases of 3.4 and the MIME format, we found that the process of finding plain text and BMP images in the mail body is actually found in five trees in Figure 3.4, and from the corresponding leaf junction point Pat out what we need. To describe convenience, use the current node to indicate the submunter body being processed, the identification algorithm is as follows:

(1) The program first identifies the mail header, then analyze the type of the mail body, and match the root node of the five trees in the forest. If there is no tree, the root node is matched to the type of email, then return an error. Information, indicating that the message cannot be identified, thus ending the program; if a match is found, go to the next step.

(2) Set the current node to the root node of the tree.

(3) Determine if the current node is a leaf node, if not, get the Boundary boundary string, and go to step (4) processing; otherwise, determine if the value of the current-Type field of the current node is TEXT / PLAIN If so, save the starting position and transmission encoding format, the program ends, otherwise, the program returns an error that cannot be identified.

(4) It is judge whether it has been identified by the Boundary string as the boundary, if yes, the program ends return; otherwise, remove the next submark body with the boundary parameter value as the boundary, and assign it to the current node, Go to the next step.

(5) Determine if the current node is the leaves node, if not, remove the subbBoundary string, then go to the next step; if yes, go to step (7) processing.

(6) Determine if the mail body with the subboundary string is bounded, if yes, go to step (4) processing; otherwise take out the next submark body with the subboundary string as the boundary, and assign it to the current Node, go to the next step.

(7) Determine whether the value of the current-type field of the current node is text / place or image / bmp, if yes, save the starting position and transmission encoding format of the content, and then determine if the boundary of the current node is equal to Boundary String, if yes, go to step (4) processing, otherwise, go to step (6) processing.

When the lookup program returns, it is determined whether the pointer array in the start position of the TEXT and the BMP image is empty. If it is empty, return an error number, indicating that there is nothing we can identify in the message; otherwise, according to the data defined according to 3.3 The structure first writes the message header to the mail receiving buffer, then decoding the message according to the transmission coding format, and writes it into the receiving buffer. 3.7, mail package

The function of the mail package is when the user writes a good e-mail, you must encode the message, encoding the contents into a certain format, then press the structure of the MIME to encapsulate the source code, then call the SMTP protocol to send to the corresponding email server.

For mail packages, you need to consider two situations: (1) The content of the email is only a plain text; (2) The mail has a plain text and the BMP image attachment.

For the message header, just follow the field structure of the message, add the corresponding field name to the field content, and define the content-type field according to the message content: When the message content is only plain text, then Content-Type is TEXT / PLAIN; When the message content consists of plain text and BMP images, Content-Type uses multiple subtypes, one for text / place, and another as image / bmp. Pure text and attachments are Base64 encoding.

4 Conclusion

This paper introduces the design and implementation of the PDA email transceiver system message identification and package module, which focuses on how to identify and encapsulate emails in a limited storage resource, lack of system support, to meet specific application needs. This module has successfully run on a PDA product. However, this module can only receive an email less than 8K, and in actual conditions, there is a lot of text or an accessory-with an accessory. So we are working on two aspects, one side is the size of the expansion of the mail buffer to 64K, so that you can receive mail within 64K, which mitigate this contradiction in a certain extent; on the other hand, we are also developing A smart email receiving agent corresponding to this module, the PDA user can receive the message by agents, which can collect mail from the real mail server and filter, translate, remove the content that PDA cannot receive, Split the big email into small mail that PDA can charge, providing better conditions for PDA email reception.

转载请注明原文地址:https://www.9cbs.com/read-103312.html

9cbs

New Post(0)