Write a web script with curl and scsh

xiaoxiao2021-03-06  69

Write a web script with curl and scsh

content:

Introduction Some explanation of simple tasks Judging whether the user is online to prevent spoofing to send a notification simple task two read input posts Regular expressions matching faces objects Work as input and output ELK Scheme Make a sand table Login and log in to a small cookficing post conclusive Document Download Reference About the author

In the Linux area:

Tutorial Tools & Product Codes & Component Articles

Zhao Wei (zhaoway@public1.ptt.js.cn) Freedr programmer January 2004 Let us look at how the programmer is on the net. In this article we will introduce how to use the simple smart web tool to add SCHEME-based power-based unix shell to write a variety of weird web scripts to help our bubble network. Some explains the examples described herein are specifically designed for Nanjing University Small Lily Http://lilybbs.net Online Forum Current web interface. Because this is a web forum I have most often went shopping. And it is said that the program code of this web forum is also a common foundation for the web interface on the BBS Forum in China. Ha ha. But our program does not involve the background of the server. Since the Web interface is often changed, and the Web interface of each website is very different, so the purpose of this article is not to provide an immediate useful bubble network program, but through the instructions for this program, let readers understand CURL and SCSH combines a method of writing a simple web script. This article also assumes that the reader has a certain degree of understanding of the Scheme program language, at least understand the program segment written in Scheme language. There is no confident reader who can read the Song Guowei to publish the article on the SchEME program language. Perhaps readers friends are not as enthusiastic about learning a new program language. I want to say to these readers, this example is initially developed with the RC Shell on the PLAN 9 operating system. This is already a Bourne Shell that is more than a standard Bourne Shell than the Unix system. However, due to the increasing complexity of the program, as well as the high demand for the flexibility of the program, the program has to be transplanted from the RC Shell to SCHEME-based SCSH. Above. From another aspect, the Scheme language is actually a program language that is easy to learn, and after you learn, you will definitely feel different. The body part mainly tells two examples. The first example is relatively simple. It is a small program that monitors your friends. The second example is more complex, and it is more interesting. It is a post on the Web Forum to be used as input and output, and add a simple security restricted Scheme language interpreter. Due to time and energy limitations, there is not so much time to bubble net wow, so this version introduced here is only a very limited semi-finished product. However, most of the content required to be required in the Scheme language R5RS standard. The simple task is like a small lily of Nanjing University, like many other college BBS, allowing users to set their own friends. When the user's friend is on the line, the system will notify the user through the automatic updated web page, and the user can communicate online with his friends. But we can't turn on your computer, you always stare at the browser to see your friends online, no wow. Our first simple task is to automatically monitor your friends with a script program. When your friend is on the line, you will immediately send a notice to the user. This way users can open the web browser to log in to BBS and chat. It is determined whether the user has different IDs on the web forum to identify different IDs. One ID is a short string. When this ID logs in the web forum, the small Lily This web forum will display a sentence on this ID related information page, indicating that this user is currently in the station. If this ID is logged out of this login, the corresponding sentence on this page becomes this user "Not in the station". Our first task is to grab this web page that belongs to the friend ID according to our friends. After grabbing this page, we can analyze it in detail and take a further action. This step is mainly due to CURL.

This is a tool program working on UNIX shell. It can receive a number of different command line parameters, depending on the contents of these command line parameters, different tasks can be completed. The most common command line parameters are a URL string that identifies the full network address of the web page we want to capture. Thus, CURL will capture this page and send it to your standard output port. We can use the web browser to manually find the web page address used to display ID online information. This will use the following command to capture this page. Bash-2.05b $ curl "http://lilybbs.net/bbsqry?userid=iloveqhq"

The next task is to drive this command in our script and to get this command output in the scripter to prepare further analysis and processing. In a Bourne Shell environment on a normal UNIX operating system, such as the Bash script on the GNU / Linux operating system, driveing ​​a shell tool is a very direct, simple matter. This is the strength of the shell. SCHEME-based language-based SCSH is also a kind of unix shell, of course, can also make it easy to drive the Shell tool program. This is actually a major feature of the SCSH distinguishing between many other Scheme program languages. This task can be completed in the following script program written in SCSH, using the following RUN / STRING syntax

(Run / String (curl "http://lilybbs.net/bbsqry?userid=iloveqhq")))

This curl command prints some statistics at the standard error output port when it is running. In the script, there is no such statistics. In SCSH, we can use the following grammatical form to close the standard error output port of the CURL.

(Run / String (curl "http://lilybbs.net/bbsqry?userid=iloveqhq")

(- 2))

Just like in a standard UNIX operating system, digital 2 represents a standard error output port. The minus above is indicated to turn this port. Long URL string appearing in the above command, we can see that from the perspective of the script, can be divided into three parts. The first part "http://lilybbs.net" is a small lily site URL string, which is constant throughout the program. The second part "/ bbsqry? Userid =" is in this function of this part of the scriptor, and each call is fixed. The third part "IloveQHQ" is the different content of the user ID according to the user ID of each function call, each time. We certainly want to use different variables to indicate these three partial strings, respectively. This requires that we can reach with the following grammatical form.

(Define Lilybbs "http://lilybbs.net")

(Run / String (CURL, String-Append Lilybbs)

"/ bbsqry? userid ="

UserID))

(- 2))

Note that the above syntax form appears in front of String-append brackets. The reason is required because Run / String is not a function of a normal Scheme language, but a special grammatical form. In Run / String, if you want to call the functions and variables in the Scheme language, you need to add a comma before the corresponding expression. The above grammatical form captures the output content of the curl command to a string so that the content of this string can be further analyzed and handled in other parts of the Scheme program. However, we are not interested in all content in this string. We only care about this string that the user represented by this ID is "currently in the station" or "not currently on the station". We can use the Schame language to analyze the contents of this aspect. We can also make this task to the standard shell command on the UNIX system like writing the shell script program. If you want to do this in SCSH, you can use the following syntax form. (Run / String (CURL, URL)

(GREP -M 1 -N "currently on the station"))))

The above vertical strokes are just like a standard UNIX shell environment, indicating a pipe connection between two shell commands. The parameters of the grep command above -M 1 indicates that the proactive operation of the command is stopped as long as an appearance of the string appears. Parameters - N indicates that we hope that GREP will add a line number in front of the result of the output. This line number indicates that if the string specified later appears in the pipe, it appears above. Why do we need a line information, which can be seen in a section below. Prevent spoofing from using the web page used to display user ID online information to allow users to enter a signature file yourself. Some users like to use these signature files to open a variety of jokes. We hope to check if a particular string "currently on the station" appears in this page to determine if a user ID is online. In this case, if the user entered this string in the signature file, our previous program will always think this user ID is online or not online. To avoid such deception, we judge whether a user ID online is to write into the following.

(Define (user-online? userid)

(LET * ((String-Append Lilybbs "/ bbsqry? userid =" userid))

(HTML (Run / String (CURL, URL) (- 2)))))

(ONLINE ("(echo, html)

(grep -m 1 -n "currently on the station"))))))))))))))

(And ("String-length online))

(LET (OFFLINE)

(grep -m 1 -n "is not in the station")))))))))))))

(OR (= 0 (String-length offline)

(LET (("" "

Offline (grep-line-number offline)))))

(

Send a notice When the script discovers that our friend ID is online, the script should be able to give us a notice. In the GNOME desktop environment, we can use the Zenity this shell command to display a dialog box for a GTK graphical user interface on the desktop to remind us: There is already a friend ID to log in to small lily. If we log in to small lily at this time, we can contact your friends. This matter can be done in the form of this grammar.

(Run (Zenity - Info - Title, Lilybbs --Text, Info-text) is used in the RUN instead of Run / String. This is because we don't care about the return result of Zenity this shell command. The use of a comma appearing in the above grammatical form, we have already said in the previous. If we can't be satisfied with a normal GTK dialog, if we hope that when we can go online, I will play a wonderful music in our computer speaker. We can do this with this expression.

(Run (MPLAYER, STRING-APPEND ", UserId" .mp3 ")))))

This will play different MP3 music clips according to different friends users. Of course, the premise that can do this is that your GNU / Linux system is installed with MPlayer this media playing software. About completing the complete program code of this simple task, you can get it in the download files listed at the end of this article. It will not be described here. Below our simple task: the Scheme interpreter for the web forum. The simple task is the COMPLAN version of Nanjing University Small Lily http://lilybbs.net is a layout of the theory and practice of discussing program languages. For the study and practice of various program languages, it is of course very important for discussions on this layout. In the posts published on the discussion version, the program code snippet and the result of the execution are attached, which is a very useful feature for this layout. Our second simple task is to open a small head in this direction, develop an interpretator of the SCHEME program language on the layout of the article as the input and output. This Scheme language interpreter reads a specific title post on the small lily of the company, extracts the Scheme program code snippet in the post, and handed it to a real Scheme interpreter running in the local background. Then use the result that the run as a new post, published on the COMPLANG layout on the small lily. Read the task of the first step to complete, is to read the title of the post on the COMPLANG layout. First open a web browser and visit this web page that displays the title of the Complang layout. How is the details of this page's HTML code in this page. Soon, we noticed that the relevant HTML code snippet of each title can be extracted with the following SCSH grammatical form.

(run / strings ("(curl, (string-append lilybbs" / bbsdoc? board = companies))))

(GREP "bbscon? Board =")))))))

Note that the Run / Strings above is a plural, not the number of RUN / STRING. The different forms of the two grammar is that the former returns each line of the output data of the shell command as a string data in a separate Scheme language, while the latter puts all the output data, no branch Just as a string data in the entire Scheme language returns the rest of the program. We are here because the HTML code of each line of the different posts represents the HTML code of each line is used, so it is used in the form of a plural. Regular expressions, we have got HTML code for each title title. The next task is to resolve this line HTML code with a regular expression, extract the relevant content inside. In SCSH, the syntax form starting with RX represents a regular expression. Let's take a look at the example of the regular expressions we have to use.

(RX (/ "09azaz")))

The above expression indicates that there is a Arabic number or lowercase or uppercase, an English letter. The slash sign of the beginning represents a "selection" meaning. It should be pointed out that these special meanings have only effects only in the syntax form covered by the RX. At other parts of the SCSH script, these special characters are not hereby special effects. (RX (** 2 12 (/ "09azaz"))))))))

The above regular expression indicates that 0 to 9 Arabic numbers and uncarithm-sized English letters appear 2 to 12 times. A string consisting of no less than two and no more than twelve Arabic numbers and English letters is just a requirement for small lilies to the user ID.

(RX (| # / _ (/ "azaz09")))))

In the Scheme language # / _ indicates that the loopline symbol is underlined. The above regular expression is that there is just a number, English letters, or next line symbols. In this regular expression, the vertical bar symbol is a "or" meaning. Here we see again, this vertical is only in the RX's grammatical form, it means "or" this. In the form of Run / String, the vertical screw represents the meaning of the shell pipe. These two means that Wanquan is not coherent.

(RX (** 2 18 (/ "azaz09")))))))))))))

The above regular expression can approximate the English name of the layout. A string consisting of two to 18 underline, arabic numbers, or English letters appear.

(RX (~ # / <)))

The above wavy is denied. This regular expression means that there is just one of the characters that are not smaller than the number.

(RX ( (~ # / <))))))))

The plus sign in the RX grammatical form indicates that the regular expression of the following is one or more times. A single asterisk indicates that the subsequent regular expression will match zero or multiple times. Two aspects are connected together, followed by two positive integers, so we have seen in front, which means that the subsequent regular expression will match no less than the first integer, at the same time Not more than the second integer. The above regular expression means one or more strings that are not smaller than the number. This regular expression is useful when analyzing HTML code, it is also very convenient.

(RX (: "bbscon? Board ="

, Board

"& File ="

( (~ # / &))

"& num ="

Num))

The above regular expression is slightly longer. It is divided into six parts. The most beginning of the colon, indicating that this regular expression is combined in order by these six parts, and every part of which must be just match once. The first part of the string "BBSCON? Board =" matches itself. A comma at the beginning of the second part indicates that SCSH will use this part as a variable or a small SCHEME function to explain the operation, run the result, must be a syntax of a RX. There is no new content in other parts. This example allows us to see the regular expression of this Scheme syntax style in SCSH, which can have a clearer logical structure than traditional string-based POSIX regular expressions. This can be seen more clearly from the example below.

(Define Regexp-userid (RX (** 2 12 (/ "09azaz")))))))))

(Define Regexp-Board (RX (** 2 18 (| # / _ (/ "azaz09")))))))))))

(Define Regexp-Time (RX ( (~ # / <))))))))))))))))))))))

(Define Regexp-size))))))))))))))))))))

(Define Regexp-Num (RX ( (/ "09")))))))))

(Define Regexp-URL (RX (: "BBSCON? Board =", Board

"& File ="

( (~ # / &))

"& num ="

Num))))))))))))))))

(Define Regexp-sub (RX ( (~ # / <)))))))))))))))))))))))))

(Define Re (RX (: " ", NUM

"" ( Whitespace)

" "

UserId " "

, Time " "

Sub " (", size ")"

"

">", NUM

"")))))))

Above this last regular expression If you write a string, traditional POSIX, I am afraid who will not be able to stand. Matching a regular expression, we can use it to match the specified string. This is mainly done by regexp-search. (Regexp-Search regular expression string)

If you do not match, you will return the #f of "false". If a match occurs, a match type data is returned. This type of data includes specific contents of a specific matching sub-string. These content can be extracted with some functions such as Match: Substring. (Match: subString match-data index)

Zero index represents a substring that the entire regular expression is matched. Other indexes represent part of the Submatch that appears in the regular expression. We still explain with the last RE regular expression of the above. This time we give it to the information of Submatch.

(Define RE (RX (: "" Submatch, Num)

"" ( Whitespace)

" "

(Submatch, Userid) " "

(Submatch, Time) " "

Submatch, SUB

"", size ")"

" "> ", num" "))))))))

In a series of functions such as Match: Substring, index zero represents the contents of the entire regular expression, and the index is indicated from the left to right, and the regular expression covered by the Submatch appears in turn. Matching.

(Match: subString match-data index)

The above function runs, returning is the sub-string matching the Submatch specified by the index index. About match-data we have already said that the data found by Regexp-Search. Below we see is from HTML code, via regular expressions, find the title of the post, post, posting time, and the full SCSH function of the post detailed URL.

(Define (HTML-> Posts HTM)

(let * (UserID (r (** 2 12 (/ "09azaz")))))))))

(Board (RX (** 2 18 (| # / _ (/ "azaz09")))))))))

(Time ( (~ # / <)))))))))))))))))))))))))))))))))))

(SIZE (RX ( # / <)))))))))))))))

(NUM (RX ( (/ "09")))))))))))

(URL (RX (: "BBSCON? Board =", Board "& file =" ( (~ # / &)) "& num =", num)))

(SUB (RX ( (~ # / <)))))))))))))))))))))

(RE (RX (: "" "" "" "" "( white" ( Whitespace)

"

(Submatch, Userid) " "

(Submatch, Time) " "

(Submatch, SUB) " ( ", size" ) "

" ">", NUM ""))))))))

(MAP (Lambda (STR)

(let * ((Regexp-search re str))

(SUB (Lambda (IDX)

(Match: Substring Mat IDX))))))))))))

(IF (not Mat) #f (Lambda (SYM) (Case Sym)

((NUM) (SUB 1))

(UserID) (SUB 2))

((TIME) (SUB 3))

((URL) (SUB 4))

(SUBJECT))))))))))))))))))))))

(Run / Strings (GREP "BBSCON? BOARD =")))))) This function or the above-oriented data is found, and returning is a lambda function below.

(Lambda (SYM) (Case Sym

((NUM) (SUB 1))

(UserID) (SUB 2))

((TIME) (SUB 3))

((URL) (SUB 4))

(SUBJECT) (SUB 5)))))))))

This lambda function accepts a call parameter. The effect of this calling parameters is equivalent to sending a short message to this Lambda function. According to this short message, this lambda function returns different results. This is a bit like an effect of an object-oriented programming. The above trick is also a simple way to simulate object-oriented programming in the functional programming language. Of course, to be truly doing the object-oriented programming in the functional programming, or more work. Use the post as the input and output in this part, we just make a simple design. Considering the reduction of the running burden of the entire system, including small lilies of server-side and our local running procedures, we only search for the latest title on the forum "○ IloveQHQ:" as the post. Our reply posts also specify "○ iloveqhq re:" as heading. The related program code snippet is listed below. This design is of course not very good. But a better design is only possible to achieve a significant number of users to join in test and provide sufficient feedback. So this is currently like this. ^ _ ^

(Define (Get-Ask-Post)

(Let * ((String-append Lilybbs "/ bbsdoc? Board = companies))

(HTM (Run / String (CURL, URL))))

(asksub (rx "○ iloveqhq:"))

(ANSSUB (RX "○ IloveQHQ Re:"))))

(Let LP ((LIS (HTML-> Posts HTM))

(askNUM 0)

(askPOST #F)

(ANSNUM 0)

(ANSPOST #F))

(if (NULL? LIS)

(if (>> askNUM ANSNUM)

askPOST

#f)

(Let * ((POST (CAR LIS)

(SUB (POST 'SUBJECT))

(NUM (String-> Number)))))))

(if (and (> Num asknum)

(Regexp-Search? Asksub Sub))

(LP (CDR LIS) NUM POSNUM ANSPOST)

(if (and (> Num Ansnum)

(Regexp-search? ANSSUB SUB))

(LP (CDR LIS) asknum askPOST NUM POST

(LP (CDR LIS) Asknum askPOST ANSNUM ANSPOST))))))))))))))))))

The content in the post has ordinary text, and there is also a Scheme program code. We are just a simple design of a mind here, and only one Scheme program code can appear in the post. The first line of this code must be "IloveQHQ: ELK". There are not many content. The end of the end must be "IloveQHQ: KLE" content must also be precise. This design is of course not very good. There should be better design in the later version. The following is the main function of the Scheme program code in the post.

(define (String-> ELK-STRING STR)

(Let * ((r / newline "iLoveQHQ: ELK"))))) (Kle (rx (: # / newline "iLoveQHQ: Kle" # / newline) # / newline ))

(RE (RX (:, ELK (Submatch ( ANY)), KLE))))))

(Let LP (Str Str)

(RES ""))

(LET ((Regexp-Search Re Str))))

(if (not Mat)

RES

(LP (SUBSTRING STR (Match: End Mat 1) (String-Length Str)

String-append res (Match: subString Mat 1))))))))))))))))))

After making a sand table with ELK Scheme After getting the scheme program code from the post, we can feed this code to a Scheme program interpreter, let it run this code, and pass the return information to us. Then we can use this return message to make a reply post, post to the layout of the small lily. This needs to consider a security issue. Because in theory, any one of the small lily can embed any SCHEME code snippet in the post. We use CURL to put this online, we don't understand the code snippet of the details. We solve this safety problem with a simple way to be a sand table environment for a Scheme language. We use ELK Scheme to set this environment.

(Define (ELK-Disable)

(LET (Nuke (Lambda (SYM)

String-append "(Symbol-> String Sym)" #f) "))))))))

(SYM '(Require)

Call-with-infut-file call-with-output-file

WITH-INPUT-FILE with-OUTPUT-TO-FILE

Open-input-file open-output-file open-input-output-file

TILDE-EXPAND FILE-EXISTS?

LOAD LOAD-PATH LOAD-NOISILY? LOAD-LIBRARIES

AutoLoad AutoAd-Notify? Dump)))))))))

Concat-string-list (map nuke SYM) "")))

(Define (Elk-Run-String STR)

(Run / String ((echo, (ECHO, STR) STR)

(ELK -L -)))))))))

What to do here is actually shielding the most of the ELK Scheme involves the input and output. In this way, the unsafe code downloaded online will not cause any excessive destruction to the local system. In addition to input and output, we also have to log out of the part of the module in the ELK Scheme. This reason is also obvious. This safety barrier is very simple. It can only prevent some of the worst damage. In some more meticulous aspects, there is no careful consideration. Because we are here just illustrated an example, so there is no need to be difficult in this, but it is a branch of the branches. After logging in and logging out from the sand table environment of the Scheme program language, we can consider posting the post on the small lily of the COMPLANG forum, post the effect of the program output. However, posts are different from the tasks we have in our previous, and we need to log in to small lily. All of the previous tasks can be done with anonymous user, but there is no post, and most layouts of small lily are not allowed to post anonymous. Before posting, we must first log in to the small lily system. Small lily logins and logouts are processed with cookies. We need to handle these and cookie related questions with curl. The first is to send the login user ID and password we need to use to a small lily of the web server through a web form. This step can be done with the under command below. (Run / String (string-append "id =" id)

-d, (string-append "pw =" pw)

String-append lilybbs "/ bbslogin? type = 2"))))))

(- 2))

The -d option of the curl command, followed by a string such as Key = Value, can be used to send Web table information to the web address. After the form is sent to the web server, the server returns a page, which includes cookie related information. The cookie setting of the small cookie is quite strange, not transmitted through the information head of the HTTP protocol, but passes through JavaScript. In this way, we cannot use the CURL standard to process the cookie. We need to use SCSH first to add JavaScript to JavaScript with JavaScript, first with SCSH. This analysis process is used to extract the cookie information with the method of the regular expression mentioned earlier. Related specific code implementations are listed below.

(Define (get-login-cookie ID PW)

(Let * ((String-append Lilybbs "/ bbslogin? type = 2")))

(html (run / string (string-append "id =" id)

-d, (string-append "pw =" pw)

, URL) (- 2)))))

(cookie-line (Run / strings ((echo, html)

(GREP "