I found a better artificial intelligence website yesterday, some of which have some very good ProLog articles, I am very interested. There are many sample programs in the text, but the right-click on the page is banned, and you can't choose, you can't save, you can't see the source code! !
It's really unhappy, the information is to share!
I have to carry forward the HACK spirit and break the limit. Fortunately, there is more Python convenient.
Increased restrictions in the web page, nothing more than sets the script in HTML, since the browser can display it, it will be able to get its text.
The first step is performed in the Python Shell:
>>> Import urllib >>> urllib.urlretrieve ("http://www.chinaai.org/Article_show.asp?tericleid=315", "c: /tmp.html")
URLRETRIEVE can save a web page to a local file.
In the second step, analyze this TMP.html file, found that the
tag is more disgusting:body leftmargin = 0 topmargin = 0 οnmοusemοve = 'HideMenu ()' οncοntextmenu = "return false" οndragstart = "return false" onselectstart = "return false" οnselect = "document.selection.empty ()" οncοpy = "document.selection. Empty () "onbeforecopy =" Return False "οnmοuseup =" Document.selection.empty () "
Replace this label is more clean: body leftmargin = 0 TopMargin = 0 οnmοusemοve = 'hidemenu ()'
(Note, <> omitted here)
Browse this file, the OK is restricted.
In the third step, automatically download the web page, "purification" processing, write a Python program:
Import Urllib
URLS = {'http://www.chinaai.org/Article_show.asp?articleid=315'p?'proLog2.html'}
NEW_TAG = ""
For url in urls: filename = urls [URL] URLLIB.URLRETRIEVE (URL, FileName) f = open (filename, 'r') content = f.read () f.close () l_pos = content.find (' R_pos = content.find ('>', l_pos) Cont1 = content [: l_pos] cont2 = content [r_pos 1:] content = cont1 new_tag cont2 f = open ('tmp.html', 'w') f .write (content) f.close () URLS is a dictionary, which is the URL and the corresponding local file name, and the user can add it according to your own situation. Note that this program is specifically for this website. For other websites, the method that may be used is different, but according to the above steps, I believe everyone can get it. Our slogan is, "I also fuse!"