\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8
可以用read(),readlines(),fileno(),close()这些函数
In [337]: file.info()
Out[337]:
In [338]: file.getcode()
Out[338]: 200
In [339]: file.geturl()
Out[339]: 'http://www.baidu.com/'
2.urlretrieve() 将url对应的html页面保存为文件
In [404]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')
In [405]: type (filename)
Out[405]:
In [406]: filename[0]
Out[406]: '/tmp/baidu.html'
In [407]: filename
Out[407]: ('/tmp/baidu.html', )
In [408]: filename[1]
Out[408]:
3.urlcleanup() 清除由urlretrieve()产生的缓存
In [454]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')
In [455]: urllib.urlcleanup()
4.urllib.quote()和urllib.quote_plus() 将url进行编码
In [483]: urllib.quote('http://www.baidu.com')
Out[483]: 'http%3A//www.baidu.com'
In [484]: urllib.quote_plus('http://www.baidu.com')
Out[484]: 'http%3A%2F%2Fwww.baidu.com'
5.urllib.unquote()和urllib.unquote_plus() 将编码后的url解码
In [514]: urllib.unquote('http%3A//www.baidu.com')
Out[514]: 'http://www.baidu.com'
In [515]: urllib.unquote_plus('http%3A%2F%2Fwww.baidu.com')
Out[515]: 'http://www.baidu.com'
6.urllib.urlencode() 将url中的键值对以&划分,可以结合urlopen()实现POST方法和GET方法
In [560]: import urllib
In [561]: params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0})
In [562]: f=urllib.urlopen("http://python.org/query %s" %params)
In [563]: f.readline()
Out[563]: '\n'
In [564]: f.readlines()
Out[564]:
['\n',
'\n',
'\n',
' \n',
'\n',
二 urllib2模块
urllib2比urllib多了些功能,例如提供基本的认证,重定向,cookie等功能
https://docs.python.org/2/library/urllib2.html
https://docs.python.org/2/howto/urllib2.html
In [566]: import urllib2
In [567]: f=urllib2.urlopen('http://www.python.org/')
In [568]: print f.read(100)
--------> print(f.read(100))
打开python的官网并返回头100个字节内容
HTTP基于请求和响应,客户端发送请求,服务器响应请求。urllib2使用一个Request对象代表发送的请求,调用urlopen()打开Request对象可以返回一个response对象。reponse对象是一个类似文件的对象,可以像文件一样进行操作
In [630]: import urllib2
In [631]: req=urllib2.Request('http://www.baidu.com')
In [632]: response=urllib2.urlopen(req)
In [633]: the_page=response.read()
In [634]: the_page
Out[634]: '
默认情况下,urllib2使用Python-urllib/2.6 表明浏览器类型,可以通过增加User-Agent HTTP头
In [107]: import urllib
In [108]: import urllib2
In [109]: url='http://xxx/login.php'
In [110]: user_agent='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
In [111]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxxx'}
In [112]: headers={'User-Agent' : user_agent}
In [114]: data=urllib.urlencode(values)
In [115]: req=urllib2.Request(url,data,headers)
In [116]: response=urllib2.urlopen(req)
In [117]: the_page=response.read()
In [118]: the_page
当给定的url不能连接时,urlopen()将报URLError异常,当给定的url内容不能访问时,urlopen()会报HTTPError异常
#/usr/bin/python
from urllib2 import Request,urlopen,URLError,HTTPError
req=Request('http://10.10.41.42/index.html')
try:
response=urlopen(req)
except HTTPError as e:
print 'The server couldn\'t fulfill the request.'
print 'Error code:',e.code
except URLError as e:
print 'We failed to fetch a server.'
print 'Reason:',e.reason
else:
print "Everything is fine"
这里需要注意的是在写异常处理时,HTTPError必须要写在URLError前面
#/usr/bin/python
from urllib2 import Request,urlopen,URLError,HTTPError
req=Request('http://10.10.41.42')
try:
response=urlopen(req)
except URLError as e:
if hasattr(e,'reason'):
print 'We failed to fetch a server.'
print 'Reason:',e.reason
elif hasattr(e,'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code:',e.code
else:
print "Everything is fine"
hasattr()函数判断一个对象是否有给定的属性