设为首页 加入收藏

TOP

Python学习之urlib模块和urllib2模块学习
2014-11-23 22:06:54 来源: 作者: 【 】 浏览:9
Tags:Python 习之 urlib 模块 urllib2 学习

一 urlib模块


利用urllib模块可以打开任意个url。
1.
urlopen() 打开一个url返回一个文件对象,可以进行类似文件对象的操作。


In [308]: import urllib



In [309]: file=urllib.urlopen('


In [310]: file.readline()


Out[310]: '\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8</p><br/><p><br />可以用read(),readlines(),fileno(),close()这些函数</p><br/><p>In [337]: file.info()</p><br/><p>Out[337]: <httplib.HTTPMessage instance at 0x2394a70></p><br/><p> </p><br/><p>In [338]: file.getcode()</p><br/><p>Out[338]: 200</p><br/><p> </p><br/><p>In [339]: file.geturl()</p><br/><p>Out[339]: 'http://www.baidu.com/'</p><br/><p>2.urlretrieve() 将url对应的html页面保存为文件</p><br/><p>In [404]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')</p><br/><p>In [405]: type (filename)</p><br/><p>Out[405]: <type 'tuple'></p><br/><p> </p><br/><p>In [406]: filename[0]</p><br/><p>Out[406]: '/tmp/baidu.html'</p><br/><p> </p><br/><p>In [407]: filename</p><br/><p>Out[407]: ('/tmp/baidu.html', <httplib.HTTPMessage instance at 0x23ba878>)</p><br/><p> </p><br/><p>In [408]: filename[1]</p><br/><p>Out[408]: <httplib.HTTPMessage instance at 0x23ba878></p><br/><p><br />3.urlcleanup() 清除由urlretrieve()产生的缓存</p><br/><p>In [454]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')</p><br/><p>In [455]: urllib.urlcleanup()</p><br/><p>4.urllib.quote()和urllib.quote_plus() 将url进行编码</p><br/><p>In [483]: urllib.quote('http://www.baidu.com')</p><br/><p>Out[483]: 'http%3A//www.baidu.com'</p><br/><p> </p><br/><p>In [484]: urllib.quote_plus('http://www.baidu.com')</p><br/><p>Out[484]: 'http%3A%2F%2Fwww.baidu.com'</p><br/><p><br />5.urllib.unquote()和urllib.unquote_plus() 将编码后的url解码</p><br/><p>In [514]: urllib.unquote('http%3A//www.baidu.com')</p><br/><p>Out[514]: 'http://www.baidu.com'</p><br/><p> </p><br/><p>In [515]: urllib.unquote_plus('http%3A%2F%2Fwww.baidu.com')</p><br/><p>Out[515]: 'http://www.baidu.com'</p><br/><p><br />6.urllib.urlencode() 将url中的键值对以&划分,可以结合urlopen()实现POST方法和GET方法</p><br/><p>In [560]: import urllib</p><br/><p>In [561]: params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0})</p><br/><p>In [562]: f=urllib.urlopen("http://python.org/query %s" %params)</p><br/><p>In [563]: f.readline()</p><br/><p>Out[563]: '<!doctype html>\n'</p><br/><p> </p><br/><p>In [564]: f.readlines()</p><br/><p>Out[564]:</p><br/><p>['<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n',</p><br/><p> '<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n',</p><br/><p> '<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->\n',</p><br/><p> '<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr"> <!--<![endif]-->\n',</p><br/><p> '\n',</p><br/><p>二 urllib2模块</p><br/><p>urllib2比urllib多了些功能,例如提供基本的认证,重定向,cookie等功能</p><br/><p>https://docs.python.org/2/library/urllib2.html</p><br/><p>https://docs.python.org/2/howto/urllib2.html</p><br/><p>In [566]: import urllib2</p><br/><p> </p><br/><p>In [567]: f=urllib2.urlopen('http://www.python.org/')</p><br/><p> </p><br/><p>In [568]: print f.read(100)</p><br/><p>--------> print(f.read(100))</p><br/><p><!doctype html></p><br/><p><!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]--></p><br/><p><br />打开python的官网并返回头100个字节内容</p><br/><p>HTTP基于请求和响应,客户端发送请求,服务器响应请求。urllib2使用一个Request对象代表发送的请求,调用urlopen()打开Request对象可以返回一个response对象。reponse对象是一个类似文件的对象,可以像文件一样进行操作</p><br/><p>In [630]: import urllib2</p><br/><p> </p><br/><p>In [631]: req=urllib2.Request('http://www.baidu.com')</p><br/><p> </p><br/><p>In [632]: response=urllib2.urlopen(req)</p><br/><p> </p><br/><p>In [633]: the_page=response.read()</p><br/><p> </p><br/><p>In [634]: the_page</p><br/><p>Out[634]: '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.</p><br/><p>通常情况下需要向一个url以POST的方式发送数据。</p><br/><p>In [763]: import urllib</p><br/><p> </p><br/><p>In [764]: import urllib2</p><br/><p> </p><br/><p>In [765]: url='http://xxxxxx/login.php'</p><br/><p> </p><br/><p>In [766]: values={'ver' : '1.7.1', 'email' : 'xxxxx', 'password' : 'xxxx', 'mac' : '111111111111'}</p><br/><p> </p><br/><p>In [767]: data=urllib.urlencode(values)</p><br/><p> </p><br/><p>In [768]: req=urllib2.Request(url,data)</p><br/><p> </p><br/><p>In [769]: response=urllib2.urlopen(req)</p><br/><p> </p><br/><p>In [770]: the_page=response.read()</p><br/><p> </p><br/><p>In [771]: the_page</p><br/><p>如果不使用urllib2.Request()发送data参数,urllib2使用GET请求,GET请求和POST请求差别在于POST请求常有副作用,POST请求会通过某些方式改变系统的状态。也可以通过GET请求发送数据。</p><br/><p>In [55]: import urllib2</p><br/><p> </p><br/><p>In [56]: import urllib</p><br/><p> </p><br/><p>In [57]: url='http://xxx/login.php'</p><br/><p> </p><br/><p>In [58]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxx'}</p><br/><p> </p><br/><p>In [59]: data=urllib.urlencode(values)</p><br/><p> </p><br/><p>In [60]: full_url=url + ' ' + data</p><br/><p> </p><br/><p>In [61]: the_page=urllib2.urlopen(full_url)</p><br/><p> </p><br/><p>In [63]: the_page.read()</p><br/><p>Out[63]: '{"result":0,"data":0}'</p><br/><p> </p><br/><p>默认情况下,urllib2使用Python-urllib/2.6 表明浏览器类型,可以通过增加User-Agent HTTP头</p><br/><p>In [107]: import urllib</p><br/><p> </p><br/><p>In [108]: import urllib2</p><br/><p> </p><br/><p>In [109]: url='http://xxx/login.php'</p><br/><p> </p><br/><p>In [110]: user_agent='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'</p><br/><p> </p><br/><p>In [111]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxxx'}</p><br/><p> </p><br/><p>In [112]: headers={'User-Agent' : user_agent}</p><br/><p> </p><br/><p>In [114]: data=urllib.urlencode(values)</p><br/><p> </p><br/><p>In [115]: req=urllib2.Request(url,data,headers)</p><br/><p> </p><br/><p>In [116]: response=urllib2.urlopen(req)</p><br/><p> </p><br/><p>In [117]: the_page=response.read()</p><br/><p> </p><br/><p>In [118]: the_page</p><br/><p>当给定的url不能连接时,urlopen()将报URLError异常,当给定的url内容不能访问时,urlopen()会报HTTPError异常</p><br/><p>#/usr/bin/python</p><br/><p> </p><br/><p>from urllib2 import Request,urlopen,URLError,HTTPError</p><br/><p>req=Request('http://10.10.41.42/index.html')</p><br/><p>try:</p><br/><p> response=urlopen(req)</p><br/><p>except HTTPError as e:</p><br/><p> print 'The server couldn\'t fulfill the request.'</p><br/><p> print 'Error code:',e.code</p><br/><p> </p><br/><p>except URLError as e:</p><br/><p> print 'We failed to fetch a server.'</p><br/><p> print 'Reason:',e.reason</p><br/><p>else:</p><br/><p> print "Everything is fine"</p><br/><p>这里需要注意的是在写异常处理时,HTTPError必须要写在URLError前面</p><br/><p>#/usr/bin/python</p><br/><p> </p><br/><p>from urllib2 import Request,urlopen,URLError,HTTPError</p><br/><p>req=Request('http://10.10.41.42')</p><br/><p>try:</p><br/><p> response=urlopen(req)</p><br/><p> </p><br/><p>except URLError as e:</p><br/><p> if hasattr(e,'reason'):</p><br/><p> print 'We failed to fetch a server.'</p><br/><p> print 'Reason:',e.reason</p><br/><p> elif hasattr(e,'code'):</p><br/><p> print 'The server couldn\'t fulfill the request.'</p><br/><p> print 'Error code:',e.code</p><br/><p>else:</p><br/><p> print "Everything is fine"</p><br/><p>hasattr()函数判断一个对象是否有给定的属性</p><br/></span></font></td> </tr> </table> <!--//投票--> <table width="98%" border="0" cellspacing="0" cellpadding="0" style='TABLE-LAYOUT: fixed;WORD-WRAP: break-word;' align="center"> <tr> </tr> <tr> <td colspan="2" align="center" class="page" height="25"></td> </tr> <tr align="right"> <td colspan="2" height="25" > <SCRIPT LANGUAGE="JavaScript" src="https://www.cppentry.com/images/default/bencandy.js"></SCRIPT> 【<a href="javascript:doZoom(18)">大</a> <a href="javascript:doZoom(14)">中</a> <a href="javascript:doZoom(12)">小</a>】【<a href="javascript:doPrint()">打印</a>】 <input type=hidden value=1 name="h1" id="h1"> 【<a href="javascript:ft(1)" id="Maiweb1">繁体</a>】【<a href="https://www.cppentry.com/member/post.php?job=postnew&fid=54" target=_blank>投稿</a>】【<a href="https://www.cppentry.com/do/job.php?job=collect&fid=54&id=21952">收藏</a>】 【<a href="https://www.cppentry.com/do/job.php?job=recommend&fid=54&id=21952" target=_blank>推荐</a>】【<a href="https://www.cppentry.com/do/job.php?job=report&fid=54&id=21952" target=_blank>举报</a>】【<a href="https://www.cppentry.com/do/comment.php?fid=54&id=21952" target=_blank>评论</a>】 【<a href="javascript:window.close()">关闭</a>】 【<a href="javascript:window.close()"></a><a href="#">返回顶部</a>】</td> </tr> <tr> <td colspan="2" style="text-align:right"> <div> <div style="float:right"> <div class="bdsharebuttonbox"><a href="#" class="bds_more" data-cmd="more"></a><a href="#" class="bds_qzone" data-cmd="qzone" title="分享到QQ空间"></a><a href="#" class="bds_tsina" data-cmd="tsina" title="分享到新浪微博"></a><a href="#" class="bds_tqq" data-cmd="tqq" title="分享到腾讯微博"></a><a href="#" class="bds_renren" data-cmd="renren" title="分享到人人网"></a><a href="#" class="bds_weixin" data-cmd="weixin" title="分享到微信"></a></div> <script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"32"},"share":{},"image":{"viewList":["qzone","tsina","tqq","renren","weixin"],"viewText":"分享到:","viewSize":"16"},"selectShare":{"bdContainerClass":null,"bdSelectMiniList":["qzone","tsina","tqq","renren","weixin"]}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script> </div> <div style="float:right; font-size:14px">分享到: </div> </div> </td> </tr> <tr class="nextpage"> <td width="50%" align="left"><a href="bencandy.php?fid=54&id=21953" onclick="">上一篇</a>:<a href="bencandy.php?fid=54&id=21953" onclick="" title="Python学习之socket模块">Python学习之socket模块</a></td> <td width="50%" align="right" height="25"><a href="bencandy.php?fid=54&id=21951" onclick="">下一篇</a>:<a href="bencandy.php?fid=54&id=21951" onclick="" title="shell脚本中的“2< " ">&2" "&>”">shell脚本中的“2< " ">&..</a></td> </tr> </table> </td> </tr> <tr> <td class="foot"> <h3 class="L"></h3> <h3 class="R"></h3> </td> </tr> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" class="dragTable" id="view_article_bbs"> <tr> <td class="head"> <h3 class="L"></h3> <span class="TAG"> </span> <h3 class="R"></h3> </td> </tr> <tr> <td class="middle"> <script type="text/javascript" id="wumiiRelatedItems"></script><div style="display:none"><!--访问统计,这段代码,会加快内容的打开速度,但是会影响右边列表的打开速度,<script src='http://pw.cnzz.com/c.php?id=80674837' language='JavaScript' charset='gb2312'></script>--></div></td> </tr> <tr> <td class="foot"> <h3 class="L"></h3> <h3 class="R"></h3> </td> </tr> </table> <!-- --> <table width="100%" border="0" cellspacing="0" cellpadding="0" class="dragTable" id="commentTable"> <tr> <td class="head"> <h3 class="L"></h3> <span class="TAG">评论</span> <h3 class="R"></h3> </td> </tr> <tr> <td class="middle"> <SCRIPT LANGUAGE="JavaScript"> <!-- document.write('<span id="comment_show"><img alt="内容加载中,请稍候..." src="https://www.cppentry.com/images/default/ico_loading3.gif"></span>'); document.write('<div style="display:none;"><iframe src="https://www.cppentry.com/do/comment_ajax.php?fid=54&aid=21952&iframeID=comment_show" width=0 height=0 name="comment_show"></iframe></div>'); //--> </SCRIPT> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <form name="form_comment" id="form_comment" method="post" target="comment_show" action="https://www.cppentry.com/do/comment_ajax.php?fid=54&aid=21952&iframeID=comment_show"> <tr style="display:" id="comment_username_tr"> <td width="16%"><span class="L">帐  号:</span></td> <td width="84%"><span class="R"> <input type="text" name="username" id="comment_username" size="12"> 密码: <input type="password" name="password" id="comment_password" size="12"> (<a href="https://www.cppentry.com/do/reg.php" target="_blank"><u>新用户注册</u></a>)</span></td> </tr> <tr style="display:" id="comment_yzimg_tr"> <td width="16%"><span class="L">验 证 码:</span></td> <td width="84%"> <input id="yzImgNum" type="text" name="yzimg" size="8"> <SCRIPT LANGUAGE="JavaScript"> <!-- document.write('<img border="0" id="yz_Img" name="imageField" onclick="this.src=this.src+Math.random();" src="https://www.cppentry.com/do/yzimg.php?'+Math.random()+'">'); //--> </SCRIPT> </td> </tr> <tr> <td width="16%"><span class="L">表  情:</span></td> <td width="84%"> <style type="text/css"> <!-- .selected {filter:Alpha(opacity=100);border:1px solid #FF9900} .unselected {filter:Alpha(opacity=50);border:1px solid #EDF8DD} --> </style> <SCRIPT LANGUAGE="JavaScript"> <!-- var prevIcon; function icon(num){ num.className="selected"; if(typeof(prevIcon)!="undefined"){ prevIcon.className="unselected"; }else{ document.all.firstface.className="unselected"; } if(num.className=="unselected"){ num.className="selected"; } prevIcon=num; document.getElementById("commentface").value=num.childNodes(0).id ; } //--> </SCRIPT> <table border=0 cellspacing=0 cellpadding=0> <tr> <td class="selected" onClick="icon(this)" id="firstface" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/1.gif" width="20" height="20" id="1"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/2.gif" width="20" height="20" id="2"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/3.gif" width="20" height="20" id="3"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/4.gif" width="20" height="20" id="4"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/5.gif" width="20" height="20" id="5"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/6.gif" width="20" height="20" id="6"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/7.gif" width="20" height="20" id="7"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/8.gif" width="20" height="20" id="8"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/9.gif" width="20" height="20" id="9"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/10.gif" width="20" height="20" id="10"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/11.gif" width="20" height="20" id="11"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/12.gif" width="20" height="20" id="12"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/13.gif" width="20" height="20" id="13"></td> <td class="unselected" onClick="icon(this)" style="cursor:hand"><img src="https://www.cppentry.com/images/default/faceicon/14.gif" width="20" height="20" id="14"></td> <td align="center" valign="top"> <input name="commentface" type="hidden" value="1"> </td> </tr> </table> </td> </tr> <tr> <td width="16%"><span class="L">内  容:</span></td> <td width="84%"><span class="R"> <textarea name="content" cols="50" rows="5" id="comment_content" onKeyDown="quickpost(event)"></textarea> </span></td> </tr> <tr> <td width="16%"> <script language="JavaScript"> <!-- cnt = 0; function quickpost(event){ if((event.ctrlKey && event.keyCode == 13)||(event.altKey && event.keyCode == 83)){ cnt++; if (cnt==1){ post_comment(); }else{ alert('内容正在提交...'); } } } function post_comment(){ if(document.getElementById("comment_yzimg_tr").style.display==''){ if(document.getElementById("yzImgNum").value==''){ alert('验证码不能为空!'); return false; } } if(document.getElementById("comment_content").value==''){ alert('内容不能为空!'); return false; } document.getElementById("form_comment").submit(); document.getElementById("comment_content").value=''; if(document.getElementById("yzImgNum")!=null){ document.getElementById("yzImgNum").value=''; document.getElementById("yz_Img").src="https://www.cppentry.com/do/yzimg.php?"+Math.random();; } limitTime=parseInt('5'); limitComment(); } //--> </script> </td> <td width="84%"><span class="R"> <input type="button" id="comment_submit" onclick="post_comment()" name="Submit" value="提交" class="button"> <input type="hidden" name="action" value="post"> </span></td> </tr></form> </table> </td> </tr> <tr> <td class="foot"> <h3 class="L"></h3> <h3 class="R"></h3> </td> </tr> </table> </div> <div class="Side"> <table width="100%" border="0" cellspacing="0" cellpadding="0" class="dragTable" id="sonSortName"> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0" > <script>document.write(unescape('%3Cdiv id="hm_t_45532"%3E%3C/div%3E%3Cscript charset="utf-8" src="http://crs.baidu.com/t.js?siteId=238ce0d9669a08cae1971b03e0b2931a&planId=45532&async=0&referer=') + encodeURIComponent(document.referrer) + '&title=' + encodeURIComponent(document.title) + '&rnd=' + (+new Date) + unescape('"%3E%3C/script%3E'));</script> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0" class="dragTable"> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0" class="dragTable"> <script type="text/javascript"> document.write('<a style="display:none!important" id="tanx-a-mm_16790798_2355298_35944706"></a>'); tanx_s = document.createElement("script"); tanx_s.type = "text/javascript"; tanx_s.charset = "gbk"; tanx_s.id = "tanx-s-mm_16790798_2355298_35944706"; tanx_s.async = true; tanx_s.src = "http://p.tanx.com/ex?i=mm_16790798_2355298_35944706"; tanx_h = document.getElementsByTagName("head")[0]; if(tanx_h)tanx_h.insertBefore(tanx_s,tanx_h.firstChild); </script> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0" class="dragTable"> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0" class="dragTable"> </table> </div> </div> <div class="cleardiv"></div> <!-- --> <SCRIPT LANGUAGE="JavaScript"> <!--//目的是为了做风格方便 document.write('</div>'); //--> </SCRIPT> <SCRIPT LANGUAGE="JavaScript"> <!-- clickEdit.init(); //--> </SCRIPT> <div id="copyright"> Copyright@https://www.cppentry.com all rights reserved <a href="http://www.miibeian.gov.cn" target="_blank">粤ICP备13067022号-3</a><br> Powered by <a href="http://www.qibosoft.com" target="_blank">qibosoft V7.0</a> Code © 2003-11 <a href="http://bbs.qibosoft.com/" target="_blank">qibosoft</a> <br> </div> </body> </html>