java平台利用jsoup开发包,抓取优酷视频播放地址与图片地址等信息。(二)

2014-11-23 22:59:02 · 作者: · 浏览: 1
件过程

public void createXml(String fileName, List
  
    videoTitles,
			List
   
     videoUrl, List
    
      videoThumbUrls, List
     
      > videoSourceIdList, List
      
       > videoSourceStatusList, List
       
        > videoSourceUrlList, int pageNum) { // 创建根节点 Element root = new Element("videoInfo"); // 创建每一页的节点 Element pageElement = new Element("page"); //设置页码 pageElement.setAttribute("page", "" + pageNum); Document Doc = new Document(root); for (int i = 0; i < videoTitles.size(); i++) { // 创建节点 videoId Element VideoIdElement = new Element("videoId"); // 给 videoId 节点添加属性 id; VideoIdElement.setAttribute("id", "" + (i + 1 + (pageNum - 1) * videoTitles.size())); // 填写视频信息的值 VideoIdElement.addContent(new Element("videoTitle") .setText(videoTitles.get(i))); VideoIdElement.addContent(new Element("videoUrl").setText(videoUrl .get(i))); VideoIdElement.addContent(new Element("videoThumbUrls") .setText(videoThumbUrls.get(i))); for (int j = 0; j < videoSourceIdList.get(i).size(); j++) { Element sourceElement = new Element("source"); sourceElement.setAttribute("id", "" + videoSourceIdList.get(i).get(j)); sourceElement.setAttribute("status", "" + videoSourceStatusList.get(i).get(j)); sourceElement.setAttribute("url", "" + videoSourceUrlList.get(i).get(j)); VideoIdElement.addContent(sourceElement); } // 添加每一个子视频到每一页上面 pageElement.addContent(VideoIdElement); } // 添加每一页的视频到根节点上面 root.addContent(pageElement); Format format = Format.getCompactFormat(); format.setEncoding("utf-8"); // setEncoding 设置编码 format.setIndent(" "); XMLOutputter XMLOut = new XMLOutputter(format); try { XMLOut.output(Doc, new FileOutputStream(fileName)); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
       
      
     
    
   
  

注意:利用jdom构建xml时,有一点要注意的就是,不设置的话,jdom构建的xml不会换行,导致xml文件异常乱,难以看懂。可以采用以下设置即可实现xml文件的换行操作。

Format format = Format.getCompactFormat();
		format.setEncoding("utf-8"); 
		format.setIndent("	");

解析效果图每一页大概所需时间为2~3s。页面比较大,是网络情况而定。

\




< http://www.2cto.com/kf/ware/vc/" target="_blank" class="keylink">vcD4KPHA+ubm9qLP2wLS1xHhtbM7EvP7I58/CPC9wPgo8cD48cHJlIGNsYXNzPQ=="brush:java;"> 来自星星的你 http://www.soku.com/detail/show/XMTEyNDE0NA== http://g3.ykimg.com/0516000052AD289A675839358A07B6AA html" /> 食为奴 http://www.soku.com/detail/show/XMTA5MTQ1Mg== http://g4.ykimg.com/0516000052F4A2C56758390A8D0C4E55 丝男士 http://www.soku.com/detail/show/XMTA4MzkwNA== http://g1.ykimg.com/05160000519310F4670C4A1AE002FEB1 丝男士 第三季 http://www.soku.com/detail/show/XMTE0NzU2OA== http://g4.ykimg.com/051600005305D18E6758397D8206CC34