zoukankan      html  css  js  c++  java
  • python爬虫简单实现,并在java中调用python脚本,将数据保存在json文件中

    # coding:utf-8
    
    import urllib2
    from bs4 import BeautifulSoup
    import json
    import sys
    
    reload(sys)
    
    sys.setdefaultencoding('utf-8')
    class dataBean(object) :
    
        def __init__(self, title, url,date):
            self.date = date
            self.url = url
            self.title = title
        def obj_2_json(obj):
            return {
                "title":obj.title,
                "url":obj.url,
                "date":obj.date
            }
    url = "http://localhost:8088/news.html"
    response3 = urllib2.urlopen(url)
    soup = BeautifulSoup(response3.read(), 'html.parser', from_encoding='utf-8')
    links = soup.find_all('a',class_='')
    data=[]
    contents = soup.find('ul', class_="w_newslistpage_list").findAll("li")
    
    
    for content in contents:
        bean = dataBean(content.find("span").find("a").get_text(), content.find("span").find("a")['href'],
                        content.find('span', class_="date").get_text())
        data.append(dataBean(content.find("span").find("a").get_text(), content.find("span").find("a")['href'],
                        content.find('span', class_="date").get_text()))
    
    jsondata= json.dumps(data,default=dataBean.obj_2_json, ensure_ascii=False,encoding='utf-8')
    fileObject = open('data.json', 'w')
    fileObject.write(jsondata)
    fileObject.close()
    print jsondata

    java中调用,借助jython.jar,并将bs4文件拷贝在当前文件夹下即可

    import org.python.core.Py;
    import org.python.core.PyString;
    import org.python.util.PythonInterpreter;
    
    
    public class Main {
    //jython安装
        public static void main(String[] args) {
            String code = "# -*- coding: utf-8 -*-
    " +
                    "import sys
    " +
                    "reload(sys)
    " + "import urllib2
    " +
                    "sys.setdefaultencoding('utf-8')
    " +
                    "import json
    ";
            new Thread(new Runnable() {
                @Override
                public void run() {
                    PythonInterpreter interpreter = new PythonInterpreter();
                    interpreter.exec("from bs4 import BeautifulSoup");
                    PyString code2 = Py.newStringUTF8(code);
                    interpreter.exec(code2);
                    interpreter.execfile("D:\java\test\src\GetNewsDataToLocal.py");
    
                }
            }
            ).start();
        }
    
    
    }

    可在当前文件夹看到json文件

  • 相关阅读:
    memcached基础与配置详解
    最小化安装CentOS基础命令
    源码编译redis及解决三个开机告警问题
    HAProxy之一----HAPproxy配置参数详解
    HAProxy之二----HAProxy实现高级负载均衡实战和ACL控制
    6 SQL语言——distinct去重
    5 SQL语言——连接符
    04 Oracle Sql语言API
    03 Oracle——用户密码忘记重置
    02 Oracle——账号管理
  • 原文地址:https://www.cnblogs.com/loaderman/p/10137082.html
Copyright © 2011-2022 走看看