最近公司要处理一批文章,于是写了个批量处理工具,这个伪原创效果很满意,而且对收录很有帮助,有需要的朋友可以尝试一下,代码python写的。
文章处理,article.py
import requests from requests.structures import CaseInsensitiveDict url = "http://api.xiaofamao.com/api.php?json=0&v=1&key=testkey" headers = CaseInsensitiveDict() headers["Accept"] = "application/json" headers["Content-Type"] = "application/x-www-form-urlencoded" data = "wenzhang=%E5%BA%8A%E5%89%8D%E6%98%8E%E6%9C%88%E5%85%89%EF%BC%8C%E7%96%91%E6%98%AF%E5%9C%B0%E4%B8%8A%E9%9C%9C%E3%80%82" resp = requests.post(url, headers=headers, data=data) print(resp.status_code)
批量处理代码 worker.py:
def run(self): make_sure_dir_exists(self.target_dir) # 确保目录存在 source_dir = self.filename # 计数器 flag = 0 # 文件名 name = 1 # 存放数据 dataList = [] final_data = '' self.sendMessage("set_info", self.file_name(source_dir)) self.sendMessage("set_info", datetime.now().strftime('%Y-%m-%d %H:%M:%S')) self.sendMessage("set_info", "开始伪原创") # open(filename, "r", encoding='utf-8') f_charInfo = '' with open(source_dir, 'rb') as f: data = f.read() f_charInfo = chardet.detect(data) if 'utf-8' not in f_charInfo['encoding'].lower(): self.sendMessage("set_info", '错误:文件非utf-8编码,暂不支持'+f_charInfo['encoding']) return #print(f_charInfo) final_data = '' temp_str = '' final_str = '' fail_str = '' with open(source_dir, 'r', encoding='utf-8') as f_source: for line in f_source: one_title = line.strip() #one_article = self.get_ai_articlev2(one_title) one_article = self.get_ai_articlev2(one_title) one_article = self.content_filter(one_article) # 过滤关键词 new_title = self.get_rand_title() new_title = one_title+new_title if len(one_article) > 10: file_name_short = self.remove_bad_file_symble(new_title) file_name = self.target_dir + '/'+file_name_short + '.txt' with open(file_name, "w", encoding="utf-8") as f: ret_val = f.write(one_article) else: print(one_article) self.sendMessage("set_info", '正在写作,标题:'+new_title) time.sleep(2)
在选择了合适的材料之后,你就进入了伪原创的步骤。这时,就需要提取文章的中心思想和几个核心部分或观点。哈哈,你看的时候感觉像是阅读理解,把文章分章节,总结段落思路。
是的,没错,相当于精简和提炼了文章的核心要素。在这个过程中,关键词也可以被选择性地提炼,这是表达文章思想的必要核心材料。
其实一般人的阅读能力问题不大。熟练后,这个阶段是从原始内容中提取核心材料,形成伪原创内容的大纲。