最近公司要处理一批文章,于是写了个批量处理工具,这个伪原创效果很满意,而且对收录很有帮助,有需要的朋友可以尝试一下,代码python写的。
文章处理,article.py
import requests from requests.structures import CaseInsensitiveDict url = "http://api.xiaofamao.com/api.php?json=0&v=1&key=testkey" headers = CaseInsensitiveDict() headers["Accept"] = "application/json" headers["Content-Type"] = "application/x-www-form-urlencoded" data = "wenzhang=%E5%BA%8A%E5%89%8D%E6%98%8E%E6%9C%88%E5%85%89%EF%BC%8C%E7%96%91%E6%98%AF%E5%9C%B0%E4%B8%8A%E9%9C%9C%E3%80%82" resp = requests.post(url, headers=headers, data=data) print(resp.status_code)
批量处理代码 worker.py:
def run(self):
make_sure_dir_exists(self.target_dir) # 确保目录存在
source_dir = self.filename
# 计数器
flag = 0
# 文件名
name = 1
# 存放数据
dataList = []
final_data = ''
self.sendMessage("set_info", self.file_name(source_dir))
self.sendMessage("set_info", datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
self.sendMessage("set_info", "开始伪原创")
# open(filename, "r", encoding='utf-8')
f_charInfo = ''
with open(source_dir, 'rb') as f:
data = f.read()
f_charInfo = chardet.detect(data)
if 'utf-8' not in f_charInfo['encoding'].lower():
self.sendMessage("set_info", '错误:文件非utf-8编码,暂不支持'+f_charInfo['encoding'])
return
#print(f_charInfo)
final_data = ''
temp_str = ''
final_str = ''
fail_str = ''
with open(source_dir, 'r', encoding='utf-8') as f_source:
for line in f_source:
one_title = line.strip()
#one_article = self.get_ai_articlev2(one_title)
one_article = self.get_ai_articlev2(one_title)
one_article = self.content_filter(one_article) # 过滤关键词
new_title = self.get_rand_title()
new_title = one_title+new_title
if len(one_article) > 10:
file_name_short = self.remove_bad_file_symble(new_title)
file_name = self.target_dir + '/'+file_name_short + '.txt'
with open(file_name, "w", encoding="utf-8") as f:
ret_val = f.write(one_article)
else:
print(one_article)
self.sendMessage("set_info", '正在写作,标题:'+new_title)
time.sleep(2)
在选择了合适的材料之后,你就进入了伪原创的步骤。这时,就需要提取文章的中心思想和几个核心部分或观点。哈哈,你看的时候感觉像是阅读理解,把文章分章节,总结段落思路。
是的,没错,相当于精简和提炼了文章的核心要素。在这个过程中,关键词也可以被选择性地提炼,这是表达文章思想的必要核心材料。
其实一般人的阅读能力问题不大。熟练后,这个阶段是从原始内容中提取核心材料,形成伪原创内容的大纲。