python文本处理

1.在文本提取URL

这个主要用于爬虫技术：

把爬取的html页面保存为一个字符串，再从字符串中进行提取URL

比如把一个字符串保存在文件中

Now a days you can learn almost anything by just visiting http://www.google.com. But if you are completely new to computers or internet then first you need to leanr those fundamentals. Next
you can visit a good e-learning site like - https://www.codingdict.com to learn further on a variety of subjects.

然后使用findall()函数进行查找和正则表达式有关的实例。
import re

with open("pathurl_example.txt") as file:
        for line in file:
            urls = re.findall('https?://(?:[-w.]|(?:%[da-fA-F]{2}))+', line)
            print(urls)

查看全文

相关阅读:
基于python的socket网络编程
 Python3报错：ModuleNotFoundError: No module named '_bz2'
机器学习博客网站
 《Linux内核设计与实现》读书笔记(4)--进程的调度
 k8s 简单入门
 docker 简单入门
 python3 教程
 .toml 文件简介
 编码规范
 python3 基本用法

原文地址：https://www.cnblogs.com/qiujichu/p/10519802.html