BeautifulSoup

解析、遍历、维护 XML 的库。

安装Beautiful Soup：pip install beautifulsoup4

使用Beautiful Soup：

from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>data</p>', 'html.parse')

Beautiful Soup 的解析器，其它的不用了解，只要知道 html.parse 即可。

Beautiful Soup 基本元素：

标签树的下行遍历：

标签树的平行遍历(在同一个父节点下)：

标签树的上行遍历：

.parent / .parents

信息提取方法：

方法	说明	属性
<>.find_all(name, attrs, recursive, string, **kwargs)	返回一个列表类型，存储查找的结果	name: 对标签名称的检索字符串；attrs: 对标签属性值的检索字符串，可标注属性检索；recursive: 是否对子孙全部检索，默认为 True；string: 对<></>中字符串区域的检索字符串；
(..)	等价于.find_all(..)
soup(..)	等价于 soup.find_all(..)
	以下为扩展方法，参数同 find_all
find()	搜索且只返回一个结果
find_parents()	在先辈节点中搜索，返回列表类型
find_parent()	在先辈节点中搜索，返回字符串
find_next_siblings()	在后续平行节点汇总搜索，返回列表类型
find_next_siblings()	在后续平行节点汇总搜索，返回字符串
find_previous_siblings()	在前序平行节点汇总搜索，返回列表类型
find_previous_sibling()	在前序平行节点汇总搜索，返回字符串