设为首页 加入收藏

TOP

爬虫——BeautifulSoup4解析器(十)
2017-09-30 17:57:09 】 浏览:1260
Tags:爬虫 BeautifulSoup4 解析
e Dormouse's story</b></p>]

(4)组合查找

#!/usr/bin/python3
# -*- coding:utf-8 -*-
__author__ = 'mayi'

from bs4 import BeautifulSoup

html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")

print(soup.select("p #link1"))

运行结果

[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

(5)属性查找

查找时还可以加入属性元素,属性需要用中括号括起来,注意属性和标签属于同一节点,所以中间不能加空格,否则会无法匹配到。

#!/usr/bin/python3
# -*- coding:utf-8 -*-
__author__ = 'mayi'

from bs4 import BeautifulSoup

html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")

print(soup.select("a[class='sister']"))

运行结果

[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

同样,属性仍然可以与上述查找方式组合,不在同一节点的空格隔开,同一节点的不加空格

#!/usr/bin/python3
# -*- coding:utf-8 -*-
__author__ = 'mayi'

from bs4 import BeautifulSoup

html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their nam
首页 上一页 7 8 9 10 下一页 尾页 10/10/10
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇【人生苦短 PYTHON当歌】——PYTH.. 下一篇day5模块学习 -- os模块学习

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目