设为首页 加入收藏

TOP

爬虫——BeautifulSoup4解析器(四)
2017-09-30 17:57:09 】 浏览:1262
Tags:爬虫 BeautifulSoup4 解析
lt;/b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 创建 Beautiful Soup 对象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") # 打印p标签的内容 print(soup.p.string) # 打印soup.p.string的类型 print(type(soup.p.string))

运行结果

The Dormouse's story
<class 'bs4.element.NavigableString'>

3.BeautifulSoup

BeautifulSoup对象表示的是一个文档的内容。大部分时候,可以把它当作Tag对象,是一个特殊的Tag,我们可以分别获取它的类型,名称,以及属性

#!/usr/bin/python3
# -*- coding:utf-8 -*-
__author__ = 'mayi'

from bs4 import BeautifulSoup

html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")

# 类型
print(type(soup.name))

# 名称
print(soup.name)

# 属性
print(soup.attrs)

运行结果

<class 'str'>
[document]
{}

4.Comment

Comment对象是一个特殊类型的NavigableString对象,其输出的内容不包括注释符号。

#!/usr/bin/python3
# -*- coding:utf-8 -*-
__author__ = 'mayi'

from bs4 import BeautifulSoup

html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")

print(soup.a)

print(soup.a.string)

print(type(soup.a.string))

运行结果

<
首页 上一页 1 2 3 4 5 6 7 下一页 尾页 4/10/10
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇【人生苦短 PYTHON当歌】——PYTH.. 下一篇day5模块学习 -- os模块学习

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目