python 使用BeautifulSoup库提取div标签中的文本内容_框架

因为你的html不是合法的xml格式，标签没有成对出现，只能用html解析器

from bs4 import BeautifulSoup

s = """

714659079qqcom 2014/09/10 10:14</div>

"""

soup = BeautifulSoup(s, "htmlparser")

print soup

print soupget_text()

如果你想用正则的话，只要把标签匹配掉就可以了

import re

s = """

714659079qqcom 2014/09/10 10:14</div>

"""

dr = recompile(r'<[^>]+>', reS)

dd = drsub('', s)

print dd

如果解决了您的问题请采纳！

如果未解决请继续追问

# -- coding:utf-8 --

#标签 *** 作

from bs4 import BeautifulSoup

import urllibrequest

import re

#如果是网址，可以用这个办法来读取网页

#html_doc = ""

#req = urllibrequestRequest(html_doc)

#webpage = urllibrequesturlopen(req)

#html = webpageread()

html="""

"""

soup = BeautifulSoup(html, 'htmlparser') #文档对象

# 类名为xxx而且文本内容为hahaha的div

for k in soupfind_all('div',class_='atcTit_more'):#,string='更多'

print(k)

先把网页内容放在一个字符串里，比如text

然后，id = textindex("") + len("")

得到的就是1在这个字符串里的位置，text[id]就是你要的结果。

int(text[id])就可以把字符“1”转换成整数1

# encoding: UTF-8

#请自行下载lxml库

from lxmlhtml import fromstring #伟大无敌的lxml库

class_name="row" #先找到class=row的所有DOM对象

dxpath="/td[1]/a" #再根据xpath找到对应的 a 标签

f=open("1TXT") #读取你的测试文档

a=fread()

fclose()

dom = fromstring(a)

b = domfind_class(class_name) #找到所有class=row的对象

print len(b)

if len(b):

for b1 in b:

ddd=b1xpath(dxpath)

if len(ddd):

for ddd1 in ddd:

print ddd1get("href")

以上就是关于python 使用BeautifulSoup库提取div标签中的文本内容全部的内容，包括:python 使用BeautifulSoup库提取div标签中的文本内容、Python如何用beautifulsoup库获取如下所示中的第二个value的值、请问用Python怎么获取网页中标签之间的内容例如1，我需要获得的结果是1，拜谢！等相关内容解答，如果想了解更多相关内容，可以关注我们，你们的支持是我们更新的动力！

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/web/9818302.html

python 使用BeautifulSoup库提取div标签中的文本内容

发表评论

评论列表（0条）