
您可以
nltk.sent_tokenize()用来将汤串分割成句子:
from nltk import sent_tokenizesentences = [sentence for string in soup.stripped_strings for sentence in sent_tokenize(string)]sentences2 = [sentence for string in soup2.stripped_strings for sentence in sent_tokenize(string)]diff = d.compare(sentences, sentences2)changes = [change for change in diff if change.startswith('-') or change.startswith('+')]for change in changes: print(change)仅在检测到更改的地方打印适当的句子:
- It contains a Title II provision that changes the age at which workers compensation/public disability offset ends for disability beneficiaries from age 65 to full retirement age (FRA).+ It contains a Title II provision that changes the age at which workers compensation/public disability offset ends for disability beneficiaries from age 68 to full retirement age (FRA).
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)