从difflib获取更多粒度的diff(或对diff进行后处理以实现同一目的的方法)

从difflib获取更多粒度的diff(或对diff进行后处理以实现同一目的的方法),第1张

从difflib获取更多粒度的diff(或对diff进行后处理以实现同一目的的方法)

您可以

nltk.sent_tokenize()
用来将汤串分割成句子

from nltk import sent_tokenizesentences = [sentence for string in soup.stripped_strings for sentence in sent_tokenize(string)]sentences2 = [sentence for string in soup2.stripped_strings for sentence in sent_tokenize(string)]diff = d.compare(sentences, sentences2)changes = [change for change in diff if change.startswith('-') or  change.startswith('+')]for change in changes:    print(change)

仅在检测到更改的地方打印适当的句子:

- It contains a Title II provision that changes the age at which workers compensation/public disability offset ends for disability beneficiaries from age 65 to full retirement age (FRA).+ It contains a Title II provision that changes the age at which workers compensation/public disability offset ends for disability beneficiaries from age 68 to full retirement age (FRA).


欢迎分享,转载请注明来源:内存溢出

原文地址:https://54852.com/zaji/5631387.html

(0)
打赏 微信扫一扫微信扫一扫 支付宝扫一扫支付宝扫一扫
上一篇 2022-12-15
下一篇2022-12-15

发表评论

登录后才能评论

评论列表(0条)

    保存