从difflib获取更多粒度的diff（或对diff进行后处理以实现同一目的的方法）_随笔

从difflib获取更多粒度的diff（或对diff进行后处理以实现同一目的的方法）

您可以

nltk.sent_tokenize()

用来将汤串分割成句子：

from nltk import sent_tokenizesentences = [sentence for string in soup.stripped_strings for sentence in sent_tokenize(string)]sentences2 = [sentence for string in soup2.stripped_strings for sentence in sent_tokenize(string)]diff = d.compare(sentences, sentences2)changes = [change for change in diff if change.startswith('-') or  change.startswith('+')]for change in changes:    print(change)

仅在检测到更改的地方打印适当的句子：

- It contains a Title II provision that changes the age at which workers compensation/public disability offset ends for disability beneficiaries from age 65 to full retirement age (FRA).+ It contains a Title II provision that changes the age at which workers compensation/public disability offset ends for disability beneficiaries from age 68 to full retirement age (FRA).

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/zaji/5631387.html

从difflib获取更多粒度的diff（或对diff进行后处理以实现同一目的的方法）

发表评论

评论列表（0条）