如何从Html页面中提取所有汉字_随笔

用正则表达式的方法

dim str

str="怎样从一个Html页面中提取所有汉字呢？不能有其它Html代码。"

alert RegExpTest("[\u4e00-\u9fa5]",str)

Function RegExpTest(patrn, strng)

Dim regEx, Match, Matches ' 建立变量。

Set regEx = New RegExp ' 建立正则表达式。

regEx.Pattern = patrn ' 设置模式。

regEx.IgnoreCase = True ' 设置是否区分大小写。

regEx.Global = True ' 设置全局替换。

Set Matches = regEx.Execute(strng) ' 执行搜索。

For Each Match in Matches ' 遍历 Matches 集合。

RetStr = RetStr & Match.Value

RegExpTest = RetStr

End Function

</SCRIPT>

function pageStrCount(str) {

var count = 0 // 统计字数

var allStr = document.body.innerHTML // 网页文字包含标签

allStr = allStr.replace(/<\w+?>/g, '').replace(/<\/\w+?>/g, '') // 过滤网页标签

console.log(allStr)

var reg = new RegExp(str, 'g') // 匹配的文字正则

reg.exec(allStr)

console.log(reg.lastIndex)

while (reg.lastIndex) {

count++

reg.exec(allStr)

}

console.log('字数统计结果：', count)

}

传入想要统计的字符串，这段代码就可以了

欢迎分享，转载请注明来源：内存溢出

如何从Html页面中提取所有汉字