
基于单词的搜索
检索分词 基于全文的查询相关性分数_scoreFunctionScoreQuery
测试数据FieldValueFactor
基于单词的搜索基于单词的搜索对应term关键字,es在检索数据时会自动把关键词小写分词处理,如果不希望这样,可以加入keyword
检索以下例子什么也搜不到:
PUT term_test/_doc/1
{
"name": "Szc",
"hometown": "China-Henan-Anyang"
}
GET term_test/_search
{
"query": {
"term": {
"name": {
"value": "Szc"
}
}
}
}
输出如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
但name后加了.keyword就可以了:
GET term_test/_search
{
"query": {
"term": {
"name.keyword": {
"value": "Szc"
}
}
}
}
输出如下
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "term_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "Szc",
"hometown" : "China-Henan-Anyang"
}
}
]
}
}
分词
分词也是一样的:
GET term_test/_search
{
"query": {
"term": {
"hometown.keyword": {
"value": "China-Henan-Anyang"
}
}
}
}
会得到想要的结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "term_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "Szc",
"hometown" : "China-Henan-Anyang"
}
}
]
}
}
基于全文的查询
对关键词进行分词,对拆到的每个单词进行term查询,然后进行合并,例如match、match_phrase,具体参见ElasticSearch7学习之搜索API中逻辑 *** 作符和match_phrase章节
相关性分数_score现在相关性算法采用的是BM25,和经典的TFIDF相比,当TF无限增加时,BM25算分会趋于一个数值。
可以通过增加explain参数来查看分数是怎么算的:
GET term_test/_search
{
"explain": true,
"query": {
"term": {
"hometown.keyword": {
"value": "China-Henan-Anyang"
}
}
}
}
输出如下,_explanation字段里各个参数和参数值都有清楚的解释
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_shard" : "[term_test][0]",
"_node" : "JhcR-XkxT2uY3UubTpfZAQ",
"_index" : "term_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "Szc",
"hometown" : "China-Henan-Anyang"
},
"_explanation" : {
"value" : 0.2876821,
"description" : "weight(hometown.keyword:China-Henan-Anyang in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.2876821,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 1,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
]
}
}
我们可以指定boost参数来优化查分的计算,包括正向增强、负向增强和负向增强因子
GET term_test/_search
{
"explain": true,
"query": {
"term": {
"hometown.keyword": {
"value": "China-Henan-Anyang"
}
},
"boosting": {
"positive": {
"term": {
"FIELD": {
"value": "VALUE"
}
}
},
"negative": {
"term": {
"FIELD": {
"value": "VALUE"
}
}
},
"negative_boost": 0.2
}
}
}
FunctionScoreQuery
作用:可以在查询结束以后,对每一个匹配的文档进行重新算分,根据新生成的分数进行排序
有几种默认的计算分值的函数:
- Weight:为每一个文档设置一个简单而不被规范化的权重);FieldValueFactor:将某些字段作为算分的参考因素);RandomScore:为每一个用户使用一个不同的随机算分结果);衰减函数:以某个字段的值为标准,距离某个值越近,得分越高;scriptScore:自定义脚本。
PUT /blogs/_doc/1
{
"title": "blog 1",
"content": "content1",
"votes": 10000000
}
PUT /blogs/_doc/2
{
"title": "blog 2",
"content": "content2",
"votes": 0
}
PUT /blogs/_doc/3
{
"title": "blog 3",
"content": "content3",
"votes": 1000
}
FieldValueFactor
对于FieldValueFactor,用法如下,其实就是原始的评分乘上指定的字段值
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "blog",
"fields": ["title", "content"]
}
},
"field_value_factor": {
"field": "votes"
}
}
}
}
可以使用modifier和factor来使曲线更加平滑,从这儿可以看出原始分数不是直接和字段值进行结合的,而是先将字段值进行强化(boost),再和原始分数进行结合
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "blog",
"fields": ["title", "content"]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p",
"factor": 0.1
}
}
}
}
还可以通过指定boost_mode来改变分数和boost后字段值的结合方式,默认是相乘;而max_boost字段可以指定boost后字段值的最大值
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "blog",
"fields": ["title", "content"]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p",
"factor": 0.1
},
"boost_mode": "max",
"max_boost": 5
}
}
}
对于一致性随机函数,用法如下,给定种子值即可
POST blogs/_search
{
"query": {
"function_score": {
"random_score": {
"seed": 314159265361
}
}
}
}
种子值不同,排序结果就可能不同。
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)