
我很好奇。众所周知,好奇心以杀死猫而闻名。
那么,给cat-skinning最快的方法是什么?本次测试的cat-skinning环境:
- Debian Squeeze 上的PostgreSQL 9.0具有不错的 RAM 和设置。
- 6.000 名学生,24.000 名俱乐部会员(从具有现实生活数据的类似数据库中复制的数据。)
- 稍微偏离了问题中的命名模式:
student.id
isstudent.stud_id
和club.id
isclub.club_id
here。 - 我在这个线程中以其作者的名字命名了这些查询。
- 我运行了几次所有查询来填充缓存,然后我选择了 5 个中最好的
EXPLAIN ANALYZE
. - 相关指标(应该是最佳的——只要我们不知道哪些俱乐部会被查询):
ALTER TABLE student ADD ConSTRAINT student_pkey PRIMARY KEY(stud_id );ALTER TABLE student_club ADD ConSTRAINT sc_pkey PRIMARY KEY(stud_id, club_id);ALTER TABLE club ADD ConSTRAINT club_pkey PRIMARY KEY(club_id );CREATE INDEX sc_club_id_idx ON student_club (club_id);
club_pkey此处的大多数查询不需要。
主键在 PostgreSQL 中自动实现唯一索引。
最后一个索引是为了弥补PostgreSQL上多列索引的这个众所周知的缺点:
结果多列 B 树索引可与涉及索引列的任何子集的查询条件一起使用,但当对前导(最左侧)列有约束时,索引最有效。
总运行时间来自
EXPLAIN ANALYZE.1) Martin 2: 44.594 ms
2) Erwin 1: 33.217 msSELECt s.stud_id, s.nameFROM student sJOIN student_club sc USING (stud_id)WHERe sc.club_id IN (30, 50)GROUP BY 1,2HAVINg COUNT(*) > 1;
3) Martin 1: 31.735 msSELECt s.stud_id, s.nameFROM student sJOIN ( SELECt stud_id FROM student_club WHERe club_id IN (30, 50) GROUP BY 1 HAVINg COUNT(*) > 1 ) sc USING (stud_id);
4) Derek: 2.287 msSELECt s.stud_id, s.nameFROM student sWHERe student_id IN ( SELECt student_id FROM student_club WHERe club_id = 30 INTERSECT SELECt stud_id FROM student_club WHERe club_id = 50 );
5) Erwin 2: 2.181 msSELECt s.stud_id, s.nameFROM student sWHERe s.stud_id IN (SELECt stud_id FROM student_club WHERe club_id = 30)AND s.stud_id IN (SELECt stud_id FROM student_club WHERe club_id = 50);
6) Sean: 2.043 msSELECt s.stud_id, s.nameFROM student sWHERe EXISTS (SELECt * FROM student_club WHERe stud_id = s.stud_id AND club_id = 30)AND EXISTS (SELECt * FROM student_club WHERe stud_id = s.stud_id AND club_id = 50);
SELECt s.stud_id, s.nameFROM student sJOIN student_club x ON s.stud_id = x.stud_idJOIN student_club y ON s.stud_id = y.stud_idWHERe x.club_id = 30AND y.club_id = 50;
最后三个表现几乎相同。4) 和 5) 导致相同的查询计划。
后期添加花式SQL,但性能跟不上:
7) ypercube 1: 148.649 ms8) ypercube 2: 147.497 msSELECt s.stud_id, s.nameFROM student AS sWHERe NOT EXISTS ( SELECt * FROM club AS c WHERe c.club_id IN (30, 50) AND NOT EXISTS ( SELECt * FROM student_club AS sc WHERe sc.stud_id = s.stud_id AND sc.club_id = c.club_id ) );
SELECt s.stud_id, s.nameFROM student AS sWHERe NOT EXISTS ( SELECt * FROM ( SELECt 30 AS club_id UNIOn ALL SELECT 50 ) AS c WHERe NOT EXISTS ( SELECT * FROM student_club AS sc WHERe sc.stud_id = s.stud_id AND sc.club_id = c.club_id ) );
正如预期的那样,这两者的表现几乎相同。查询计划导致表扫描,计划器在这里找不到使用索引的方法。
9) Wildplasser 1: 49.849 msWITH RECURSIVE two AS ( SELECt 1::int AS level , stud_id FROM student_club sc1 WHERe sc1.club_id = 30 UNIOn SELECt two.level + 1 AS level , sc2.stud_id FROM student_club sc2 JOIN two USING (stud_id) WHERe sc2.club_id = 50 AND two.level = 1 )SELECt s.stud_id, s.studentFROM student sJOIN two USING (studid)WHERe two.level > 1;
花式 SQL,CTE 性能不错。非常奇特的查询计划。
10) Wildplasser 2: 36.986 msWITH sc AS ( SELECt stud_id FROM student_club WHERe club_id IN (30,50) GROUP BY stud_id HAVINg COUNT(*) > 1 )SELECt s.*FROM student sJOIN sc USING (stud_id);
查询 2) 的 CTE 变体。令人惊讶的是,对于完全相同的数据,它可能会导致略有不同的查询计划。我在 上找到了一个顺序扫描
student,其中子查询变体使用了索引。11) ypercube 3: 101.482 ms
另一个后期添加 ypercube。真是太神奇了,有多少种方法。
12) erwin 3: 2.377 msSELECt s.stud_id, s.studentFROM student sJOIN student_club sc USING (stud_id)WHERe sc.club_id = 10 -- member in 1st club ...AND NOT EXISTS ( SELECt * FROM (SELECt 14 AS club_id) AS c -- can't be excluded for missing the 2nd WHERe NOT EXISTS ( SELECT * FROM student_club AS d WHERe d.stud_id = sc.stud_id AND d.club_id = c.club_id ) );
ypercube 的 11) 实际上只是这个更简单变体的令人费解的反向方法,它也仍然缺失。执行速度几乎与顶级猫科动物一样快。
13) erwin 4: 2.375 msSELECt s.*FROM student sJOIN student_club x USING (stud_id)WHERe sc.club_id = 10 -- member in 1st club ...AND EXISTS ( -- ... and membership in 2nd exists SELECt * FROM student_club AS y WHERe y.stud_id = s.stud_id AND y.club_id = 14 );
难以置信,但这是另一个真正的新变体。我看到了两个以上会员资格的潜力,但它也仅凭两个就跻身顶级among 之列。
俱乐部会员动态数量SELECt s.*FROM student AS sWHERe EXISTS ( SELECt * FROM student_club AS x JOIN student_club AS y USING (stud_id) WHERe x.stud_id = s.stud_id AND x.club_id = 14 AND y.club_id = 10 );
换句话说:不同数量的过滤器。这个问题要求恰好有两个俱乐部会员资格。但是许多用例必须为不同的数量做准备。
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)