postgresql - Postgres hierarchical (jsonb) CTE unnecessarily slow -
i have jsonb column in table holds hierarchical information.
mytable (id uuid, indexes jsonb, content bytea)
now if create cte say
with recursive hierarchy(pid, id, content) ( --load first parents select t.indexes ->> 'parentid' pid, t.id, t.content mytable c join mytable t on t.indexes ->> 'id' = c.indexes ->> 'parentid' c.id = any('{..some uuids}') union select t.indexes ->> 'parentid' pid, t.id, t.content hierarchy h, mytable t t.indexes ->> 'id' = h.pid ) select id, content hierarchy
now example run of building parent tree 2 nodes within table of 300k records takes approximately 10s.
now if create index
create index myindex on mytable using btree ((indexes ->> 'id')
this reduces time 4.5s. produces analysis of
-> recursive union (cost=23.81..4528423.71 rows=80794929 width=1219) (actual time=0.188..1802.636 rows=5 loops=1) -> nested loop (cost=23.81..3150.15 rows=899 width=1219) (actual time=0.132..0.133 rows=1 loops=1) output: (t.indexes ->> 'parentid'::text), t.id, t.content -> index scan using "mytable_pkey" on "test"."mytable" c (cost=0.42..8.44 rows=1 width=123) (actual time=0.053..0.053 rows=1 loops=1) output: c.id, c.content, c.indexes index cond: (c.id = ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[])) -> bitmap heap scan on "test"."mytable" t (cost=23.39..3130.48 rows=899 width=1219) (actual time=0.066..0.066 rows=1 loops=1) output: t.id, t.content, t.indexes recheck cond: (((t.indexes ->> 'id'::text) = (c.indexes ->> 'parentid'::text))) heap blocks: exact=1 -> bitmap index scan on "myindex" (cost=0.00..23.17 rows=899 width=0) (actual time=0.055..0.055 rows=1 loops=1) index cond: ((t.indexes ->> 'id'::text) = (c.indexes ->> 'parentid'::text)) //union part -> merge join (cost=770.60..290937.50 rows=8079403 width=1219) (actual time=360.467..360.476 rows=1 loops=5) output: (t_1.indexes ->> 'parentid'::text), t_1.id, t_1.content merge cond: ((t_1.indexes ->> 'id'::text) = h.pid) -> index scan using "myindex" on "test"."mytable" t_1 (cost=0.42..127680.55 rows=179742 width=1219) (actual time=0.019..288.168 rows=60478 loops=5) output: t_1.id, t_1.sourceid, t_1.content, t_1.indexes -> sort (cost=770.18..792.65 rows=8990 width=32) (actual time=0.010..0.011 rows=1 loops=5) output: h.pid sort key: h.pid sort method: quicksort memory: 25kb -> worktable scan on hierarchy h (cost=0.00..179.80 rows=8990 width=32) (actual time=0.001..0.001 rows=1 loops=5) output: h.pid
now can massive speed impovements replacing indexes ->> 'parentid' function in cte, , creating index on function.
create function "test"."myfunction"(idarg uuid) returns text $body$ select t.indexes ->> 'id' result "test"."mytable" t t.id = idarg $body$ language sql immutable;
with index
create index myfunctionindex on mytable using btree (myfunction(id))
this takes 0.01s execute query analysis
-> recursive union (cost=23.81..5333205.06 rows=80794929 width=1219) (actual time=0.163..0.291 rows=5 loops=1) -> nested loop (cost=23.81..3372.65 rows=899 width=1219) (actual time=0.082..0.084 rows=1 loops=1) output: (t.indexes ->> 'parentid'::text), t.id, t.content, t.modified -> index scan using "mytable_pkey" on "test"."mytable" c (cost=0.42..8.44 rows=1 width=123) (actual time=0.019..0.019 rows=1 loops=1) output: c.id, c.sourceid, c.viewid, c.content, c.indexes, c.statekey, c.modified index cond: (c.id = ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[])) -> bitmap heap scan on "test"."mytable" t (cost=23.39..3352.98 rows=899 width=1219) (actual time=0.037..0.037 rows=1 loops=1) output: t.id, t.content, t.indexes recheck cond: (("test"."myfunction"(t.id) = (c.indexes ->> 'parentid'::text))) heap blocks: exact=1 -> bitmap index scan on "myfunctionindex" (cost=0.00..23.17 rows=899 width=0) (actual time=0.025..0.025 rows=1 loops=1) index cond: ("test"."myfunction"(t.id) = (c.indexes ->> 'parentid'::text)) //union part -> nested loop (cost=0.42..371393.38 rows=8079403 width=1219) (actual time=0.012..0.013 rows=1 loops=5) output: (t_1.indexes ->> 'parentid'::text), t_1.id, t_1.content -> worktable scan on hierarchy h (cost=0.00..179.80 rows=8990 width=32) (actual time=0.000..0.000 rows=1 loops=5) output: h.pid, h.id, h.content -> index scan using "myfunctionindex" on "test"."mytable" t_1 (cost=0.42..30.06 rows=899 width=1219) (actual time=0.010..0.010 rows=1 loops=5) output: t_1.id, t_1.content, t_1.indexes index cond: ("test"."myfunction"(t_1.id) = h.pid)
so why cant index run fast functionindex?
there seems superfluous sort in there. , reason dont want use function index is immutable index wont auto-update after insert/update/delete.
ps not looking schema change suggestions.
looks gin indexes behave well. if create gin index on indexes column, , change join
on t.indexes @> jsonb_build_object('id', c.indexes -> 'parentid')
and to
where t.indexes @> jsonb_build_object('id', h.pid)
its not fast purely function index, least update dynamically, , execution plan doesnt have unnecessary sort
further performance improvements can made adding gin index flag jsonb_path_ops
Comments
Post a Comment