postgresql - Postgres hierarchical (jsonb) CTE unnecessarily slow -

i have jsonb column in table holds hierarchical information.

mytable (id uuid, indexes jsonb, content bytea)

now if create cte say

with recursive hierarchy(pid, id, content) (   --load first parents   select t.indexes ->> 'parentid' pid, t.id, t.content mytable c   join mytable t on t.indexes ->> 'id' = c.indexes ->> 'parentid'    c.id = any('{..some uuids}')   union   select t.indexes ->> 'parentid' pid, t.id, t.content   hierarchy h, mytable t   t.indexes ->> 'id' = h.pid ) select id, content hierarchy

now example run of building parent tree 2 nodes within table of 300k records takes approximately 10s.

now if create index

create index myindex on mytable using btree ((indexes ->> 'id')

this reduces time 4.5s. produces analysis of

    ->  recursive union  (cost=23.81..4528423.71 rows=80794929 width=1219) (actual time=0.188..1802.636 rows=5 loops=1)           ->  nested loop  (cost=23.81..3150.15 rows=899 width=1219) (actual time=0.132..0.133 rows=1 loops=1)                 output: (t.indexes ->> 'parentid'::text), t.id, t.content                 ->  index scan using "mytable_pkey" on "test"."mytable" c  (cost=0.42..8.44 rows=1 width=123) (actual time=0.053..0.053 rows=1 loops=1)                       output: c.id, c.content, c.indexes                       index cond: (c.id = ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[]))                 ->  bitmap heap scan on "test"."mytable" t  (cost=23.39..3130.48 rows=899 width=1219) (actual time=0.066..0.066 rows=1 loops=1)                       output: t.id, t.content, t.indexes                       recheck cond: (((t.indexes ->> 'id'::text) = (c.indexes ->> 'parentid'::text)))                       heap blocks: exact=1                       ->  bitmap index scan on "myindex"  (cost=0.00..23.17 rows=899 width=0) (actual time=0.055..0.055 rows=1 loops=1)                             index cond: ((t.indexes ->> 'id'::text) = (c.indexes ->> 'parentid'::text))  //union part           ->  merge join  (cost=770.60..290937.50 rows=8079403 width=1219) (actual time=360.467..360.476 rows=1 loops=5)                 output: (t_1.indexes ->> 'parentid'::text), t_1.id, t_1.content                 merge cond: ((t_1.indexes ->> 'id'::text) = h.pid)                 ->  index scan using "myindex" on "test"."mytable" t_1  (cost=0.42..127680.55 rows=179742 width=1219) (actual time=0.019..288.168 rows=60478 loops=5)                       output: t_1.id, t_1.sourceid, t_1.content, t_1.indexes                 ->  sort  (cost=770.18..792.65 rows=8990 width=32) (actual time=0.010..0.011 rows=1 loops=5)                       output: h.pid                       sort key: h.pid                       sort method: quicksort  memory: 25kb                       ->  worktable scan on hierarchy h  (cost=0.00..179.80 rows=8990 width=32) (actual time=0.001..0.001 rows=1 loops=5)                             output: h.pid

now can massive speed impovements replacing indexes ->> 'parentid' function in cte, , creating index on function.

create function "test"."myfunction"(idarg uuid)   returns text $body$  select t.indexes ->> 'id' result "test"."mytable" t  t.id = idarg  $body$ language sql immutable;

with index

create index myfunctionindex on mytable using btree (myfunction(id))

this takes 0.01s execute query analysis

->  recursive union  (cost=23.81..5333205.06 rows=80794929 width=1219) (actual time=0.163..0.291 rows=5 loops=1)       ->  nested loop  (cost=23.81..3372.65 rows=899 width=1219) (actual time=0.082..0.084 rows=1 loops=1)             output: (t.indexes ->> 'parentid'::text), t.id, t.content, t.modified             ->  index scan using "mytable_pkey" on "test"."mytable" c  (cost=0.42..8.44 rows=1 width=123) (actual time=0.019..0.019 rows=1 loops=1)                   output: c.id, c.sourceid, c.viewid, c.content, c.indexes, c.statekey, c.modified                   index cond: (c.id = ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[]))             ->  bitmap heap scan on "test"."mytable" t  (cost=23.39..3352.98 rows=899 width=1219) (actual time=0.037..0.037 rows=1 loops=1)                   output: t.id, t.content, t.indexes                   recheck cond: (("test"."myfunction"(t.id) = (c.indexes ->> 'parentid'::text)))                   heap blocks: exact=1                   ->  bitmap index scan on "myfunctionindex"  (cost=0.00..23.17 rows=899 width=0) (actual time=0.025..0.025 rows=1 loops=1)                         index cond: ("test"."myfunction"(t.id) = (c.indexes ->> 'parentid'::text))  //union part           ->  nested loop  (cost=0.42..371393.38 rows=8079403 width=1219) (actual time=0.012..0.013 rows=1 loops=5)                 output: (t_1.indexes ->> 'parentid'::text), t_1.id, t_1.content                 ->  worktable scan on hierarchy h  (cost=0.00..179.80 rows=8990 width=32) (actual time=0.000..0.000 rows=1 loops=5)                       output: h.pid, h.id, h.content                 ->  index scan using "myfunctionindex" on "test"."mytable" t_1  (cost=0.42..30.06 rows=899 width=1219) (actual time=0.010..0.010 rows=1 loops=5)                       output: t_1.id, t_1.content, t_1.indexes                       index cond: ("test"."myfunction"(t_1.id) = h.pid)

so why cant index run fast functionindex?
there seems superfluous sort in there. , reason dont want use function index is immutable index wont auto-update after insert/update/delete.

ps not looking schema change suggestions.

looks gin indexes behave well. if create gin index on indexes column, , change join

on t.indexes @> jsonb_build_object('id', c.indexes -> 'parentid')

and to

where t.indexes @> jsonb_build_object('id', h.pid)

its not fast purely function index, least update dynamically, , execution plan doesnt have unnecessary sort

further performance improvements can made adding gin index flag jsonb_path_ops

Search This Blog

Stadnd

postgresql - Postgres hierarchical (jsonb) CTE unnecessarily slow -

Comments

Post a Comment

Popular posts from this blog

Capture and play voice with Asterisk ARI -

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -

python - How to use elasticsearch.helpers.streaming_bulk -