My application is running over Postgresql 9.4 and I found a problem and I don’t
know how can I fix it.
I’m runing VoIp application, and we have a few tables that are too long (>60M of
rows) with all call details, I’m trying to make some reports over it, but some
queries are too slow and we don’t know how can I do faster.
For example, one customer maybe have for a month 700K calls per month, and I
want to make some grouping around it, get the total cost, filter by json fields,
etc..
For that we use index, but in this case index are heavy, as I said a lot of
records that are similar (account_name) but with large index.
So the system that I use for billing has the following tables:
CREATE TABLE cdrs_primary (
id SERIAL PRIMARY KEY,
cgrid CHAR(40) NOT NULL,
tor VARCHAR(16) NOT NULL,
accid VARCHAR(64) NOT NULL,
cdrhost VARCHAR(64) NOT NULL,
cdrsource VARCHAR(64) NOT NULL,
reqtype VARCHAR(24) NOT NULL,
direction VARCHAR(8) NOT NULL,
tenant VARCHAR(64) NOT NULL,
category VARCHAR(32) NOT NULL,
account VARCHAR(128) NOT NULL,
subject VARCHAR(128) NOT NULL,
destination VARCHAR(128) NOT NULL,
setup_time TIMESTAMP NOT NULL,
pdd NUMERIC(12,9) NOT NULL,
answer_time TIMESTAMP NOT NULL,
usage NUMERIC(30,9) NOT NULL,
supplier VARCHAR(128) NOT NULL,
disconnect_cause VARCHAR(64) NOT NULL,
created_at TIMESTAMP,
deleted_at TIMESTAMP,
UNIQUE (cgrid)
);
CREATE INDEX answer_time_idx ON cdrs_primary (answer_time);
CREATE INDEX deleted_at_cp_idx ON cdrs_primary (deleted_at);
CREATE TABLE cdrs_extra (
id SERIAL PRIMARY KEY,
cgrid CHAR(40) NOT NULL,
extra_fields jsonb NOT NULL,
created_at TIMESTAMP,
deleted_at TIMESTAMP,
UNIQUE (cgrid)
);
CREATE INDEX deleted_at_ce_idx ON cdrs_extra (deleted_at);
CREATE TABLE cost_details (
id SERIAL PRIMARY KEY,
cgrid CHAR(40) NOT NULL,
runid VARCHAR(64) NOT NULL,
tor VARCHAR(16) NOT NULL,
direction VARCHAR(8) NOT NULL,
tenant VARCHAR(128) NOT NULL,
category VARCHAR(32) NOT NULL,
account VARCHAR(128) NOT NULL,
subject VARCHAR(128) NOT NULL,
destination VARCHAR(128) NOT NULL,
cost NUMERIC(20,4) NOT NULL,
timespans jsonb,
cost_source VARCHAR(64) NOT NULL,
created_at TIMESTAMP,
updated_at TIMESTAMP,
deleted_at TIMESTAMP,
UNIQUE (cgrid, runid)
);
CREATE INDEX deleted_at_cd_idx ON cost_details (deleted_at);
CREATE TABLE rated_cdrs (
id SERIAL PRIMARY KEY,
cgrid CHAR(40) NOT NULL,
runid VARCHAR(64) NOT NULL,
reqtype VARCHAR(24) NOT NULL,
direction VARCHAR(8) NOT NULL,
tenant VARCHAR(64) NOT NULL,
category VARCHAR(32) NOT NULL,
account VARCHAR(128) NOT NULL,
subject VARCHAR(128) NOT NULL,
destination VARCHAR(128) NOT NULL,
setup_time TIMESTAMP NOT NULL,
pdd NUMERIC(12,9) NOT NULL,
answer_time TIMESTAMP NOT NULL,
usage NUMERIC(30,9) NOT NULL,
supplier VARCHAR(128) NOT NULL,
disconnect_cause VARCHAR(64) NOT NULL,
cost NUMERIC(20,4) DEFAULT NULL,
extra_info text,
created_at TIMESTAMP,
updated_at TIMESTAMP,
deleted_at TIMESTAMP,
UNIQUE (cgrid, runid)
);
CREATE INDEX deleted_at_rc_idx ON rated_cdrs (deleted_at);
In this case account it’s customer name or id. My idea is to create a query where I can get the following information:
- cdrs_primary.account
- count(cdrs_primary.cdrhost)
- sum(rated_cdrs.cost)
- sum(rated_cdrs.usage)
- direction
And this need to be filter by the following:
- cdrs_extra.extra_fields['connection']
- cost_details.timespans[0]['MatchedDestId']
- answer_time
My idea it’s to use the following query:
select
p.account,
p.direction,
date_trunc('day', r.answer_time),
count(p.cdrhost) as host,
sum(r.cost) as cost,
sum(r.usage)/60 as minutes
from cdrs_primary p
join rated_cdrs r on p.cgrid = r.cgrid
join cost_details c on (r.cgrid=c.cgrid and r.runid=c.runid)
join cdrs_extra e on (e.cgrid=r.cgrid)
where
r.setup_time > '2015-08-15'
and e.extra_fields->>'connection' = '1'
and c.timespans->0->>'MatchedDestId' IN ('UKN_FM1', 'UKN_FM2')
and p.account='eloy'
group by
p.account, p.direction, date_trunc('day', r.answer_time)
Here you have the explain:
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
HashAggregate (cost=402230.95..402230.99 rows=2 width=40) (actual time=23854.251..23854.251 rows=0 loops=1)
Group Key: p.account, p.direction, date_trunc('day'::text, r.answer_time)
Buffers: shared hit=24645 read=151172
-> Nested Loop (cost=317806.74..402230.92 rows=2 width=40) (actual time=23854.247..23854.247 rows=0 loops=1)
Join Filter: (p.cgrid = c.cgrid)
Buffers: shared hit=24645 read=151172
-> Nested Loop (cost=317806.18..395160.27 rows=249 width=170) (actual time=23854.246..23854.246 rows=0 loops=1)
Buffers: shared hit=24645 read=151172
-> Merge Join (cost=317805.62..320484.41 rows=2643 width=106) (actual time=23854.244..23854.244 rows=0 loops=1)
Merge Cond: (p.cgrid = e.cgrid)
Buffers: shared hit=24645 read=151172
-> Sort (cost=196879.28..198155.65 rows=510548 width=65) (actual time=19980.899..20121.328 rows=518651 loops=1)
Sort Key: p.cgrid
Sort Method: quicksort Memory: 85224kB
Buffers: shared hit=1 read=116096
-> Bitmap Heap Scan on cdrs_primary p (cost=11509.18..148475.03 rows=510548 width=65) (actual time=319.521..15620.892 rows=518651 loops=1)
Recheck Cond: ((account)::text = 'eloy'::text)
Heap Blocks: exact=114107
Buffers: shared hit=1 read=116096
-> Bitmap Index Scan on cdrs_primary_account (cost=0.00..11381.54 rows=510548 width=0) (actual time=270.386..270.386 rows=518651 loops=1)
Index Cond: ((account)::text = 'eloy'::text)
Buffers: shared read=1990
-> Sort (cost=120926.33..120976.15 rows=19928 width=41) (actual time=3247.967..3284.225 rows=144073 loops=1)
Sort Key: e.cgrid
Sort Method: quicksort Memory: 17400kB
Buffers: shared hit=24644 read=35076
-> Seq Scan on cdrs_extra e (cost=0.00..119503.22 rows=19928 width=41) (actual time=2275.996..2358.434 rows=144073 loops=1)
Filter: ((extra_fields ->> 'connection'::text) = '1'::text)
Rows Removed by Filter: 3841475
Buffers: shared hit=24644 read=35076
-> Index Scan using rated_cdrs_runid_cgrid_idx on rated_cdrs r (cost=0.56..28.24 rows=1 width=64) (never executed)
Index Cond: (cgrid = p.cgrid)
Filter: (setup_time > '2015-08-15 00:00:00'::timestamp without time zone)
-> Index Scan using cost_details_cgrid_runid on cost_details c (cost=0.56..28.38 rows=1 width=48) (never executed)
Index Cond: ((cgrid = r.cgrid) AND ((runid)::text = (r.runid)::text))
Filter: (((timespans -> 0) ->> 'MatchedDestId'::text) = ANY ('{UKN_FM1,UKN_FM2}'::text[]))
Planning time: 4.432 ms
Execution time: 23863.701 ms
I added the following index, but I think that didn’t works at all, query it’s
still slow.
cdrs_extra
"cdrs_extra_connection_idx" gin ((extra_fields -> 'connection'::text))
rated_cdrs:
"rated_cdrs_pkey" PRIMARY KEY, btree (id)
"cost_idx" btree (setup_time)
"deleted_at_rc_idx" btree (deleted_at)
"rated_cdrs_runid_cgrid_idx" btree (cgrid, runid)
"setup_time_idx" btree (setup_time DESC)
cost_details:
"cost_details_cgrid_runid" btree (cgrid, runid)
"cost_details_destination_key_idx" gin (((timespans -> 0) -> 'MatchDestId'::text))
The size of the index are the following:
tablename │ indexname │ num_rows │ table_size │ index_size
──────────────┼──────────────────────────────────┼─────────────┼────────────┼────────────
cdrs_extra │ cdrs_extra_connection_idx │ 3.98555e+06 │ 467 MB │ 4232 kB
cdrs_extra │ cdrs_extra_pkey │ 3.98555e+06 │ 467 MB │ 88 MB
cdrs_extra │ deleted_at_ce_idx │ 3.98555e+06 │ 467 MB │ 159 MB
cdrs_primary │ answer_time_idx │ 3.98555e+06 │ 1020 MB │ 154 MB
cdrs_primary │ cdrs_primary_account │ 3.98555e+06 │ 1020 MB │ 115 MB
cdrs_primary │ cdrs_primary_pkey │ 3.98555e+06 │ 1020 MB │ 85 MB
cdrs_primary │ deleted_at_cp_idx │ 3.98555e+06 │ 1020 MB │ 85 MB
cost_details │ cost_details_cgrid_runid │ 6.80088e+06 │ 7320 MB │ 471 MB
cost_details │ cost_details_destination_key_idx │ 6.80088e+06 │ 7320 MB │ 8056 kB
cost_details │ cost_details_pkey │ 6.80088e+06 │ 7320 MB │ 146 MB
cost_details │ deleted_at_cd_idx │ 6.80088e+06 │ 7320 MB │ 146 MB
rated_cdrs │ cost_idx │ 7.74959e+06 │ 1595 MB │ 166 MB
rated_cdrs │ deleted_at_rc_idx │ 7.74959e+06 │ 1595 MB │ 166 MB
rated_cdrs │ rated_cdrs_pkey │ 7.74959e+06 │ 1595 MB │ 166 MB
rated_cdrs │ rated_cdrs_runid_cgrid_idx │ 7.74959e+06 │ 1595 MB │ 537 MB
rated_cdrs │ setup_time_idx │ 7.74959e+06 │ 1595 MB │ 166 MB
Any idea to speed up this query?
Thanks