postgresql - join between two large tables takes very long - sql
I do have two rather large tables and I need to do a date range join between those. Unfortunately the query takes over 12 hours. I am using postgresql 10.5 running in docker with max. 5GB of ram and up to 12 CPU cores available.
Basically in the left table I do have an Equipment ID and a list of date ranges (from = Timestamp, to = ValidUntil). I then want to join the right table, which has measurements (sensor data) for all of the equipments, so that I only get the sensor data that lies within one of the date ranges (from the left table). Query:
select
A.*,
B."Timestamp" as "PressureTimestamp",
B."PropertyValue" as "Pressure"
from A
inner join B
on B."EquipmentId" = A."EquipmentId"
and B."Timestamp" >= A."Timestamp"
and B."Timestamp" < A."ValidUntil"
This query unfortunately is only utilizing one core, which might be the reason that it is running that slow. Is there a way to rewrite the query so it can be parallelized?
Indexes:
create index if not exists A_eq_timestamp_validUntil on public.A using btree ("EquipmentId", "Timestamp", "ValidUntil");
create index if not exists B_eq_timestamp on public.B using btree ("EquipmentId", "Timestamp");
Tables:
-- contains 332,000 rows
CREATE TABLE A (
"EquipmentId" bigint,
"Timestamp" timestamp without time zone,
"ValidUntil" timestamp without time zone
)
WITH ( OIDS = FALSE )
-- contains 70,000,000 rows
CREATE TABLE B
(
"EquipmentId" bigint,
"Timestamp" timestamp without time zone,
"PropertyValue" double precision
)
WITH ( OIDS = FALSE )
Execution plan (explain ... output):
Nested Loop (cost=176853.59..59023908.95 rows=941684055 width=48)
-> Bitmap Heap Scan on v2_pressure p (cost=176853.16..805789.35 rows=9448335 width=24)
Recheck Cond: ("EquipmentId" = 2956235)
-> Bitmap Index Scan on v2_pressure_eq (cost=0.00..174491.08 rows=9448335 width=0)
Index Cond: ("EquipmentId" = 2956235)"
-> Index Scan using v2_prs_eq_timestamp_validuntil on v2_prs prs (cost=0.42..5.16 rows=100 width=32)
Index Cond: (("EquipmentId" = 2956235) AND (p."Timestamp" >= "Timestamp") AND (p."Timestamp" < "ValidUntil"))
Update 1:
Fixed the indexes, according to comments, which improved performance a lot
Index correction is the first resort to fix slowness but it will only help to some extent. Given that your tables are big I would recommend to try Postgres Partition . It has some inbuilt support from postgres.
But you need to have some filter/partition criteria. I don't see any where clause in your query so can't suggest. May be you can try equipmentId. This can also help in achieving parallelism.
-- \i tmp.sql
CREATE TABLE A
( equipmentid bigint NOT NULL
, ztimestamp timestamp without time zone NOT NULL
, validuntil timestamp without time zone NOT NULL
, PRIMARY KEY (equipmentid,ztimestamp)
, UNIQUE (equipmentid,validuntil) -- mustbeunique, since the intervals dont overlap
) ;
-- contains 70,000,000 rows
CREATE TABLE B
( equipmentid bigint NOT NULL
, ztimestamp timestamp without time zone NOT NULL
, propertyvalue double precision
, PRIMARY KEY (equipmentid,ztimestamp)
) ;
INSERT INTO B(equipmentid,ztimestamp,propertyvalue)
SELECT i,t, random()
FROM generate_series(1,1000) i
CROSS JOIN generate_series('2018-09-01','2018-09-30','1day'::interval) t
;
INSERT INTO A(equipmentid,ztimestamp,validuntil)
SELECT equipmentid,ztimestamp, ztimestamp+ '7 days'::interval
FROM B
WHERE date_part('dow', ztimestamp) =0
;
ANALYZE A;
ANALYZE B;
EXPLAIN
SELECT
A.*,
B.ztimestamp AS pressuretimestamp,
B.propertyvalue AS pressure
FROM A
INNER JOIN B
ON B.equipmentid = A.equipmentid
AND B.ztimestamp >= A.ztimestamp
AND B.ztimestamp < A.validuntil
WHERE A.equipmentid=333 -- I added this, the plan in the question also has a r estriction on Id
;
And the resulting plan:
SET
ANALYZE
ANALYZE
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.34..21.26 rows=17 width=40)
-> Index Scan using a_equipmentid_validuntil_key on a (cost=0.17..4.34 rows=5 width=24)
Index Cond: (equipmentid = 333)
-> Index Scan using b_pkey on b (cost=0.17..3.37 rows=3 width=24)
Index Cond: ((equipmentid = 333) AND (ztimestamp >= a.ztimestamp) AND (ztimestamp < a.validuntil))
(5 rowSET
That is with my current setting of random_page_cost=1.1;
After setting it to 4.0, I get the same plan as the OP:
SET
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=35.13..54561.69 rows=1416136 width=40) (actual time=1.391..1862.275 rows=225540 loops=1)
-> Bitmap Heap Scan on aa2 (cost=34.71..223.52 rows=1345 width=24) (actual time=1.173..5.223 rows=1345 loops=1)
Recheck Cond: (equipmentid = 5)
Heap Blocks: exact=9
-> Bitmap Index Scan on aa2_equipmentid_validuntil_key (cost=0.00..34.38 rows=1345 width=0) (actual time=1.047..1.048 rows=1345 loops=1)
Index Cond: (equipmentid = 5)
-> Index Scan using bb2_pkey on bb2 (cost=0.42..29.87 rows=1053 width=24) (actual time=0.109..0.757 rows=168 loops=1345)
Index Cond: ((equipmentid = 5) AND (ztimestamp >= aa2.ztimestamp) AND (ztimestamp < aa2.validuntil))
Planning Time: 3.167 ms
Execution Time: 2168.967 ms
(10 rows)
Related
Should I use single of composite index?
What is the right way to set index for next query? SELECT t1.purchaseNumber, t1.parsing_status, t1.docPublishDate FROM xml_files t1 LEFT JOIN xml_files t2 ON t1.purchaseNumber = t2.purchaseNumber AND t1.docPublishDate < t2.docPublishDate WHERE t1.parsing_status IS NULL AND t2.parsing_status IS NULL AND t2.docPublishDate IS NULL AND t1.section_name='contracts' AND t1.parsing_status IS NULL AND t1.random IN (1,2,3,4) Should I create composite index or better to create single index for every table that used in query? Also if I am doing comparison of timestamp docPublishDate how should I create in index? Should I use desc keyword? purchaseNumber - varchar(50) parsing_status - varchar(10) random - integer section_name - varchar(10) EXPLAIN (ANALYZE, BUFFERS) query;: Gather (cost=1000.86..137158.61 rows=43091 width=35) (actual time=22366.063..72674.678 rows=46518 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=99244069 read=144071 -> Nested Loop Anti Join (cost=0.86..131849.51 rows=17955 width=35) (actual time=22309.989..72440.514 rows=15506 loops=3) Buffers: shared hit=99244069 read=144071 -> Parallel Index Scan using index_for_xml_files_parsing_status on xml_files t1 (cost=0.43..42606.31 rows=26932 width=35) (actual time=0.086..193.982 rows=40725 loops=3) Index Cond: ((parsing_status IS NULL) AND (parsing_status IS NULL)) Filter: (((section_name)::text = 'contracts'::text) AND (random = ANY ('{1,2,3,4}'::integer[]))) Rows Removed by Filter: 383974 Buffers: shared hit=15724 read=42304 -> Index Scan using "index_for_xml_files_purchaseNumber" on xml_files t2 (cost=0.43..4.72 rows=3 width=27) (actual time=1.773..1.773 rows=1 loops=122174) Index Cond: (("purchaseNumber")::text = (t1."purchaseNumber")::text) Filter: (t1."docPublishDate" < "docPublishDate") Rows Removed by Filter: 6499 Buffers: shared hit=99228345 read=101767 Planning Time: 0.396 ms Execution Time: 72681.868 ms Data example: How to improve speed of query?
You should explain what you want the query to do. I would write the query more clearly as: SELECT t1.purchaseNumber, t1.parsing_status, t1.docPublishDate FROM xml_files t1 WHERE t1.section_name = 'contracts' AND t1.parsing_status IS NULL AND t1.random IN (1, 2, 3, 4) AND NOT EXISTS (SELECT 1 FROM xml_files t2 WHERE t1.purchaseNumber = t2.purchaseNumber AND t1.docPublishDate < t2.docPublishDate ); For this query, I would suggest the the following indexes: create index idx_xml_files_3 on xml_files(section_name, random) where parsing_status is null; create index idx_xml_files_2 on xml_files(purchaseNumber, docPublishDate); There is probably an even better way to write the query, using window functions for instance. However, it is not clear what your data looks like nor what the query is intended to do.
The index scan on the inner side of the nested loop join is inefficient: on average, 6499 of the 6500 rows found are discarded. Create a better index: CREATE INDEX ON xml_files ("purchaseNumber", "docPublishDate");
Slow LEFT JOIN on CTE with time intervals
I am trying to debug a query in PostgreSQL that I've built to bucket market data in time buckets in arbitrary time intervals. Here is my table definition: CREATE TABLE historical_ohlcv ( exchange_symbol TEXT NOT NULL, symbol_id TEXT NOT NULL, kafka_key TEXT NOT NULL, open NUMERIC, high NUMERIC, low NUMERIC, close NUMERIC, volume NUMERIC, time_open TIMESTAMP WITH TIME ZONE NOT NULL, time_close TIMESTAMP WITH TIME ZONE, CONSTRAINT historical_ohlcv_pkey PRIMARY KEY (exchange_symbol, symbol_id, time_open) ); CREATE INDEX symbol_id_idx ON historical_ohlcv (symbol_id); CREATE INDEX open_close_symbol_id ON historical_ohlcv (time_open, time_close, exchange_symbol, symbol_id); CREATE INDEX time_open_idx ON historical_ohlcv (time_open); CREATE INDEX time_close_idx ON historical_ohlcv (time_close); The table has ~25m rows currently. My query as an example for 1 hour, but could be 5 mins, 10 mins, 2 days, etc. EXPLAIN ANALYZE WITH vals AS ( SELECT NOW() - '5 months' :: INTERVAL AS frame_start, NOW() AS frame_end, INTERVAL '1 hour' AS t_interval ) , grid AS ( SELECT start_time, lead(start_time, 1) OVER ( ORDER BY start_time ) AS end_time FROM ( SELECT generate_series(frame_start, frame_end, t_interval) AS start_time, frame_end FROM vals ) AS x ) SELECT max(high) FROM grid g LEFT JOIN historical_ohlcv ohlcv ON ohlcv.time_open >= g.start_time WHERE exchange_symbol = 'BINANCE' AND symbol_id = 'ETHBTC' GROUP BY start_time; The WHERE clause could be any valid value in the table. This technique was inspired by: Best way to count records by arbitrary time intervals in Rails+Postgres. The idea is to make a common table and left join your data with that to indicate which bucket stuff is in. This query is really slow! It's currently taking 15s. Based on the query planner, we have a really expensive nested loop: QUERY PLAN HashAggregate (cost=2758432.05..2758434.05 rows=200 width=40) (actual time=16023.713..16023.817 rows=542 loops=1) Group Key: g.start_time CTE vals -> Result (cost=0.00..0.02 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1) CTE grid -> WindowAgg (cost=64.86..82.36 rows=1000 width=16) (actual time=2.986..9.594 rows=3625 loops=1) -> Sort (cost=64.86..67.36 rows=1000 width=8) (actual time=2.981..4.014 rows=3625 loops=1) Sort Key: x.start_time Sort Method: quicksort Memory: 266kB -> Subquery Scan on x (cost=0.00..15.03 rows=1000 width=8) (actual time=0.014..1.991 rows=3625 loops=1) -> ProjectSet (cost=0.00..5.03 rows=1000 width=16) (actual time=0.013..1.048 rows=3625 loops=1) -> CTE Scan on vals (cost=0.00..0.02 rows=1 width=32) (actual time=0.008..0.009 rows=1 loops=1) -> Nested Loop (cost=0.56..2694021.34 rows=12865667 width=14) (actual time=7051.730..16015.873 rows=31978 loops=1) -> CTE Scan on grid g (cost=0.00..20.00 rows=1000 width=16) (actual time=2.988..11.635 rows=3625 loops=1) -> Index Scan using historical_ohlcv_pkey on historical_ohlcv ohlcv (cost=0.56..2565.34 rows=12866 width=22) (actual time=3.712..4.413 rows=9 loops=3625) Index Cond: ((exchange_symbol = 'BINANCE'::text) AND (symbol_id = 'ETHBTC'::text) AND (time_open >= g.start_time)) Filter: (time_close < g.end_time) Rows Removed by Filter: 15502 Planning time: 0.568 ms Execution time: 16023.979 ms My guess is this line is doing a lot: LEFT JOIN historical_ohlcv ohlcv ON ohlcv.time_open >= g.start_time AND ohlcv.time_close < g.end_time But I'm not sure how to accomplish this in another way. P.S. apologies if this belongs to dba.SE. I read the FAQ and this seemed too basic for that site, so I posted here. Edits as requested: SELECT avg(pg_column_size(t)) FROM historical_ohlcv t TABLESAMPLE SYSTEM (0.1); returns 107.632 For exchange_symbol, there are 3 unique values, for symbol_id there are ~400 PostgreSQL version: PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit. The table will be growing about ~1m records a day, so not exactly read-only. All this stuff is done locally and I will try to move to RDS or to help manage hardware issues. Related: if I wanted to add other aggregates, specifically 'first in the bucket', 'last in the bucket', min, sum, would my indexing strategy change?
Correctness first: I suspect a bug in your query: LEFT JOIN historical_ohlcv ohlcv ON ohlcv.time_open >= g.start_time AND ohlcv.time_close < g.end_time Unlike my referenced answer, you join on a time interval: (time_open, time_close]. The way you do it excludes rows in the table where the interval crosses bucket borders. Only intervals fully contained in a single bucket count. I don't think that's intended? A simple fix would be to decide bucket membership based on time_open (or time_close) alone. If you want to keep working with both, you have to define exactly how to deal with intervals overlapping with multiple buckets. Also, you are looking for max(high) per bucket, which is different in nature from count(*) in my referenced answer. And your buckets are simple intervals per hour? Then we can radically simplify. Working with just time_open: SELECT date_trunc('hour', time_open) AS hour, max(high) AS max_high FROM historical_ohlcv WHERE exchange_symbol = 'BINANCE' AND symbol_id = 'ETHBTC' AND time_open >= now() - interval '5 months' -- frame_start AND time_open < now() -- frame_end GROUP BY 1 ORDER BY 1; Related: Resample on time series data It's hard to talk about further performance optimization while basics are unclear. And we'd need more information. Are WHERE conditions variable? How many distinct values in exchange_symbol and symbol_id? Avg. row size? What do you get for: SELECT avg(pg_column_size(t)) FROM historical_ohlcv t TABLESAMPLE SYSTEM (0.1); Is the table read-only? Assuming you always filter on exchange_symbol and symbol_id and values are variable, your table is read-only or autovacuum can keep up with the write load so we can hope for index-only scans, you would best have a multicolumn index on (exchange_symbol, symbol_id, time_open, high DESC) to support this query. Index columns in this order. Related: Multicolumn index and performance Depending on data distribution and other details a LEFT JOIN LATERAL solution might be another option. Related: How to find an average of values for time intervals in postgres Optimize GROUP BY query to retrieve latest record per user Aside from all that, you EXPLAIN plan exhibits some very bad estimates: https://explain.depesz.com/s/E5yI Are you using a current version of Postgres? You may have to work on your server configuration - or at least set higher statistics targets on relevant columns and more aggressive autovacuum settings for the big table. Related: Keep PostgreSQL from sometimes choosing a bad query plan Aggressive Autovacuum on PostgreSQL
Performance of max() vs ORDER BY DESC + LIMIT 1
I was troubleshooting a few slow SQL queries today and don't quite understand the performance difference below: When trying to extract the max(timestamp) from a data table based on some condition, using MAX() is slower than ORDER BY timestamp LIMIT 1 if a matching row exists, but considerably faster if no matching row is found. SELECT timestamp FROM data JOIN sensors ON ( sensors.id = data.sensor_id ) WHERE sensor.station_id = 4 ORDER BY timestamp DESC LIMIT 1; (0 rows) Time: 1314.544 ms SELECT timestamp FROM data JOIN sensors ON ( sensors.id = data.sensor_id ) WHERE sensor.station_id = 5 ORDER BY timestamp DESC LIMIT 1; (1 row) Time: 10.890 ms SELECT MAX(timestamp) FROM data JOIN sensors ON ( sensors.id = data.sensor_id ) WHERE sensor.station_id = 4; (0 rows) Time: 0.869 ms SELECT MAX(timestamp) FROM data JOIN sensors ON ( sensors.id = data.sensor_id ) WHERE sensor.station_id = 5; (1 row) Time: 84.087 ms There are indexes on (timestamp) and (sensor_id, timestamp), and I noticed that Postgres uses very different query plans and indexes for both cases: QUERY PLAN (ORDER BY) -------------------------------------------------------------------------------------------------------- Limit (cost=0.43..9.47 rows=1 width=8) -> Nested Loop (cost=0.43..396254.63 rows=43823 width=8) Join Filter: (data.sensor_id = sensors.id) -> Index Scan using timestamp_ind on data (cost=0.43..254918.66 rows=4710976 width=12) -> Materialize (cost=0.00..6.70 rows=2 width=4) -> Seq Scan on sensors (cost=0.00..6.69 rows=2 width=4) Filter: (station_id = 4) (7 rows) QUERY PLAN (MAX) ---------------------------------------------------------------------------------------------------------- Aggregate (cost=3680.59..3680.60 rows=1 width=8) -> Nested Loop (cost=0.43..3571.03 rows=43823 width=8) -> Seq Scan on sensors (cost=0.00..6.69 rows=2 width=4) Filter: (station_id = 4) -> Index Only Scan using sensor_ind_timestamp on data (cost=0.43..1389.59 rows=39258 width=12) Index Cond: (sensor_id = sensors.id) (6 rows) So my two questions are: Where does this performance difference come from? I've seen the accepted answer here MIN/MAX vs ORDER BY and LIMIT, but that doesn't quite seem to apply here. Any good resources would be appreciated. Are there better ways to increase performance in all cases (matching row vs no matching row) than adding an EXISTS check? EDIT to address the questions in the comments below. I kept the initial query plans above for future reference: Table definitions: Table "public.sensors" Column | Type | Modifiers ----------------------+------------------------+----------------------------------------------------------------- id | integer | not null default nextval('sensors_id_seq'::regclass) station_id | integer | not null .... Indexes: "sensor_primary" PRIMARY KEY, btree (id) "ind_station_id" btree (station_id, id) "ind_station" btree (station_id) Table "public.data" Column | Type | Modifiers -----------+--------------------------+------------------------------------------------------------------ id | integer | not null default nextval('data_id_seq'::regclass) timestamp | timestamp with time zone | not null sensor_id | integer | not null avg | integer | Indexes: "timestamp_ind" btree ("timestamp" DESC) "sensor_ind" btree (sensor_id) "sensor_ind_timestamp" btree (sensor_id, "timestamp") "sensor_ind_timestamp_desc" btree (sensor_id, "timestamp" DESC) Note that I added ind_station_id on sensors just now after #Erwin's suggestion below. Timings haven't really changed drastically, still >1200ms in the ORDER BY DESC + LIMIT 1 case and ~0.9ms in the MAX case. Query Plans: QUERY PLAN (ORDER BY) ---------------------------------------------------------------------------------------------------------- Limit (cost=0.58..9.62 rows=1 width=8) (actual time=2161.054..2161.054 rows=0 loops=1) Buffers: shared hit=3418066 read=47326 -> Nested Loop (cost=0.58..396382.45 rows=43823 width=8) (actual time=2161.053..2161.053 rows=0 loops=1) Join Filter: (data.sensor_id = sensors.id) Buffers: shared hit=3418066 read=47326 -> Index Scan using timestamp_ind on data (cost=0.43..255048.99 rows=4710976 width=12) (actual time=0.047..1410.715 rows=4710976 loops=1) Buffers: shared hit=3418065 read=47326 -> Materialize (cost=0.14..4.19 rows=2 width=4) (actual time=0.000..0.000 rows=0 loops=4710976) Buffers: shared hit=1 -> Index Only Scan using ind_station_id on sensors (cost=0.14..4.18 rows=2 width=4) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (station_id = 4) Heap Fetches: 0 Buffers: shared hit=1 Planning time: 0.478 ms Execution time: 2161.090 ms (15 rows) QUERY (MAX) ---------------------------------------------------------------------------------------------------------- Aggregate (cost=3678.08..3678.09 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=1) Buffers: shared hit=1 -> Nested Loop (cost=0.58..3568.52 rows=43823 width=8) (actual time=0.006..0.006 rows=0 loops=1) Buffers: shared hit=1 -> Index Only Scan using ind_station_id on sensors (cost=0.14..4.18 rows=2 width=4) (actual time=0.005..0.005 rows=0 loops=1) Index Cond: (station_id = 4) Heap Fetches: 0 Buffers: shared hit=1 -> Index Only Scan using sensor_ind_timestamp on data (cost=0.43..1389.59 rows=39258 width=12) (never executed) Index Cond: (sensor_id = sensors.id) Heap Fetches: 0 Planning time: 0.435 ms Execution time: 0.048 ms (13 rows) So just like in the earlier explains, ORDER BY does a Scan using timestamp_in on data, which is not done in the MAX case. Postgres version: Postgres from the Ubuntu repos: PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 5.2.1-21ubuntu2) 5.2.1 20151003, 64-bit Note that there are NOT NULL constraints in place, so ORDER BY won't have to sort over empty rows. Note also that I'm largely interested in where the difference comes from. While not ideal, I can retrieve data relatively quickly using EXISTS (<1ms) and then SELECT (~11ms).
There does not seem to be an index on sensor.station_id, which is important here. There is an actual difference between max() and ORDER BY DESC + LIMIT 1. Many people seem to miss that. NULL values sort first in descending sort order. So ORDER BY timestamp DESC LIMIT 1 returns a NULL value if one exists, while the aggregate function max() ignores NULL values and returns the latest not-null timestamp. ORDER BY timestamp DESC NULLS LAST LIMIT 1 would be equivalent For your case, since your column d.timestamp is defined NOT NULL (as your update revealed), there is no effective difference. An index with DESC NULLS LAST and the same clause in the ORDER BY for the LIMIT query should still serve you best. I suggest these indexes (my query below builds on the 2nd one): sensor(station_id, id) data(sensor_id, timestamp DESC NULLS LAST) You can drop the other indexes sensor_ind_timestamp and sensor_ind_timestamp_desc unless they are in use otherwise (unlikely, but possible). Much more importantly, there is another difficulty: The filter on the first table sensors returns few, but still (possibly) multiple rows. Postgres expects to find 2 rows (rows=2) in your added EXPLAIN output. The perfect technique would be an index-skip-scan (a.k.a. loose index scan) for the second table data - which is not currently implemented (up to at least Postgres 15). There are various workarounds. See: Optimize GROUP BY query to retrieve latest row per user The best should be: SELECT d.timestamp FROM sensors s CROSS JOIN LATERAL ( SELECT timestamp FROM data WHERE sensor_id = s.id ORDER BY timestamp DESC NULLS LAST LIMIT 1 ) d WHERE s.station_id = 4 ORDER BY d.timestamp DESC NULLS LAST LIMIT 1; The choice between max() and ORDER BY / LIMIT hardly matters in comparison. You might as well: SELECT max(d.timestamp) AS timestamp FROM sensors s CROSS JOIN LATERAL ( SELECT timestamp FROM data WHERE sensor_id = s.id ORDER BY timestamp DESC NULLS LAST LIMIT 1 ) d WHERE s.station_id = 4; Or: SELECT max(d.timestamp) AS timestamp FROM sensors s CROSS JOIN LATERAL ( SELECT max(timestamp) AS timestamp FROM data WHERE sensor_id = s.id ) d WHERE s.station_id = 4; Or even with a correlated subquery, shortest of all: SELECT max((SELECT max(timestamp) FROM data WHERE sensor_id = s.id)) AS timestamp FROM sensors s WHERE station_id = 4; Note the double parentheses! The additional advantage of LIMIT in a LATERAL join is that you can retrieve arbitrary columns of the selected row, not just the latest timestamp (one column). Related: Why do NULL values come first when ordering DESC in a PostgreSQL query? What is the difference between a LATERAL JOIN and a subquery in PostgreSQL? Select first row in each GROUP BY group? Optimize groupwise maximum query
The query plan shows index names timestamp_ind and timestamp_sensor_ind. But indexes like that do not help with a search for a particular sensor. To resolve an equals query (like sensor.id = data.sensor_id) the column has to be the first in the index. Try to add an index that allows searching on sensor_id and, within a sensor, is sorted by timestamp: create index sensor_timestamp_ind on data(sensor_id, timestamp); Does adding that index speed up the query?
Improve PostgreSQL query performance
When running this query in my server it's very slow, and I can't understand why. Can anyone help me figure it out? Query: SELECT "t_dat"."t_year" AS "c0", "t_dat"."t_month" AS "c1", "t_dat"."t_week" AS "c2", "t_dat"."t_day" AS "c3", "t_purs"."p_id" AS "c4", sum("t_purs"."days") AS "m0", sum("t_purs"."timecreated") AS "m1" FROM "t_dat", "t_purs" WHERE "t_purs"."created" = "t_dat"."t_key" AND "t_dat"."t_year" = 2013 AND "t_dat"."t_month" = 3 AND "t_dat"."t_week" = 9 AND "t_dat"."t_day" IN (1,2) AND "t_purs"."p_id" IN ( '4','15','18','19','20','29', '31','35','46','56','72','78') GROUP BY "t_dat"."t_year", "t_dat"."t_month", "t_dat"."t_week", "t_dat"."t_day", "t_purs"."p_id" Explain Analyze: HashAggregate (cost=12252.04..12252.04 rows=1 width=28) (actualtime=10212.374..10212.384 rows=10 loops=1) -> Nested Loop (cost=0.00..12252.03 rows=1 width=28) (actual time=3016.006..10212.249 rows=14 loops=1) Join Filter: (t_dat.t_key = t_purs.created) -> Seq Scan on t_dat (cost=0.00..129.90 rows=1 width=20) (actual time=0.745..2.040 rows=48 loops=1) Filter: ((t_day = ANY ('{1,2}'::integer[])) AND (t_year = 2013) AND (t_month = 3) AND (t_week = 9)) -> Seq Scan on t_purs (cost=0.00..12087.49 rows=9900 width=16) (actual time=0.018..201.630 rows=14014 loops=48) Filter: (p_id = ANY ('{4,15,18,19,20,29,31,35,46,56,72,78}'::integer[])) Total runtime: 10212.470 ms
It is difficult to say what exactly you are missing, but if I were you, I would make sure that following index exists: CREATE INDEX t_dat_id_date_idx ON t_dat (t_key, t_year, t_month, t_week, t_day); For t_purs, create this index: CREATE INDEX t_purs_created_p_id_idx ON t_purs (created, p_id);
Consider using a single column in your table: t_date date instead of (t_year, t_month, t_week, t_day). The data type date occupies 4 byte. That would shrink your table a bit, make the index smaller and faster and grouping a lot easier. Year, month, week and day can easily and quickly be extracted from a date with extract(). Your query could then look like this and would be faster: SELECT extract (year FROM t_date) AS c0 ,extract (month FROM t_date) AS c1 ,extract (week FROM t_date) AS c2 ,extract (day FROM t_date) AS c3 ,p.p_id AS c4 ,sum(p.days) AS m0 ,sum(p.timecreated) AS m1 FROM t_dat d JOIN t_purs p ON p.created = d.t_key WHERE d.t_date IN ('2013-03-01'::date, '2013-03-02'::date) AND p.p_id IN (4,15,18,19,20,29,31,35,46,56,72,78) GROUP BY d.t_date, p.p_id; More important for performance is the index, which would then simply be: CREATE INDEX t_dat_date_idx ON t_dat (t_key, t_date); Or, depending on data distribution: CREATE INDEX t_dat_date_idx ON t_dat (t_date, t_key); The sequence of column matters. You may even create both.
Your query is having sequential scans on t_purs and t_dat. Creating the appropriate indexes will help you make this query faster and avoid sequential scans. create index index_name on t_purs(created) where created is not null; create index index_name on t_dat using btree(t_key, t_year, t_month, t_week, t_day) Run explain analyze after running the above two queries. You'll see the plan time and execution time will be reduced.
Query records linked through key value pairs to records that actually match criteria
We have a simple, generic tables structure, implemented in PostgreSQL (8.3; 9.1 is at our horizon). It seems a very straightforward and common implementation. It boils down to this: events_event_types ( # this table holds some 50 rows id bigserial # PK "name" character varying(255) ) events_events ( # this table holds some 15M rows id bigserial # PK datetime timestamp with time zone eventtype_id bigint # FK to events_event_types.id ) CREATE TABLE events_eventdetails ( # this table holds some 65M rows id bigserial # PK keyname character varying(255) "value" text event_id bigint # FK to events_events.id ) Some of the rows in events_events and events_eventdetails tables would be like this: events_events | events_eventdetails id datetime eventtype_id | id keyname value event_id ----------------------------|------------------------------------------- 100 ... 10 | 1000 transactionId 9774ae16-... 100 | 1001 someKey some value 100 200 ... 20 | 2000 transactionId 9774ae16-... 200 | 2001 reductionId 123 200 | 2002 reductionId 456 200 300 ... 30 | 3000 transactionId 9774ae16-... 300 | 2001 customerId 234 300 | 2001 companyId 345 300 We are in desperate need of a "solution" that returns events_events rows 100 and 200 and 300 together in a single result set and FAST! when asked for reductionId=123 or when asked for customerId=234 or when asked for companyId=345. (Possibly interested in an AND combination of these criteria, but that's not essentially the goal.) Not sure if it matters at this point, but the result set should be filterable on datetime range and eventtype_id (IN list) and be given a LIMIT. I ask for a "solution", since this could be either: A single query Two smaller queries (as long as their intermediate result is always small enough. I followed this approach and got stuck for companies (companyId) with large amounts (~20k) of associated transactions (transactionId)) A subtle redesign (e.g. denormalization) This is not a fresh question as we tried all three approaches over many months (won't bother you with those queries) but it all fails at performance. The solution should return in <<<1s. Previous attempts took approx. 10s at best. I'd really appreciate some help -- I'm at a loss now... The two smaller queries approach looks much like this: Query 1: SELECT Substring(details2_transvalue.VALUE, 0, 32) FROM events_eventdetails details2_transvalue JOIN events_eventdetails compdetails ON details2_transvalue.event_id = compdetails.event_id AND compdetails.keyname = 'companyId' AND Substring(compdetails.VALUE, 0, 32) = '4' AND details2_transvalue.keyname = 'transactionId' Query 2: SELECT events1.* FROM events_events events1 JOIN events_eventdetails compDetails ON events1.id = compDetails.event_id AND compDetails.keyname='companyId' AND substring(compDetails.value,0,32)='4' WHERE events1.eventtype_id IN (...) UNION SELECT events2.* FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' AND substring(details2_transKey.value,0,32) IN ( -- result of query 1 goes here -- ) WHERE events2.eventtype_id IN (...) ORDER BY dateTime DESC LIMIT 50 Performance of this gets poor due to the large set returned by query 1. As you can see, values in the events_eventdetails table are always expressed as length 32 substrings, which we have indexed as such. Further indices on keyname, event_id, event_id + keyname, keyname + length 32 substring. Here is a PostgreSQL 9.1 approach -- even though I don't officially have that platform at my disposal: WITH companyevents AS ( SELECT events1.* FROM events_events events1 JOIN events_eventdetails compDetails ON events1.id = compDetails.event_id AND compDetails.keyname='companyId' AND substring(compDetails.value,0,32)=' -- my desired companyId -- ' WHERE events1.eventtype_id in (...) ORDER BY dateTime DESC LIMIT 50 ) SELECT * from events_events WHERE transaction_id IN (SELECT transaction_id FROM companyevents) OR id IN (SELECT id FROM companyevents) AND eventtype_id IN (...) ORDER BY dateTime DESC LIMIT 250; The query plan is as follows for companyId with 28228 transactionIds: Limit (cost=7545.99..7664.33 rows=250 width=130) (actual time=210.100..3026.267 rows=50 loops=1) CTE companyevents -> Limit (cost=7543.62..7543.74 rows=50 width=130) (actual time=206.994..207.020 rows=50 loops=1) -> Sort (cost=7543.62..7544.69 rows=429 width=130) (actual time=206.993..207.005 rows=50 loops=1) Sort Key: events1.datetime Sort Method: top-N heapsort Memory: 23kB -> Nested Loop (cost=10.02..7529.37 rows=429 width=130) (actual time=0.093..178.719 rows=28228 loops=1) -> Append (cost=10.02..1140.62 rows=657 width=8) (actual time=0.082..27.594 rows=28228 loops=1) -> Bitmap Heap Scan on events_eventdetails compdetails (cost=10.02..394.47 rows=97 width=8) (actual time=0.021..0.021 rows=0 loops=1) Recheck Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text)) -> Bitmap Index Scan on events_eventdetails_substring_ind (cost=0.00..10.00 rows=97 width=0) (actual time=0.019..0.019 rows=0 loops=1) Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text)) -> Index Scan using events_eventdetails_companyid_substring_ind on events_eventdetails_companyid compdetails (cost=0.00..746.15 rows=560 width=8) (actual time=0.061..18.655 rows=28228 loops=1) Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text)) -> Index Scan using events_events_pkey on events_events events1 (cost=0.00..9.71 rows=1 width=130) (actual time=0.004..0.004 rows=1 loops=28228) Index Cond: (id = compdetails.event_id) Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])) -> Index Scan Backward using events_events_datetime_ind on events_events (cost=2.25..1337132.75 rows=2824764 width=130) (actual time=210.100..3026.255 rows=50 loops=1) Filter: ((hashed SubPlan 2) OR ((hashed SubPlan 3) AND (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])))) SubPlan 2 -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=90) (actual time=206.998..207.071 rows=50 loops=1) SubPlan 3 -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=8) (actual time=0.001..0.026 rows=50 loops=1) Total runtime: 3026.410 ms The query plan is as follows for companyId with 288 transactionIds: Limit (cost=7545.99..7664.33 rows=250 width=130) (actual time=30.976..3790.362 rows=54 loops=1) CTE companyevents -> Limit (cost=7543.62..7543.74 rows=50 width=130) (actual time=9.263..9.290 rows=50 loops=1) -> Sort (cost=7543.62..7544.69 rows=429 width=130) (actual time=9.263..9.272 rows=50 loops=1) Sort Key: events1.datetime Sort Method: top-N heapsort Memory: 24kB -> Nested Loop (cost=10.02..7529.37 rows=429 width=130) (actual time=0.071..8.195 rows=1025 loops=1) -> Append (cost=10.02..1140.62 rows=657 width=8) (actual time=0.060..1.348 rows=1025 loops=1) -> Bitmap Heap Scan on events_eventdetails compdetails (cost=10.02..394.47 rows=97 width=8) (actual time=0.021..0.021 rows=0 loops=1) Recheck Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text)) -> Bitmap Index Scan on events_eventdetails_substring_ind (cost=0.00..10.00 rows=97 width=0) (actual time=0.019..0.019 rows=0 loops=1) Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text)) -> Index Scan using events_eventdetails_companyid_substring_ind on events_eventdetails_companyid compdetails (cost=0.00..746.15 rows=560 width=8) (actual time=0.039..1.006 rows=1025 loops=1) Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text)) -> Index Scan using events_events_pkey on events_events events1 (cost=0.00..9.71 rows=1 width=130) (actual time=0.005..0.006 rows=1 loops=1025) Index Cond: (id = compdetails.event_id) Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])) -> Index Scan Backward using events_events_datetime_ind on events_events (cost=2.25..1337132.75 rows=2824764 width=130) (actual time=30.975..3790.332 rows=54 loops=1) Filter: ((hashed SubPlan 2) OR ((hashed SubPlan 3) AND (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])))) SubPlan 2 -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=90) (actual time=9.266..9.327 rows=50 loops=1) SubPlan 3 -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=8) (actual time=0.001..0.019 rows=50 loops=1) Total runtime: 3796.736 ms With 3s/4s this is not bad at all, but still a factor 100+ too slow. Also, this wasn't on relevant hardware. Nonetheless it should show where the pain is. Here is something that could possibly grow into a solution: Added a table: events_transaction_helper ( event_id bigint not null transactionid character varying(36) not null keyname character varying(255) not null value bigint not null # index on keyname, value ) I "manually" filled this table now, but a materialized view implementation would do the trick. It would much follow the below query: SELECT tr.event_id, tr.value AS transactionid, det.keyname, det.value AS value FROM events_eventdetails tr JOIN events_eventdetails det ON det.event_id = tr.event_id WHERE tr.keyname = 'transactionId' AND det.keyname IN ('companyId', 'reduction_id', 'customer_id'); Added a column to the events_events table: transaction_id character varying(36) null This new column is filled as follows: update events_events set transaction_id = (select value from events_eventdetails where keyname='transactionId' and event_id=events_events.id); Now, the following query returns in <15ms consistently: explain analyze select * from events_events where transactionId in (select distinct transactionid from events_transaction_helper WHERE keyname='companyId' and value=5) and eventtype_id in (...) order by datetime desc limit 250; Limit (cost=5075.23..5075.85 rows=250 width=130) (actual time=8.901..9.028 rows=250 loops=1) -> Sort (cost=5075.23..5077.19 rows=785 width=130) (actual time=8.900..8.953 rows=250 loops=1) Sort Key: events_events.datetime Sort Method: top-N heapsort Memory: 81kB -> Nested Loop (cost=57.95..5040.04 rows=785 width=130) (actual time=0.928..8.268 rows=524 loops=1) -> HashAggregate (cost=52.30..52.42 rows=12 width=37) (actual time=0.895..0.991 rows=276 loops=1) -> Subquery Scan on "ANY_subquery" (cost=52.03..52.27 rows=12 width=37) (actual time=0.558..0.757 rows=276 loops=1) -> HashAggregate (cost=52.03..52.15 rows=12 width=37) (actual time=0.556..0.638 rows=276 loops=1) -> Index Scan using testmaterializedviewkeynamevalue on events_transaction_helper (cost=0.00..51.98 rows=22 width=37) (actual time=0.068..0.404 rows=288 loops=1) Index Cond: (((keyname)::text = 'companyId'::text) AND (value = 5)) -> Bitmap Heap Scan on events_events (cost=5.65..414.38 rows=100 width=130) (actual time=0.023..0.024 rows=2 loops=276) Recheck Cond: ((transactionid)::text = ("ANY_subquery".transactionid)::text) Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])) -> Bitmap Index Scan on testtransactionid (cost=0.00..5.63 rows=100 width=0) (actual time=0.020..0.020 rows=2 loops=276) Index Cond: ((transactionid)::text = ("ANY_subquery".transactionid)::text) Total runtime: 9.122 ms I'll check back later to let you know if this turned out a feasible solution for real :)
The Idea is not to denormalise, but to normalise. The events_details() table can be replaced by two tables: one with the event_detail_types, and one with the actual values (referring to the {even_id,detail_types}. This will make the execution of the query easier, since only the numerical ids of the detail_types have to be extracted and selected. The gain is in the reduced number of pages that has to be fetched by the DBMS, since all the key name need only be stored+retrieved+compared once. NOTE: I changed the naming a bit. For reasons of sanity and safety, mostly. SET search_path='cav'; /**** ***/ DROP SCHEMA cav CASCADE; CREATE SCHEMA cav; SET search_path='cav'; CREATE TABLE event_types ( -- this table holds some 50 rows id bigserial PRIMARY KEY , zname varchar(255) ); INSERT INTO event_types(zname) SELECT 'event_'::text || gs::text FROM generate_series (1,100) gs ; CREATE TABLE events ( -- this table holds some 15M rows id bigserial PRIMARY KEY , zdatetime timestamp with time zone , eventtype_id bigint REFERENCES event_types(id) ); INSERT INTO events(zdatetime,eventtype_id) SELECT gs, et.id FROM generate_series ('2012-04-11 00:00:00'::timestamp , '2012-04-12 12:00:00'::timestamp ,' 1 hour'::interval ) gs , event_types et ; -- SELECT * FROM event_types; -- SELECT * FROM events; CREATE TABLE event_details ( -- this table holds some 65M rows id bigserial PRIMARY KEY , event_id bigint REFERENCES events(id) , keyname varchar(255) , zvalue text ); INSERT INTO event_details(event_id, keyname) SELECT ev.id,im.* FROM events ev , (VALUES ('transactionId'::text),('someKey'::text) ,('reductionId'::text),('customerId'::text),('companyId'::text) ) im ; UPDATE event_details SET zvalue = 'Some_value'::text || (random() * 1000)::int::text ; -- -- Domain table with all valid detail_types -- CREATE TABLE detail_types( id bigserial PRIMARY KEY , keyname varchar(255) ); INSERT INTO detail_types(keyname) SELECT DISTINCT keyname FROM event_details ; -- -- Context-attribute-value table, referencing {event_id, type_id} -- CREATE TABLE event_detail_values ( event_id BIGINT , detail_type_id BIGINT , zvalue text , PRIMARY KEY(event_id , detail_type_id) , FOREIGN KEY(event_id ) REFERENCES events(id) , FOREIGN KEY(detail_type_id)REFERENCES detail_types(id) ); -- -- For the sake of joining we create some natural keys -- CREATE INDEX events_details_keyname ON event_details (keyname) ; CREATE INDEX detail_types_keyname ON detail_types(keyname) ; INSERT INTO event_detail_values (event_id,detail_type_id, zvalue) SELECT ed.event_id, dt.id , ed.zvalue FROM event_details ed , detail_types dt WHERE ed.keyname = dt.keyname ; -- -- Now we can drop the original table, and use the view instead -- DROP TABLE event_details; CREATE VIEW event_details AS ( SELECT dv.event_id AS event_id , dt.keyname AS keyname , dv.zvalue AS zvalue FROM event_detail_values dv JOIN detail_types dt ON dt.id = dv.detail_type_id ); EXPLAIN ANALYZE SELECT ev.id AS event_id , ev.zdatetime AS zdatetime , ed.keyname AS keyname , ed.zvalue AS zevalue FROM events ev JOIN event_details ed ON ed.event_id = ev.id WHERE ed.keyname IN ('transactionId','customerId','companyId') ORDER BY event_id,keyname ; resulting Query plan: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1178.79..1197.29 rows=7400 width=40) (actual time=159.902..177.379 rows=11100 loops=1) Sort Key: ev.id, dt.keyname Sort Method: external sort Disk: 560kB -> Hash Join (cost=108.34..703.22 rows=7400 width=40) (actual time=12.225..122.231 rows=11100 loops=1) Hash Cond: (dv.event_id = ev.id) -> Hash Join (cost=1.09..466.47 rows=7400 width=32) (actual time=0.047..74.183 rows=11100 loops=1) Hash Cond: (dv.detail_type_id = dt.id) -> Seq Scan on event_detail_values dv (cost=0.00..322.00 rows=18500 width=29) (actual time=0.006..26.543 rows=18500 loops=1) -> Hash (cost=1.07..1.07 rows=2 width=19) (actual time=0.025..0.025 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on detail_types dt (cost=0.00..1.07 rows=2 width=19) (actual time=0.009..0.014 rows=3 loops=1) Filter: ((keyname)::text = ANY ('{transactionId,customerId,companyId}'::text[])) -> Hash (cost=61.00..61.00 rows=3700 width=16) (actual time=12.161..12.161 rows=3700 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 131kB -> Seq Scan on events ev (cost=0.00..61.00 rows=3700 width=16) (actual time=0.004..5.926 rows=3700 loops=1) Total runtime: 192.724 ms (16 rows) As you can see, the "deepest" part of the query is the retrieval of the detail_type_ids, given the list of strings. This is put into a hash table, which is then combined with a corresponding hashset for the detail_values. (NB: this is pg-9.1) YMMV.
If you must use a design along these lines, you should eliminate the id column from events_eventdetails and declare the primary key to be (event_id, keyname). That would give you a very useful index without also maintaining a useless index for the synthetic key. A step better would be to eliminate the events_eventdetails table entirely and use an hstore column for that data, with a GIN index. That would probably get you to your performance goals without needing to pre-define what event details are stored. Even better, if you can predict or specify what event details are possible, would be to not try to implement a database within a database. Make each "keyname" value into a column in events_eventdetails with a data type appropriate to the nature of that data. This will probably allow much faster access at the cost of needing to issue ALTER TABLE statements as the nature of the detail changes.
See, if your key (reductionId in this case) is met in more then 7-10% of all the rows in the events_eventdetails table, then PostgreSQL will prefer a SeqScan. There's nothing you can do, it is the fastest way. I have had a similar case working with ISO8583 packets. Each packet consists of 128 fields (by design), so first database design followed your approach with 2 tables: field_id and description in one table (events_events in your case), field_id + field_value in another (events_eventdetails). Although such layout follows the 3NF, we hit same issues straight away: bad performance, highly complicated queries. In your case you should go for re-design. One option (easier one) is to make events_eventdetails.keyname being a smallint, which will make comparison operations faster. Not a big win though. Another option is to reduce 2 tables into a single one, something like: CREATE TABLE events_events ( id bigserial, datetime timestamp with time zone, eventtype_id bigint, transactionId text, -- value for transactionId reductionId text, -- -"- reductionId companyId text, -- etc. customerId text, anyotherId text, ... ); This will break the 3NF, but on the other hand: you have more freedom to index your data; your queries will be shorter and easier to maintain; performance will be way too better. Possible drawbacks: you will waste a bit more space for the unused fields: unused fields / 8 bytes per row you might still need an extra table for the events that are too rear to keep a separate column for. EDIT: I don't quite understand what you mean by materialize here. In your question you mentioned you want: "solution" that returns events_events rows 100 and 200 and 300 together in a single result set and FAST! when asked for reductionId=123 or when asked for customerId=234 or when asked for companyId=345. The suggested redesign creates a crosstab or pivot table from your events_eventdetails. And to get all events_events rows that satisfies your conditions you can use: SELECT * FROM events_events WHERE id IN (100, 200, 300) AND reductionId = 123 -- AND customerId = 234 -- AND companyId = 345;