HOW CTE (Common Table Expression) in HIVE gets evaluated - hive

My question is around performance and the way a CTE gets evaluated in runtime.
I am planning to reuse code by defining a base projection and then defining multiple CTE's on top of this base projection with different filters.
Does that cause any performance issues.More specifically, does base projection will be evaluated every time.
For example:
WITH CTE_PERSON as (
SELECT * FROM PersonTable
),
CTE_PERSON_WITH_AGE as (
SELECT * FROM CTE_PERSON WHERE age > 24
),
CTE_PERSON_WITH_AGE_AND_GENDER as (
SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender = 'm'
),
CTE_PERSON_WITH_NAME as (
SELECT * FROM CTE_PERSON WHERE name = 'abc'
)
Does every time all the entries from PersonTable will get loaded
into memory and then filters will be applied after
(or)
Only
Result set after filters will be loaded into memory.

A single scan.
Note:
- a single stage
- a single TableScan
- predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)
create table t (i int,j int,k int);
explain
with t1 as (select i,j,k from t where i=1)
,t2 as (select i,j,k from t1 where j=2)
,t3 as (select i,j,k from t2 where k=3)
select * from t3
;
Explain
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: t
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: 1 (type: int), 2 (type: int), 3 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink

Related

Return five rows of random DNA instead of just one

This is the code I have to create a string of DNA:
prepare dna_length(int) as
with t1 as (
select chr(65) as s
union select chr(67)
union select chr(71)
union select chr(84) )
, t2 as ( select s, row_number() over() as rn from t1)
, t3 as ( select generate_series(1,$1) as i, round(random() * 4 + 0.5) as rn )
, t4 as ( select t2.s from t2 join t3 on (t2.rn=t3.rn))
select array_to_string(array(select s from t4),'') as dna;
execute dna_length(20);
I am trying to figure out how to re-write this to give a table of 5 rows of strings of DNA of length 20 each, instead of just one row. This is for PostgreSQL.
I tried:
CREATE TABLE dna_table(g int, dna text);
INSERT INTO dna_table (1, execute dna_length(20));
But this does not seem to work. I am an absolute beginner. How to do this properly?
PREPARE creates a prepared statement that can be used "as is". If your prepared statement returns one string then you can only get one string. You can't use it in other operations like insert, e.g.
In your case you may create a function:
create or replace function dna_length(int) returns text as
$$
with t1 as (
select chr(65) as s
union
select chr(67)
union
select chr(71)
union
select chr(84))
, t2 as (select s,
row_number() over () as rn
from t1)
, t3 as (select generate_series(1, $1) as i,
round(random() * 4 + 0.5) as rn)
, t4 as (select t2.s
from t2
join t3 on (t2.rn = t3.rn))
select array_to_string(array(select s from t4), '') as dna
$$ language sql;
And use it in a way like this:
insert into dna_table(g, dna) select generate_series(1,5), dna_length(20)
From the official doc:
PREPARE creates a prepared statement. A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
About functions.
This can be much simpler and faster:
SELECT string_agg(CASE ceil(random() * 4)
WHEN 1 THEN 'A'
WHEN 2 THEN 'C'
WHEN 3 THEN 'T'
WHEN 4 THEN 'G'
END, '') AS dna
FROM generate_series(1,100) g -- 100 = 5 rows * 20 nucleotides
GROUP BY g%5;
random() produces random value in the range 0.0 <= x < 1.0. Multiply by 4 and take the mathematical ceiling with ceil() (cheaper than round()), and you get a random distribution of the numbers 1-4. Convert to ACTG, and aggregate with GROUP BY g%5 - % being the modulo operator.
About string_agg():
Concatenate multiple result rows of one column into one, group by another column
As prepared statement, taking
$1 ... the number of rows
$2 ... the number of nucleotides per row
PREPARE dna_length(int, int) AS
SELECT string_agg(CASE ceil(random() * 4)
WHEN 1 THEN 'A'
WHEN 2 THEN 'C'
WHEN 3 THEN 'T'
WHEN 4 THEN 'G'
END, '') AS dna
FROM generate_series(1, $1 * $2) g
GROUP BY g%$1;
Call:
EXECUTE dna_length(5,20);
Result:
| dna |
| :------------------- |
| ATCTTCGACACGTCGGTACC |
| GTGGCTGCAGATGAACAGAG |
| ACAGCTTAAAACACTAAGCA |
| TCCGGACCTCTCGACCTTGA |
| CGTGCGGAGTACCCTAATTA |
db<>fiddle here
If you need it a lot, consider a function instead. See:
What is the difference between a prepared statement and a SQL or PL/pgSQL function, in terms of their purposes?

Change GUC parameter before subquery PostgreSQL

I'm currently doing a query like such:
SELECT * FROM (
(SELECT * from A WHERE first_name % 'fakeFirstName')
UNION
(SELECT * from B WHERE last_name % 'fakeLastName')
) AS result;
Both A and B are views of the same underlying table C, with the exact same columns.
However the % operator uses the GUC parameter pg_trgm.similarity_threshold to compute both first_name % 'fakeFirstName' and last_name % 'fakeLastName', and my goal is to change this parameter before each sub query since the similarity thresholds are different for these two column.
To change the pg_trgm.similarity_threshold parameter, for example at the value 0.2, I have two options:
SET pg_trgm.similarity_threshold = 0.2;
SELECT set_limit(0.2);
I am very concerned with speed of execution, which means I'd prefer to use the % operation with a GIN index instead of the <-> operator with a GIST index.
I tried to do something like the following, but it didn't work since the function set_limit() wasn't called before the use of the % operator:
SELECT * FROM (
(SELECT *, set_limit(0.2) from A WHERE first_name % 'fakeFirstName')
UNION
(SELECT *, set_limit(0.6) from B WHERE last_name % 'fakeLastName')
) AS result;
Any help is deeply appreciated.
I'm not sure, if I understand you correctly, but you can try joining A and B against CTE with set_limit(), eg:
t=# with a as (select set_limit(0.21))
, b as (select set_limit(0.22))
select show_limit(), 'a' from pg_database where datname = 't'
union
select show_limit(), 'b' from pg_database join a on true where datname = 't'
union
select show_limit(), 'c' from pg_database join b on true where datname = 't'
t-# order by 2;
show_limit | ?column?
------------+----------
0.2 | a
0.21 | b
0.22 | c
(3 rows)
also mind you don't change it back, so the value remains the last called set_limit()

Using TUPLES to put more than 1000 entries in SQL IN clause [duplicate]

This question already has answers here:
SQL IN Clause 1000 item limit
(5 answers)
Closed 6 years ago.
This post is about putting more than 1000 entries in an IN clause in Oracle.
Generally, when you put more than 1000 entries in the IN clause, Oracle will throw an error, as Oracle does not handle such number of entries in the IN clause.
How to solve this issue?
If you want to put more than 1000 comma-separated hard-coded values, use the concept called "Tuples".
A simple syntax of using tuple is as shown below:
SELECT * FROM TABLE_NAME WHERE (1, COLUMN_NAME) IN
((1, VALUE_1),
(1, VALUE_2),
...
...
...
...
(1, VALUE_1000),
(1, VALUE_1001));
This approach will help to frame an SQL query with more than 1000 entries in the IN clause.
Hope this helps.
Please add to this thread of there is any other approach for this kind of scenario; that would be helpful.
Thanks
Marshal
Why not just
SELECT ..
FROM my_table
WHERE (
my_column IN ( 'val1', 'val2', 'val3', ... , 'val1000' )
OR my_column IN ( 'val1001', 'val1002', 'val1003', ... , 'val2000' )
OR ... etc
)
AND other_conditions...
?
That works in 12c at least.
Response to comment re: performance
CREATE TABLE matt_tuple_test2 ( a number, b varchar2(2000) );
insert into matt_tuple_test2 (a, b) select rownum, lpad('.',2000,'.') from dual connect by rownum <= 300000;
create index matt_tuple_test_n1 on matt_tuple_test2 (a);
SELECT a, b
FROM matt_tuple_test2
WHERE ( a in ( 3,
6,
9,
...
2997,
3000 )
OR a IN (
3003,
3006,
3009,
...
5997,
6000)
OR a IN (
6003,
6006,
6009,
...
8994,
8997,
9000)
)
and b like '.%'
;
Explain plan:
Plan Query Block Name Object Alias
4 SELECT STATEMENT ALL_ROWS
Cost: 1,073 Bytes: 1,612,835 Cardinality: 1,589 CPU Cost: 0 IO Cost: 0 Time: 00:00:00
Partition #: 0
3 INLIST ITERATOR
Projection: "A"[NUMBER,22], "B"[VARCHAR2,2000]
Cost: 0 Bytes: 0 Cardinality: 0 CPU Cost: 0 IO Cost: 0
Partition #: 0 SEL$1
2 TABLE ACCESS BY INDEX ROWID BATCHED APPS.MATT_TUPLE_TEST2
Filter: "B" LIKE '.%'
Projection: "A"[NUMBER,22], "B"[VARCHAR2,2000]
Cost: 1,073 Bytes: 1,612,835 Cardinality: 1,589 CPU Cost: 0 IO Cost: 0 Time: 00:00:00
Partition #: 0 SEL$1
1 INDEX RANGE SCAN APPS.MATT_TUPLE_TEST_N1 [Analyzed]
Access: "A"=3 OR "A"=6 OR "A"=9 OR "A"=12 OR "A"=15 OR "A"=18 OR "A"=21 OR "A"=24 OR "A"=27 OR "A"=30 OR "A"=33 OR "A"=36 OR "A"=39 OR "A"=42 OR "A"=45 OR "A"=48 OR "A"=51 OR "A"=54 OR "A"=57 OR "A"=60 OR "A"=63 OR "A"=66 OR "A"=69 OR "A"=72 OR "A"=75 OR "A"=78 OR "A"=81 OR "A"=84 OR "A"=87 OR "A"=90 OR "A"=93 OR "A"=96 OR "A"=99 OR "A"=102 OR "A"=105 OR "A"=108 OR "A"=111 OR "A"=114 OR "A"=117 OR "A"=120 OR "A"=123 OR "A"=126 OR "A"=129 OR "A"=132 OR "A"=135 OR "A"=138 OR "A"=141 OR "A"=144 OR "A"=147 OR "A"=150 OR "A"=153 OR "A"=156 OR "A"=159 OR "A"=162 OR "A"=165 OR "A"=168 OR "A"=171 OR "A"=174 OR "A"=177 OR "A"=180 OR "A"=183 OR "A"=186 OR "A"=189 OR "A"=192 OR "A"=195 OR "A"=198 OR "A"=201 OR "A"=204 OR "A"=207 OR "A"=210 OR "A"=213 OR "A"=216 OR "A"=219 OR "A"=222 OR "A"=225 OR "A"=228 OR "A"=231 OR "A"=234 OR "A"=237 OR "A"=240 OR "A"=243 OR "A"=246 OR "A"=249 OR "A"=252 OR "A"=255 OR "A"=258 OR "A"=261 OR "A"=264 OR "A"=267 OR "A"=270 OR "A"=273 OR "A"=276 OR "A"=279 OR "A"=282 OR "A"=285 OR "A"=288 OR "A"=291 OR "A"=294 OR "A"=297 OR "A"=300 OR "A"=303 OR "A"=306 OR "A"=309 OR "A"=312 OR "A"=315 OR "A"=318 OR "A"=321 OR "A"=324 OR "A"=327 OR "A"=330 OR "A"=333 OR "A"=336 OR "A"=339 OR "A"=342 OR "A"=345 OR "A"=348 OR "A"=351 OR "A"=354 OR "A"=357 OR "A"=360 OR "A"=363 OR "A"=366 OR "A"=369 OR "A"=372 OR "A"=375 OR "A"=378 OR "A"=381 OR "A"=384 OR "A"=387 OR "A"=390 OR "A"=393 OR "A"=396 OR "A"=399 OR "A"=402 OR "A"=405 OR "A"=408 OR "A"=411 OR "A"=414 OR "A"=417 OR "A"=420 OR "A"=423 OR "A"=426 OR "A"=429 OR "A"=432 OR "A"=435 OR "A"=438 OR "A"=441 OR "A"=444 OR "A"=447 OR "A"=450 OR "A"=453 OR "A"=456 OR "A"=459 OR "A"=462 OR "A"=465 OR "A"=468 OR "A"=471 OR "A"=474 OR "A"=477 OR "A"=480 OR "A"=483 OR "A"=486 OR "A"=489 OR "A"=492 OR "A"=495 OR "A"=498 OR "A"=501 OR "A"=504 OR "A"=507 OR "A"=510 OR "A"=513 OR "A"=516 OR "A"=519 OR "A"=522 OR "A"=525 OR "A"=528 OR "A"=531 OR "A"=534 OR "A"=537 OR "A"=540 OR "A"=543 OR "A"=546 OR "A"=549 OR "A"=552 OR "A"=555 OR "A"=558 OR "A"=561 OR "A"=564 OR "A"=567 OR "A"=570 OR "A"=573 OR "A"=576 OR "A"=579 OR "A"=582 OR "A"=585 OR "A"=588 OR "A"=591 OR "A"=594 OR "A"=597 OR "A"=600 OR "A"=603 OR "A"=606 OR "A"=609 OR "A"=612 OR "A"=615 OR "A"=618 OR "A"=621 OR "A"=624 OR "A"=627 OR "A"=630 OR "A"=633 OR "A"=636 OR "A"=639 OR "A"=642 OR "A"=645 OR "A"=648 OR "A"=651 OR "A"=654 OR "A"=657 OR "A"=660 OR "A"=663 OR "A"=666 OR "A"=669 OR "A"=672 OR "A"=675 OR "A"=678 OR "A"=681 OR "A"=684 OR "A"=687 OR "A"=690 OR "A"=693 OR "A"=696 OR "A"=699 OR "A"=702 OR "A"=705 OR "A"=708 OR "A"=711 OR "A"=714 OR "A"=717 OR "A"=720 OR "A"=723 OR "A"=726 OR "A"=729 OR "A"=732 OR "A"=735 OR "A"=738 OR "A"=741 OR "A"=744 OR "A"=747 OR "A"=750 OR "A"=753 OR "A"=756 OR "A"=759 OR "A"=762 OR "A"=765 OR "A"=768 OR "A"=771 OR "A"=774 OR "A"=777 OR "A"=780 OR "A"=783 OR "A"=786 OR "A"=789 OR "A"=792 OR "A"=795 OR "A"=798 OR "A"=801 OR "A"=804 OR "A"=807 OR "A"=810 OR "A"=813 OR "A"=816 OR "A"=819 OR "A"=822 OR "A"=825 OR "A"=828 OR "A"=831 OR "A"=834 OR "A"=837 OR "A"=840 OR "A"=843 OR "A"=846 OR "A"=849 OR "A"=852 OR "A"=855 OR "A"=858 OR "A"=861 OR "A"=864 OR "A"=867 OR "A"=870 OR "A"=873 OR "A"=876 OR "A"=879 OR "A"=882 OR "A"=885 OR "A"=888 OR "A"=891 OR "A"=894 OR "A"=897 OR "A"=900 OR "A"=903 OR "A"=906 OR "A"=909 OR "A"=912 OR "A"=915 OR "A"=918 OR "A"=921 OR "A"=924 OR "A"=927 OR "A"=930 OR "A"=933 OR "A"=936 OR "A"=939 OR "A"=942 OR "A"=945 OR "A"=948 OR "A"=951 OR "A"=954 OR "A"=957 OR "A"=960 OR "A"=963 OR "A"=966 OR "A"=969 OR "A"=972 OR "A"=975 OR "A"=978 OR "A"=981 OR "A"=984 OR "A"=987 OR "A"=990 OR "A"=993 OR "A"=996 OR "A"=999 OR "A"=1002 OR "A"=1005 OR "A"=1008 OR "A"=1011 OR "A"=1014 OR "A"=1017 OR "A"=1020 OR "A"=1023 OR "A"=1026 OR "A"=1029 OR "A"=1032 OR "A"=1035 OR "A"=1038 OR "A"=1041 OR "A"=1044 OR "A"=1047 OR "A"=1050 OR "A"=1053 OR "A"=1056 OR "A"=1059 OR "A"=1062 OR "A"=1065 OR "A"=1068 OR "A"=1071 OR "A"=1074 OR "A"=1077 OR "A"=1080 OR "A"=1083 OR "A"=1086 OR "A"=1089 OR "A"=1092 OR
Projection: "MATT_TUPLE_TEST2".ROWID[ROWID,10], "A"[NUMBER,22]
Cost: 673 Bytes: 0 Cardinality: 858 CPU Cost: 0 IO Cost: 0 Time: 00:00:00
Partition #: 0 SEL$1
The limit does not apply to the number of values returned by a SELECT subquery. So it would be much simpler to do it like this:
select * from table_name
where column_name in ( select value_1 from dual union all
select value_2 from dual union all
select value_3 from dual union all .....
)
Presumably that doesn't need to be done by hand; the values are either in a table already (in which case they can be selected directly from that table, no need for "dual" and "union all"), or are the result of a more complex subquery, or are produced in application code - where a simple loop can create the needed string for the subquery.
you can also do a join like this (for Oracle only):
WITH T(A, B) AS
(
SELECT 1 AS A, VALUE_1 AS B
UNION ALL
SELECT 1 AS A, VALUE_2 AS B
UNION ALL
-- ...
SELECT 1 AS A, VALUE_1000 AS B
UNION ALL
SELECT 1 AS A, VALUE_1001 AS B
)
SELECT *
FROM TABLE_NAME
JOIN T ON 1 = T.A AND COLUMN_NAME = T.B
for everything else:
WITH T(A, B) AS
(
VALUES
(1, VALUE_1),
(1, VALUE_2),
--...
(1, VALUE_1000),
(1, VALUE_1001)
)
SELECT *
FROM TABLE_NAME
JOIN T ON 1 = T.A AND COLUMN_NAME = T.B
Note, in this case you don't have to have the values in the SQL query -- they can also be in a table you join to... for this example I show as a CTE.
If values does not work with CTE in Oracle you could also do it like this
SELECT *
FROM TABLE_NAME
JOIN (
VALUES
(1, VALUE_1),
(1, VALUE_2),
--...
(1, VALUE_1000),
(1, VALUE_1001)
) AS T(A,B) ON 1 = T.A AND COLUMN_NAME = T.B
Use sub select in the IN clause.

Postgres slow query (slow index scan)

I have a table with 3 million rows and 1.3GB in size. Running Postgres 9.3 on my laptop with 4GB RAM.
explain analyze
select act_owner_id from cnt_contacts where act_owner_id = 2
I have btree key on cnt_contacts.act_owner_id defined as:
CREATE INDEX cnt_contacts_idx_act_owner_id
ON public.cnt_contacts USING btree (act_owner_id, status_id);
The query runs in about 5 seconds
Bitmap Heap Scan on cnt_contacts (cost=2598.79..86290.73 rows=6208 width=4) (actual time=5865.617..5875.302 rows=5444 loops=1)
Recheck Cond: (act_owner_id = 2)
-> Bitmap Index Scan on cnt_contacts_idx_act_owner_id (cost=0.00..2597.24 rows=6208 width=0) (actual time=5865.407..5865.407 rows=5444 loops=1)
Index Cond: (act_owner_id = 2)
Total runtime: 5875.684 ms"
Why is taking so long?
work_mem = 1024MB;
shared_buffers = 128MB;
effective_cache_size = 1024MB
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 15.0 # same scale as above
cpu_tuple_cost = 3.0
Ok, You have big table, index and long time execution plain for PG. Lets think about ways how to improve you plan and reduce time. You write and remove rows. PG write and remove tuples and table and index can be bloated. For good search PG loads index to shared buffer. And you need keep you index clean as possible. For selection PG reads to shared buffer and than search. Try to set up buffer memory and reduce index and table bloating, keep db cleaned.
What you do and think about:
1) Just check index duplicates and that you indexes have good selection:
WITH table_scans as (
SELECT relid,
tables.idx_scan + tables.seq_scan as all_scans,
( tables.n_tup_ins + tables.n_tup_upd + tables.n_tup_del ) as writes,
pg_relation_size(relid) as table_size
FROM pg_stat_user_tables as tables
),
all_writes as (
SELECT sum(writes) as total_writes
FROM table_scans
),
indexes as (
SELECT idx_stat.relid, idx_stat.indexrelid,
idx_stat.schemaname, idx_stat.relname as tablename,
idx_stat.indexrelname as indexname,
idx_stat.idx_scan,
pg_relation_size(idx_stat.indexrelid) as index_bytes,
indexdef ~* 'USING btree' AS idx_is_btree
FROM pg_stat_user_indexes as idx_stat
JOIN pg_index
USING (indexrelid)
JOIN pg_indexes as indexes
ON idx_stat.schemaname = indexes.schemaname
AND idx_stat.relname = indexes.tablename
AND idx_stat.indexrelname = indexes.indexname
WHERE pg_index.indisunique = FALSE
),
index_ratios AS (
SELECT schemaname, tablename, indexname,
idx_scan, all_scans,
round(( CASE WHEN all_scans = 0 THEN 0.0::NUMERIC
ELSE idx_scan::NUMERIC/all_scans * 100 END),2) as index_scan_pct,
writes,
round((CASE WHEN writes = 0 THEN idx_scan::NUMERIC ELSE idx_scan::NUMERIC/writes END),2)
as scans_per_write,
pg_size_pretty(index_bytes) as index_size,
pg_size_pretty(table_size) as table_size,
idx_is_btree, index_bytes
FROM indexes
JOIN table_scans
USING (relid)
),
index_groups AS (
SELECT 'Never Used Indexes' as reason, *, 1 as grp
FROM index_ratios
WHERE
idx_scan = 0
and idx_is_btree
UNION ALL
SELECT 'Low Scans, High Writes' as reason, *, 2 as grp
FROM index_ratios
WHERE
scans_per_write <= 1
and index_scan_pct < 10
and idx_scan > 0
and writes > 100
and idx_is_btree
UNION ALL
SELECT 'Seldom Used Large Indexes' as reason, *, 3 as grp
FROM index_ratios
WHERE
index_scan_pct < 5
and scans_per_write > 1
and idx_scan > 0
and idx_is_btree
and index_bytes > 100000000
UNION ALL
SELECT 'High-Write Large Non-Btree' as reason, index_ratios.*, 4 as grp
FROM index_ratios, all_writes
WHERE
( writes::NUMERIC / ( total_writes + 1 ) ) > 0.02
AND NOT idx_is_btree
AND index_bytes > 100000000
ORDER BY grp, index_bytes DESC )
SELECT reason, schemaname, tablename, indexname,
index_scan_pct, scans_per_write, index_size, table_size
FROM index_groups;
2) Check if you have tables and index bloating?
SELECT
current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS iotta -- very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
(maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+COUNT(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::NUMERIC) AS bs,
CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC
3) Do you clean unused tuples from hard disk? Is it time for vacuum?
SELECT
relname AS TableName
,n_live_tup AS LiveTuples
,n_dead_tup AS DeadTuples
FROM pg_stat_user_tables;
4) Think about that. If you have 10 records in db and 8 of 10 have id = 2 thats mean you have bad selectivity of index and in this way PG will scan all 8 records. But of you try to use id != 2 index will work good. Try to set index with good selection.
5) Use proper column type got you data. If you can use less kb type for you column just convert it.
6) Just check you DB and condition. Check this for start going page
Just try to see that you have in data base unused data in tables, indexes must be cleaned, check selectivity for you indexes. Try use other brin indexes for data, try to recreate indexes.
You are selecting 5444 records scattered over a 1.3 GB table on a laptop. How long do you expect that to take?
It looks like your index is not cached, either because it can't be sustained in the cache, or because this is the first time you used that part of it. What happens if you run the exact same query repeatedly? The same query but with a different constant?
running the query under "explain (analyze,buffers)" would be helpful to get additional information, particularly if you turned track_io_timing on first.

Find closest numeric value in database

I need to find a select statement that will return either a record that matches my input exactly, or the closest match if an exact match is not found.
Here is my select statement so far.
SELECT * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p'
ORDER BY Area DESC
What I need to do is find the closest match to the 'Area' field, so if my input is 1.125 and the database contains 2, 1.5, 1 and .5 the query will return the record containing 1.
My SQL skills are very limited so any help would be appreciated.
get the difference between the area and your input, take absolute value so always positive, then order ascending and take the first one
SELECT TOP 1 * FROM [myTable]
WHERE Name = 'Test' and Size = 2 and PType = 'p'
ORDER BY ABS( Area - #input )
something horrible, along the lines of:
ORDER BY ABS( Area - 1.125 ) ASC LIMIT 1
Maybe?
If you have many rows that satisfy the equality predicates on Name, Size, and PType columns then you may want to include range predicates on the Area column in your query. If the Area column is indexed this could allow efficient index-based access.
The following query (written using Oracle syntax) uses one branch of a UNION ALL to find the record with minimal Area >= your target, while the other branch finds the record with maximal Area < your target. One of these two records will be the record that you are looking for. Then you can ORDER BY ABS(Area - ?input) to pick the winner out of those two candidates. Unfortunately the query is complex due to nested SELECTS that are needed to enforce the desired ROWNUM / ORDER BY precedence.
SELECT *
FROM
(SELECT * FROM
(SELECT * FROM
(SELECT * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p' AND Area >= ?target
ORDER BY Area)
WHERE ROWNUM < 2
UNION ALL
SELECT * FROM
(SELECT * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p' AND Area < ?target
ORDER BY Area DESC)
WHERE ROWNUM < 2)
ORDER BY ABS(Area - ?target))
WHERE rownum < 2
A good index for this query would be (Name, Size, PType, Area), in which case the expected query execution plan would be based on two index range scans that each returned a single row.
SELECT *
FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p'
ORDER BY ABS(Area - 1.125)
LIMIT 1
-- MarkusQ
How about ordering by the difference between your input and [Area], such as:
DECLARE #InputValue DECIMAL(7, 3)
SET #InputValue = 1.125
SELECT TOP 1 * FROM [myTable]
WHERE Name = 'Test' AND Size = 2 AND PType = 'p'
ORDER BY ABS(#InputValue - Area)
Note that although ABS() is supported by pretty much everything, it's not technically standard (in SQL99 at least). If you must write ANSI standard SQL for some reason, you'd have to work around the problem with a CASE operator:
SELECT * FROM myTable
WHERE Name='Test' AND Size=2 AND PType='p'
ORDER BY CASE Area>1.125 WHEN 1 THEN Area-1.125 ELSE 1.125-Area END
If using MySQL
SELECT * FROM [myTable] ... ORDER BY ABS(Area - SuppliedValue) LIMIT 1