I have a table in Postgres with timestamp and 6 columns (A,B,C,D,E,F) with values. Every 10 minutes new record is appended to this table, however for columns B, D, F actual value is fetched only every 30 minutes meaning that only every 3rd row is not null.
I would like to write a query that outputs most recent record per every column. The only thing that comes to my mind is to write 2 queries:
SELECT A,C,E
FROM data_prices
ORDER BY date_of_record DESC LIMIT 1
SELECT B,D,F
FROM data_prices
WHERE B is not null, D is not null, F is not null
ORDER BY date_of_record DESC LIMIT 1
And then join the results into 1 table with 6 columns and 1 row. I don't know, however how to do that because in the documentation I found operations like UNION, INTERSECT, EXCEPT which append data rather than creating one wider table. Any ideas how to join these 2 selects into 1 table with 6 columns? Or maybe smarter way to get latest non NULL result per column in table?
Unfortunately, Postgres does not support lag(ignore nulls).
One method uses first_value():
select date_of_record, a,
first_value(b) over (order by (b is not null) desc, date_of_record desc) as last_b,
c,
first_value(d) over (order by (d is not null) desc, date_of_record desc) as last_d,
e,
first_value(f) over (order by (f is not null) desc, date_of_record desc) as last_f
from t
order by date_of_record desc
limit 1;
Related
This is the table
id
category
value
1
A
40
1
B
20
1
C
10
2
A
4
2
B
7
2
C
7
3
A
32
3
B
21
3
C
2
I want the result like this
id
category
1
A
2
B
2
C
3
A
For small tables or for only very few rows per user, a subquery with the window function rank() (as demonstrated by The Impaler) is just fine. The resulting sequential scan over the whole table, followed by a sort will be the most efficient query plan.
For more than a few rows per user, this gets increasingly inefficient though.
Typically, you also have a users table holding one distinct row per user. If you don't have it, created it! See:
Is there a way to SELECT n ON (like DISTINCT ON, but more than one of each)
Select first row in each GROUP BY group?
We can leverage that for an alternative query that scales much better - using WITH TIES in a LATERAL JOIN. Requires Postgres 13 or later.
SELECT u.id, t.*
FROM users u
CROSS JOIN LATERAL (
SELECT t.category
FROM tbl t
WHERE t.id = u.id
ORDER BY t.value DESC
FETCH FIRST 1 ROWS WITH TIES -- !
) t;
db<>fiddle here
See:
Get top row(s) with highest value, with ties
Fetching a minimum of N rows, plus all peers of the last row
This can use a multicolumn index to great effect - which must exist, of course:
CREATE INDEX ON tbl (id, value);
Or:
CREATE INDEX ON tbl (id, value DESC);
Even faster index-only scans become possible with:
CREATE INDEX ON tbl (id, value DESC, category);
Or (the optimum for the query at hand):
CREATE INDEX ON tbl (id, value DESC) INCLUDE (category);
Assuming value is defined NOT NULL, or we have to use DESC NULLS LAST. See:
Sort by column ASC, but NULL values first?
To keep users in the result that don't have any rows in table tbl, user LEFT JOIN LATERAL (...) ON true. See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
You can use RANK() to identify the rows you want. Then, filtering is easy. For example:
select *
from (
select *,
rank() over(partition by id order by value desc) as rk
from t
) x
where rk = 1
Result:
id category value rk
--- --------- ------ --
1 A 40 1
2 B 7 1
2 C 7 1
3 A 32 1
See running example at DB Fiddle.
I have a table with columns: FILING_ID, DATE, and BLAH
I'm trying to write a query that for each FILING_ID, returns the rows with the last three dates. If table is:
FILING_ID DATE
aksjdfj 2/1/2006
b 2/1/2006
b 3/1/2006
b 4/1/2006
b 5/1/2006
I would like:
FILING_ID DATE
aksjdfj 2/1/2006
b 3/1/2006
b 4/1/2006
b 5/1/2006
I was thinking of maybe running some query to figure out the 3rd highest date for each FILING_ID then doing a join and comparing the cutoff date with the DATE?
I use PostgreSQL. Is there some way to use limit?
SELECT filing_id, date -- more columns?
FROM (
SELECT *, row_number() OVER (PARTITION BY filing_id ORDER BY date DESC NULLS LAST) AS rn
FROM tbl
) sub
WHERE rn < 4
ORDER BY filing_id, date; -- optionally order rows
NULLS LAST is only relevant if date can actually be NULL.
If date is not unique, you may need to break ties to get stable results.
PostgreSQL sort by datetime asc, null first?
Select first row in each GROUP BY group?
Is there some way to use limit?
Maybe. If you have an additional table holding all distinct filing_id (and possibly a few more, which are removed by the join), you can use CROSS JOIN LATERAL (, LATERAL is short syntax):
SELECT f.filing_id, t.*
FROM filing f -- table with distinct filing_id
, LATERAL (
SELECT date -- more columns?
FROM tbl
WHERE filing_id = f.filing_id
ORDER BY date DESC NULLS LAST
LIMIT 3 -- now you can use LIMIT
) t
ORDER BY f.filing_id, t.date;
What is the difference between LATERAL and a subquery in PostgreSQL?
If you don't have a filing table, you can create one. Or derive it on the fly:
Optimize GROUP BY query to retrieve latest record per user
I have two tables, Alpha & Bravo. Bravo has a column id (integer, primary key) and some other columns that are not relevant for this question. Alpha has columns id (integer, primary key), bravo_id (foreign key to table Bravo), special (a single char, null for most rows but has a value for certain important rows), created (a DATETIME), and some others not relevant to this question.
I would like to get all the special rows from Alpha, plus for each special row I would like to get the "previous" non-special row from Alpha associated with the same row of Beta (that is, I would like to get the Alpha row with the same bravo_id and the most recent created that is older than the created of the special row), and I need to keep the special row & its previous row linked.
Currently I'm doing this with n+1 queries:
SELECT id, bravo_id, created FROM Alpha WHERE special IS NOT NULL
followed by a query like this for each result in the initial query:
SELECT id, created FROM Alpha
WHERE special IS NULL AND bravo_id = BrvN AND created < CrtN ORDER BY created DESC
Obviously that's wildly inefficient. Is there a way I can retrieve this information in a single query that will put each special row & its previous non-special row in a single row of the result?
Our product supports both SQL Server (2008 R2 if relevant) and Oracle (11g if relevant) so a query that works for both of those would be ideal, but a query for only one of the two would be fine.
EDIT: "Created" is perhaps a misnomer. The datetime in that column is when the referenced object was created, and not when it was entered into the database (which could be anywhere from seconds to years later). An ordering of the rows of Alpha based on the created column would have little or no correlation to an ordering based on the id column (which is a traditional incrementing identity/sequence).
SELECT a.Id, a.Bravo_Id, a.Created, d.Id, d.Created FROM #Alpha a
OUTER APPLY
(
SELECT TOP 1 da.id, da.Created
FROM #Alpha da
WHERE da.Special IS NULL
AND da.Bravo_Id = a.Bravo_Id
AND da.Created < a.Created
ORDER BY da.Created DESC
) d
WHERE a.Special IS NOT NULL
You can bind both queries with apply (ms sql server query)
This works in both SQL Server & Oracle:
select A.id, A.bravo_id, A.created, B.id, B.created
from Alpha A
left join Alpha B on A.bravo_id = B.bravo_id
and B.created < A.created
and B.special is null
where A.special is not null
and (B.created is null or
B.created = (select max(S.created)
from Alpha S
where S.special is null
and S.bravo_id = A.bravo_id
and S.created < A.created))
It left joins in all rows with the same foreign key and a lower/older created, then uses the where clause to filter them out (being careful not to exclude A rows that have no older row).
Unfortunately, SQL Server 2008 doesn't support cumulative sum. Here is an approach to solving the problem.
For each row in Alpha, count the number of special rows after alpha. This will assign a grouping. Within the group, then use row_number() to enumerate the values, and choose the first two.
select a.*
from (select a.*,
row_number() over (partition by bravo, grp order by id desc) as seqnum
from (select a.*,
(select count(*)
from alpha a2
where a2.bravo = a.bravo and a2.special = 1 and
a2.id >= a.id
) as grp
from alpha a
) a
) a
where seqnum <= 2;
In Oracle (or SQL Server 2012), you would write this as:
select a.*
from (select a.*,
row_number() over (partition by bravo, grp order by id desc) as seqnum
from (select a.*,
sum(case when special = 1 then 1 else 0 end) over (partition by bravo order by id desc
) as grp
from alpha a
) a
) a
where seqnum <= 2;
Suppose I have a table filled with the data below, what SQL function or query I should use in db2 to retrieve all rows having the FIRST field FLD_A with value A, the FIRST field FLD_A with value B..and so on?
ID FLD_A FLD_B
1 A 10
2 A 20
3 A 30
4 B 10
5 A 20
6 C 30
I am expecting a table like below; I am aware of grouping done by function GROUP BY but how can I limit the query to return the very first of each group?
Essentially I would like to have the information about the very first row where a new value for FLD_A is appearing for the first time?
ID FLD_A FLD_B
1 A 10
4 B 10
6 C 30
Try this it works in sql
SELECT * FROM Table1
WHERE ID IN (SELECT MIN(ID) FROM Table1 GROUP BY FLD_A)
A good way to approach this problem is with window functions and row_number() in particular:
select t.*
from (select t.*,
row_number() over (partition by fld_a order by id) as seqnum
from table1
) t
where seqnum = 1;
(This is assuming that "first" means "minimum id".)
If you use t.*, this will add one extra column to the output. You can just list the columns you want to avoid this.
I am new to sql side. so if this questoin sound very easy then please spare me. I have a 4 coloumns in a sql table.Let say A,B,C,D . For any BC combination I may get any number of rows. I need to get at max 3 rows (which inturn give me 3 unique value of A for that BC ombination) for these selected rows i should have Top 3 Max value of D. As compare to other entries for that BC combination.
So there can be any number of BC combination so the above logic should imply to all of them.
Most databases support ranking functions. With these, you can do what you want as follows:
select A, B, C, D
from (select t.*,
row_number() over (partition by B, C order by D desc) as seqnum
from t
) t
where seqnum <= 3
order by B, C, D desc
The row_number() function creates a sequencial number. This number starts at "1" in very B,C group, and is ordered by the value of D descending.