Select Top N rows plus another Select based on previous result - sql

I am quite new to SQL.
I have a MS SQL DB where I would like to fetch the top 3 rows with datetime above a specific input PLUS get all the rows where the datetime value is equal to the last row of the previous fetch.
| rowId | Timestamp | data |
|-------|--------------------------|------|
| rsg | 2019-01-01T00:00:00.000Z | 120 |
| zqd | 2020-01-01T00:00:00.000Z | 36 |
| ylp | 2020-01-01T00:00:00.000Z | 48 |
| abt | 2022-01-01T00:00:00.000Z | 53 |
| zio | 2022-01-01T00:00:00.000Z | 12 |
Here is my current request to fetch the 3 rows.
SELECT
TOP 3 *
FROM
Table
WHERE
Timestamp >= '2020-01-01T00:00:00.000Z'
ORDER BY
Timestamp ASC
Here I would like to get in one request the last 4 rows.
Thanks for your help

One possibility, using ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY Timestamp) rn
FROM yourTable
)
SELECT *
FROM cte
WHERE
Timestamp <= (SELECT Timestamp FROM cte WHERE rn = 3);
Matching records should be in the first 3 rows or should have timestamps equal to the timestamp in the third row. We can combine these conditions by restricting to timestamps equal or before the timestamp in the third row.
Or maybe use TOP 3 WITH TIES:
SELECT TOP 3 WITH TIES *
FROM yourTable
ORDER BY Timestamp;

Related

SQL: How can I pick a cell value from one table as a condition to select another table

Hi I'm a new learner of SQL. How can I realize this process in SQL or perhaps with python if needed:
First, from table1, I randomly selected two results:
SELECT TOP 2 id, date
FROM table 1
WHERE date >= 2 AND date <= 6
ORDER BY RAND(CHECKSUM(*) * RAND())
+-----------+
| table1 |
+-----------+
| id | date |
+----+------+
| x | 3 |
| y | 4 |
+----+------+
I need to use the value x and y as conditions to display another table. For instance, using x, I can:
SELECT id, date
FROM table1
WHERE date >= 2 AND date <= 6 AND id = 'x'
ORDER BY date ASC
+-----------+
| table2 |
+-----------+
| id | date |
+----+------+
| x | 3 |
| x | 4 |
| x | 5 |
| x | 6 |
| x | 6 |
+----+------+
What I need is to get the length of table2 without duplication on date. For instance, table2 has 5 rows, but last two duplicate in date. So the final answer is 4 rows.
For id = y, I have to do the same thing (say table3) and compare the length of table3 and table2 to see if consistent.
If yes, then return the length (say, 4 rows); If no, then go back to table1 and select another two id (say, z and y).
I was thinking to use python to select value or create variables, then use python variables in SQL. But it is too much for a new learner. I really appreciate it if someone could help me out this process.
You can use subqueries with IN clause
Here is too a Version with two diemsnions, maybe this will help also
CREATE TABLE table1 ([id] varchar(2),[date] int)
GO
✓
SELECT id, date FROM table1
where date >= 2 and date <= 6
and id IN (
SELECT TOP 2 id FROM table1
WHERE date >= 2 and date <= 6
ORDER BY RAND(CHECKSUM(*) * RAND())
)
ORDER BY date ASC
GO
id | date
:- | ---:
SELECT id, date FROM table1
WHERE EXISTS (SELECT 1
FROM (
SELECT TOP 2 id,[date] FROM table1
WHERE date >= 2 and date <= 6
ORDER BY RAND(CHECKSUM(*) * RAND())) AS table2
WHERE table1.[id] = table2.[id]
AND table1.[date] = table2.[date])
GO
id | date
:- | ---:
db<>fiddle here

SQL select all rows per group after a condition is met

I would like to select all rows for each group after the last time a condition is met for that group. This related question has an answer using correlated subqueries.
In my case I will have millions of categories and hundreds of millions/billions of rows. Is there a way to achieve the same results using a more performant query?
Here is an example. The condition is all rows (per group) after the last 0 in the conditional column.
category | timestamp | condition
--------------------------------------
A | 1 | 0
A | 2 | 1
A | 3 | 0
A | 4 | 1
A | 5 | 1
B | 1 | 0
B | 2 | 1
B | 3 | 1
The result I would like to achieve is
category | timestamp | condition
--------------------------------------
A | 4 | 1
A | 5 | 1
B | 2 | 1
B | 3 | 1
If you want everything after the last 0, you can use window functions:
select t.*
from (select t.*,
max(case when condition = 0 then timestamp end) over (partition by category) as max_timestamp_0
from t
) t
where timestamp > max_timestamp_0 or
max_timestamp_0 is null;
With an index on (category, condition, timestamp), the correlated subquery version might also perform quite well:
select t.*
from t
where t.timestamp > all (select t2.timestamp
from t t2
where t2.category = t.category and
t2.condition = 0
);
You might want to try window functions:
select category, timestamp, condition
from (
select
t.*,
min(condition) over(partition by category order by timestamp desc) min_cond
from mytable t
) t
where min_cond = 1
The window min() with the order by clause computes the minimum value of condition over the current and following rows of the same category: we can use it as a filter to eliminate rows for which there is a more recent row with a 0.
Compared to the correlated subquery approach, the upside of using window functions is that it reduces the number of scans needed on the table. Of course this computing also has a cost, so you'll need to assess both solutions against your sample data.

Order By Id and Limit Offset By Id from a table

I have an issue similar to the following query:
select name, number, id
from tableName
order by id
limit 10 offset 5
But in this case I only take the 10 elements from the group with offset 5
Is there a way to set limit and offset by id?
For example if I have a set:
|------------------------------------|---|---------------------------------------|
| Ana | 1 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
| Joe | 2 | 64ed0011-ef54-4708-a64a-f85228149651 |
and if I have skip 1 I should get
|------------------------------------|---|---------------------------------------|
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
I think that you want to filter by row_number():
select name, number, id
from (
select t.*, row_number() over(partition by name order by id) rn
from mytable t
) t
where
rn >= :number_of_records_per_group_to_skip
and rn < :number_of_records_per_group_to_skip + :number_of_records_per_group_to_keep
The query ranks records by id withing groups of records having the same name, and then filters using two parameters:
:number_of_records_per_group_to_skip: how many records per group should be skipped
:number_of_records_per_group_to_skip: how many records per group should be kept (after skipping :number_of_records_per_group_to_skip records)
This might not be the answer you are looking for but it gives you the results your example shows:
select name, number, id
from (
select * from tableName
order by id
limit 3 offset 0
) d
where id > 1;
Best regards,
Bjarni

get the id based on condition in group by

I'm trying to create a sql query to merge rows where there are equal dates. the idea is to do this based on the highest amount of hours, so that i in the end gets the corresponding id for each date with the highest amount of hours. i've been trying to do with a simple group by, but does not seem to work, since i CANT just put a aggregate function on id column, since it should be based the hours condition
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 2 | 2012-01-01 | 10 |
| 3 | 2012-01-01 | 5 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+
desired result
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+
If you want exactly one row -- even if there are ties -- then use row_number():
select t.*
from (select t.*, row_number() over (partition by date order by hours desc) as seqnum
from t
) t
where seqnum = 1;
Ironically, both Postgres and Oracle (the original tags) have what I would consider to be better ways of doing this, but they are quite different.
Postgres:
select distinct on (date) t.*
from t
order by date, hours desc;
Oracle:
select date, max(hours) as hours,
max(id) keep (dense_rank first over order by hours desc) as id
from t
group by date;
Here's one approach using row_number:
select id, dt, hours
from (
select id, dt, hours, row_number() over (partition by dt order by hours desc) rn
from yourtable
) t
where rn = 1
You can use subquery with correlation approach :
select t.*
from table t
where id = (select t1.id
from table t1
where t1.date = t.date
order by t1.hours desc
limit 1);
In Oracle you can use fetch first 1 row only in subquery instead of LIMIT clause.

Filter table : Keep N row after each row with special value

I have a table with a huge amount of data with this structure (simplidied) :
+--------+-------------------------+-------+
| id | datetime | type |
+--------+-------------------------+-------+
| 1 | 2015-08-13 17:50:41 | 1 |
| 2 | 2015-08-13 17:50:45 | 0 |
| 3 | 2015-08-14 17:50:56 | 0 |
| 4 | 2015-08-14 17:50:59 | 0 |
+--------+-------------------------+-------+
Row with type=1 are followed by a lots of rows with type=0
I need to do an intelligent clean :
I want to keep rows with type=0 following rows with type=1 only during one hour (After the type 1 row timestamp)
And at least one row with type=0 per hour
I don't know if its possible to do that with a query, or if I will have to loop through all rows with a script.
I use PostgreSQL
I dont have postgres here to test, but this should return all of the data you want to keep:
SELECT ID FROM (
SELECT ID FROM (SELECT
id,
datetime,
type,
LAG(type) OVER (ORDER BY id asc) AS prev_type,
LAG(datetime) OVER (ORDER BY id asc) AS prev_date
FROM employees
WHERE
type=1 AND
prev_type=0 AND
EXTRACT(EPOCH FROM (datetime - prev_date)) < 3601
)
UNION
SELECT MAX(ID) FROM employees GROUP BY TO_CHAR(datetime, 'DDMMYYYHH24'))