How do I group a collection of elements by whether or not they have consecutive values? - sql

So given a table like the one below, I would like to grab rows where id has at least three consecutive years.
+---------+--------+
| id | year |
+------------------+
| 2 | 2003 |
| 2 | 2004 |
| 1 | 2005 |
| 2 | 2005 |
| 1 | 2007 |
| 1 | 2008 |
+---------+--------+
The result over here would be of course:
+---------+
| id |
+---------+
| 2 |
+---------+
Any input at all as to how I could go about structuring a query to do this would be great.

This one works and can be fast when you have at least an index on the id-field:
WITH t1 AS (
SELECT *
FROM (VALUES
(2,2003),
(2,2004),
(1,2005),
(2,2005),
(1,2007),
(1,2008)
) v(id, year)
)
SELECT DISTINCT t1.id
FROM t1 -- your tablename
JOIN t1 AS t2 ON t1.id = t2.id AND t1.year + 1 = t2.year
JOIN t1 AS t3 ON t1.id = t3.id AND t1.year + 2 = t3.year;

You can use JOIN approach (self-join):
SELECT t1.id
FROM tbl t1
JOIN tbl t2 ON t2.year = t1.year + 1
AND t1.id = t2.id
JOIN tbl t3 ON t3.year = t1.year + 2
AND t1.id = t3.id
SQLFiddle

Combination (id, year) is UNIQUE
Typically guaranteed with a PRIMARY KEY or UNIQUE constraint or a unique index.
This is a general solution for any minimum number of consecutive rows:
SELECT DISTINCT id
FROM (
SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM tbl
) sub
GROUP BY id, grp
HAVING count(*) > 2; -- minimum: 3
This should be faster than self-joining repeatedly, because only a single scan on the base table is needed. Test performance with EXPLAIN ANALYZE.
Related answer with detailed explanation:
Select longest continuous sequence
Combination (id, year) is not UNIQUE
You can make it unique in a first step.
SELECT DISTINCT id
FROM (
SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM tbl
GROUP BY id, year
) sub
GROUP BY id, grp
HAVING count(*) > 2; -- minimum: 3
SQL Fiddle.
Or you could use the window function dense_rank() instead of row_number() and then count(DISTINCT year), but I don't see a benefit in this approach.
Understanding the sequence of events in a SELECT query is the key:
Best way to get result count before LIMIT was applied

Related

Compare columns from 2 different tables with only last inserted values in table_2 in SQL Server

If I have two different tables in a SQL Server 2019 database as follows:
Table1
|id | name |
+-----+--------+
| 1 | rose |
| 2 | peter |
| 3 | ann |
| 4 | rose |
| 5 | ann |
Table2
| name2 |
+--------+
|rose |
|ann |
I would like to retrieve only the last tow ids from table1 (which in this case 4 and 5) that match name2 in table2. In other words, match happens only once on the last added names in table1, furthermore, the ids (4, 5) to be inserted in table2.
How to do that using SQL?
Thank you
You can use row_number()
select name,id from
(
select *, row_number() over(partition by t.name order by id desc) as rn
from table1 t join table2 t1 on t.name=t1.name2
)A where rn=1
Your question is vague, so there could be many answers here. My first thought is that you simply want an inner join. This will fetch ONLY the data that both tables share.
SELECT Table1.*
FROM Table1
INNER JOIN Table2 on Table1.name = Table2.name2
You seem to be describing:
select . . . -- whatever columns you want
from (select top (2) t1.*
from table1 t1
order by t1.id desc
) t1 join
table2 t2
on t2.name2 = t1.name;
This doesn't seem particularly useful for the data you have provided, but it does what you describe.
EDIT:
If you want only the most recent rows that match, use row_number():
select . . . -- whatever columns you want
from (select t1.*,
row_number() over (partition by name order by id desc) as seqnum
from table1 t1
) t1 join
table2 t2
on t2.name2 = t1.name and t1.seqnum = 1;

MS Access: Compare 2 tables with duplicates

I have two tables which look like this:
T1:
ID | Date | Hour
T2:
ID | Date | Hour
I basically need to join these tables when their IDs, dates, and hours match. However, I only want to return the results from table 1 that do not match up with the results in table 2.
I know this seems simple, but where I'm stuck is the fact that there are multiple rows in table 1 that match up with table 2 (there are multiple intervals for any given hour). I need to return all of these intervals so long as they do not fall within the same hour period in table 2.
Example data:
T1:
1 | 1/1/2011 | 1
1 | 1/1/2011 | 1
1 | 1/1/2011 | 1
1 | 1/1/2011 | 2
T2:
1 | 1/1/2011 | 1
1 | 1/1/2011 | 1
My expected result set for this would be the last 2 rows from T1. Can anyone point me on the right track?.
I think you just want not exists:
select t1.*
from t1
where not exists (select 1
from t2
where t2.id = t1.id and t2.date = t1.date and t2.hour = t1.hour
);
EDIT:
I misread the question. This is very hard to do in MS Access. But, you can come close. The following returns the distinct rows in table 1 that do not have equivalent numbers in table 2:
select t1.id, t1.date, t1.hour, (t1.cnt - t2.cnt)
from (select id, date, hour, count(*) as cnt
from t1
group by id, date, hour
) t1 left join
(select id, date, hour, count(*) as cnt
from t2
group by id, date, hour
) t2 left join
on t2.id = t1.id and t2.date = t1.date and t2.hour = t1.hour
where t2.cnt < t1.cnt;

Distinct Value of a column in sql server 2008

Hello all I have made a query using left outer joins which result in some what like the table below:
| 00-00-00-00-00 | 1 | a.txt |
| 00-00-00-00-00 | 2 | b.txt |
| 00-00-00-00-00 | 1 | c.txt |
| 11-11-11-11-11 | 2 | d.txt |
What I want is Distict value of the MAC Column below is the SQL Fiddle to understand better.
SQLFIDDLE
Thanks
EDIT
The purpose is that 2 and 3 are useless or redundant data where as 1 and 4 are useful means the 1 and 4 show the current file on the MACs
Output:
| 00-00-00-00-00 | 1 | a.txt |
| 11-11-11-11-11 | 2 | d.txt |
Is not possible to answer exactly what you ask. However, usually folk that express the question you ask really mean to ask something like 'I want all the columns for a sample of rows containing only distinct MacAddress values'. This question has many answers, as the result is non-deterministic. A trivial solution is to pick the first (for whatever definition of 'first') row for each MacAddress:
with cte as (
select row_number() over (partition by MacAddress order by CounterNo) as rn, *
from Heartbeats
)
select * from cte where rn = 1;
If you want to get only the distinct macaddresses, you can do:
SELECT DISTINCT macaddress FROM heartbeats
If you want all the columns alongside the distinct macaddress, you need to create a rule to get them. The query below gives you the ones with the highest id for each macaddress:
SELECT t1.*
FROM heartbeats t1
LEFT JOIN heartbeats t2
ON (t1.macaddress = t2.macaddress AND t1.id < t2.id)
WHERE t2.id IS NULL
sqlfiddle demo
EDIT:
Since in original query the code used doesnt have ID column the above query was refined as:
with cte as (
select ROW_NUMBER() OVER(ORDER BY (Select 0)) AS ID,* from heartBeats
)
SELECT t1.*
FROM cte t1
LEFT JOIN cte t2
ON (t1.macaddress = t2.macaddress AND t1.id < t2.id)
WHERE t2.id IS NULL
SQL Fiddle
SELECT hb1.* FROM [heartbeats] as hb1
LEFT OUTER JOIN [heartbeats] as hb2
ON (hb1.macaddress = hb2.macaddress AND hb1.id > hb2.id)
WHERE hb2.id IS NULL;
You have to neglect the file name. See http://sqlfiddle.com/#!3/a75e47/13

PostgreSQL LEFT OUTER JOIN query syntax

Lets say I have a table1:
id name
-------------
1 "one"
2 "two"
3 "three"
And a table2 with a foreign key to the first:
id tbl1_fk option value
-------------------------------
1 1 1 1
2 2 1 1
3 1 2 1
4 3 2 1
Now I want to have as a query result:
table1.id | table1.name | option | value
-------------------------------------
1 "one" 1 1
2 "two" 1 1
3 "three"
1 "one" 2 1
2 "two"
3 "three" 2 1
How do I achieve that?
I already tried:
SELECT
table1.id,
table1.name,
table2.option,
table2.value
FROM table1 AS table1
LEFT outer JOIN table2 AS table2 ON table1.id = table2.tbl1fk
but the result seems to omit the null vales:
1 "one" 1 1
2 "two" 1 1
1 "one" 2 1
3 "three" 2 1
SOLVED: thanks to Mahmoud Gamal: (plus the GROUP BY)
Solved with this query
SELECT
t1.id,
t1.name,
t2.option,
t2.value
FROM
(
SELECT t1.id, t1.name, t2.option
FROM table1 AS t1
CROSS JOIN table2 AS t2
) AS t1
LEFT JOIN table2 AS t2 ON t1.id = t2.tbl1fk
AND t1.option = t2.option
group by t1.id, t1.name, t2.option, t2.value
ORDER BY t1.id, t1.name
You have to use CROSS JOIN to get every possible combination of name from the first table with the option from the second table. Then LEFT JOIN these combination with the second table. Something like:
SELECT
t1.id,
t1.name,
t2.option,
t2.value
FROM
(
SELECT t1.id, t1.name, t2.option
FROM table1 AS t1
CROSS JOIN table2 AS t2
) AS t1
LEFT JOIN table2 AS t2 ON t1.id = t2.tbl1_fk
AND t1.option = t2.option
SQL Fiddle Demo
Simple version: option = group
It's not specified in the Q, but it seems like option is supposed to define a group somehow. In this case, the query can simply be:
SELECT t1.id, t1.name, t2.option, t2.value
FROM (SELECT generate_series(1, max(option)) AS option FROM table2) o
CROSS JOIN table1 t1
LEFT JOIN table2 t2 ON t2.option = o.option AND t2.tbl1_fk = t1.id
ORDER BY o.option, t1.id;
Or, if options are not numbered in sequence, starting with 1:
...
FROM (SELECT DISTINCT option FROM table2) o
...
Returns:
id | name | option | value
----+-------+--------+-------
1 | one | 1 | 1
2 | two | 1 | 1
3 | three | |
1 | one | 2 | 1
2 | two | |
3 | three | 2 | 1
Faster and cleaner, avoiding the big CROSS JOIN and the big GROUP BY.
You get distinct rows with a group number (grp) per set.
Requires Postgres 8.4+.
More complex: group indicated by sequence of rows
WITH t2 AS (
SELECT *, count(step OR NULL) OVER (ORDER BY id) AS grp
FROM (
SELECT *, lag(tbl1_fk, 1, 2147483647) OVER (ORDER BY id) >= tbl1_fk AS step
FROM table2
) x
)
SELECT g.grp, t1.id, t1.name, t2.option, t2.value
FROM (SELECT generate_series(1, max(grp)) AS grp FROM t2) g
CROSS JOIN table1 t1
LEFT JOIN t2 ON t2.grp = g.grp AND t2.tbl1_fk = t1.id
ORDER BY g.grp, t1.id;
Result:
grp | id | name | option | value
-----+----+-------+--------+-------
1 | 1 | one | 1 | 1
1 | 2 | two | 1 | 1
1 | 3 | three | |
2 | 1 | one | 2 | 1
2 | 2 | two | |
2 | 3 | three | 2 | 1
-> SQLfiddle for both.
How?
Explaining the complex version ...
Every set is started with a tbl1_fk <= the last one. I check for this with the window function lag(). To cover the corner case of the first row (no preceding row) I provide the biggest possible integer 2147483647 the default for lag().
With count() as aggregate window function I add the running count to each row, effectively forming the group number grp.
I could get a single instance for every group with:
(SELECT DISTINCT grp FROM t2) g
But it's faster to just get the maximum and employ the nifty generate_series() for the reduced CROSS JOIN.
This CROSS JOIN produces exactly the rows we need without any surplus. Avoids the need for a later GROUP BY.
LEFT JOIN t2 to that, using grp in addition to tbl1_fk to make it distinct.
Sort any way you like - which is possible now with a group number.
try this
SELECT
table1.id, table1.name, table2.option, table2.value FROM table1 AS table11
JOIN table2 AS table2 ON table1.id = table2.tbl1_fk
This is enough:
select * from table1 left join table2 on table1.id=table2.tbl1_fk ;

Only select first row of repeating value in a column in SQL

I have table that has a column that may have same values in a burst. Like this:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 2 | 6050000 |
+----+---------+
| 3 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 5 | 6060000 |
+----+---------+
| 6 | 6060000 |
+----+---------+
| 7 | 6060000 |
+----+---------+
| 8 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
| 11 | 6000000 |
+----+---------+
Now I want to prune rows where the value of Col1 is repeated and only select the first occurrence.
For the above table the result should be:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
How can I do this in SQL?
Note that only burst rows should be removed and values can be repeated in non-burst rows! id=1 & id=9 are repeated in sample result.
EDIT:
I achieved it using this:
select id,col1 from data as d1
where not exists (
Select id from data as d2
where d2.id=d1.id-1 and d1.col1=d2.col1 order by id limit 1)
But this only works when ids are sequential. With gaps between ids (deleted ones) the query breaks. How can I fix this?
You can use a EXISTS semi-join to identify candidates:
Select wanted rows:
SELECT * FROM tbl t
WHERE NOT EXISTS (
SELECT *
FROM tbl
WHERE col1 = t.col1
AND id = t.id - 1
)
ORDER BY id;
Get rid of unwanted rows:
DELETE FROM tbl AS t
-- SELECT * FROM tbl t -- check first?
WHERE EXISTS (
SELECT *
FROM tbl
WHERE col1 = t.col1
AND id = t.id - 1
);
This effectively deletes every row, where the preceding row has the same value in col1, thereby arriving at your set goal: only the first row of every burst survives.
I left the commented SELECT statement because you should always check what is going to be deleted before you do the deed.
Solution for non-sequential IDs:
If your RDBMS supports CTEs and window functions (like PostgreSQL, Oracle, SQL Server, ... but not SQLite prior to v3.25, MS Access or MySQL prior to v8.0.1), there is an elegant way:
WITH cte AS (
SELECT *, row_number() OVER (ORDER BY id) AS rn
FROM tbl
)
SELECT id, col1
FROM cte c
WHERE NOT EXISTS (
SELECT *
FROM cte
WHERE col1 = c.col1
AND rn = c.rn - 1
)
ORDER BY id;
Another way doing the job without those niceties (should work for you):
SELECT id, col1
FROM tbl t
WHERE (
SELECT col1 = t.col1
FROM tbl
WHERE id < t.id
ORDER BY id DESC
LIMIT 1) IS NOT TRUE
ORDER BY id;
select min(id), Col1 from tableName group by Col1
If your RDBMS supports Window Aggregate functions and/or LEAD() and LAG() functions you can leverage them to accomplish what you are trying to report. The following SQL will help get you started down the right path:
SELECT id
, Col AS CurCol
, MAX(Col)
OVER(ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS PrevCol
, MIN(COL)
OVER(ORDER BY id ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS NextCol
FROM MyTable
From there you can put that SQL in a derived table with some CASE logic that if the NextCol or PrevCol is the same as CurCol then set CurCol = NULL. Then you can collapse eliminate all the id records CurCol IS NULL.
If you don't have the ability to use window aggregates or LEAD/LAG functions your task is a little more complex.
Hope this helps.
Since id is always sequential, with no gaps or repetitions, as per your comment, you could use the following method:
SELECT t1.*
FROM atable t1
LEFT JOIN atable t2 ON t1.id = t2.id + 1 AND t1.Col1 = t2.Col1
WHERE t2.id IS NULL
The table is (outer-)joined to itself on the condition that the left side's id is one greater than the right side's and their Col1 values are identical. In other words, the condition is ‘the previous row contains the same Col1 value as the current row’. If there's no match on the right, then the current record should be selected.
UPDATE
To account for non-sequential ids (which, however, are assumed to be unique and defining the order of changes of Col1), you could also try the following query:
SELECT t1.*
FROM atable t1
LEFT JOIN atable t2 ON t1.id > t2.id
LEFT JOIN atable t3 ON t1.id > t3.id AND t3.id > t2.id
WHERE t3.id IS NULL
AND (t2.id IS NULL OR t2.Col1 <> t1.Col1)
The third self-join is there to ensure that the second one yields the row directly preceding that of t1. That is, if there's no match for t3, then either t2 contains the preceding row or it's got no match either, the latter meaning that t1's current row is the top one.