SQL selecting adjacent rows for a adjacent set - sql

I'm having trouble doing the following in SQL with Postgres. My program has an ordered set of numbers. In my database I have the table which stores all numbers in rows with extra data. These rows are also placed in order.
For example my set I need to find is;
1,5,6,1,3
The database has rows
row1 4
row2 5
row3 1
row4 5
row5 6
row6 1
row7 3
row8 2
row9 7
In the example above it's easy to see that my set is found from row 3 to to row7. Still doing such in SQL is a mystery to me. I'm reading some articles regarding pivot tables, still I'm hoping there's an easier way.

Both data-sets need to have fields that identify the order.
And provided that the ordering column is a sequential consecutive set of numbers, then this is possible, although I doubt it's very quick.
Table 1 Table 2
id | value id | value
1 4 1 1
2 5 2 5
3 1 3 6
4 5 4 1
5 6 5 3
6 1
7 3
8 2
9 7
Then this query...
SELECT
*
FROM
table_1
INNER JOIN
(
SELECT
MIN(table_1.id) AS first_id,
MAX(table_1.id) AS last_id
FROM
table_1
INNER JOIN
table_2
ON table_1.value = table_2.value
GROUP BY
table_1.id - table_2.id
HAVING
COUNT(*) = (SELECT COUNT(*) FROM table_2)
)
AS matched_sets
ON matched_sets.first <= table_1.id
AND matched_sets.last >= table_1.id

Recursive version
#Dems beat me to it: a recursive CTE is the way to go here. It works for any sequence of numbers. I post my version because:
It does not require an additional table. Just insert your sequential numbers as array.
The recursive CTE itself is simpler.
The final query is smarter.
It actually works in PostgreSQL. #Dems recursive version is not syntactically correct in it's current state.
Test setup:
CREATE TEMP TABLE t (id int, val int);
INSERT INTO t VALUES
(1,4),(2,5),(3,1)
,(4,5),(5,6),(6,1)
,(7,3),(8,2),(9,7);
Call:
WITH RECURSIVE x AS (
SELECT '{1,5,6,1,3}'::int[] AS a
), y AS (
SELECT t.id AS start_id
,1::int AS step
FROM x
JOIN t ON t.val = x.a[1]
UNION ALL
SELECT y.start_id
,y.step + 1 -- AS step -- next step
FROM y
JOIN t ON t.id = y.start_id + step -- next id
JOIN x ON t.val = x.a[1 + step] -- next value
)
SELECT y.start_id
FROM x
JOIN y ON y.step = array_length(x.a, 1) -- only where last steps was matched
Result:
3
Static version
Works for a predefined number of array items, but is faster for small arrays. 5 items in this case. Same test setup as above.
WITH x AS (
SELECT '{1,5,6,1,3}'::int[] AS a
)
SELECT t1.id
FROM x, t t1
JOIN t t2 ON t2.id = t1.id + 1
JOIN t t3 ON t3.id = t1.id + 2
JOIN t t4 ON t4.id = t1.id + 3
JOIN t t5 ON t5.id = t1.id + 4
WHERE t1.val = x.a[1]
AND t2.val = x.a[2]
AND t3.val = x.a[3]
AND t4.val = x.a[4]
AND t5.val = x.a[5];

how about...
Select instr(',' & Group_Concat(mNumber SEPARATOR ',') &',',#yourstring)
FROM Table
Whoops that's my SQL have to look up similar functions for Postgresql...
Postgresql Version of Group_concat
All this does is group multiple rows into one long string and then do a "Find" to return the first position of your string in the generated long string. The returned number will match the row_number. If 0 is returned your string isn't in the generated one. (may have to be cautious with the ', ' comma space.

Recursive answer...
WITH
CTE AS
(
SELECT
id AS first_id,
id AS current_id,
1 AS sequence_id
FROM
main_table
WHERE
value = (SELECT value FROM search_table WHERE id = 1)
UNION ALL
SELECT
CTE.first_id,
main_table.id,
CTE.sequence_id + 1
FROM
CTE
INNER JOIN
main_table
ON main_table.id = CTE.current_id + 1
INNER JOIN
search_table
ON search_table.value = main_table.value
AND search_table.id = CTE.sequence_id + 1
)
SELECT
*
FROM
main_table
INNER JOIN
CTE
ON main_table.id >= CTE.first_id
AND main_table.id <= CTE.current_id
WHERE
CTE.sequence_id = (SELECT COUNT(*) FROM search_table)

Related

Insert missing values in column at all levels of another column in SQL?

I have been working with some big data in SQL/BigQuery and found that it has some holes in it that need to be filled with values in order to complete the dataset. What I'm struggling with is how to insert the missing values properly.
Say that I have multiple levels of a variable (1, 2, 3... no upper bound) and for each of these levels, they should have an A, B, C value. Some of these records will have data, others will not.
Current dataset:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 C 2c_data
3 B 3b_data
What I want the dataset to look like:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 B NULL
2 C 2c_data
3 A NULL
3 B 3b_data
3 C NULL
What would be the best way to do this?
You need a CROSS join of the distinct levels with the distinct values and a LEFT join to the table:
SELECT l.level, v.value, t.data
FROM (SELECT DISTINCT level FROM tablename) l
CROSS JOIN (SELECT DISTINCT value FROM tablename) v
LEFT JOIN tablename t ON t.level = l.level AND t.value = v.value
ORDER BY l.level, v.value;
See the demo.
We can use an INSERT INTO ... SELECT with the help of a calendar table:
INSERT INTO yourTable (level, value, data)
SELECT t1.level, t2.value, NULL
FROM (SELECT DISTINCT level FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT value FROM yourTable) t2
LEFT JOIN yourTable t3
ON t3.level = t1.level AND
t3.value = t2.value
WHERE t3.data IS NULL;

Querying two tables to filter data using select case

I have two tables
Table 1 looks like this
ID Repeats
-----------
A 1
A 1
A 0
B 2
B 2
C 2
D 1
Table 2 looks like this
ID values
-----------
A 100
B 200
C 100
D 300
Using a view I need a result like this
ID values Repeats
-------------------
A 100 NA
B 200 2
C 100 2
D 300 1
that means, I want unique ID, its values and Repeats. Repeats value should display NA when there are multiple values against single ID and it should display the Repeats value in case there is single value for repeats.
Initially I needed to display the max value of repeats so I tried the following view
ALTER VIEW [dbo].[BookingView1]
AS
SELECT bv.*, bd2.Repeats FROM Table1 bv
JOIN
(
SELECT distinct bd.id, bd.Repeats FROM table2 bd
JOIN
(
SELECT Id, MAX(Repeats) AS MaxRepeatCount
FROM table2
GROUP BY Id
) bd1
ON bd.Id = bd1.Id
AND bd.Repeats = bd1.MaxRepeatCount
) bd2
ON bv.Id = bd2.Id;
and this returns the correct result but when trying to implement the CASE it fails to return unique ID results. Please help!!
One method uses outer apply:
select t2.*, t1.repeats
from table2 t2 outer apply
(select (case when max(repeats) = min(repeats) then max(repeats)
else 'NA'
end) as repeats
from table1 t1
where t1.id = t2.id
) t1;
Two notes:
This assumes that repeats is a string. If it is a number, you need to cast it to a string.
repeats is not null.
For the sake of completeness, I'm including another approach that will work if repeats is NULL. However, Gordon's answer has a much simpler query plan and should be preferred.
Option 1 (Works with NULLs):
SELECT
t1.ID, t2.[Values],
CASE
WHEN COUNT(*) > 1 THEN 'NA'
ELSE CAST(MAX(Repeats) AS VARCHAR(2))
END Repeats
FROM (
SELECT DISTINCT t1.ID, t1.Repeats
FROM #table1 t1
) t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values]
Option 2 (does not contain explicit subqueries, but does not work with NULLs):
SELECT DISTINCT
t1.ID,
t2.[Values],
CASE
WHEN COUNT(t1.Repeats) OVER (PARTITION BY COUNT(DISTINCT t1.Repeats), t1.ID) > 1 THEN 'NA'
ELSE CAST(t1.Repeats AS VARCHAR(2))
END Repeats
FROM #table1 t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values], t1.Repeats
NOTE:
This may not give desired results if table2 has different values for the same ID.

Chaining joins in SQL based on dynamic table

The title may not be accurate for the question but here goes! I have the following table:
id1 id2 status
1 2 a
2 3 b
3 4 c
6 7 d
7 8 e
8 9 f
9 10 g
I would like to get the first id1 and last status based on a dynamic chain joining, meaning that the result table will be:
id final_status
1 c
6 g
Logically, I want to construct the following arrays based on joining the table to itself:
id1 chained_ids chained_status
1 [2,3,4] [a,b,c]
6 [7,8,9,10] [d,e,f,g]
Then grab the last element of the chained_status list.
Since if we were to keep joining this table to itself on id1 = id2 we would eventually have single rows with these results. The problem is that the number of joins is not constant (a single id may be chained many or few times). There is always a 1 to 1 mapping of id1 to id2.
Thanks in advanced! This can be done in either T-SQL or Hive (if someone has a clever map-reduce solution).
You can do this with a recursive CTE:
;WITH My_CTE AS
(
SELECT
id1,
id2,
status,
1 AS lvl
FROM
My_Table T1
WHERE
NOT EXISTS
(
SELECT *
FROM My_Table T2
WHERE T2.id2 = T1.id1
)
UNION ALL
SELECT
CTE.id1,
T3.id2,
T3.status,
CTE.lvl + 1
FROM
My_CTE CTE
INNER JOIN My_Table T3 ON T3.id1 = CTE.id2
)
SELECT
CTE.id1,
CTE.status
FROM
My_CTE CTE
INNER JOIN (SELECT id1, MAX(lvl) AS max_lvl FROM My_CTE GROUP BY id1) M ON
M.id1 = CTE.id1 AND
M.max_lvl = CTE.lvl

Comparing different rows in PostgreSQL for each Id

Few columns in my table looks like
Id Code date latest
1 T 2014-10-04 0
2 B 2014-10-19 0
2 B 2014-10-26 0
1 S 2014-10-05 0
1 T 2014-10-06 0
1 T 2014-10-08 1
2 P 2014-10-27 1
I am tracking all changes made by each ID. if there is any change, I insert new row and update the latest value column.
What I want is for each Id, I should be able to find last code where latest is 0. Also, that code should not be equal to existing code(latest = 1) So for id = 1, answer cannot be
Id Code
1 T
as for id = 1 T is existing code (latest = 1).
So ideally my output should look like:
Id Code
1 S
2 B
I think I can get the latest value for code for each id where latest = 0.
But how do I make sure that it should not be equal to existing code value (latest = 1)
Works in Postgres:
SELECT DISTINCT ON (t0.id)
t0.id, t0.code
FROM tbl t0
LEFT JOIN tbl t1 ON t1.code = t0.code
AND t1.id = t0.id
AND t1.latest = 1
WHERE t0.latest = 0
AND t1.code IS NULL
ORDER BY t0.id, t0.date DESC;
I use the combination of a LEFT JOIN / IS NULL to remove siblings of rows with latest = 1. There are various ways to do this:
Select rows which are not present in other table
Details for DISTINCT ON:
Select first row in each GROUP BY group?
Version with CTE and 2x LEFT JOIN
Since Redshift does not seem to support DISTINCT ON:
WITH cte AS (
SELECT t0.*
FROM tbl t0
LEFT JOIN tbl t1 ON t1.code = t0.code
AND t1.id = t0.id
AND t1.latest = 1
WHERE t0.latest = 0
AND t1.id IS NULL
)
SELECT c0.id, c0.code
FROM cte c0
LEFT JOIN cte c1 ON c1.id = c0.id
AND c1.date > c0.date
WHERE c1.id IS NULL
ORDER BY c0.id;
SQL Fiddle showing both.
I think the following does what you want:
select t.*
from (select distinct on (code) id, code
from table t
where latest = 0
order by code, date desc
) t
where not exists (select 1 from table t2 where t2.id = t.id and t2.code = t.code and t2.latest = 1);
I believe you should have a data for the current version and you should create another table where you would store previous revisions, having foreign key to the Id. Your Id does not fulfill the general expectations for a column with such a name. So, ideally, you would:
create a table Revisions(Id, myTableId, core, date, revision), where Id would be auto_increment primary key and myTableId would point to the Id of the records (1 and 2 in the example)
migrate the elements into revision: insert into Revisions(myTableId, core, date, revision) select Id, core, date latest from MyTable where latest = 0
update the migrated records: update Revisions r1 set r1.revision = (select count(*) from revisions r2 where r2.date < r1.date)
remove the old data from your new table: delete from MyTable where latest = 0
drop your latest column from MyTable
From here, you will be always able to select the penultimate version, or second to last and so on, without problems. Note, that my code suggestions might be of wrong syntax in postgreSQL, as I have never used it, but the idea should work there as well.

Self join for two same rows with one different column

Hi I have table with the following data
A B bid status
10 20 1 SUCCESS_1
10 20 1 SUCCESS_2
10 30 2 SUCCESS_1
10 30 2 SUCCESS_2
Now I want to print or count above rows based on SUCCESS_1 and SUCCESS_2. I created the following query but it does not work it just returns one row by combining two rows.
select * from tbl t1 join tbl t2 on
on (t1.A=t2.A and t1.B=t2.B and
(t1.Status = 'SUCCESS_1' and t2.Status = 'SUCCESS_2')
where t1.bid= 1
I want output as the following for the above query
A B bid status
10 20 1 SUCCESS_1
10 20 1 SUCCESS_2
I am new to SQL please guide. Thanks in advance.
If you need to do the join for some reason (e.g. your database does not let you select everything if you group by 1 column, because it wants everything projected to either be grouped or be an aggregate), you could do the following:
select t1.*
from tbl t1 join tbl t2
on (t1.A=t2.A and t1.B=t2.B and t1.Status = 'SUCCESS_1' and t2.Status = 'SUCCESS_2')
where t1.bid= 1
union all select t2.*
from tbl t1 join tbl t2
on (t1.A=t2.A and t1.B=t2.B and t1.Status = 'SUCCESS_1' and t2.Status = 'SUCCESS_2')
where t1.bid= 1
order by 1,2,3,4
Your original query is pulling back all the data in one row, but this one pulls back the two rows that make that resulting join row separately.
SELECT * FROM `tbl1` WHERE `bid`=1 GROUP BY `status`