How can I transform my N little queries into one query? - sql

I have a query that gives me the first available value for a given date and pair.
SELECT
TOP 1 value
FROM
my_table
WHERE
date >= 'myinputdate'
AND key = 'myinpukey'
ORDER BY date
I have N pairs of key and dates, and I try to find out how not to query each pair one by one. The table is rather big, and N as well, so it's currently heavy and slow.
How can I query all the pairs in one query ?

A solution is to use APPLY like a "function" created on the fly with one or many columns from another set:
DECLARE #inputs TABLE (
myinputdate DATE,
myinputkey INT)
INSERT INTO #inputs(
myinputdate,
myinputkey)
VALUES
('2019-06-05', 1),
('2019-06-01', 2)
SELECT
I.myinputdate,
I.myinputkey,
R.value
FROM
#inputs AS I
CROSS APPLY (
SELECT TOP 1
T.value
FROM
my_table AS T
WHERE
T.date >= I.myinputdate AND
T.key = I.myinputkey
ORDER BY
T.date ) AS R
You can use OUTER APPLY if you want NULL result values to be shown also. This supports fetching multiple columns and using ORDER BY with TOP to control amount of rows.

This solution is without variables. You control your N by setting the right value to the row_num predicate.
There are plenty of ways how to do you what you want and it all depends on your specific needs. As it answered already, that you can use temp/variable table to store these conditions and then join it on the same conditions you use predicates. You can also create user defined data type and use it as param to the function/procedure. You might use CROSS APPLY + VALUES clause to get that list and then join it.
DROP TABLE IF EXISTS #temp;
CREATE TABLE #temp ( d DATE, k VARCHAR(100) );
GO
INSERT INTO #temp
VALUES ( '20180101', 'a' ),
( '20180102', 'b' ),
( '20180103', 'c' ),
( '20180104', 'd' ),
( '20190101', 'a' ),
( '20190102', 'b' ),
( '20180402', 'c' ),
( '20190103', 'c' ),
( '20190104', 'd' );
SELECT a.d ,
a.k
FROM ( SELECT d ,
k ,
ROW_NUMBER() OVER ( PARTITION BY k ORDER BY d DESC ) row_num
FROM #temp
WHERE (d >= '20180401'
AND k = 'a')
OR (d > '20180401'
AND k = 'b')
OR (d > '20180401'
AND k = 'c')
) a
WHERE a.row_num <= 1;
-- VALUES way
SELECT a.d ,
a.k
FROM ( SELECT t.d ,
t.k ,
ROW_NUMBER() OVER ( PARTITION BY t.k ORDER BY t.d DESC ) row_num
FROM #temp t
CROSS APPLY (VALUES('20180401','a'), ('20180401', 'b'), ('20180401', 'c')) f(d,k)
WHERE t.d >= f.d AND f.k = t.k
) a
WHERE a.row_num <= 1;

If all the keys are using the same date, then use window functions:
SELECT key, value
FROM (SELECT t.*, ROW_NUMBER() OVER (PARTITION BY key ORDER BY date) as seqnum
FROM my_table t
WHERE date >= #input_date AND
key IN ( . . . )
) t
WHERE seqnum = 1;

SELECT key, date,value
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY key,date ORDER BY date) as rownum,key,date,value
FROM my_table
WHERE
date >= 'myinputdate'
) as d
WHERE d.rownum = 1;

Related

Find all records within x units of each other

I have a table like this:
CREATE TABLE t(idx integer primary key, value integer);
INSERT INTO t(idx, value)
VALUES
(1, 1),
(2, 2),
(3, 3),
(4, 6),
(5, 7),
(6, 12)
I would like to return all the groups of records where the values are within 2 of each other, with an associated group label as a new column by which to identify them.
I thought perhaps a recursive query might be suitable...but my sql-fu is lacking.
You can use a recursive CTE:
with recursive tt as (
select t.*, row_number() over (order by idx) as seqnum
from t
),
cte as (
select idx, value, value as grp,
seqnum, 1 as lev
from tt
where seqnum = 1
union all
select tt.idx, tt.value,
(case when tt.value > grp + 2 then tt.value else cte.grp end),
tt.seqnum, 1 + lev
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte;
Here is a db<>fiddle. Note that this added a row with the value of "4" to show that the first four rows are split into two groups.
I assume you want to group rows so that any two values in each group may differ only by at most 2. Then you are right, recursive query is the solution. In each level of recursion the bounds of new group are precomputed. Groups are disjoint so finally join original table with computed group number and group by this number. Db fiddle here.
with recursive r (minv,maxv,level) as (
select min(t.value), min(t.value) + 2, 1
from t
union all
select minv, maxv, level from (
select t.value as minv, t.value + 2 as maxv, r.level + 1 as level, row_number() over (order by minv) rn
from r
join t on t.value > r.maxv
) x where x.rn = 1
)
select r.level
, format('ids from %s to %s', min(t.idx), max(t.idx)) as id_label
, format('values from %s to %s', min(t.value), max(t.value)) as value_label
from t join r on t.value between r.minv and r.maxv
group by r.level
order by r.level
(The inner query in the recursive part is just to limit number of newly added rows only to one. Simpler clause select min(t.value), min(t.value) + 2 is not possible because aggregation functions are not allowed in recursive part, analytic function is workaround.)

How can I efficiently extract a sub-table which only contains rows that have duplicated elements in SQL?

The main task is obtaining a sub-table (apologies if this is not quite the correct term) from an existing table, where only a few rows of interested are kept. Essentially, the rows of interest are any such row that has an element which has an identical value in any other element in any other row.
Any explanation or help for which is the best way to go around this would be very helpful.
I have considered performing queries to check for each element in each row, and then simply making a union out of all the query results.
This is the basic of what I tried, although it is probably inefficient. Note that there are 3 columns and I am actually only checking for duplicated values within 2 columns (PARTICIPANT_1, PARTICIPANT_2).
SELECT * FROM
(
team_table
)
WHERE PARTICIPANT_2 in (SELECT PARTICIPANT_2
FROM
(
select startdate, PARTICIPANT_1, PARTICIPANT_2
from team_table
)
GROUP BY PARTICIPANT_2
HAVING COUNT(distinct PARTICIPANT_1) > 1
)
UNION
SELECT * FROM
(
team_table
)
WHERE PARTICIPANT_1 in (SELECT PARTICIPANT_1
FROM
(
select startdate, PARTICIPANT_1, PARTICIPANT_2
from team_table
)
GROUP BY PARTICIPANT_1
HAVING COUNT(distinct PARTICIPANT_2) > 1
)
For an example table:
startdate PARTICIPANT_1 PARTICIPANT_2
1-1-19 A B
1-1-19 A C
1-1-19 C D
1-1-19 Q R
1-1-19 S T
1-1-19 U V
should yield the following since A and C are the repeated elements
startdate PARTICIPANT_1 PARTICIPANT_2
1-1-19 A B
1-1-19 A C
1-1-19 C D
I think this is what you need:
SELECT * FROM team_table t1
WHERE exists (SELECT 1 from team_table t2
WHERE t1.startdate = t2.startdate -- don't know if you need this
-- Get all rows with duplicate values:
AND (t2.PARTICIPANT_1 IN (t1.PARTICIPANT_1, t1.PARTICIPANT_2)
OR t2.PARTICIPANT_2 IN (t1.PARTICIPANT_1, t1.PARTICIPANT_2))
-- Exclude the record itself:
AND (t1.PARTICIPANT_1 != t2.PARTICIPANT_1
OR t1.PARTICIPANT_2 != t2.PARTICIPANT_2))
If you have a unique id column, you can use:
select tt.*
from team_table tt
where exists (select 1
from team_table tt2
where (tt.participant_1 in (tt2.participant_1, tt2.participant_2) or
tt.participant_2 in (tt2.participant_1, tt2.participant_2)
) and
tt2.id <> tt.id
);
If you don't have one, you can actually generate one:
with tt as (
select tt.*,
row_number() over (partition by participant_1, participant_2, start_date) as seqnum
from test_table tt
)
select tt.*
from team_table tt
where exists (select 1
from team_table tt2
where (tt.participant_1 in (tt2.participant_1, tt2.participant_2) or
tt.participant_2 in (tt2.participant_1, tt2.participant_2)
) and
tt2.seqnum <> tt.seqnum
);

2 rows differences

I would like to get 2 consecutive rows from an SQL table.
One of the columns storing UNIX datestamp and between 2 rows the difference only this value.
For example:
id_int dt_int
1. row 8211721 509794233
2. row 8211722 509794233
I need only those rows where dt_int the same (edited)
Do you want both lines to be shown?
A solution could be this:
with foo as
(
select
*
from (values (8211721),(8211722),(8211728),(8211740),(8211741)) a(id_int)
)
select
id_int
from
(
select
id_int
,id_int-isnull(lag(id_int,1) over (order by id_int) ,id_int-6) prev
,isnull(lead(id_int,1) over (order by id_int) ,id_int+6)-id_int nxt
from foo
) a
where prev<=5 or nxt<=5
We use lead and lag, to find the differences between rows, and keep the rows where there is less than or equal to 5 for the row before or after.
If you use 2008r2, then lag and lead are not available. You could use rownumber in stead:
with foo as
(
select
*
from (values (8211721),(8211722),(8211728),(8211740),(8211741)) a(id_int)
)
, rownums as
(
select
id_int
,row_number() over (order by id_int) rn
from foo
)
select
id_int
from
(
select
cur.id_int
,cur.id_int-prev.id_int prev
,nxt.id_int-cur.id_int nxt
from rownums cur
left join rownums prev
on cur.rn-1=prev.rn
left join rownums nxt
on cur.rn+1=nxt.rn
) a
where isnull(prev,6)<=5 or isnull(nxt,6)<=5
Assuming:
lead() analytical function available.
ID_INT is what we need to sort by to determine table order...
you may need to partition by some value lead(ID_int) over(partition by SomeKeysuchasOrderNumber order by ID_int asc) so that orders and dates don't get mixed together.
.
WITH CTE AS (
SELECT A.*
, lead(ID_int) over ([missing partition info] ORDER BY id_Int asc) - id_int as ID_INT_DIFF
FROM Table A)
SELECT *
FROM CTE
WHERE ID_INT_DIFF < 5;
You can try it. This version works on SQL Server 2000 and above. Today I don not a more recent SQL Server to write on.
declare #t table (id_int int, dt_int int)
INSERT #T SELECT 8211721 , 509794233
INSERT #T SELECT 8211722 , 509794233
INSERT #T SELECT 8211723 , 509794235
INSERT #T SELECT 8211724 , 509794236
INSERT #T SELECT 8211729 , 509794237
INSERT #T SELECT 8211731 , 509794238
;with cte_t as
(SELECT
ROW_NUMBER() OVER (ORDER BY id_int) id
,id_int
,dt_int
FROM #t),
cte_diff as
( SELECT
id_int
,dt_int
,(SELECT TOP 1 dt_int FROM cte_t b WHERE a.id < b.id) dt_int1
,dt_int - (SELECT TOP 1 dt_int FROM cte_t b WHERE a.id < b.id) Difference
FROM cte_t a
)
SELECT DISTINCT id_int , dt_int FROM #t a
WHERE
EXISTS(SELECT 1 FROM cte_diff b where b.Difference =0 and a.dt_int = b.dt_int)

Does sequence contain 5 numbers that are each one apart solved recursively

This is the data:
create table #t
(ID int)
insert into #t
values
(-2)
,(-1)
-- ,(0)
,(1)
,(2)
,(3)
,(4)
,(7)
,(5)
,(21)
,(22)
,(23)
,(24)
,(25)
,(8);
We want to know if there are 5 numbers within the above sequence that are each 1 apart e.g. 21-22-23-24-25 gives a positive result. So is there an island of 5 anywhere in the list?
None recursively I've got a few possibilities but is there a simple recursive solution?
Or is there a simpler non-recursive solution?
--::::::::::::::
--:: 1. LONG-WINDED
with t as
(
select id,
U = (id+5),
L = (id-5)
from #t
)
, up as
(
select x.id,
cnt = count(*)
from t x
join t y on
(y.id > x.L and y.id <= x.id)
group by x.id
)
, down as --<<MAYBE DOWN IS NOT NEEDED
(
select x.id,
cnt = count(*)
from t x
join t y on
(y.id < x.U and y.id >= x.id)
group by x.id
)
select id from up where cnt >= 5
union all
select id from down where cnt >= 5
Following two are better:
--::::::::::::::
--::
--:: 2. PRETTY!!
SELECT *
FROM #t A
WHERE EXISTS
(
SELECT *
FROM #t B
WHERE (
(A.id + 5) > B.id
AND
A.id <= B.id
)
HAVING COUNT(*) >=5
)
--::::::::::::::
--::
--:: 3. PRETTY PRETTY!!
--::
SELECT ID
FROM #t A
CROSS APPLY
(
SELECT cnt = COUNT(*)
FROM #t B
WHERE (A.id + 5) > B.id AND A.id <= B.id
) C
WHERE C.cnt>=5
Following used this reference to Itzak article
--::::::::::::::
--::
--:: 4. One of the Windowed functions
--::
WITH x AS
(
SELECT ID,
y = LAG(ID,4) OVER(ORDER BY ID),
dif = ID - LAG(ID,4) OVER(ORDER BY ID)
FROM #t A
)
SELECT ID,y
FROM x
WHERE dif = 4
Yes, there is a much simpler solution. Take the difference between the numbers and an increasing sequence of numbers. If the numbers are consecutive, the difference is constant. So, you can do:
select grp, count(*) as num_in_sequence, min(id) as first_id, max(id) as last_id
from (select t.*,
(id - row_number() over (order by id)) as grp
from #t t
) t
group by grp
having count(*) >= 5;
EDIT:
I think this is the simplest of all. One window function and a comparison:
select t.*
from (select t.*, lead(id, 4) over (order by id) as id4
from #t
) t
where id4 - id = 4;
This does make the assumption that there are no duplicates in the ids, which is true of the OP data.
As I look further, this is the last solution in the OP. Kudos!

SQL group by if values are close

Class| Value
-------------
A | 1
A | 2
A | 3
A | 10
B | 1
I am not sure whether it is practical to achieve this using SQL.
If the difference of values are less than 5 (or x), then group the rows (of course with the same Class)
Expected result
Class| ValueMin | ValueMax
---------------------------
A | 1 | 3
A | 10 | 10
B | 1 | 1
For fixed intervals, we can easily use "GROUP BY". But now the grouping is based on nearby row's value. So if the values are consecutive or very close, they will be "chained together".
Thank you very much
Assuming MSSQL
You are trying to group things by gaps between values. The easiest way to do this is to use the lag() function to find the gaps:
select class, min(value) as minvalue, max(value) as maxvalue
from (select class, value,
sum(IsNewGroup) over (partition by class order by value) as GroupId
from (select class, value,
(case when lag(value) over (partition by class order by value) > value - 5
then 0 else 1
end) as IsNewGroup
from t
) t
) t
group by class, groupid;
Note that this assumes SQL Server 2012 for the use of lag() and cumulative sum.
Update:
*This answer is incorrect*
Assuming the table you gave is called sd_test, the following query will give you the output you are expecting
In short, we need a way to find what was the value on the previous row. This is determined using a join on row ids. Then create a group to see if the difference is less than 5. and then it is just regular 'Group By'.
If your version of SQL Server supports windowing functions with partitioning the code would be much more readable.
SELECT
A.CLASS
,MIN(A.VALUE) AS MIN_VALUE
,MAX(A.VALUE) AS MAX_VALUE
FROM
(SELECT
ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
,CLASS
,VALUE
FROM SD_TEST) AS A
LEFT JOIN
(SELECT
ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
,CLASS
,VALUE
FROM SD_TEST) AS B
ON A.CLASS = B.CLASS AND A.ROW_ID=B.ROW_ID+1
GROUP BY A.CLASS,CASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END
ORDER BY A.CLASS,cASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END DESC
ps: I think the above is ANSI compliant. So should run in most SQL variants. Someone can correct me if it is not.
These give the correct result, using the fact that you must have the same number of group starts as ends and that they will both be in ascending order.
if object_id('tempdb..#temp') is not null drop table #temp
create table #temp (class char(1),Value int);
insert into #temp values ('A',1);
insert into #temp values ('A',2);
insert into #temp values ('A',3);
insert into #temp values ('A',10);
insert into #temp values ('A',13);
insert into #temp values ('A',14);
insert into #temp values ('b',7);
insert into #temp values ('b',8);
insert into #temp values ('b',9);
insert into #temp values ('b',12);
insert into #temp values ('b',22);
insert into #temp values ('b',26);
insert into #temp values ('b',67);
Method 1 Using CTE and row offsets
with cte as
(select distinct class,value,ROW_NUMBER() over ( partition by class order by value ) as R from #temp),
cte2 as
(
select
c1.class
,c1.value
,c2.R as PreviousRec
,c3.r as NextRec
from
cte c1
left join cte c2 on (c1.class = c2.class and c1.R= c2.R+1 and c1.Value < c2.value + 5)
left join cte c3 on (c1.class = c3.class and c1.R= c3.R-1 and c1.Value > c3.value - 5)
)
select
Starts.Class
,Starts.Value as StartValue
,Ends.Value as EndValue
from
(
select
class
,value
,row_number() over ( partition by class order by value ) as GroupNumber
from cte2
where PreviousRec is null) as Starts join
(
select
class
,value
,row_number() over ( partition by class order by value ) as GroupNumber
from cte2
where NextRec is null) as Ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber
** Method 2 Inline views using not exists **
select
Starts.Class
,Starts.Value as StartValue
,Ends.Value as EndValue
from
(
select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
from
(select distinct class,value from #temp) as T
where not exists (select 1 from #temp where class=t.class and Value < t.Value and Value > t.Value -5 )
) Starts join
(
select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
from
(select distinct class,value from #temp) as T
where not exists (select 1 from #temp where class=t.class and Value > t.Value and Value < t.Value +5 )
) ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber
In both methods I use a select distinct to begin because if you have a dulpicate entry at a group start or end things go awry without it.
Here is one way of getting the information you are after:
SELECT Under5.Class,
(
SELECT MIN(m2.Value)
FROM MyTable AS m2
WHERE m2.Value < 5
AND m2.Class = Under5.Class
) AS ValueMin,
(
SELECT MAX(m3.Value)
FROM MyTable AS m3
WHERE m3.Value < 5
AND m3.Class = Under5.Class
) AS ValueMax
FROM
(
SELECT DISTINCT m1.Class
FROM MyTable AS m1
WHERE m1.Value < 5
) AS Under5
UNION
SELECT Over4.Class,
(
SELECT MIN(m4.Value)
FROM MyTable AS m4
WHERE m4.Value >= 5
AND m4.Class = Over4.Class
) AS ValueMin,
(
SELECT Max(m5.Value)
FROM MyTable AS m5
WHERE m5.Value >= 5
AND m5.Class = Over4.Class
) AS ValueMax
FROM
(
SELECT DISTINCT m6.Class
FROM MyTable AS m6
WHERE m6.Value >= 5
) AS Over4