How can I duplicate the result set like below - sql

I have a table like below and I want to duplicate the records while the min date being less or equal the max date
686151209 E13232677 1333439 2017-10-23
686151209 E13232677 1333439 2017-10-26
I'd like to have the result set like below
686151209 E13232677 1333439 2017-10-23
686151209 E13232677 1333439 2017-10-24
686151209 E13232677 1333439 2017-10-25
86151209 E13232677 1333439 2017-10-26

You and use spt_values to get continous number :
;WITH testdata(col1,col2,col3,col4)AS(
SELECT '686151209','E13232677','1333439','2017-10-23' UNION all
SELECT '686151209','E13232677','1333439','2017-10-26'
)
SELECT col1,col2,col3,DATEADD(d,sv.number-1,a.mindate) AS col4,sv.number FROM (
SELECT col1,col2,col3,CONVERT(DATE,MIN(col4)) AS mindate,CONVERT(DATE,MAX(col4)) AS maxdate
FROM testdata AS t
group by col1,col2,col3
) AS a
INNER JOIN master.dbo.spt_values AS sv ON sv.type='P' AND sv.number BETWEEN 1 AND DATEDIFF(d,mindate,maxdate)+1
+-----------+-----------+---------+------------+--------+
| col1 | col2 | col3 | col4 | number |
+-----------+-----------+---------+------------+--------+
| 686151209 | E13232677 | 1333439 | 2017-10-23 | 1 |
| 686151209 | E13232677 | 1333439 | 2017-10-24 | 2 |
| 686151209 | E13232677 | 1333439 | 2017-10-25 | 3 |
| 686151209 | E13232677 | 1333439 | 2017-10-26 | 4 |
+-----------+-----------+---------+------------+--------+

One method is a numbers table. If you don't have too many rows, I also like a recursive CTE:
with cte as (
select col1, col2, col3, mind, maxd
from (select col1, col2, min(dte) as mind, max(dte) as maxd
from t
group by col1, col2, col3
) t
union all
select col1, col2, col3, dateadd(day, 1, mind), maxd
from cte
where dateadd(day, 1, mind) < maxd
)
select col1, col2, col3, mind
from cte;
This is limited to 100 rows for each col1/col2 combination, unless you set the max recursion option.

Or like this:
CREATE TABLE temp
(
ID BIGINT,
CODE VARCHAR(50),
ID2 BIGINT,
DATE DATE
);
INSERT INTO temp VALUES (686151209, 'E13232677', 1333439, '2017-10-23'),
(686151209, 'E13232677', 1333439, '2017-10-26');
SELECT generate_series(T.D1::timestamp, T.D2::timestamp, interval '1 day')::date
FROM
(
SELECT A.id, A.code, A.id2, A.dates AS D1, B.dates AS D2
FROM temp A
LEFT JOIN temp b ON (A.id = B.id AND
A.code=B.code AND
A.id2 = B.id2 AND
B.dates > A.dates)
WHERE B.id IS NOT NULL
) T;

Related

Row count discrepancy between Intersect and Except queries

I'm getting some strange behaviour using intersect and except. Tb1 has the least rows out of the two tables, and the difference in row count between tb1 and the intersect query results is 143 (intersect = 9782, tb1 = 9925).
But when I run the same query with except, it returns 24 lines. My understanding is that it should have returned 143 rows, being the rows that didn't match in the intersect query. Could someone help me understand why this might be?
There is a possibility that both datasets have multiple duplicate rows (being subset data). Could this be the cause of the difference?
SELECT
amount
,date
FROM tb1
INTERSECT
SELECT
amount
,date
FROM tb2
As you're probably already aware, the difference between UNION and UNION ALL is that the former returns a unique result, while the latter doesn't.
The same can be said about INTERSECT versus INTERSECT ALL.
And also about EXCEPT versus EXCEPT ALL.
So when there are dups, then the totals can be different from what you expect.
Here's a simplified demo to illustrate.
create table TableA (
col1 int not null,
col2 varchar(8)
);
create table TableB (
col1 int not null,
col2 varchar(8)
);
insert into TableA (Col1, Col2) values
(1,'A') -- only A
, (3,'AB') -- 1 in both
, (4,'AAB'), (4,'AAB') -- 2 in A, 1 in B
, (5,'ABB') -- 1 in A, 2 in B
, (6,'AABB'), (6,'AABB') -- 2 in both
, (7, NULL); -- 1 NULL in both
8 rows affected
insert into TableB (Col1, Col2) values
(2,'B') -- only B
, (3,'AB') -- 1 in both
, (4,'AAB') -- 2 in A, 1 in B
, (5,'ABB'), (5,'ABB') -- 1 in A, 2 in B
, (6,'AABB'), (6,'AABB') -- 2 in both
, (7, null); -- 1 NULL in both
8 rows affected
select Col1, Col2 from TableA
intersect
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
3 | AB
4 | AAB
5 | ABB
6 | AABB
7 | null
select Col1, Col2 from TableA
intersect all
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
3 | AB
4 | AAB
5 | ABB
6 | AABB
6 | AABB
7 | null
select Col1, Col2 from TableA
except
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
1 | A
select Col1, Col2 from TableA
except all
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
1 | A
4 | AAB
Demo on db<>fiddle here

T-SQL sequential updating with two columns

I have a table created by:
CREATE TABLE table1
(
id INT,
multiplier INT,
col1 DECIMAL(10,5)
)
INSERT INTO table1
VALUES (1, 2, 1.53), (2, 3, NULL), (3, 2, NULL),
(4, 2, NULL), (5, 3, NULL), (6, 1, NULL)
Which results in:
id multiplier col1
-----------------------
1 2 1.53000
2 3 NULL
3 2 NULL
4 2 NULL
5 3 NULL
6 1 NULL
I want to add a column col2 which is defined as multiplier * col1, however the next value of col1 then updates to take the previous calculated value of col2.
The resulting table should look like:
id multiplier col1 col2
---------------------------------------
1 2 1.53000 3.06000
2 3 3.06000 9.18000
3 2 9.18000 18.36000
4 2 18.36000 36.72000
5 3 36.72000 110.16000
6 1 110.16000 110.16000
Is this possible using T-SQL? I've tried a few different things such as joining id to id - 1 and have played around with a sequential update using UPDATE and setting variables but I can't get it to work.
A recursive CTE might be the best approach. Assuming your ids have no gaps:
with cte as (
select id, multiplier, convert(float, col1) as col1, convert(float, col1 * multiplier) as col2
from table1
where id = 1
union all
select t1.id, t1.multiplier, cte.col2 as col1, cte.col2 * t1.multiplier
from cte join
table1 t1
on t1.id = cte.id + 1
)
select *
from cte;
Here is a db<>fiddle.
Note that I converted the destination type to float, which is convenient for this sort of operation. You can convert back to decimal if you prefer that.
Basically, this would require an aggregate/window function that computes the product of column values. Such set function does not exists in SQL though. We can work around this with arithmetics:
select
id,
multiplier,
coalesce(min(col1) over() * exp(sum(log(multiplier)) over(order by id rows between unbounded preceding and 1 preceding)), col1) col1,
min(col1) over() * exp(sum(log(multiplier)) over(order by id)) col2
from table1
Demo on DB Fiddle:
id | multiplier | col1 | col2
-: | ---------: | -----: | -----:
1 | 2 | 1.53 | 3.06
2 | 3 | 3.06 | 9.18
3 | 2 | 9.18 | 18.36
4 | 2 | 18.36 | 36.72
5 | 3 | 36.72 | 110.16
6 | 1 | 110.16 | 110.16
This will fail if there are negative multipliers.
If you wanted an update statement:
with cte as (
select col1, col2,
coalesce(min(col1) over() * exp(sum(log(multiplier)) over(order by id rows between unbounded preceding and 1 preceding)), col1) col1_new,
min(col1) over() * exp(sum(log(multiplier)) over(order by id)) col2_new
from table1
)
update cte set col1 = col1_new, col2 = col2_new

Select all entries that have the same Type as the entry with the largest Date in SQL?

How do I select all entries that have the same Type as the entry with the largest Date?
I'm using SQL Server.
My table:
+----+------+-------------------------+
| id | Type | Date |
+----+------+-------------------------+
| 1 | xxx | 2020-02-25 09:11:53.000 |
| 2 | yyy | 2020-02-25 08:30:35.000 |
| 3 | xxx | 2020-02-25 07:48:17.000 |
| 4 | xxx | 2020-02-25 09:04:25.000 |
| 5 | yyy | 2020-02-25 07:59:03.000 |
The result should be:
+----+------+-------------------------+
| id | Type | Date |
+----+------+-------------------------+
| 1 | xxx | 2020-02-25 09:11:53.000 |
| 3 | xxx | 2020-02-25 07:48:17.000 |
| 4 | xxx | 2020-02-25 09:04:25.000 |
+----+------+-------------------------+
Because id =1 is the Type with the max Date.
You can use exists with correlated sub-query :
select t.*
from table t
where exists (select 1 from table t1 where t1.type = t.type and t1.id <> t.id) and
t.type = (select top (1) t1.type from table t1 order by t1.date desc);
A correlated subquery is often the most efficient method with the right index:
select t.*
from t
where t.type = (select top (1) t2.type
from t t2
order by t2.date desc
);
The best indexes are (date desc, type) and (type).
You can also do this with window functions:
select t.*
from (select t.*,
first_value(type) over (order by date desc) as last_type
from t
) t
where type = last_type;
Rather than a Self Join, you could use LAST_VALUE in a CTE and then add that to the WHERE:
WITH CTE AS(
SELECT V.ID,
V.[Type],
V.[Date],
LAST_VALUE(V.Type) OVER (ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastType
FROM (VALUES (1, 'xxx', CONVERT(datetime2(0), '2020-02-25 09:11:53.000')),
(2, 'yyy', CONVERT(datetime2(0), '2020-02-25 08:30:35.000')),
(3, 'xxx', CONVERT(datetime2(0), '2020-02-25 07:48:17.000')),
(4, 'xxx', CONVERT(datetime2(0), '2020-02-25 09:04:25.000')),
(5, 'yyy', CONVERT(datetime2(0), '2020-02-25 07:59:03.000'))) V (ID, [Type], [Date]))
SELECT CTE.ID,
CTE.[Type],
CTE.[Date]
FROM CTE
WHERE [Type] = LastType;
DB<>Fiddle
Try this:
Declare #t table (id int , types nvarchar(100),dates datetime)
insert into #t values (1,'xxx','2020-02-25 09:11:53.000')
insert into #t values (2,'yyy','2020-02-25 08:30:35.000')
insert into #t values (3,'xxx','2020-02-25 07:48:17.000')
insert into #t values (4,'xxx','2020-02-25 09:04:25.000')
insert into #t values (5,'yyy','2020-02-25 07:59:03.000')
Declare #max nvarchar(100) = (select t.types from (
select top 1 max(dates) as t,types from #t group by types
) t)
select * from #t
where types = #max
Output:
id types dates
1 xxx 2020-02-25 09:11:53.000
3 xxx 2020-02-25 07:48:17.000
4 xxx 2020-02-25 09:04:25.000

DENSE_RANK() without duplication

Here's what my data looks like:
| col1 | col2 | denserank | whatiwant |
|------|------|-----------|-----------|
| 1 | 1 | 1 | 1 |
| 2 | 1 | 1 | 1 |
| 3 | 2 | 2 | 2 |
| 4 | 2 | 2 | 2 |
| 5 | 1 | 1 | 3 |
| 6 | 2 | 2 | 4 |
| 7 | 2 | 2 | 4 |
| 8 | 3 | 3 | 5 |
Here's the query I have so far:
SELECT col1, col2, DENSE_RANK() OVER (ORDER BY COL2) AS [denserank]
FROM [table1]
ORDER BY [col1] asc
What I'd like to achieve is for my denserank column to increment every time there is a change in the value of col2 (even if the value itself is reused). I can't actually order by the column I have denserank on, so that won't work). See the whatiwant column for an example.
Is there any way to achieve this with DENSE_RANK()? Or is there an alternative?
I would do it with a recursive cte like this:
declare #Dept table (col1 integer, col2 integer)
insert into #Dept values(1, 1),(2, 1),(3, 2),(4, 2),(5, 1),(6, 2),(7, 2),(8, 3)
;with a as (
select col1, col2,
ROW_NUMBER() over (order by col1) as rn
from #Dept),
s as
(select col1, col2, rn, 1 as dr from a where rn=1
union all
select a.col1, a.col2, a.rn, case when a.col2=s.col2 then s.dr else s.dr+1 end as dr
from a inner join s on a.rn=s.rn+1)
col1, col2, dr from s
result:
col1 col2 dr
----------- ----------- -----------
1 1 1
2 1 1
3 2 2
4 2 2
5 1 3
6 2 4
7 2 4
8 3 5
The ROW_NUMBER is only required in case your col1 values are not sequential. If they are you can use the recursive cte straight away
Try this using window functions:
with t(col1 ,col2) as (
select 1 , 1 union all
select 2 , 1 union all
select 3 , 2 union all
select 4 , 2 union all
select 5 , 1 union all
select 6 , 2 union all
select 7 , 2 union all
select 8 , 3
)
select t.col1,
t.col2,
sum(x) over (
order by col1
) whatyouwant
from (
select t.*,
case
when col2 = lag(col2) over (
order by col1
)
then 0
else 1
end x
from t
) t
order by col1;
Produces:
It does a single table read and forms group of consecutive equal col2 values in increasing order of col1 and then finds dense rank on that.
x: Assign value 0 if previous row's col2 is same as this row's col2 (in order of increasing col1) otherwise 1
whatyouwant: create groups of equal values of col2 in order of increasing col1 by doing an incremental sum of the value x generated in the last step and that's your output.
Here is one way using SUM OVER(Order by) window aggregate function
SELECT col1,Col2,
Sum(CASE WHEN a.prev_val = a.col2 THEN 0 ELSE 1 END) OVER(ORDER BY col1) AS whatiwant
FROM (SELECT col1,
col2,
Lag(col2, 1)OVER(ORDER BY col1) AS prev_val
FROM Yourtable) a
ORDER BY col1;
How it works:
LAG window function is used to find the previous col2 for each row ordered by col1
SUM OVER(Order by) will increment the number only when previous col2 is not equal to current col2
I think this is possible in pure SQL using some gaps and islands tricks, but the path of least resistance might be to use a session variable combined with LAG() to keep track of when your computed dense rank changes value. In the query below, I use #a to keep track of the change in the dense rank, and when it changes this variable is incremented by 1.
DECLARE #a int
SET #a = 1
SELECT t.col1,
t.col2,
t.denserank,
#a = CASE WHEN LAG(t.denserank, 1, 1) OVER (ORDER BY t.col1) = t.denserank
THEN #a
ELSE #a+1 END AS [whatiwant]
FROM
(
SELECT col1, col2, DENSE_RANK() OVER (ORDER BY COL2) AS [denserank]
FROM [table1]
) t
ORDER BY t.col1

Query to get previous value

I have a scenerio where I need previous column value but it should not be same as current column value.
Table A:
+------+------+-------------+
| Col1 | Col2 | Lead_Col2 |
+------+------+-------------+
| 1 | A | NULL |
| 2 | B | A |
| 3 | B | A |
| 4 | C | B |
| 5 | C | B |
| 6 | C | B |
| 7 | D | C |
+------+------+-------------+
As Given above, I need previuos column(Col2) value. which is not same as current value.
Try:
select *
from (select col1,
col2,
lag(col2, 1) over(order by col1) as prev_col2
from table_a)
where col2 <> prev_col2
The name lead_col2 is misleading, because you really want a lag.
Here is a brute force method that uses a correlated subquery to get the index of the value and then joins the value in:
select aa.col1, aa.col2, aa.col2
from (select col1, col2,
(select max(col1) as maxcol1
from a a2
where a2.id < a.id and a2.col2 <> a.col2
) as prev_col1
from a
) aa left join
a
on aa.maxcol1 = a.col1
EDIT:
You can also use logic with lead() and ignore NULLs. If a value is the last in its sequence, then use that value, otherwise set it to NULL. Then use lag() with ignoreNULL`s:
select col1, col2,
lag(col3) over (order by col1 ignore nulls)
from (select col1, col2,
(case when col2 <> lead(col2) over (order by col1) then col2
end) as col3
from a
) a;
Try this:
select t.col1
,t.col2
,first_value(lag_col2) over (partition by col2 order by ord) lag_col2
from (select t.*
,case when lag_col2 = col2 then 1 else 0 end ord
from (select t.*
,lag (col2) over (order by col1) lag_col2
from table1 t
)t
)t
order by col1
SQL Fiddle