Filter rows and select in to another columns in SQL? - sql

I have a table like below.
If(OBJECT_ID('tempdb..#temp') Is Not Null)
Begin
Drop Table #Temp
End
create table #Temp
(
Type int,
Code Varchar(50),
)
Insert Into #Temp
SELECT 1,'1'
UNION
SELECT 1,'2'
UNION
SELECT 1,'3'
UNION
SELECT 2,'4'
UNION
SELECT 2,'5'
UNION
SELECT 2,'6'
select * from #Temp
And would like to get the below result.
Type_1
Code_1
Type_2
Code_2
1
1
2
4
1
2
2
5
1
3
2
6
I have tried with union and inner join, but not getting desired result. Please help.

You can use full outer join and cte as follows:
With cte as
(Select type, code,
Row_number() over (partition by type order by code) as rn
From your_table t)
Select t1.type, t1.code, t2.type, t2.code
From cte t1 full join cte t2
On t1.rn = t2.rn and t1.type =1 and t2.type = 2

Here is a query which will produce the output you expect:
WITH cte AS (
SELECT t.[Type], t.Code
, rn = ROW_NUMBER() OVER (PARTITION BY t.[Type] ORDER BY t.Code)
FROM #Temp t
)
SELECT Type_1 = t1.[Type], Code_1 = t1.Code
, Type_2 = t2.[Type], Code_2 = t2.Code
FROM cte t1
JOIN cte t2 ON t1.rn = t2.rn AND t2.[Type] = 2
AND t1.[Type] = 1
This query is will filter out any Type_1 records which do not have a Type_2 record. This means if there are an uneven number of Type_1 vs Type_2 records, the extra records will get eliminated.
Explanation:
Since there is no obvious way to join the two sets of data, because there is no shared key between them, we need to create one.
So we use this query:
SELECT t.[Type], t.Code
, rn = ROW_NUMBER() OVER (PARTITION BY t.[Type] ORDER BY t.Code)
FROM #Temp t
Which assigns a ROW_NUMBER to every row...It restarts the numbering for every Type value, and it orders the numbering by the Code.
So it will produce:
| Type | Code | rn |
|------|------|----|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 3 |
| 2 | 4 | 1 |
| 2 | 5 | 2 |
| 2 | 6 | 3 |
Now you can see that we have assigned a key to each row of Type 1's and Type 2's which we can use for the joining process.
In order for us to re-use this output, we can stick it in a CTE and perform a self join (not an actual type of join, it just means we want to join a table to itself).
That's what this query is doing:
SELECT *
FROM cte t1
JOIN cte t2 ON t1.rn = t2.rn AND t2.[Type] = 2
AND t1.[Type] = 1
It's saying, "give me a list of all Type 1 records, and then join all Type 2 records to that using the new ROW_NUMBER we've generated".
Note: All of this works based on the assumption that you always want to join the Type 1's and Type 2's based on the order of their Code.

You can also do this using aggregation:
select max(case when type = 1 then type end) as type_1,
max(case when type = 1 then code end) as code_1,
max(case when type = 2 then type end) as type_2,
max(case when type = 2 then code end) as code_2
from (select type, code,
row_number() over (partition by type order by code) as seqnum
from your_table t
) t
group by seqnum;
It would be interesting to know which is faster -- a join approach or aggregation.
Here is a db<>fiddle.

Related

Rolling Average in SQL with Partition [duplicate]

declare #t table
(
id int,
SomeNumt int
)
insert into #t
select 1,10
union
select 2,12
union
select 3,3
union
select 4,15
union
select 5,23
select * from #t
the above select returns me the following.
id SomeNumt
1 10
2 12
3 3
4 15
5 23
How do I get the following:
id srome CumSrome
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
select t1.id, t1.SomeNumt, SUM(t2.SomeNumt) as sum
from #t t1
inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.SomeNumt
order by t1.id
SQL Fiddle example
Output
| ID | SOMENUMT | SUM |
-----------------------
| 1 | 10 | 10 |
| 2 | 12 | 22 |
| 3 | 3 | 25 |
| 4 | 15 | 40 |
| 5 | 23 | 63 |
Edit: this is a generalized solution that will work across most db platforms. When there is a better solution available for your specific platform (e.g., gareth's), use it!
The latest version of SQL Server (2012) permits the following.
SELECT
RowID,
Col1,
SUM(Col1) OVER(ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
or
SELECT
GroupID,
RowID,
Col1,
SUM(Col1) OVER(PARTITION BY GroupID ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
This is even faster. Partitioned version completes in 34 seconds over 5 million rows for me.
Thanks to Peso, who commented on the SQL Team thread referred to in another answer.
For SQL Server 2012 onwards it could be easy:
SELECT id, SomeNumt, sum(SomeNumt) OVER (ORDER BY id) as CumSrome FROM #t
because ORDER BY clause for SUM by default means RANGE UNBOUNDED PRECEDING AND CURRENT ROW for window frame ("General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx)
Let's first create a table with dummy data:
Create Table CUMULATIVESUM (id tinyint , SomeValue tinyint)
Now let's insert some data into the table;
Insert Into CUMULATIVESUM
Select 1, 10 union
Select 2, 2 union
Select 3, 6 union
Select 4, 10
Here I am joining same table (self joining)
Select c1.ID, c1.SomeValue, c2.SomeValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Order By c1.id Asc
Result:
ID SomeValue SomeValue
-------------------------
1 10 10
2 2 10
2 2 2
3 6 10
3 6 2
3 6 6
4 10 10
4 10 2
4 10 6
4 10 10
Here we go now just sum the Somevalue of t2 and we`ll get the answer:
Select c1.ID, c1.SomeValue, Sum(c2.SomeValue) CumulativeSumValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Group By c1.ID, c1.SomeValue
Order By c1.id Asc
For SQL Server 2012 and above (much better performance):
Select
c1.ID, c1.SomeValue,
Sum (SomeValue) Over (Order By c1.ID )
From CumulativeSum c1
Order By c1.id Asc
Desired result:
ID SomeValue CumlativeSumValue
---------------------------------
1 10 10
2 2 12
3 6 18
4 10 28
Drop Table CumulativeSum
A CTE version, just for fun:
;
WITH abcd
AS ( SELECT id
,SomeNumt
,SomeNumt AS MySum
FROM #t
WHERE id = 1
UNION ALL
SELECT t.id
,t.SomeNumt
,t.SomeNumt + a.MySum AS MySum
FROM #t AS t
JOIN abcd AS a ON a.id = t.id - 1
)
SELECT * FROM abcd
OPTION ( MAXRECURSION 1000 ) -- limit recursion here, or 0 for no limit.
Returns:
id SomeNumt MySum
----------- ----------- -----------
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
Late answer but showing one more possibility...
Cumulative Sum generation can be more optimized with the CROSS APPLY logic.
Works better than the INNER JOIN & OVER Clause when analyzed the actual query plan ...
/* Create table & populate data */
IF OBJECT_ID('tempdb..#TMP') IS NOT NULL
DROP TABLE #TMP
SELECT * INTO #TMP
FROM (
SELECT 1 AS id
UNION
SELECT 2 AS id
UNION
SELECT 3 AS id
UNION
SELECT 4 AS id
UNION
SELECT 5 AS id
) Tab
/* Using CROSS APPLY
Query cost relative to the batch 17%
*/
SELECT T1.id,
T2.CumSum
FROM #TMP T1
CROSS APPLY (
SELECT SUM(T2.id) AS CumSum
FROM #TMP T2
WHERE T1.id >= T2.id
) T2
/* Using INNER JOIN
Query cost relative to the batch 46%
*/
SELECT T1.id,
SUM(T2.id) CumSum
FROM #TMP T1
INNER JOIN #TMP T2
ON T1.id > = T2.id
GROUP BY T1.id
/* Using OVER clause
Query cost relative to the batch 37%
*/
SELECT T1.id,
SUM(T1.id) OVER( PARTITION BY id)
FROM #TMP T1
Output:-
id CumSum
------- -------
1 1
2 3
3 6
4 10
5 15
Select
*,
(Select Sum(SOMENUMT)
From #t S
Where S.id <= M.id)
From #t M
You can use this simple query for progressive calculation :
select
id
,SomeNumt
,sum(SomeNumt) over(order by id ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as CumSrome
from #t
There is a much faster CTE implementation available in this excellent post:
http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx
The problem in this thread can be expressed like this:
DECLARE #RT INT
SELECT #RT = 0
;
WITH abcd
AS ( SELECT TOP 100 percent
id
,SomeNumt
,MySum
order by id
)
update abcd
set #RT = MySum = #RT + SomeNumt
output inserted.*
For Ex: IF you have a table with two columns one is ID and second is number and wants to find out the cumulative sum.
SELECT ID,Number,SUM(Number)OVER(ORDER BY ID) FROM T
Once the table is created -
select
A.id, A.SomeNumt, SUM(B.SomeNumt) as sum
from #t A, #t B where A.id >= B.id
group by A.id, A.SomeNumt
order by A.id
The SQL solution wich combines "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "SUM" did exactly what i wanted to achieve.
Thank you so much!
If it can help anyone, here was my case. I wanted to cumulate +1 in a column whenever a maker is found as "Some Maker" (example). If not, no increment but show previous increment result.
So this piece of SQL:
SUM( CASE [rmaker] WHEN 'Some Maker' THEN 1 ELSE 0 END)
OVER
(PARTITION BY UserID ORDER BY UserID,[rrank] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Cumul_CNT
Allowed me to get something like this:
User 1 Rank1 MakerA 0
User 1 Rank2 MakerB 0
User 1 Rank3 Some Maker 1
User 1 Rank4 Some Maker 2
User 1 Rank5 MakerC 2
User 1 Rank6 Some Maker 3
User 2 Rank1 MakerA 0
User 2 Rank2 SomeMaker 1
Explanation of above: It starts the count of "some maker" with 0, Some Maker is found and we do +1. For User 1, MakerC is found so we dont do +1 but instead vertical count of Some Maker is stuck to 2 until next row.
Partitioning is by User so when we change user, cumulative count is back to zero.
I am at work, I dont want any merit on this answer, just say thank you and show my example in case someone is in the same situation. I was trying to combine SUM and PARTITION but the amazing syntax "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" completed the task.
Thanks!
Groaker
Above (Pre-SQL12) we see examples like this:-
SELECT
T1.id, SUM(T2.id) AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < = T1.id
GROUP BY
T1.id
More efficient...
SELECT
T1.id, SUM(T2.id) + T1.id AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < T1.id
GROUP BY
T1.id
Try this
select
t.id,
t.SomeNumt,
sum(t.SomeNumt) Over (Order by t.id asc Rows Between Unbounded Preceding and Current Row) as cum
from
#t t
group by
t.id,
t.SomeNumt
order by
t.id asc;
Try this:
CREATE TABLE #t(
[name] varchar NULL,
[val] [int] NULL,
[ID] [int] NULL
) ON [PRIMARY]
insert into #t (id,name,val) values
(1,'A',10), (2,'B',20), (3,'C',30)
select t1.id, t1.val, SUM(t2.val) as cumSum
from #t t1 inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.val order by t1.id
Without using any type of JOIN cumulative salary for a person fetch by using follow query:
SELECT * , (
SELECT SUM( salary )
FROM `abc` AS table1
WHERE table1.ID <= `abc`.ID
AND table1.name = `abc`.Name
) AS cum
FROM `abc`
ORDER BY Name

What would be the best way to write a query to produce a table given the following data?

I have a table that contains the following data:
ADD_Col Data OrderId Output NEW_ADD Col1 Col2
----- ------ ------- -----> ------- -------- -------
AD*A*1 A 96 A 1 2
AD*A*1 B 95 B 1 1
AD*A*1 C 94 C 0.8 1
AD*A*1 D 93 D 5 2
AD*A*2 1 92
AD*A*2 1 91
AD*A*2 0.8 90
AD*A*2 5 89
AD*A*3 2 88
AD*A*3 1 87
AD*A*3 1 86
AD*A*3 2 85
This data is all in the same table and I need to link each letter to each factor. I was thinking of doing a ROW_NUMBER() and joining based on the respective row number and assign my letter the same number either that or DENSERANK. What would be the best way to achieve this? If you can please provide query examples that would be great thanks.
Seems like what you need to do is normalise your data here. Here I use PARSENAME to get the "column Number", and then ROW_NUMBER to number the relevant rows in the groups. Finally I use a Cross tab to Pivot to data:
WITH CTE AS(
SELECT V.[Key],
V.data,
V.[Order],
PARSENAME(REPLACE(V.[Key],'*','.'),1) AS ColNo,
ROW_NUMBER() OVER (PARTITION BY V.[Key] ORDER BY V.[Order] DESC) AS RN
FROM (VALUES('AD*A*1','A',96),
('AD*A*1','B',95),
('AD*A*1','C',94),
('AD*A*1','D',93),
('AD*A*2','1',92),
('AD*A*2','1',91),
('AD*A*2','0.8',90),
('AD*A*2','5',89),
('AD*A*3','2',88),
('AD*A*3','1',87),
('AD*A*3','1',86),
('AD*A*3','2',85))V([Key],[data],[Order]))
SELECT MAX(CASE C.ColNo WHEN '1' THEN C.[data] END) AS New_ADD,
MAX(CASE C.ColNo WHEN '2' THEN C.[data] END) AS Col1,
MAX(CASE C.ColNo WHEN '3' THEN C.[data] END) AS Col2
FROM CTE C
GROUP BY C.RN;
For your sample data this will work:
with cte as (
select *,
row_number() over (partition by [key] order by [OrderId desc]) rn,
dense_rank() over (order by [key]) rk
from tablename
)
select t1.data,
max(case when t2.rk = 2 then t2.data end) col1,
max(case when t2.rk = 3 then t2.data end) col2
from (select * from cte where rk = 1) t1
inner join (select * from cte where rk in (2, 3)) t2
on t2.rn = t1.rn
group by t1.data
See the demo.
Results:
> data | col1 | col2
> :--- | :--- | :---
> A | 1 | 2
> B | 1 | 1
> C | 0.8 | 1
> D | 5 | 2
select t1.Data "Key"
, t2.Data "Col1"
, t3.Data "Col2"
from ((SELECT Data,
row_number() over (order by Key_C) rn
from my_table
where Key_C = 'AD*A*1') t1
left join
(SELECT Data,
row_number() over (order by Key_C) rn
from my_table
where Key_C = 'AD*A*2') t2
on t1.rn = t2.rn
left join
(SELECT Data,
row_number() over (order by Key_C) rn
from my_table
where Key_C = 'AD*A*3') t3
on t2.rn = t3.rn);
Here is the DEMO
DROP TABLE IF EXISTS #RawData
SELECT
[ADD_Col]
,[Data]
,[OrderId]
,REPLACE([ADD_Col], 'AD*A*', '') AS [Level]
,DENSE_RANK() OVER (PARTITION BY [ADD_Col] ORDER BY [OrderId] DESC) AS [Grouping]
INTO
#RawData
FROM
[SourceTable]
SELECT
rd.[Data]
,rdc1.[Data] AS [Col1]
,rdc2.[Data] AS [Col2]
FROM
#RawData AS rd
LEFT OUTER JOIN #RawData AS rdc1
ON rdc1.[Level] = 2
AND rd.[Grouping] = rdc1.[Grouping]
LEFT OUTER JOIN #RawData AS rdc2
ON rdc2.[Level] = 3
AND rd.[Grouping] = rdc2.[Grouping]
WHERE
rd.[Level] = 1

Aligning offset data values sql join

Currently, I've got two rows of data pivoted using case statement into two columns. Data is not aligning. Within the case statement, is there a way to align? All values in Column A has a corresponding value in Column B.
+----------+----------+
| Column A | Column B |
+----------+----------+
| Null | 0 |
+----------+----------+
| 40 | Null |
+----------+----------+
| Null | 0 |
+----------+----------+
| 50 | Null |
+----------+----------+
Expected Output:
+----------+----------+
| Column A | Column B |
+----------+----------+
| 40 | 0 |
+----------+----------+
| 50 | 0 |
+----------+----------+
SELECT (CASE WHEN t.[column A] = 'Column A' THEN t.value END) AS [Column A],
(CASE WHEN t.[column B] = 'Column B' THEN t.value END) AS [Column B]
FROM t INNER JOIN t1 ON t.ID = t1.ID
WHERE t1.string = '123455'
cross apply should work for this requirement since there's no conditions.
select t.colA, max(t.ColB) from(
select t1.colA, t2.colB from testA t1
cross apply testA t2
where t1.colA is not null) t
group by t.colA
sql fiddle
SQL tables represent unordered sets. In a sense, there is no way to answer your question, although this would work:
select a, 0 as b
from t
where a is not null;
However, I suspect that you want to align values based on the ordering. For that, you need an ordering column. Something like this should work:
select max(a) as a, max(b) as b
from (select t.*, row_number() over (order by <ordering col>) as seqnum
from t
) t
group by floor( (seqnum - 1) / 2 ); -- floor() is not really needed
For this to work, though, you need a column that specifies the ordering.
Try this:
select t2.weight, t1.invalid
from mytable t1
join mytable t2 on t2.sequence = t1.sequence
where t1.weight is null
and t2.invalid is null
I have put sample data and tried the below. It is working fine. You can use Window functions & GROUP BY aggregate to arrive at the result.
CREATE TABLE #test1 (seq int, columna int, columnb int)
INSERT INTO #test1
values (0,99,null), (0,null,0)
select * from #test1
select seq, MAX(case when rnk_columna = 1 then columna else 0 end) as columna, MAX(case when rnk_columnb = 1 then columnb else 0 end) as columna
from
(select seq, columna, columnb, Row_number() over(partition by seq order by columna desc) as rnk_columna, Row_number() over(partition by seq order by columnb desc) as rnk_columnb
from #test1) as t
group by seq

Finding Missing Numbers series when Data Is Grouped in sql server

I need to write a query that will calculate the missing numbers with their count in a sequence when the data is "grouped". The data are in multiple groups & each group is in sequence.
For Ex. I have number series like 1001-1050, 1245-1270, 4571-4590 and all numbers like 1001,1002,1003,....1050 is stored in Table1 and from that Table1 some numbers are stored in another table Table2. E.g. 1001,1002,1003,1004,1005.
I want to get output like this:
Utilized Numbers | Balance Numbers |
----------- -------------------------
1001 - 1005 = 5 | 1006 - 1050 = 45 |
1245 - 1251 = 7 | 1252 - 1270 = 19 |
4571 - 4573 = 3 | 4574 - 4590 = 17 |
The number of each series is single field which is stored in both tables.
You haven't really explained your data, but guessing that "Utilized" are the numbers found in both Table1 and Table2, and "Balance" are the numbers only in Table1.
You can get the result at least this way, it's a little bit messy, mostly because of formatting the results:
Edit: This is a new version that does not use lag.
select
min (case when C2 = 1 then MINID end), max (case when C2 = 1 then MAXID end), max(case when C2=1 then ROWS end),
min (case when C2 = 0 then MINID end), max (case when C2 = 0 then MAXID end), max(case when C2=0 then ROWS end)
from (
select min(ID) as MINID, max(ID) as MAXID, count(*) as ROWS, C2, row_number() over (partition by C2 order by min(ID)) as GRP3 from (
select *, ID - RN as GRP1, ID - RN2 as GRP2 from (
select
T1.ID, row_number() over (order by T1.ID) as RN,
case when T2.ID is NULL then 0 else 1 end as C2,
row_number() over (partition by case when T2.ID is NULL then 0 else 1 end order by T1.ID) as RN2,
T2.ID as ID2
from #Table1 T1
left outer join #Table2 T2 on T1.ID = T2.ID
) X
) Y
group by GRP1, GRP2, C2
) Z
group by GRP3
order by 1
The idea here is to have a row number ordered by Table1.ID, and it's compared to the Table1.ID, and if the difference changes, then it's a new group. The same logic is used second time, but now partitioned differently for rows that exist in Table2 to handle changes between "Utilized" and "Balance".
From those groupings you can get the min and max value + number of rows. There's one additional grouping with min/max and case to format the result into 2 columns.
See the demo.

Any other alternative to write this SQL query

I need to select data base upon three conditions
Find the latest date (StorageDate Column) from the table for each record
See if there is more then one entry for date (StorageDate Column) found in first step for same ID (ID Column)
and then see if DuplicateID is = 2
So if table has following data:
ID |StorageDate | DuplicateTypeID
1 |2014-10-22 | 1
1 |2014-10-22 | 2
1 |2014-10-18 | 1
2 |2014-10-12 | 1
3 |2014-10-11 | 1
4 |2014-09-02 | 1
4 |2014-09-02 | 2
Then I should get following results
ID
1
4
I have written following query but it is really slow, I was wondering if anyone has better way to write it.
SELECT DISTINCT(TD.RecordID)
FROM dbo.MyTable TD
JOIN (
SELECT T1.RecordID, T2.MaxDate,COUNT(*) AS RecordCount
FROM MyTable T1 WITH (nolock)
JOIN (
SELECT RecordID, MAX(StorageDate) AS MaxDate
FROM MyTable WITH (nolock)
GROUP BY RecordID)T2
ON T1.RecordID = T2.RecordID AND T1.StorageDate = T2.MaxDate
GROUP BY T1.RecordID, T2.MaxDate
HAVING COUNT(*) > 1
)PT ON TD.RecordID = PT.RecordID AND TD.StorageDate = PT.MaxDate
WHERE TD.DuplicateTypeID = 2
Try this and see how the performance goes:
;WITH
tmp AS
(
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY StorageDate DESC) AS StorageDateRank,
COUNT(ID) OVER (PARTITION BY ID, StorageDate) AS StorageDateCount
FROM MyTable
)
SELECT DISTINCT ID
FROM tmp
WHERE StorageDateRank = 1 -- latest date for each ID
AND StorageDateCount > 1 -- more than 1 entry for date
AND DuplicateTypeID = 2 -- DuplicateTypeID = 2
You can use analytic function rank , can you try this query ?
Select recordId from
(
select *, rank() over ( partition by recordId order by [StorageDate] desc) as rn
from mytable
) T
where rn =1
group by recordId
having count(*) >1
and sum( case when duplicatetypeid =2 then 1 else 0 end) >=1