SQL Server - Remove duplicates with different ordering - sql

I have a table containing pairs of items bought together and the # of times the pairing occurred.
item_1 item_2 count
123 234 5
345 567 22
567 345 22
890 345 6
Some of the pairings are dupes that differ just by order (ie rows 2&3).
Is there an easy way to de-dupe this table?

If the "dups" can appear only once in either direction, then a convenient way is:
select t.*
from t
where t.item_1 <= t.item_2
union all
select t.*
from t t
where t.item_1 > t.item2 and
not exists (select 1
from t t2
where t2.item_1 = t.item_2 and t.item_2 = t.item_1 and t2.count = t.count
);

You can use this script.
DECLARE #T TABLE (item_1 INT, item_2 INT , [count] INT)
INSERT INTO #T
VALUES
(123 ,234, 5),
(345 ,567, 22),
(567 ,345, 22),
(890 ,345, 6)
;WITH BASE AS
(
SELECT RN = ROW_NUMBER() OVER(ORDER BY item_1), * FROM #T
)
SELECT T1.item_1, T1.item_2, T1.count FROM BASE T1
OUTER APPLY (SELECT TOP 1 *
FROM BASE T2
WHERE T2.RN > T1.RN AND T1.item_1 = T2.item_2 AND T1.item_2 = T2.item_1) X
WHERE X.RN IS NULL
Result
item_1 item_2 count
----------- ----------- -----------
123 234 5
567 345 22
890 345 6

You can classify a pair being the same with comparison similar to the least and greatest of the two. And select one of them.
select item_1,item_2,count
from (select t.*
,row_number() over(partition by case when item_1<item_2 then item_1 else item_2 end,
case when item_1>item_2 then item_1 else item_2 end
order by item_1) as rnum
from tbl t
) t
where rnum=1
Edit: Per Gordon's comment, if the duplicates have to eliminated only when the count is the same, use
select item_1,item_2,count
from (select t.*
,row_number() over(partition by case when item_1<item_2 then item_1 else item_2 end,
case when item_1>item_2 then item_1 else item_2 end,
count
order by item_1) as rnum
from tbl t
) t
where rnum=1

Related

Rolling Average in SQL with Partition [duplicate]

declare #t table
(
id int,
SomeNumt int
)
insert into #t
select 1,10
union
select 2,12
union
select 3,3
union
select 4,15
union
select 5,23
select * from #t
the above select returns me the following.
id SomeNumt
1 10
2 12
3 3
4 15
5 23
How do I get the following:
id srome CumSrome
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
select t1.id, t1.SomeNumt, SUM(t2.SomeNumt) as sum
from #t t1
inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.SomeNumt
order by t1.id
SQL Fiddle example
Output
| ID | SOMENUMT | SUM |
-----------------------
| 1 | 10 | 10 |
| 2 | 12 | 22 |
| 3 | 3 | 25 |
| 4 | 15 | 40 |
| 5 | 23 | 63 |
Edit: this is a generalized solution that will work across most db platforms. When there is a better solution available for your specific platform (e.g., gareth's), use it!
The latest version of SQL Server (2012) permits the following.
SELECT
RowID,
Col1,
SUM(Col1) OVER(ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
or
SELECT
GroupID,
RowID,
Col1,
SUM(Col1) OVER(PARTITION BY GroupID ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
This is even faster. Partitioned version completes in 34 seconds over 5 million rows for me.
Thanks to Peso, who commented on the SQL Team thread referred to in another answer.
For SQL Server 2012 onwards it could be easy:
SELECT id, SomeNumt, sum(SomeNumt) OVER (ORDER BY id) as CumSrome FROM #t
because ORDER BY clause for SUM by default means RANGE UNBOUNDED PRECEDING AND CURRENT ROW for window frame ("General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx)
Let's first create a table with dummy data:
Create Table CUMULATIVESUM (id tinyint , SomeValue tinyint)
Now let's insert some data into the table;
Insert Into CUMULATIVESUM
Select 1, 10 union
Select 2, 2 union
Select 3, 6 union
Select 4, 10
Here I am joining same table (self joining)
Select c1.ID, c1.SomeValue, c2.SomeValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Order By c1.id Asc
Result:
ID SomeValue SomeValue
-------------------------
1 10 10
2 2 10
2 2 2
3 6 10
3 6 2
3 6 6
4 10 10
4 10 2
4 10 6
4 10 10
Here we go now just sum the Somevalue of t2 and we`ll get the answer:
Select c1.ID, c1.SomeValue, Sum(c2.SomeValue) CumulativeSumValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Group By c1.ID, c1.SomeValue
Order By c1.id Asc
For SQL Server 2012 and above (much better performance):
Select
c1.ID, c1.SomeValue,
Sum (SomeValue) Over (Order By c1.ID )
From CumulativeSum c1
Order By c1.id Asc
Desired result:
ID SomeValue CumlativeSumValue
---------------------------------
1 10 10
2 2 12
3 6 18
4 10 28
Drop Table CumulativeSum
A CTE version, just for fun:
;
WITH abcd
AS ( SELECT id
,SomeNumt
,SomeNumt AS MySum
FROM #t
WHERE id = 1
UNION ALL
SELECT t.id
,t.SomeNumt
,t.SomeNumt + a.MySum AS MySum
FROM #t AS t
JOIN abcd AS a ON a.id = t.id - 1
)
SELECT * FROM abcd
OPTION ( MAXRECURSION 1000 ) -- limit recursion here, or 0 for no limit.
Returns:
id SomeNumt MySum
----------- ----------- -----------
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
Late answer but showing one more possibility...
Cumulative Sum generation can be more optimized with the CROSS APPLY logic.
Works better than the INNER JOIN & OVER Clause when analyzed the actual query plan ...
/* Create table & populate data */
IF OBJECT_ID('tempdb..#TMP') IS NOT NULL
DROP TABLE #TMP
SELECT * INTO #TMP
FROM (
SELECT 1 AS id
UNION
SELECT 2 AS id
UNION
SELECT 3 AS id
UNION
SELECT 4 AS id
UNION
SELECT 5 AS id
) Tab
/* Using CROSS APPLY
Query cost relative to the batch 17%
*/
SELECT T1.id,
T2.CumSum
FROM #TMP T1
CROSS APPLY (
SELECT SUM(T2.id) AS CumSum
FROM #TMP T2
WHERE T1.id >= T2.id
) T2
/* Using INNER JOIN
Query cost relative to the batch 46%
*/
SELECT T1.id,
SUM(T2.id) CumSum
FROM #TMP T1
INNER JOIN #TMP T2
ON T1.id > = T2.id
GROUP BY T1.id
/* Using OVER clause
Query cost relative to the batch 37%
*/
SELECT T1.id,
SUM(T1.id) OVER( PARTITION BY id)
FROM #TMP T1
Output:-
id CumSum
------- -------
1 1
2 3
3 6
4 10
5 15
Select
*,
(Select Sum(SOMENUMT)
From #t S
Where S.id <= M.id)
From #t M
You can use this simple query for progressive calculation :
select
id
,SomeNumt
,sum(SomeNumt) over(order by id ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as CumSrome
from #t
There is a much faster CTE implementation available in this excellent post:
http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx
The problem in this thread can be expressed like this:
DECLARE #RT INT
SELECT #RT = 0
;
WITH abcd
AS ( SELECT TOP 100 percent
id
,SomeNumt
,MySum
order by id
)
update abcd
set #RT = MySum = #RT + SomeNumt
output inserted.*
For Ex: IF you have a table with two columns one is ID and second is number and wants to find out the cumulative sum.
SELECT ID,Number,SUM(Number)OVER(ORDER BY ID) FROM T
Once the table is created -
select
A.id, A.SomeNumt, SUM(B.SomeNumt) as sum
from #t A, #t B where A.id >= B.id
group by A.id, A.SomeNumt
order by A.id
The SQL solution wich combines "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "SUM" did exactly what i wanted to achieve.
Thank you so much!
If it can help anyone, here was my case. I wanted to cumulate +1 in a column whenever a maker is found as "Some Maker" (example). If not, no increment but show previous increment result.
So this piece of SQL:
SUM( CASE [rmaker] WHEN 'Some Maker' THEN 1 ELSE 0 END)
OVER
(PARTITION BY UserID ORDER BY UserID,[rrank] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Cumul_CNT
Allowed me to get something like this:
User 1 Rank1 MakerA 0
User 1 Rank2 MakerB 0
User 1 Rank3 Some Maker 1
User 1 Rank4 Some Maker 2
User 1 Rank5 MakerC 2
User 1 Rank6 Some Maker 3
User 2 Rank1 MakerA 0
User 2 Rank2 SomeMaker 1
Explanation of above: It starts the count of "some maker" with 0, Some Maker is found and we do +1. For User 1, MakerC is found so we dont do +1 but instead vertical count of Some Maker is stuck to 2 until next row.
Partitioning is by User so when we change user, cumulative count is back to zero.
I am at work, I dont want any merit on this answer, just say thank you and show my example in case someone is in the same situation. I was trying to combine SUM and PARTITION but the amazing syntax "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" completed the task.
Thanks!
Groaker
Above (Pre-SQL12) we see examples like this:-
SELECT
T1.id, SUM(T2.id) AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < = T1.id
GROUP BY
T1.id
More efficient...
SELECT
T1.id, SUM(T2.id) + T1.id AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < T1.id
GROUP BY
T1.id
Try this
select
t.id,
t.SomeNumt,
sum(t.SomeNumt) Over (Order by t.id asc Rows Between Unbounded Preceding and Current Row) as cum
from
#t t
group by
t.id,
t.SomeNumt
order by
t.id asc;
Try this:
CREATE TABLE #t(
[name] varchar NULL,
[val] [int] NULL,
[ID] [int] NULL
) ON [PRIMARY]
insert into #t (id,name,val) values
(1,'A',10), (2,'B',20), (3,'C',30)
select t1.id, t1.val, SUM(t2.val) as cumSum
from #t t1 inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.val order by t1.id
Without using any type of JOIN cumulative salary for a person fetch by using follow query:
SELECT * , (
SELECT SUM( salary )
FROM `abc` AS table1
WHERE table1.ID <= `abc`.ID
AND table1.name = `abc`.Name
) AS cum
FROM `abc`
ORDER BY Name

Filter rows and select in to another columns in SQL?

I have a table like below.
If(OBJECT_ID('tempdb..#temp') Is Not Null)
Begin
Drop Table #Temp
End
create table #Temp
(
Type int,
Code Varchar(50),
)
Insert Into #Temp
SELECT 1,'1'
UNION
SELECT 1,'2'
UNION
SELECT 1,'3'
UNION
SELECT 2,'4'
UNION
SELECT 2,'5'
UNION
SELECT 2,'6'
select * from #Temp
And would like to get the below result.
Type_1
Code_1
Type_2
Code_2
1
1
2
4
1
2
2
5
1
3
2
6
I have tried with union and inner join, but not getting desired result. Please help.
You can use full outer join and cte as follows:
With cte as
(Select type, code,
Row_number() over (partition by type order by code) as rn
From your_table t)
Select t1.type, t1.code, t2.type, t2.code
From cte t1 full join cte t2
On t1.rn = t2.rn and t1.type =1 and t2.type = 2
Here is a query which will produce the output you expect:
WITH cte AS (
SELECT t.[Type], t.Code
, rn = ROW_NUMBER() OVER (PARTITION BY t.[Type] ORDER BY t.Code)
FROM #Temp t
)
SELECT Type_1 = t1.[Type], Code_1 = t1.Code
, Type_2 = t2.[Type], Code_2 = t2.Code
FROM cte t1
JOIN cte t2 ON t1.rn = t2.rn AND t2.[Type] = 2
AND t1.[Type] = 1
This query is will filter out any Type_1 records which do not have a Type_2 record. This means if there are an uneven number of Type_1 vs Type_2 records, the extra records will get eliminated.
Explanation:
Since there is no obvious way to join the two sets of data, because there is no shared key between them, we need to create one.
So we use this query:
SELECT t.[Type], t.Code
, rn = ROW_NUMBER() OVER (PARTITION BY t.[Type] ORDER BY t.Code)
FROM #Temp t
Which assigns a ROW_NUMBER to every row...It restarts the numbering for every Type value, and it orders the numbering by the Code.
So it will produce:
| Type | Code | rn |
|------|------|----|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 3 |
| 2 | 4 | 1 |
| 2 | 5 | 2 |
| 2 | 6 | 3 |
Now you can see that we have assigned a key to each row of Type 1's and Type 2's which we can use for the joining process.
In order for us to re-use this output, we can stick it in a CTE and perform a self join (not an actual type of join, it just means we want to join a table to itself).
That's what this query is doing:
SELECT *
FROM cte t1
JOIN cte t2 ON t1.rn = t2.rn AND t2.[Type] = 2
AND t1.[Type] = 1
It's saying, "give me a list of all Type 1 records, and then join all Type 2 records to that using the new ROW_NUMBER we've generated".
Note: All of this works based on the assumption that you always want to join the Type 1's and Type 2's based on the order of their Code.
You can also do this using aggregation:
select max(case when type = 1 then type end) as type_1,
max(case when type = 1 then code end) as code_1,
max(case when type = 2 then type end) as type_2,
max(case when type = 2 then code end) as code_2
from (select type, code,
row_number() over (partition by type order by code) as seqnum
from your_table t
) t
group by seqnum;
It would be interesting to know which is faster -- a join approach or aggregation.
Here is a db<>fiddle.

Consolidate, Combine, Merge Rows

Every search I do leads me to results for people seeking array_agg to combine multiple columns in a row into column. That's not what I am trying to figure out here, and maybe I am not using the right search terms (e.g., consolidate, combine, merge).
I am trying to combine rows by populating values in fields ... I am not sure the best way to describe this other than with an example:
Current:
--------------------------------
id num_1 num_2 num_3 num_4
--------------------------------
1 111 222 0 0
2 111 333 0 0
3 111 0 0 444
4 0 222 555 0
5 777 999 0 0
6 0 999 888 0
After Processing:
--------------------------------
id num_1 num_2 num_3 num_4
--------------------------------
1 111 222 555 444
2 111 333 555 444
3 111 333 555 444
4 111 222 555 444
5 777 999 888 0
6 777 999 888 0
After Deleting Duplicate Rows:
--------------------------------
id num_1 num_2 num_3 num_4
--------------------------------
1 111 222 555 444
2 111 333 555 444
3 777 999 888 0
This will likely be a 2 step process ... first fill in the blanks, and then find/delete the duplicate. I can do the second step, but having trouble figuring how to first populate the 0 values with values from another row where you might have two different values (id 1/2 for num_2 column) but only one value for num_1 (e.g., 111)
I can do it in PHP, but would like to figure out how to do it using only Postgres.
EDIT: My example table is a relations table. I have multiple datasets with similar information (e.g., username) but different registration ID numbers. So, I do an inner join on table 1 and table 2 (for example) where the username is the same. Then I take the registration IDs (which are different) from each table and insert that as a row into my relations table. In my example tables above, Row 1 has two different registration IDs from the two tables I joined … the values 111 (num_1) and 222 (num_2) are inserted into the table and zeros inserted for num_3 and num_4. Then I compare table 1 and table 4 and the values 111 (num_1) and 444 (num_4) get inserted into the relations table and zeros for num_2 and num_3. Since registration ID 111 is related to registration ID 222 and registration ID 111 is related to registration ID 444, then registration IDs 111, 222, and 444 are all related (meaning the username is the same for each of those registration IDs). Does that help to clarify?
EDIT 2: I corrected Tables 2 and 3. Hopefully now it makes sense. The username column is not unique. So, I have 4 tables like this:
Table 1:
bob - 111
mary - 777
Table 2:
bob - 222
bob - 333
mary - 999
Table 3:
bob - 555
mary - 888
Table 4:
bob - 444 -- mary does not exist in this table
So, in my relations table I should end up with 3 rows as given in example Table 3 above.
It seems like you started in the middle of a presumed solution, forgetting to present the initial problem. Based on your added information I suggest a completely different, much simpler solution. You have:
CREATE TABLE table1 (username text, registration_id int);
CREATE TABLE table2 (LIKE table1);
CREATE TABLE table3 (LIKE table1);
CREATE TABLE table4 (LIKE table1);
INSERT INTO table1 VALUES ('bob', 111), ('mary', 777);
INSERT INTO table2 VALUES ('bob', 222), ('bob', 333), ('mary', 999);
INSERT INTO table3 VALUES ('bob', 555), ('mary', 888);
INSERT INTO table4 VALUES ('bob', 444); -- no mary
Solution
What you really seem to need is FULL [OUTER] JOIN. Details in the manual on FROM and JOIN.
-- CREATE TABLE relations AS
SELECT username
, t1.registration_id AS reg1
, t2.registration_id AS reg2
, t3.registration_id AS reg3
, t4.registration_id AS reg4
FROM table1 t1
FULL JOIN table2 t2 USING (username)
FULL JOIN table3 t3 USING (username)
FULL JOIN table4 t4 USING (username)
ORDER BY username;
That's all. Produces your desired result directly.
username reg1 reg2 reg3 reg4
---------------------------------
bob 111 222 555 444
bob 111 333 555 444
mary 777 999 888 (null)
Your given example would work with LEFT JOIN as well, since all missing entries are to the right. But that would fail in other constellations. I added some more revealing test cases in the fiddle:
SQL Fiddle.
I assume you are aware that multiple entries in multiple tables will produce a huge number of output rows:
Two SQL LEFT JOINS produce incorrect result
If your values are always increasing (as in the example), then just use cumulative maximum and then select distinct:
select row_number() over (order by min(id)) as id,
t.num1, t.num2, t.num3, t.num4
from (select id,
max(num1) over (order by id) as num1,
max(num2) over (order by id) as num2,
max(num3) over (order by id) as num3,
max(num4) over (order by id) as num4
from t
) t
group by t.num1, t.num2, t.num3, t.num4;
If max() doesn't work, then what you really want is lag( . . . ignore nulls). That is not yet available. Perhaps the simplest method is then correlated subqueries for each column:
select row_number() over (order by min(id)) as id,
t.num1, t.num2, t.num3, t.num4
from (select id,
(select t2.num1 from t t2 where t2.id <= t.id and t2.num1 <> 0 order by t2.id desc limit 1
) as num1,
(select t2.num2 from t t2 where t2.id <= t.id and t2.num2 <> 0 order by t2.id desc limit 1
) as num2,
(select t2.num3 from t t2 where t2.id <= t.id and t2.num3 <> 0 order by t2.id desc limit 1
) as num3,
(select t2.num4 from t t2 where t2.id <= t.id and t2.num4 <> 0 order by t2.id desc limit 1
) as num4
from t
) t
group by t.num1, t.num2, t.num3, t.num4;
This version would not be very efficient on even medium sized tables.
A more efficient version is more complicated:
select row_number() over (order by id) as id,
t1.num1, t2.num2, t3.num3, t4.num4
from (select min(id) as id,
from (select id,
max(case when num1 > 0 then id end) over (order by id) as num1_id,
max(case when num2 > 0 then id end) over (order by id) as num2_id,
max(case when num3 > 0 then id end) over (order by id) as num3_id,
max(case when num4 > 0 then id end) over (order by id) as num4_id
from t
) t
group by num1_id, num2_id, num3_id, num4_id
) t left join
t t1
on t1.id = t.num1_id left join
t t2
on t2.id = t.num2_id left join
t t3
on t3.id = t.num3_id left join
t t4
on t4.id = t.num4_id left join
group by t.num1, t.num2, t.num3, t.num4;
EDIT:
That was a little silly. There is an easier way using first_value() (which Postgres unfortunately does not support as an aggregation function):
select row_number() over (order by min(id)) as id,
num1, num2, num3, num4
from (select id,
first_value(num1) over (order by (case when num1 is not null then id en) nulls last
) as num1,
first_value(num2) over (order by (case when num2 is not null then id end) nulls last
) as num2,
first_value(num3) over (order by (case when num3 is not null then id end) nulls last
) as num3,
first_value(num4) over (order by (case when num4 is not null then id end) nulls last
) as num4
from t
) t
group by num1, num2, num3, num4;

SQL Server 2008 Group Based on a Sequence

I'm struggling to find if this is possible to use SQL Server 2008 to assign a sequence without having to use cursors. Let's say I have the following table which defines a driver's driving route going from one location to another (null means he is going from home):
RouteID SourceLocationID DestinationLocationID DriverID Created Updated
------- ---------------- --------------------- -------- ------- -------
1 NULL 219 1 10:20 10:23
2 219 266 1 10:21 10:24
3 266 NULL 1 10:22 10:25
4 NULL 54 2 10:23 10:26
5 54 NULL 2 10:24 10:27
6 NULL 300 1 10:25 10:28
7 300 NULL 1 10:26 10:29
I want to group the records between the rows where sourceLID is NULL and the destinationLID is null, so I get the following (generating a sequence number for each grouping set):
DriverID DestinationLocationID TripNumber
-------- --------------------- ----------
1 219 1 (his first trip)
1 266 1
1 300 2 (his second trip)
2 54 1
Is there a way I could use GROUP BY here rather than cursors?
a quick try:
with cte as
( select DestinationLocationID
, DriverID
, tripid = row_number()
over ( partition by driverid
order by DestinationLocationID)
from table1
where sourcelocationid is NULL
UNION ALL
select table1.DestinationLocationID
, table1.DriverID
, cte.tripid
from table1
join cte on table1.SourceLocationID=cte.DestinationLocationID
and table1.DriverID=cte.DriverID
where cte.DestinationLocationID is not null
)
select * from cte
Try this:
select driverid, destinationlocationid, count(destinationlocationid) from
(
select driverid, destinationlocationid from table1 where sourcelocationid is NULL
union all
select driverid, sourcelocationid from table1 where destinationlocationid is NULL
)A group by driverid, destinationlocationid
Try this,
Declare #t table(RouteID int, SourceLocationID int,DestinationLocationID int
,DriverID int,Created time, Updated time)
insert into #t
values(1, NULL, 219, 1, '10:20','10:23'),
(2 ,219,266, 1, '10:21','10:24'),
(3,266, NULL, 1, '10:22','10:25'),
(4, NULL, 54, 2, '10:23','10:26'),
(5,54, NULL, 2, '10:24','10:27'),
(6,NULL,300, 1, '10:25','10:28'),
(7,300,NULL, 1, '10:26','10:29')
;
WITH CTE
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY DriverID ORDER BY Created
) RN
FROM #t
)
,CTE1
AS (
SELECT *
,1 TripNumber
FROM CTE
WHERE RN = 1
UNION ALL
SELECT A.*
,CASE
WHEN A.SourceLocationID IS NULL
THEN B.TripNumber + 1
ELSE B.TripNumber
END
FROM CTE1 B
INNER JOIN CTE A ON B.DriverID = A.DriverID
WHERE A.RN > B.RN
)
SELECT DISTINCT DestinationLocationID
,DriverID
,TripNumber
FROM CTE1
WHERE DestinationLocationID IS NOT NULL
ORDER BY DriverID
Use a correlated sub-query to count previous trips, plus 1 to get this trip number.
select DriverID,
DestinationLocationID,
(select count(*) + 1
from routes t2
where t1.DriverID = t2.DriverID
and t1.RouteID > t2.RouteID
and DestinationLocationID IS NULL) as TripNumber
from routes t1
where DestinationLocationID IS NOT NULL
order by DriverID, DestinationLocationID;
Executes like this:
SQL>select DriverID,
SQL& DestinationLocationID,
SQL& (select count(*) + 1
SQL& from routes t2
SQL& where t1.DriverID = t2.DriverID
SQL& and t1.RouteID > t2.RouteID
SQL& and DestinationLocationID IS NULL) as TripNumber
SQL&from routes t1
SQL&where DestinationLocationID IS NOT NULL
SQL&order by DriverID, DestinationLocationID;
DriverID DestinationLocationID TripNumber
=========== ===================== ============
1 219 1
1 266 1
1 300 2
2 54 1
4 rows found

SELECT records until new value SQL

I have a table
Val | Number
08 | 1
09 | 1
10 | 1
11 | 3
12 | 0
13 | 1
14 | 1
15 | 1
I need to return the last values where Number = 1 (however many that may be) until Number changes, but do not need the first instances where Number = 1. Essentially I need to select back until Number changes to 0 (15, 14, 13)
Is there a proper way to do this in MSSQL?
Based on following:
I need to return the last values where Number = 1
Essentially I need to select back until Number changes to 0 (15, 14,
13)
Try (Fiddle demo ):
select val, number
from T
where val > (select max(val)
from T
where number<>1)
EDIT: to address all possible combinations (Fiddle demo 2)
;with cte1 as
(
select 1 id, max(val) maxOne
from T
where number=1
),
cte2 as
(
select 1 id, isnull(max(val),0) maxOther
from T
where val < (select maxOne from cte1) and number<>1
)
select val, number
from T cross join
(select maxOne, maxOther
from cte1 join cte2 on cte1.id = cte2.id
) X
where val>maxOther and val<=maxOne
I think you can use window functions, something like this:
with cte as (
-- generate two row_number to enumerate distinct groups
select
Val, Number,
row_number() over(partition by Number order by Val) as rn1,
row_number() over(order by Val) as rn2
from Table1
), cte2 as (
-- get groups with Number = 1 and last group
select
Val, Number,
rn2 - rn1 as rn1, max(rn2 - rn1) over() as rn2
from cte
where Number = 1
)
select Val, Number
from cte2
where rn1 = rn2
sql fiddle demo
DEMO: http://sqlfiddle.com/#!3/e7d54/23
DDL
create table T(val int identity(8,1), number int)
insert into T values
(1),(1),(1),(3),(0),(1),(1),(1),(0),(2)
DML
; WITH last_1 AS (
SELECT Max(val) As val
FROM t
WHERE number = 1
)
, last_non_1 AS (
SELECT Coalesce(Max(val), -937) As val
FROM t
WHERE EXISTS (
SELECT val
FROM last_1
WHERE last_1.val > t.val
)
AND number <> 1
)
SELECT t.val
, t.number
FROM t
CROSS
JOIN last_1
CROSS
JOIN last_non_1
WHERE t.val <= last_1.val
AND t.val > last_non_1.val
I know it's a little verbose but I've deliberately kept it that way to illustrate the methodolgy.
Find the highest val where number=1.
For all values where the val is less than the number found in step 1, find the largest val where the number<>1
Finally, find the rows that fall within the values we uncovered in steps 1 & 2.
select val, count (number) from
yourtable
group by val
having count(number) > 1
The having clause is the key here, giving you all the vals that have more than one value of 1.
This is a common approach for getting rows until some value changes. For your specific case use desc in proper spots.
Create sample table
select * into #tmp from
(select 1 as id, 'Alpha' as value union all
select 2 as id, 'Alpha' as value union all
select 3 as id, 'Alpha' as value union all
select 4 as id, 'Beta' as value union all
select 5 as id, 'Alpha' as value union all
select 6 as id, 'Gamma' as value union all
select 7 as id, 'Alpha' as value) t
Pull top rows until value changes:
with cte as (select * from #tmp t)
select * from
(select cte.*, ROW_NUMBER() over (order by id) rn from cte) OriginTable
inner join
(
select cte.*, ROW_NUMBER() over (order by id) rn from cte
where cte.value = (select top 1 cte.value from cte order by cte.id)
) OnlyFirstValueRecords
on OriginTable.rn = OnlyFirstValueRecords.rn and OriginTable.id = OnlyFirstValueRecords.id
On the left side we put an original table. On the right side we put only rows whose value is equal to the value in first line.
Records in both tables will be same until target value changes. After line #3 row numbers will get different IDs associated because of the offset and will never be joined with original table:
LEFT RIGHT
ID Value RN ID Value RN
1 Alpha 1 | 1 Alpha 1
2 Alpha 2 | 2 Alpha 2
3 Alpha 3 | 3 Alpha 3
----------------------- result set ends here
4 Beta 4 | 5 Alpha 4
5 Alpha 5 | 7 Alpha 5
6 Gamma 6 |
7 Alpha 7 |
The ID must be unique. Ordering by this ID must be same in both ROW_NUMBER() functions.