Remove matching pairs of rows from query - sql

I am trying to produce a report from an SQL database.
The data is transactions, sometimes because of operator error incorrect records are entered, latter to correct for this the same record is entered but with a negative quantity.
i.e.
ID, DESC , QTY
0 , ITEM1 , 2
1 , ITEM2 , 1
2 , ITEM3 , 2 // This record and
3 , ITEM2 , 1
4 , ITEM3 , -2 // this record cancel out
I would like to have a query that looks at pairs of rows that are identical besides the ID and have an opposite sign on the QTY and does not include them in the result.
Similar to the below.
ID, DESC , QTY
0 , ITEM1 , 2
1 , ITEM2 , 1
3 , ITEM4 , 1
What is the easiest way I can achieve this in a query. I was thinking along the lines of an aggregate SUM function, but I only wanted to remove rows with a QTY of opposite sign but equal magnitude.

This is rather painful. The immediate answer to your question is not exists. However, you need to be careful about duplicates, so I would recommend enumerating the values first:
with t as (
select t.*,
row_number() over (partition by desc, qty order by id) as seqnum
from transactions t
)
select t.*
from t
where not exists (select 1
from t t2
where t2.desc = t.desc and
t2.seqnum = t.seqnum and
t2.qty = - t.qty
);

You could use the left join antipattern to evict records for which another record exists with the same desc and an opposite qty.
select t.*
from mytable t
left join mytable t1 on t1.desc = t.desc and t1.qty = - t.qty
where t1.id is null
Or a not exists condition with a correlated subquery:
select t.*
from mytable t
where not exists (
select 1
from mytable t1
where t1.desc = t.desc and t1.qty = - t.qty
)

Related

SQL Partition by with conditions

I want to partition the data on the basis of two columns Type and Env and fetch the top 5 records for each partition order by count desc. The problem that I'm facing is that I need to partition the Env on the basis of LIKE condition.
Data -
Type
Environment
Count
T1
E1
1
T1
M1
2
T1
AB1
3
T2
E1
1
T2
M1
2
T2
CB1
3
T2
M1
5
The result that I want - Let's say I'm fetching top (1) record for now
Type
Environment
Count
T1
M1
2
T1
AB1
3
T2
CB1
3
T2
M1
5
Here I'm dividing the env on condition (env LIKE "%M%" and env NOT LIKE "%M")
One approach that I can think of is using partition and union but this is a very expensive call due to the large amount of data that I'm filtering from. Is there a better way to achieve this?
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Count DESC) AS maxCount
FROM
table
WHERE
Env LIKE '%M%'
) AS t1
WHERE
t1.maxCount <= 5
UNION
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Count DESC) AS maxCount
FROM
table
WHERE
Env NOT LIKE '%M%'
) AS t1
WHERE
t1.maxCount <= 5
You would seem to want an additional partition by in your row_number():
select t.*
from (select t.*,
row_number() over (partition by type, case when environment like '%M%' then 1 else 2 end)
order by count desc
) as seqnum
from t
) t
where seqnum <= 5;

SQL exclude rows based on value in another row

I am trying to exclude rows where a value exists in another row.
select * from TABLE1
ROW SEQ VALUE
1 1 HIGH
1 2 HIGH
1 3 LOW
1 4 HIGH
2 1 MED
2 2 HIGH
2 3 HIGH
2 4 LOW
2 5 HIGH
2 6 HIGH
All the data is coming from the same table what I am trying to do is exclude the rows where VALUE = 'LOW' and all previous rows where SEQ <= the row with the value = 'LOW'. This is my desired result:
ROW SEQ VALUE
1 4 HIGH
2 5 HIGH
2 6 HIGH
Here's work in progress but it's only excluding the one row
select * from TABLE1
where not exists(select VALUE from TABLE1
where ROW = ROW and VALUE = 'LOW' and SEQ <= SEQ)
I need to write it into the where cause as the select is hard coded. I am lost any help would be greatly appreciated. Thanks in advance!
select *
from table1
left outer join (
select row, max(seq) as seq
from table1
where value = 'low'
group by row
) lows on lows.row = table1.row
where lows.row is null
or table1.seq > lows.seq
You should be aliasing the tables. I'm surprised you are getting any results from this query as you don't have aliases at all.
select *
from TABLE1 As t0
where not exists(
select VALUE
from TABLE1 As t1
where t0.ROW = t1.ROW
and t1.VALUE = 'LOW'
and t0.SEQ <= t1.SEQ
)
You can use a window function with a cumulative approach :
select t.*
from (select t.*, sum(case when value = 'LOW' then 1 else 0 end) over (partition by row order by seq) as cnt
from table t
) t
where cnt = 1 and value <> 'LOW';
For the results you mention, you seem to want the rows after the last "low". One method is:
select t1.*
from table1 t1
where t1.seq > (select max(t2.seq) from table1 tt1 where tt1.row = t1.row and tt1.value = 'LOW');
(Note: This requires a "low" row. If there could be no "low" rows and you want all rows returned, that is easily added to the query.)
Or, similarly, using not exists:
select t1.*
from table1 t1
where not exists (select 1
from table1 tt1
where tt1.row = t1.row and
tt1.seq > t.seq and
tt1.value = 'LOW'
);
This might be the most direct translation of your question.
However, I would more likely use window functions:
select t1.*
from (select t1.*,
max(case when t1.value = 'low' then seqnum end) over (partition by row) as max_low_seqnum
from table1 t1
) t1
where seqnum > max_low_seqnum;
You might want to add or max_low_seqnum is null to return all rows if there are no "low" rows.

SQL Get rows based on conditions

I'm currently having trouble writing the business logic to get rows from a table with id's and a flag which I have appended to it.
For example,
id: id seq num: flag: Date:
A 1 N ..
A 2 N ..
A 3 N
A 4 Y
B 1 N
B 2 Y
B 3 N
C 1 N
C 2 N
The end result I'm trying to achieve is that:
For each unique ID I just want to retrieve one row with the condition for that row being that
If the flag was a "Y" then return that row.
Else return the last "N" row.
Another thing to note is that the 'Y' flag is not always necessarily the last
I've been trying to get a case condition using a partition like
OVER (PARTITION BY A."ID" ORDER BY A."Seq num") but so far no luck.
-- EDIT:
From the table, the sample result would be:
id: id seq num: flag: date:
A 4 Y ..
B 2 Y ..
C 2 N ..
Using a window clause is the right idea. You should partition the results by the ID (as you've done), and order them so the Y flag rows come first, then all the N flag rows in descending date order, and pick the first for each id:
SELECT id, id_seq_num, flag, date
FROM (SELECT id, id_seq_num, flag, date,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY CASE flag WHEN 'Y' THEN 0
ELSE 1
END ASC,
date ASC) AS rk
FROM mytable) t
WHERE rk = 1
My approach is to take a UNION of two queries. The first query simply selects all Yes records, assuming that Yes only appears once per ID group. The second query targets only those ID having no Yes anywhere. For those records, we use the row number to select the most recent No record.
WITH cte1 AS (
SELECT id
FROM yourTable
GROUP BY id
HAVING SUM(CASE WHEN flag = 'Y' THEN 1 ELSE 0 END) = 0
),
cte2 AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t1."id seq" DESC) rn
FROM yourTable t1
INNER JOIN cte1 t2
ON t1.id = t2.id
)
SELECT *
FROM yourTable
WHERE flag = 'Y'
UNION ALL
SELECT *
FROM cte2 t2
WHERE t2.rn = 1
Here's one way (with quite generic SQL):
select t1.*
from Table1 as t1
where t1.id_seq_num = COALESCE(
(select max(id_seq_num) from Table1 as T2 where t1.id = t2.id and t2.flag = 'Y') ,
(select max(id_seq_num) from Table1 as T3 where t1.id = t3.id and t3.flag = 'N') )
Available in a fiddle here: http://sqlfiddle.com/#!9/5f7f9/6
SELECT DISTINCT id, flag
FROM yourTable

Duplicate Counts - TSQL

I want to get All records that has duplicate values for SOME of the fields (i.e. Key columns).
My code:
CREATE TABLE #TEMP (ID int, Descp varchar(5), Extra varchar(6))
INSERT INTO #Temp
SELECT 1,'One','Extra1'
UNION ALL
SELECT 2,'Two','Extra2'
UNION ALL
SELECT 3,'Three','Extra3'
UNION ALL
SELECT 1,'One','Extra4'
SELECT ID, Descp, Extra FROM #TEMP
;WITH Temp_CTE AS
(SELECT *
, ROW_NUMBER() OVER (PARTITION BY ID, Descp ORDER BY (SELECT 0))
AS DuplicateRowNumber
FROM #TEMP
)
SELECT * FROM Temp_cte
DROP TABLE #TEMP
The last column tells me how many times each row has appeared based on ID and Descp values.
I want that row but I ALSO need another column* that indicates both rows for ID = 1 and Descp = 'One' has showed up more than once.
So an extra column* (i.e. MultipleOccurances (bool)) which has 1 for two rows with ID = 1 and Descp = 'One' and 0 for other rows as they are only showing up once.
How can I achieve that? (I want to avoid using Count(1)>1 or something if possible.
Edit:
Desired output:
ID Descp Extra DuplicateRowNumber IsMultiple
1 One Extra1 1 1
1 One Extra4 2 1
2 Two Extra2 1 0
3 Three Extra3 1 0
SQL Fiddle
You say "I want to avoid using Count" but it is probably the best way. It uses the partitioning you already have on the row_number
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID, Descp
ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE
WHEN COUNT(*) OVER (PARTITION BY ID, Descp) > 1 THEN 1
ELSE 0
END AS IsMultiple
FROM #Temp
And the execution plan just shows a single sort
Well, I have this solution, but using a Count...
SELECT T1.*,
ROW_NUMBER() OVER (PARTITION BY T1.ID, T1.Descp ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE WHEN T2.C = 1 THEN 0 ELSE 1 END MultipleOcurrences FROM #temp T1
INNER JOIN
(SELECT ID, Descp, COUNT(1) C FROM #TEMP GROUP BY ID, Descp) T2
ON T1.ID = T2.ID AND T1.Descp = T2.Descp

Is there something equivalent to putting an order by clause in a derived table?

This is sybase 15.
Here's my problem.
I have 2 tables.
t1.jobid t1.date
------------------------------
1 1/1/2012
2 4/1/2012
3 2/1/2012
4 3/1/2012
t2.jobid t2.userid t2.status
-----------------------------------------------
1 100 1
1 110 1
1 120 2
1 130 1
2 100 1
2 130 2
3 100 1
3 110 1
3 120 1
3 130 1
4 110 2
4 120 2
I want to find all the people who's status for THEIR two most recent jobs is 2.
My plan was to take the top 2 of a derived table that joined t1 and t2 and was ordered by date backwards for a given user. So the top two would be the most recent for a given user.
So that would give me that individuals most recent job numbers. Not everybody is in every job.
Then I was going to make an outer query that joined against the derived table searching for status 2's with a having a sum(status) = 4 or something like that. That would find the people with 2 status 2s.
But sybase won't let me use an order by clause in the derived table.
Any suggestions on how to go about this?
I can always write a little program to loop through all the users, but I was gonna try to make one horrendus sql out of it.
Juicy one, no?
You could rank the rows in the subquery by adding an extra column using a window function. Then select the rows that have the appropriate ranks within their groups.
I've never used Sybase, but the documentation seems to indicate that this is possible.
With Table1 As
(
Select 1 As jobid, '1/1/2012' As [date]
Union All Select 2, '4/1/2012'
Union All Select 3, '2/1/2012'
Union All Select 4, '3/1/2012'
)
, Table2 As
(
Select 1 jobid, 100 As userid, 1 as status
Union All Select 1,110,1
Union All Select 1,120,2
Union All Select 1,130,1
Union All Select 2,100,1
Union All Select 2,130,2
Union All Select 3,100,1
Union All Select 3,110,1
Union All Select 3,120,1
Union All Select 3,130,1
Union All Select 4,110,2
Union All Select 4,120,2
)
, MostRecentJobs As
(
Select T1.jobid, T1.date, T2.userid, T2.status
, Row_Number() Over ( Partition By T2.userid Order By T1.date Desc ) As JobCnt
From Table1 As T1
Join Table2 As T2
On T2.jobid = T1.jobid
)
Select *
From MostRecentJobs As M2
Where Not Exists (
Select 1
From MostRecentJobs As M1
Where M1.userid = M2.userid
And M1.JobCnt <= 2
And M1.status <> 2
)
And M2.JobCnt <= 2
I'm using a number of features here which do exist in Sybase 15. First, I'm using common-table expressions both for my sample data and clump my queries together. Second, I'm using the ranking function Row_Number to order the jobs by date.
It should be noted that in the example data you gave, no user satisfies the requirement of having their two most recent jobs both be of status "2".
__
Edit
If you are using a version of Sybase that does not support ranking functions (e.g. Sybase 15 prior to 15.2), then you need simulate the ranking function using Counts.
Create Table #JobRnks
(
jobid int not null
, userid int not null
, status int not null
, [date] datetime not null
, JobCnt int not null
, Primary Key ( jobid, userid, [date] )
)
Insert #JobRnks( jobid, userid, status, [date], JobCnt )
Select T1.jobid, T1.userid, T1.status, T1.[date], Count(T2.jobid)+ 1 As JobCnt
From (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T1
Left Join (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T2
On T2.userid = T1.userid
And T2.[date] < T1.[date]
Group By T1.jobid, T1.userid, T1.status, T1.[date]
Select *
From #JobRnks As J1
Where Not Exists (
Select 1
From #JobRnks As J2
Where J2.userid = J1.userid
And J2.JobCnt <= 2
And J2.status <> 2
)
And J1.JobCnt <= 2
The reason for using the temp table here is for performance and ease of reading. Technically, you could plug in the query for the temp table into the two places used as a derived table and achieve the same result.