I have millions of rows of data that have similar values like this:
Id Reff Amount
1 a1 1000
2 a2 -1000
3 a3 -2500
4 a4 -1500
5 a5 1500
every data must have positive and negative values. the question is, how do I show only records that don't have a similar value? like a row Id 3. thanks for help
You can use not exists:
select t.*
from mytable t
where not exists (select 1 from mytable t1 where t1.amount = -1 * t.amount)
A left join antipattern would also get the job done:
select t.*
from mytable t
left join mytable t1 on t1.amount = -1 * t.amount
where t1.id is null
Demo on DB Fiddle:
Id | Reff | Amount
-: | :--- | -----:
3 | a3 | -2500
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE Test(
Id int
,Reff varchar(2)
,Amount int
);
INSERT INTO Test(Id,Reff,Amount) VALUES (1,'a1',1000);
INSERT INTO Test(Id,Reff,Amount) VALUES (2,'a2',-1000);
INSERT INTO Test(Id,Reff,Amount) VALUES (3,'a3',-2500);
INSERT INTO Test(Id,Reff,Amount) VALUES (4,'a4',-1500);
INSERT INTO Test(Id,Reff,Amount) VALUES (5,'a5',1500);
Query 1:
select t.*
from Test t
left join Test t1 on t1.amount =ABS(t.amount)
where t1.id is null
Results:
| Id | Reff | Amount |
|----|------|--------|
| 3 | a3 | -2500 |
Using a NOT EXISTS or a LEFT JOIN will work fine to find the amounts that don't have an opposite amount in the data.
But to really find the amounts that don't balance out with an Amount sorted by ID?
For such SQL puzzle it should be handled as a Gaps-And-Islands problem.
So the solution might appear a bit more complicated, but it's actually quite simple.
It first calculates a ranking per absolute value.
And based on that ranking it filters the last amount where the SUM per ranking isn't balanced out (not 0)
SELECT Id, Reff, Amount
FROM
(
SELECT *,
SUM(Amount) OVER (PARTITION BY Rnk) AS SumAmountByRank,
ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Id DESC) AS Rn
FROM
(
SELECT Id, Reff, Amount,
ROW_NUMBER() OVER (ORDER BY Id) - ROW_NUMBER() OVER (PARTITION BY ABS(Amount) ORDER BY Id) AS Rnk
FROM YourTable
) AS q1
) AS q2
WHERE SumAmountByRank != 0
AND Rn = 1
ORDER BY Id;
A test on rextester here
If the sequence doesn't matter, and just the balance matters?
Then the query can be simplified.
SELECT Id, Reff, Amount
FROM
(
SELECT Id, Reff, Amount,
SUM(Amount) OVER (PARTITION BY ABS(Amount)) AS SumByAbsAmount,
ROW_NUMBER() OVER (PARTITION BY ABS(Amount) ORDER BY Id DESC) AS Rn
FROM YourTable
) AS q
WHERE SumByAbsAmount != 0
AND Rn = 1
ORDER BY Id;
Related
I have the following script:
SELECT DISTINCT GIFT_ID, GIFT_DESG, SUM(GIFT_AMT)
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG
It will return something like this:
GIFT_ID GIFT_DESG SUM(GIFT_AMT)
1 A 25
1 B 500
1 C 75
2 A 100
2 B 200
2 C 300
...
My desired outcome is:
GIFT_ID GIFT_DESG SUM(GIFT_AMT)
1 B 500
2 C 300
How would I do that?
Possibly row_number() right? I think it's something with the summing of gift amounts by designation that is throwing me off.
Thank you.
if your DBMS support ROW_NUMBER window function you can try to make row number by GIFT_ID order by SUM(GIFT_AMT) then get rn = 1 row.
SELECT t1.GIFT_ID,t1.GIFT_DESG,t1.GIFT_AMT
FROM (
SELECT t1.*,ROW_NUMBER() OVER(PARTITION BY GIFT_ID ORDER BY GIFT_AMT DESC) rn
FROM (
SELECT GIFT_ID, GIFT_DESG, SUM(GIFT_AMT) GIFT_AMT
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG
) t1
) t1
where rn =1
Note
You already use GROUP BY the DISTINCT keyword is no sense, you can remove it from your query.
Here is a sample
CREATE TABLE T(
GIFT_ID int,
GIFT_DESG varchar(5),
GIFT_AMT int
);
insert into t values (1,'A' ,25);
insert into t values (1,'B' ,500);
insert into t values (1,'C' ,75);
insert into t values (2,'A' ,100);
insert into t values (2,'B' ,200);
insert into t values (2,'C' ,300);
Query 1:
SELECT t1.GIFT_ID,t1.GIFT_DESG,t1.GIFT_AMT
FROM (
SELECT t1.*,ROW_NUMBER() OVER(PARTITION BY GIFT_ID ORDER BY GIFT_AMT DESC) rn
FROM T t1
) t1
where rn =1
Results:
| GIFT_ID | GIFT_DESG | GIFT_AMT |
|---------|-----------|----------|
| 1 | B | 500 |
| 2 | C | 300 |
You can do this with no subquery:
SELECT TOP (1) WITH TIES GIFT_ID, GIFT_DESG, SUM(GIFT_AMT)
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG
ORDER BY ROW_NUMBER() OVER (PARTITION BY GIFT_ID ORDER BY SUM(GIFT_AMT) DESC);
You can do it also like this
WITH t as
SELECT GIFT_ID, GIFT_DESG, SUM(GIFT_AMT) AS GIFT_AMT
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG)
SELECT GIFT_ID,
max(GIFT_DESG) KEEP (DENSE_RANK LAST ORDER BY GIFT_AMT),
max(GIFT_AMT) GIFT_AMT
FROM T
GROUP BY GIFT_ID;
I have bunch of data out of which I'm showing ID, max date and it's corresponding values (user id, type, ...). Then I need to take MAX date for each ID, substract 30 days and show first date and it's corresponding values within this date period.
Example:
ID Date Name
1 01.05.2018 AAA
1 21.04.2018 CCC
1 05.04.2018 BBB
1 28.03.2018 AAA
expected:
ID max_date max_name previous_date previous_name
1 01.05.2018 AAA 05.04.2018 BBB
I have working solution using subselects, but as I have quite huge WHERE part, refresh takes ages.
SUBSELECT looks like that:
(SELECT MIN(N.name)
FROM t1 N
WHERE N.ID = T.ID
AND (N.date < MAX(T.date) AND N.date >= (MAX(T.date)-30))
AND (...)) AS PreviousName
How'd you write the select?
I'm using TSQL
Thanks
I can do this with 2 CTEs to build up the dates and names.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE t1 (ID int, theDate date, theName varchar(10)) ;
INSERT INTO t1 (ID, theDate, theName)
VALUES
( 1,'2018-05-01','AAA' )
, ( 1,'2018-04-21','CCC' )
, ( 1,'2018-04-05','BBB' )
, ( 1,'2018-03-27','AAA' )
, ( 2,'2018-05-02','AAA' )
, ( 2,'2018-05-21','CCC' )
, ( 2,'2018-03-03','BBB' )
, ( 2,'2018-01-20','AAA' )
;
Main Query:
;WITH cte1 AS (
SELECT t1.ID, t1.theDate, t1.theName
, DATEADD(day,-30,t1.theDate) AS dMinus30
, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.theDate DESC) AS rn
FROM t1
)
, cte2 AS (
SELECT c2.ID, c2.theDate, c2.theName
, ROW_NUMBER() OVER (PARTITION BY c2.ID ORDER BY c2.theDate) AS rn
, COUNT(*) OVER (PARTITION BY c2.ID) AS theCount
FROM cte1
INNER JOIN cte1 c2 ON cte1.ID = c2.ID
AND c2.theDate >= cte1.dMinus30
WHERE cte1.rn = 1
GROUP BY c2.ID, c2.theDate, c2.theName
)
SELECT cte1.ID, cte1.theDate AS max_date, cte1.theName AS max_name
, cte2.theDate AS previous_date, cte2.theName AS previous_name
, cte2.theCount
FROM cte1
INNER JOIN cte2 ON cte1.ID = cte2.ID
AND cte2.rn=1
WHERE cte1.rn = 1
Results:
| ID | max_date | max_name | previous_date | previous_name |
|----|------------|----------|---------------|---------------|
| 1 | 2018-05-01 | AAA | 2018-04-05 | BBB |
| 2 | 2018-05-21 | CCC | 2018-05-02 | AAA |
cte1 builds the list of max_date and max_name grouped by the ID and then using a ROW_NUMBER() window function to sort the groups by the dates to get the most recent date. cte2 joins back to this list to get all dates within the last 30 days of cte1's max date. Then it does essentially the same thing to get the last date. Then the outer query joins those two results together to get the columns needed while only selecting the most and least recent rows from each respectively.
I'm not sure how well it will scale with your data, but using the CTEs should optimize pretty well.
EDIT: For the additional requirement, I just added in another COUNT() window function to cte2.
I would do:
select id,
max(case when seqnum = 1 then date end) as max_date,
max(case when seqnum = 1 then name end) as max_name,
max(case when seqnum = 2 then date end) as prev_date,
max(case when seqnum = 2 then name end) as prev_name,
from (select e.*, row_number() over (partition by id order by date desc) as seqnum
from example e
) e
group by id;
Query
Declare #table1 TABLE (accountno varchar(max), saved_amount decimal)
INSERT INTO #table1 VALUES
('001',25),
('002',5)
Declare #table2 TABLE (accountno varchar(max), payamount decimal,ilno int)
INSERT INTO #table2 VALUES
('001',10,1),
('001',10,2),
('001',10,3),
('001',10,4),
('002',10,1),
('002',10,2);
WITH aa
AS (
SELECT a.*
,b.ilno
,b.payamount
,SUM(payamount) OVER (
PARTITION BY a.accountno ORDER BY CAST(a.accountno AS INT)
,ilno
) AS total_amount
FROM #table1 a
LEFT JOIN #table2 b ON a.accountno = b.accountno
)
,bb
AS (
SELECT accountno
,MAX(ilno) AS ilno
FROM aa
WHERE saved_amount >= total_amount
GROUP BY accountno
)
SELECT a.* FROM aa a INNER JOIN bb b on a.accountno =b.accountno AND a.ilno = b.ilno
Result
accountno | saved_amount | ilno | payamount | total_amount
----------------------------------------------------------
001 | 25 | 2 | 10 | 20
Expected Result
accountno | saved_amount | ilno | payamount | total_amount
----------------------------------------------------------
001 | 25 | 2 | 10 | 20
002 | 5 | 1 | 10 | 10
What I want is
If saved_amount is less than the first ilno, then get the first ilno else
get the highest ilno where saved_amount>=total_amount
You have a running total that you compare with the saved amount. You want the highest running total that doesn't exceed the saved amount. But in case even the initial pay amount exceeds the saved amount already, you want to default to this record. So the main task is to find a way of ranking the records. In my query I do it like this:
Prefer records where the running total does not exceed the saved amount.
Then look at the abolute of their difference and take the smallest.
There are certainly other ways that achieve the same. Maybe even methods that you find more readable. Then just adjust the order by clause in the ranking query.
with summed as
(
select
t1.*,
from #table1 t1
join
(
select
ilno,
payamount,
sum(payamount) over (partition by accountno order by ilno) as total_amount
from #table2
) on t2.accountno = t1.accountno
)
, ranked as
(
select summed.*,
row_number() over (partition by accountno
order by case when saved_amount >= total_amount then 1 else 2 end,
abs(saved_amount - total_amount)
) as rn
)
select *
from ranked
where rn = 1;
This is not the "nearest sum", as you said in the title, but the one that obeys the specified rules. So with a saved amount of 100 and paid amounts of first 1 and then 100, you'd get the record with a total of 1 (which is 99 less than the saved amount) and not the one with a total of 101 (which is only 1 more than the saved amount).
Other way to solve using flags:
first calculated one flag to point if saved_amount >= payamount for current row
calculated three more flags:
group_flag to show is there a case where saved_amount >= payamount for the given accountno
[min_ilno] and [max_ilno] for given account
Having this flags, the final result set is calculated easily. Here is the code:
WITH DataSource AS
(
SELECT a.*
,b.ilno
,b.payamount
,SUM(payamount) OVER (PARTITION BY a.accountno ORDER BY ilno) AS total_amount
,IIF(a.saved_amount >= SUM(payamount) OVER (PARTITION BY a.accountno ORDER BY ilno), 1, 0) AS [flag]
FROM #table1 a
LEFT JOIN #table2 b
ON a.accountno = b.accountno
),
DataSourceFinal AS
(
SELECT *
,MAX(flag) OVER (PARTITION BY accountno) as [group_flag]
,MIN(IIF(flag = 0 ,ilno, NULL)) OVER (PARTITION BY accountno) as [min_ilno]
,MAX(IIF(flag = 1 ,ilno, NULL)) OVER (PARTITION BY accountno) as [max_ilno]
FROM DataSource
)
SELECT accountno, saved_amount, ilno, payamount, total_amount
FROM DataSourceFinal
WHERE ([group_flag] = 1 AND [ilno] = [max_ilno])
OR ([group_flag] = 0 AND [ilno] = [min_ilno]);
and the output:
I have Table1 with three columns:
Key | Date | Price
----------------------
1 | 26-May | 2
1 | 25-May | 2
1 | 24-May | 2
1 | 23 May | 3
1 | 22 May | 4
2 | 26-May | 2
2 | 25-May | 2
2 | 24-May | 2
2 | 23 May | 3
2 | 22 May | 4
I want to select the row where value 2 was last updated (24-May). The Date was sorted using RANK function.
I am not able to get the desired results. Any help will be appreciated.
SELECT *
FROM (SELECT key, DATE, price,
RANK() over (partition BY key order by DATE DESC) AS r2
FROM Table1 ORDER BY DATE DESC) temp;
Another way of looking at the problem is that you want to find the most recent record with a price different from the last price. Then you want the next record.
with lastprice as (
select t.*
from (select t.*
from table1 t
order by date desc
) t
where rownum = 1
)
select t.*
from (select t.*
from table1 t
where date > (select max(date)
from table1 t2
where t2.price <> (select price from lastprice)
)
order by date asc
) t
where rownum = 1;
This query looks complicated. But, it is structured so it can take advantage of indexes on table1(date). The subqueries are necessary in Oracle pre-12. In the most recent version, you can use fetch first 1 row only.
EDIT:
Another solution is to use lag() and find the most recent time when the value changed:
select t1.*
from (select t1.*
from (select t1.*,
lag(price) over (order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where rownum = 1;
Under many circumstances, I would expect the first version to have better performance, because the only heavy work is done in the innermost subquery to get the max(date). This verson has to calculate the lag() as well as doing the order by. However, if performance is an issue, you should test on your data in your environment.
EDIT II:
My best guess is that you want this per key. Your original question says nothing about key, but:
select t1.*
from (select t1.*,
row_number() over (partition by key order by date desc) as seqnum
from (select t1.*,
lag(price) over (partition by key order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where seqnum = 1;
You can try this:-
SELECT Date FROM Table1
WHERE Price = 2
AND PrimaryKey = (SELECT MAX(PrimaryKey) FROM Table1
WHERE Price = 2)
This is very similar to the second option by Gordon Linoff but introduces a second windowed function row_number() to locate the most recent row that changed the price. This will work for all or a range of keys.
select
*
from (
select
*
, row_number() over(partition by Key order by [date] DESC) rn
from (
select
*
, NVL(lag(Price) over(partition by Key order by [date] DESC),0) prevPrice
from table1
where Key IN (1,2,3,4,5) -- as an example
)
where Price <> prevPrice
)
where rn = 1
apologies but I haven't been able to test this at all.
I have spent quite some time dealing with the following:
Imagine that you have N number of groups with multiple records each and every record has unique starting and ending points.
In other words:
ID|GroupName|StartingPoint|EndingPoint|seq(row_number)|desired_seq
__|_________|_____________|___________|_______________|____________
1 | Grp1 |2014-01-06 |2014-01-07 |1 |1
__|_________|_____________|___________|_______________|____________
2 | Grp1 |2014-01-07 | 2014-01-08|2 |2
__|_________|_____________|___________|_______________|____________
3 | Grp2 |2014-01-08 | 2014-01-09|1 |1
__|_________|_____________|___________|_______________|____________
4 | Grp1 |2014-01-09 | 2014-01-10|3 |1
__|_________|_____________|___________|_______________|____________
5 | Grp2 |2014-01-10 | 2014-01-11|2 |1
__|_________|_____________|___________|_______________|____________
As you can see, the starting point for every consecutive record is the same as the ending point of the previous.
Basically, I would like to obtain the minimumS and maximumS for each group based on the dates. Once a record with new group name appears, then consider it as a new group and reset the sequencing.
Single row_number() function is not sufficient enough for this task since it doesnt reflect the change in the group names.(I have included a seq column in the sample data which represents the values generated by row number)
Desired result based on the sample data:
1 Grp1 |2014-01-06 | 2014-01-08
2 Grp2 |2014-01-08 | 2014-01-09
3 Grp1 |2014-01-09 | 2014-01-10
4 Grp2 |2014-01-10 | 2014-01-11
What I have tried:
;with cte as(
select *
, row_number() over (partition by GroupName order by startingpoint) as seq
from table1
)
select *
into #temp2
from cte t1
left join cte t2 on t1.id=t2.id and t1.seq= t2.seq-1
select *
,(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MIN(seq) from #temp2) as Oldest
(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MAX(seq) from #temp2) as MostRecent
from #temp2 t1
This is a gaps-and-islands problem with subgrouping. The trick is grouping by the difference between two ROW_NUMBER() values, one partitioned and one unpartitioned.
WITH t AS (
SELECT
GroupName,
StartingPoint,
EndingPoint,
ROW_NUMBER() OVER(PARTITION BY GroupName ORDER BY StartingPoint)
- ROW_NUMBER() OVER(ORDER BY StartingPoint) AS SubGroupId
FROM #test
)
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(StartingPoint)) AS SortOrderId,
GroupName AS GroupName,
MIN(StartingPoint) AS GroupStartingPoint,
MAX(EndingPoint) AS GroupEndingPoint
FROM t
GROUP BY GroupName, SubGroupId
ORDER BY SortOrderId
This is so much easier with the lag() functionality in SQL Server 2012. The way I approach these problems is to find where groups start, assigning a flag of 1 or 0 to each row. Then take a cumulative sum of the 1s to get a new group id.
In SQL Server 2008, you can do this with correlated subqueries (or joins):
with table1_flag as (
select t1.*,
isnull((select top 1 1
from table1 t2
where t2.groupname = t1.groupname and
t2.endingpoint = t1.startingpoint
), 0) as groupstartflag
from table1 t1
),
table1_flag_cum as (
select tf.*,
(select sum(groupstartflag)
from table1_flag tf2
where tf2.groupname = tf.groupname and
tf2.startingpoint <= tf.startingpoint
) as groupnum
from table1_flag tf
)
select groupnum, groupname,
min(startingpoint) as startingpoint, max(endingpoint) as endingpoint
from table1_flag_cum
group by groupnum, groupname;
Not sure, but maybe:
SELECT DISTINCT
GroupName,
MIN(StartingPoint) OVER (PARTITION BY GroupName ORDER BY Id),
MAX(EndingPoint) OVER (PARTITION BY GroupName ORDER BY Id)
FROM table1
Because partition does not lead to the reduction of number of rows there will be originally duplicated entries, which are removed with distinct.