LAG / OVER / PARTITION / ORDER BY using conditions - SQL Server 2017 - sql

I have a table that looks like this:
Date AccountID Amount
2018-01-01 123 12
2018-01-06 123 150
2018-02-14 123 11
2018-05-06 123 16
2018-05-16 123 200
2018-06-01 123 18
2018-06-15 123 17
2018-06-18 123 110
2018-06-30 123 23
2018-07-01 123 45
2018-07-12 123 116
2018-07-18 123 60
This table has multiple dates and IDs, along with multiple Amounts. For each individual row, I want grab the last Date where Amount was over a specific value for that specific AccountID. I have been trying to use the LAG( Date, 1 ) in combination with several variatons of CASE and OVER ( PARTITION BY AccountID ORDER BY Date ) statements but I've had no luck. Ultimately, this is what I would like my SELECT statement to return.
Date AccountID Amount LastOverHundred
2018-01-01 123 12 NULL
2018-01-06 123 150 2018-01-06
2018-02-14 123 11 2018-01-06
2018-05-06 123 16 2018-01-06
2018-05-16 123 200 2018-05-16
2018-06-01 123 18 2018-05-16
2018-06-15 123 17 2018-05-16
2018-06-18 123 110 2018-06-18
2018-06-30 123 23 2018-06-18
2018-07-01 123 45 2018-06-18
2018-07-12 123 116 2018-07-12
2018-07-18 123 60 2018-07-12
Any help with this would be greatly appreciated.

Use a cumulative conditional max():
select t.*,
max(case when amount > 100 then date end) over (partition by accountid order by date) as lastoverhundred
from t;

Related

MSSQL server SQL query how to utilize "group by"

Table Data
My_table_data.
testdate plant value
2020-05-14 02:00:00.000 1 68
2020-05-14 01:00:00.000 3 115
2020-05-14 02:00:00.000 2 107
2020-05-14 03:00:00.000 2 102
2020-05-14 05:00:00.000 2 104
2020-05-14 06:00:00.000 2 111
2020-05-14 08:00:00.000 1 74
2020-05-14 08:00:00.000 2 114
2020-05-14 09:00:00.000 1 70
2020-05-14 09:00:00.000 2 114
2020-05-14 03:00:00.000 3 106
2020-05-14 03:00:00.000 3 102
2020-05-15 02:00:00.000 2 108
2020-05-14 05:00:00.000 1 74
2020-05-14 04:00:00.000 3 96
2020-05-14 04:00:00.000 3 97
2020-05-14 06:00:00.000 1 80
2020-05-14 03:00:00.000 1 77
2020-05-14 06:00:00.000 3 102
I am novice at complex SQL query phrase. How to achieve the result like below.
Conditions
1. Latest data on timestamp(Hour)
2. single row for each plant.
WITH data AS(
SELECT testdate, plant, value, (DATEPART(HOUR,testdate))AS dH FROM My_table_data
WHERE testdate >= (SELECT CONVERT(DATETIME, CONVERT(DATE, GETDATE()), 120))
)
SELECT data.testdate, data.plant, data.value from data
WHERE data.dH= (SELECT MAX(data.dH) FROM data)
GROUP BY data.plant
ORDER BY data.plant DESC;
It gives error!!
testdate Plant value
2020-05-14 09:00:00.000 1 70
2020-05-14 09:00:00.000 2 114
2020-05-14 06:00:00.000 3 102
I think this should be as simple as
with maxData
AS
(
SELECT MAX(testDate) as testDate, plantId
FROM My_table_data
GROUP BY plantId
)
select
t.testDate,
t.plantId,
t.value
from My_table_data t
INNER JOIN maxData m
on t.testDate = m.testDate
and t.plantId = m.plantId
order by plantId
WHat this does is uses a CTE to find the max date for every plantId and then joins that info back to the original table to find the value corresponding to that plantId and date.

Wrong results with group by for distinct count

I have these two queries for calculating a distinct count from a table for a particular date range. In my first query I group by location, aRID ( which is a rule ) and date. In my second query I don't group by a date.
I am expecting the same distinct count in both the results but I get total count as 6147 in first result and 6359 in second result. What is wrong here? The difference is group by..
select
r.loc
,cast(r.date as DATE) as dateCol
,count(distinct r.dC) as dC_count
from table r
where r.date between '01-01-2018' and '06-02-2018'
and r.loc = 1
group by r.loc, r.aRId, cast(r.date as DATE)
select
r.loc
,count(distinct r.DC) as dC_count
from table r
and r.date between '01-01-2018' and '06-02-2018'
and r.loc = 1
group by r.loc, r.aRId
loc dateCol dC_count
1 2018-01-22 1
1 2018-03-09 2
1 2018-01-28 3
1 2018-01-05 1
1 2018-05-28 143
1 2018-02-17 1
1 2018-05-08 187
1 2018-05-31 146
1 2018-01-02 3
1 2018-02-14 1
1 2018-05-11 273
1 2018-01-14 1
1 2018-03-18 2
1 2018-02-03 1
1 2018-05-20 200
1 2018-05-14 230
1 2018-01-11 5
1 2018-01-31 1
1 2018-05-17 209
1 2018-01-20 2
1 2018-03-01 1
1 2018-01-03 3
1 2018-05-06 253
1 2018-05-26 187
1 2018-03-24 1
1 2018-02-09 1
1 2018-03-04 1
1 2018-05-03 269
1 2018-05-23 187
1 2018-05-29 133
1 2018-03-21 1
1 2018-03-27 1
1 2018-05-15 202
1 2018-03-07 1
1 2018-06-01 155
1 2018-02-21 1
1 2018-01-26 2
1 2018-02-15 2
1 2018-05-12 331
1 2018-03-10 1
1 2018-01-09 3
1 2018-02-18 1
1 2018-03-13 2
1 2018-05-09 184
1 2018-01-12 2
1 2018-03-16 1
1 2018-05-18 198
1 2018-02-07 1
1 2018-02-01 1
1 2018-01-15 3
1 2018-02-24 4
1 2018-03-19 1
1 2018-05-21 161
1 2018-02-10 1
1 2018-05-04 250
1 2018-05-30 148
1 2018-05-24 153
1 2018-01-24 1
1 2018-05-10 199
1 2018-03-08 1
1 2018-01-21 1
1 2018-05-27 151
1 2018-01-04 3
1 2018-05-07 236
1 2018-03-25 1
1 2018-03-11 2
1 2018-01-10 1
1 2018-01-30 1
1 2018-03-14 1
1 2018-02-19 1
1 2018-05-16 192
1 2018-01-13 5
1 2018-01-07 1
1 2018-03-17 3
1 2018-01-27 2
1 2018-02-22 1
1 2018-05-13 200
1 2018-02-08 2
1 2018-01-16 2
1 2018-03-03 1
1 2018-05-02 217
1 2018-05-22 163
1 2018-03-20 1
1 2018-02-05 2
1 2018-02-11 1
1 2018-01-19 2
1 2018-02-28 1
1 2018-05-05 332
1 2018-05-25 211
1 2018-03-23 1
1 2018-05-19 219
loc dC_count
1 6359
From "COUNT (Transact-SQL)"
COUNT(DISTINCT expression) evaluates expression for each row in a group, and returns the number of unique, nonnull values.
The distinct is relative to the group, not to the whole table (or selected subset). I think this might be your misconception here.
To better understand what this means, take the following simplified example:
CREATE TABLE group_test
(a varchar(1),
b varchar(1),
c varchar(1));
INSERT INTO group_test
(a,
b,
c)
VALUES ('a',
'r',
'x'),
('a',
's',
'x'),
('b',
'r',
'x'),
('b',
's',
'y');
If we GROUP BY a and select count(DISTINCT c)
SELECT a,
count(DISTINCT c) #
FROM group_test
GROUP BY a;
we get
a | #
----|----
a | 1
b | 2
As there is only c='x' for a=1, there is only a distinct count of 1 for this group but 2 for the other group as it has 'x'and 'y' in c. The sum of counts is 3 here.
Now if we GROUP BY a, b
SELECT a,
b,
count(DISTINCT c) #
FROM group_test
GROUP BY a,
b;
we get
a | b | #
----|----|----
a | r | 1
a | s | 1
b | r | 1
b | s | 1
We get 1 for every count here as each value of c is the only one in the group. And all of a sudden the sum of counts is 4.
And if we get the distinct count of c for the whole table
SELECT count(DISTINCT c) #
FROM group_test;
we get
#
----
2
which sums up to 2.
The sum of the counts is different in each case but right none the less.
The more groups there are, the higher the chance for a value to be unique within that group. So your results seem totally plausible.
db<>fiddle

grouping a table with different dates

I have a table like below,
SalesId ItemId DateSale USDVal
ABC 01A 2018-04-01 52
ABC 01B 2018-04-01 300
ABC 01C 2018-04-01 12
ABC 01D 2018-04-01 62
ABC 01A 2018-03-23 66
MNB 01A 2018-01-01 584
MNB 01A 2018-02-20 320
MNB 01F 2018-02-20 5
I want to write a query that selects the last date for each SalesId and shows those records so the result would look something like below.
Result
SalesId ItemId DateSale USDVal
ABC 01A 2018-04-01 52
ABC 01B 2018-04-01 300
ABC 01C 2018-04-01 12
ABC 01D 2018-04-01 62
MNB 01A 2018-02-20 320
MNB 01F 2018-02-20 5
In SQL Server, the fastest way is often a correlated subquery:
select t.*
from t
where t.datesale = (select max(t2.datesale) from t t2 where t2.salesid = t.salesid);

How to add status to the table

I have the following table where is clipping from my db. I have 2 types of contracts.
I: client pays for first 6mth 60$, next 6mth 120$ (111 client)
II: client pays for first 6mth 60$ but if want still pays 60$ the contract will be extended at 6mth, whole contract is 18mth. (321 client who still pays)
ID_Client | Amount | Amount_charge | Lenght | Date_from | Date_to | Reverse
--------------------------------------------------------------------------------
111 60 60 12 2015-01-01 2015-01-31 12
111 60 60 12 2015-02-01 2015-02-28 11
111 60 60 12 2015-03-01 2015-03-31 10
111 60 60 12 2015-04-01 2015-04-30 9
111 60 60 12 2015-05-01 2015-05-31 8
111 60 60 12 2015-06-01 2015-06-30 7
111 120 60 12 2015-07-01 2015-07-31 6
111 120 60 12 2015-08-01 2015-08-31 5
111 120 60 12 2015-09-01 2015-09-30 4
111 120 60 12 2015-10-01 2015-10-31 3
111 120 60 12 2015-11-01 2015-11-30 2
111 120 60 12 2015-12-01 2015-12-31 1
111 120 60 12 2016-01-01 2015-01-31 0
111 120 60 12 2016-02-01 2015-02-29 0
321 60 60 12 2015-01-01 2015-01-31 12
321 60 60 12 2015-02-01 2015-02-28 11
321 60 60 12 2015-03-01 2015-03-31 10
321 60 60 12 2015-04-01 2015-04-30 9
321 60 60 12 2015-05-01 2015-05-31 8
321 60 60 12 2015-06-01 2015-06-30 7
321 60 60 12 2015-07-01 2015-07-31 6
321 60 60 12 2015-08-01 2015-08-31 5
321 60 60 12 2015-09-01 2015-09-30 4
321 60 60 12 2015-10-01 2015-10-31 3
321 60 60 12 2015-11-01 2015-11-30 2
321 60 60 12 2015-12-01 2015-12-31 1
321 60 60 12 2016-01-01 2016-01-30 0
321 60 60 12 2016-02-01 2016-02-31 0
321 60 60 12 2016-03-01 2016-03-30 0
321 60 60 12 2016-04-01 2016-04-31 0
I need to add status column.
A - normal period of agreement
D - where the agreement is doubled after 6mth but after 12mth is E(nd of agreemnt)
E - where contract is finished
L - where contract after 6mth was extended, after 18mth the status will be type E
For 321 Client after 12mth the lenght of contract was updated from 12 to 18
I have a lot of clients so i think better will be using loop to go by all clients?
ID_Client | Amount | Amount_charge | Lenght | Date_from | Date_to | Reverse | Status
-----------------------------------------------------------------------------------------
111 60 60 12 2015-01-01 2015-01-31 12 A
111 60 60 12 2015-02-01 2015-02-28 11 A
111 60 60 12 2015-03-01 2015-03-31 10 A
111 60 60 12 2015-04-01 2015-04-30 9 A
111 60 60 12 2015-05-01 2015-05-31 8 A
111 60 60 12 2015-06-01 2015-06-30 7 A
111 120 60 12 2015-07-01 2015-07-31 6 D
111 120 60 12 2015-08-01 2015-08-31 5 D
111 120 60 12 2015-09-01 2015-09-30 4 D
111 120 60 12 2015-10-01 2015-10-31 3 D
111 120 60 12 2015-11-01 2015-11-30 2 D
111 120 60 12 2015-12-01 2015-12-31 1 D
111 120 60 12 2016-01-01 2015-01-31 0 E
111 120 60 12 2016-02-01 2015-02-29 0 E
321 60 60 12 2015-01-01 2015-01-31 12 A
321 60 60 12 2015-02-01 2015-02-28 11 A
321 60 60 12 2015-03-01 2015-03-31 10 A
321 60 60 12 2015-04-01 2015-04-30 9 A
321 60 60 12 2015-05-01 2015-05-31 8 A
321 60 60 12 2015-06-01 2015-06-30 7 A
321 60 60 12 2015-07-01 2015-07-31 6 L
321 60 60 12 2015-08-01 2015-08-31 5 L
321 60 60 12 2015-09-01 2015-09-30 4 L
321 60 60 12 2015-10-01 2015-10-31 3 L
321 60 60 12 2015-11-01 2015-11-30 2 L
321 60 60 12 2015-12-01 2015-12-31 1 L
321 60 60 18 2016-01-01 2016-01-30 0 L
321 60 60 18 2016-02-01 2016-02-31 0 L
321 60 60 18 2016-03-01 2016-03-30 0 L
321 60 60 18 2016-04-01 2016-04-31 0 L
If the Reverse column is what I think:
update table1 a
set "Status"=
CASE
WHEN A."Reverse" > 6 THEN
'A'
WHEN A."Reverse" > 0 THEN
DECODE (A."Amount", A."Amount_charge", 'L', 'D')
ELSE
CASE
WHEN A."Amount" <> A."Amount_charge" THEN
'E'
ELSE
CASE WHEN ADD_MONTHS ( (SELECT b."Date_from" FROM table1 b WHERE a."ID_Client" = b."ID_Client" AND b."Reverse" = 1),6) > a."Date_from" THEN 'L'
ELSE
'E'
END
END
END
Better is to calculate the sums. The amount per month come from first payment. Something like this:
DECLARE
CURSOR c2
IS
SELECT ID_CLIENT, --AMOUNT, AMOUNT_CHARGE, LENGTH, DATE_FROM, DATE_TO, REVERSE, STATUS,
FIRST_VALUE (amount_charge) OVER (PARTITION BY id_client ORDER BY date_from) first_amount_charge,
SUM (amount) OVER (PARTITION BY id_client ORDER BY date_from) sum_amount,
SUM (amount_charge) OVER (PARTITION BY id_client ORDER BY date_from) sum_amount_charge
FROM TABLE2
FOR UPDATE NOWAIT;
BEGIN
FOR c1 IN c2
LOOP
UPDATE table2
SET status = CASE WHEN c1.sum_amount <= 6 * c1.first_amount_charge THEN 'A'
WHEN c1.sum_amount > 18 * c1.first_amount_charge THEN 'E'
WHEN c1.sum_amount > c1.sum_amount_charge THEN 'D'
ELSE 'L'
END
WHERE CURRENT OF c2;
END LOOP;
END;

SQL - Compare rows by id, date and amount

I need to SELECT a row in which issue_date = maturity_date of another row with same id, and same amount_usd.
I tried with self join, but I do not get right result.
Here is a simplified version of my table:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2015-12-01 00:00:00.000 5000
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
2 2015-02-02 00:00:00.000 2015-12-05 00:00:00.000 12000
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Result should be:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Thanks in advance!
Do following: http://sqlfiddle.com/#!6/c0a02/1
select a.id, a.issue_date, a.maturity_date, a.amount_usd
from tbl a
inner join tbl b
on a.id = b.id
and a.maturity_date = b.issue_date
-- added to prevent same maturity date and issue date
where a.maturity_date <> a.issue_date
Output:
| id | issue_date | maturity_date | amount_usd |
|----|----------------------------|----------------------------|------------|
| 1 | January, 01 2010 00:00:00 | December, 01 2015 00:00:00 | 5000 |
| 2 | February, 02 2015 00:00:00 | December, 05 2015 00:00:00 | 12000 |