grouping a table with different dates

grouping a table with different dates - sql

I have a table like below,
SalesId ItemId DateSale USDVal
ABC 01A 2018-04-01 52
ABC 01B 2018-04-01 300
ABC 01C 2018-04-01 12
ABC 01D 2018-04-01 62
ABC 01A 2018-03-23 66
MNB 01A 2018-01-01 584
MNB 01A 2018-02-20 320
MNB 01F 2018-02-20 5
I want to write a query that selects the last date for each SalesId and shows those records so the result would look something like below.
Result
SalesId ItemId DateSale USDVal
ABC 01A 2018-04-01 52
ABC 01B 2018-04-01 300
ABC 01C 2018-04-01 12
ABC 01D 2018-04-01 62
MNB 01A 2018-02-20 320
MNB 01F 2018-02-20 5

In SQL Server, the fastest way is often a correlated subquery:
select t.*
from t
where t.datesale = (select max(t2.datesale) from t t2 where t2.salesid = t.salesid);

Related

LAG / OVER / PARTITION / ORDER BY using conditions - SQL Server 2017

I have a table that looks like this:
Date AccountID Amount
2018-01-01 123 12
2018-01-06 123 150
2018-02-14 123 11
2018-05-06 123 16
2018-05-16 123 200
2018-06-01 123 18
2018-06-15 123 17
2018-06-18 123 110
2018-06-30 123 23
2018-07-01 123 45
2018-07-12 123 116
2018-07-18 123 60
This table has multiple dates and IDs, along with multiple Amounts. For each individual row, I want grab the last Date where Amount was over a specific value for that specific AccountID. I have been trying to use the LAG( Date, 1 ) in combination with several variatons of CASE and OVER ( PARTITION BY AccountID ORDER BY Date ) statements but I've had no luck. Ultimately, this is what I would like my SELECT statement to return.
Date AccountID Amount LastOverHundred
2018-01-01 123 12 NULL
2018-01-06 123 150 2018-01-06
2018-02-14 123 11 2018-01-06
2018-05-06 123 16 2018-01-06
2018-05-16 123 200 2018-05-16
2018-06-01 123 18 2018-05-16
2018-06-15 123 17 2018-05-16
2018-06-18 123 110 2018-06-18
2018-06-30 123 23 2018-06-18
2018-07-01 123 45 2018-06-18
2018-07-12 123 116 2018-07-12
2018-07-18 123 60 2018-07-12
Any help with this would be greatly appreciated.

Use a cumulative conditional max():
select t.*,
max(case when amount > 100 then date end) over (partition by accountid order by date) as lastoverhundred
from t;

Wrong results with group by for distinct count

I have these two queries for calculating a distinct count from a table for a particular date range. In my first query I group by location, aRID ( which is a rule ) and date. In my second query I don't group by a date.
I am expecting the same distinct count in both the results but I get total count as 6147 in first result and 6359 in second result. What is wrong here? The difference is group by..
select
r.loc
,cast(r.date as DATE) as dateCol
,count(distinct r.dC) as dC_count
from table r
where r.date between '01-01-2018' and '06-02-2018'
and r.loc = 1
group by r.loc, r.aRId, cast(r.date as DATE)
select
r.loc
,count(distinct r.DC) as dC_count
from table r
and r.date between '01-01-2018' and '06-02-2018'
and r.loc = 1
group by r.loc, r.aRId
loc dateCol dC_count
1 2018-01-22 1
1 2018-03-09 2
1 2018-01-28 3
1 2018-01-05 1
1 2018-05-28 143
1 2018-02-17 1
1 2018-05-08 187
1 2018-05-31 146
1 2018-01-02 3
1 2018-02-14 1
1 2018-05-11 273
1 2018-01-14 1
1 2018-03-18 2
1 2018-02-03 1
1 2018-05-20 200
1 2018-05-14 230
1 2018-01-11 5
1 2018-01-31 1
1 2018-05-17 209
1 2018-01-20 2
1 2018-03-01 1
1 2018-01-03 3
1 2018-05-06 253
1 2018-05-26 187
1 2018-03-24 1
1 2018-02-09 1
1 2018-03-04 1
1 2018-05-03 269
1 2018-05-23 187
1 2018-05-29 133
1 2018-03-21 1
1 2018-03-27 1
1 2018-05-15 202
1 2018-03-07 1
1 2018-06-01 155
1 2018-02-21 1
1 2018-01-26 2
1 2018-02-15 2
1 2018-05-12 331
1 2018-03-10 1
1 2018-01-09 3
1 2018-02-18 1
1 2018-03-13 2
1 2018-05-09 184
1 2018-01-12 2
1 2018-03-16 1
1 2018-05-18 198
1 2018-02-07 1
1 2018-02-01 1
1 2018-01-15 3
1 2018-02-24 4
1 2018-03-19 1
1 2018-05-21 161
1 2018-02-10 1
1 2018-05-04 250
1 2018-05-30 148
1 2018-05-24 153
1 2018-01-24 1
1 2018-05-10 199
1 2018-03-08 1
1 2018-01-21 1
1 2018-05-27 151
1 2018-01-04 3
1 2018-05-07 236
1 2018-03-25 1
1 2018-03-11 2
1 2018-01-10 1
1 2018-01-30 1
1 2018-03-14 1
1 2018-02-19 1
1 2018-05-16 192
1 2018-01-13 5
1 2018-01-07 1
1 2018-03-17 3
1 2018-01-27 2
1 2018-02-22 1
1 2018-05-13 200
1 2018-02-08 2
1 2018-01-16 2
1 2018-03-03 1
1 2018-05-02 217
1 2018-05-22 163
1 2018-03-20 1
1 2018-02-05 2
1 2018-02-11 1
1 2018-01-19 2
1 2018-02-28 1
1 2018-05-05 332
1 2018-05-25 211
1 2018-03-23 1
1 2018-05-19 219
loc dC_count
1 6359

From "COUNT (Transact-SQL)"
COUNT(DISTINCT expression) evaluates expression for each row in a group, and returns the number of unique, nonnull values.
The distinct is relative to the group, not to the whole table (or selected subset). I think this might be your misconception here.
To better understand what this means, take the following simplified example:
CREATE TABLE group_test
(a varchar(1),
b varchar(1),
c varchar(1));
INSERT INTO group_test
(a,
b,
c)
VALUES ('a',
'r',
'x'),
('a',
's',
'x'),
('b',
'r',
'x'),
('b',
's',
'y');
If we GROUP BY a and select count(DISTINCT c)
SELECT a,
count(DISTINCT c) #
FROM group_test
GROUP BY a;
we get
a | #
----|----
a | 1
b | 2
As there is only c='x' for a=1, there is only a distinct count of 1 for this group but 2 for the other group as it has 'x'and 'y' in c. The sum of counts is 3 here.
Now if we GROUP BY a, b
SELECT a,
b,
count(DISTINCT c) #
FROM group_test
GROUP BY a,
b;
we get
a | b | #
----|----|----
a | r | 1
a | s | 1
b | r | 1
b | s | 1
We get 1 for every count here as each value of c is the only one in the group. And all of a sudden the sum of counts is 4.
And if we get the distinct count of c for the whole table
SELECT count(DISTINCT c) #
FROM group_test;
we get
#
----
2
which sums up to 2.
The sum of the counts is different in each case but right none the less.
The more groups there are, the higher the chance for a value to be unique within that group. So your results seem totally plausible.
db<>fiddle

How to restrict the upper limit of rows while doing join in SQL?

I have two tables: balance and calendar.
Balance :
Account Date Balance
1111 01/01/2014 100
1111 02/01/2014 156
1111 03/01/2014 300
1111 04/01/2014 300
1111 07/01/2014 468
1112 02/01/2014 300
1112 03/01/2014 300
1112 06/01/2014 300
1112 07/01/2014 350
1112 08/01/2014 400
1112 09/01/2014 450
1113 01/01/2014 30
1113 02/01/2014 40
1113 03/01/2014 45
1113 06/01/2014 45
1113 07/01/2014 60
1113 08/01/2014 50
1113 09/01/2014 20
1113 10/01/2014 10
Calendar
date business_day_ind
01/01/2014 N
02/01/2014 Y
03/01/2014 Y
04/01/2014 N
05/01/2014 N
06/01/2014 Y
07/01/2014 Y
08/01/2014 Y
09/01/2014 Y
10/01/2014 Y
I need to do the following:
I need to fill in the missing days for all the accounts up to the maximum day for which it has value. Say for account 1111, it has value only till 07/01/2014, so the dates need to be filled only till that. But when I join with the calendar table (plain left join), I am not able restrict the maximum day to the day available for an account.
1111 01/01/2014 100 N
1111 02/01/2014 156 Y
1111 03/01/2014 300 Y
1111 04/01/2014 300 Y
1111 05/01/2014 N
1111 06/01/2014 N
1111 07/01/2014 468 Y
1111 08/01/2014 Y
1111 09/01/2014 Y
1111 10/01/2014 Y
1112 01/01/2014 N
1112 02/01/2014 300 Y
1112 03/01/2014 300 Y
1112 04/01/2014 N
1112 05/01/2014 N
1112 06/01/2014 300 Y
1112 07/01/2014 350 Y
1112 08/01/2014 400 Y
1112 09/01/2014 450 Y
1112 10/01/2014 Y
I need an efficient way (preferably not involving multiple steps) to restrict the dates up to an account's maximum balance available date (07/01/2014 in case of 1111,09/01/2014 in case 1112)
Desired output:
1111 01/01/2014 100 N
1111 02/01/2014 156 Y
1111 03/01/2014 300 Y
1111 04/01/2014 300 Y
1111 05/01/2014 N
1111 06/01/2014 N
1111 07/01/2014 468 Y
1112 01/01/2014 N
1112 02/01/2014 300 Y
1112 03/01/2014 300 Y
1112 04/01/2014 N
1112 05/01/2014 N
1112 06/01/2014 300 Y
1112 07/01/2014 350 Y
1112 08/01/2014 400 Y
1112 09/01/2014 450 Y
After filling the missing days, I am planning to impute the balance of previous business day to the missing days. I am planning to get previous business day for every date and do an update to missing rows by joining the original balance table with acct and previous business day as key.
Thanks.
I am Greenplum database.

A possible way would be put a second select in a subquery. For instance:
select ... from calendar a left outer join balance b on a.date = b.date
where a.date <= (select max(date) from balance c where b.Account = c.Account )

I suppose that you have third table, accounts:
select
accounts.account,
calendar.date,
balance.balance,
calendar.business_day_ind
from
accounts cross join lateral (
select *
from calendar
where calendar.date <= (
select max(date)
from balance
where balance.account = accounts.account)) as calendar left join
balance on (balance.account = accounts.account and balance.date = calendar.date)
order by
accounts.account, calendar.date;
About lateral joins

That was a fun challenge!
CREATE TABLE balance
(account int, balance_date timestamp, balance int)
DISTRIBUTED BY (account, balance_date);
INSERT INTO balance
values (1111,'01/01/2014', 100),
(1111, '02/01/2014', 156),
(1111, '03/01/2014', 300),
(1111, '04/01/2014', 300),
(1111, '07/01/2014', 468),
(1112, '02/01/2014', 300),
(1112, '03/01/2014', 300),
(1112, '06/01/2014', 300),
(1112, '07/01/2014', 350),
(1112, '08/01/2014', 400),
(1112, '09/01/2014', 450),
(1113, '01/01/2014', 30),
(1113, '02/01/2014', 40),
(1113, '03/01/2014', 45),
(1113, '06/01/2014', 45),
(1113, '07/01/2014', 60),
(1113, '08/01/2014', 50),
(1113, '09/01/2014', 20),
(1113, '10/01/2014', 10);
CREATE TABLE calendar
(calendar_date timestamp, business_day_ind boolean)
DISTRIBUTED BY (calendar_date);
INSERT INTO calendar
values ('01/01/2014', false),
('02/01/2014', true),
('03/01/2014', true),
('04/01/2014', false),
('05/01/2014', false),
('06/01/2014', true),
('07/01/2014', true),
('08/01/2014', true),
('09/01/2014', true),
('10/01/2014', true);
analyze balance;
analyze calendar;
And now the query.
select d.account, d.my_date, b.balance, c.business_day_ind
from (
select account, start_date + interval '1 month' * (generate_series(0, duration)) AS my_date
from (
select account, start_date, (date_part('year', duration) * 12 + date_part('month', duration))::int as duration
from (
select start_date, age(end_date, start_date) as duration, account
from (
select account, min(balance_date) as start_date, max(balance_date) as end_date
from balance
group by account
) as sub1
) as sub2
) sub3
) as d
left outer join balance b on d.account = b.account and d.my_date = b.balance_date
join calendar c on c.calendar_date = d.my_date
order by d.account, d.my_date;
Results:
account | my_date | balance | business_day_ind
---------+---------------------+---------+------------------
1111 | 2014-01-01 00:00:00 | 100 | f
1111 | 2014-02-01 00:00:00 | 156 | t
1111 | 2014-03-01 00:00:00 | 300 | t
1111 | 2014-04-01 00:00:00 | 300 | f
1111 | 2014-05-01 00:00:00 | | f
1111 | 2014-06-01 00:00:00 | | t
1111 | 2014-07-01 00:00:00 | 468 | t
1112 | 2014-02-01 00:00:00 | 300 | t
1112 | 2014-03-01 00:00:00 | 300 | t
1112 | 2014-04-01 00:00:00 | | f
1112 | 2014-05-01 00:00:00 | | f
1112 | 2014-06-01 00:00:00 | 300 | t
1112 | 2014-07-01 00:00:00 | 350 | t
1112 | 2014-08-01 00:00:00 | 400 | t
1112 | 2014-09-01 00:00:00 | 450 | t
1113 | 2014-01-01 00:00:00 | 30 | f
1113 | 2014-02-01 00:00:00 | 40 | t
1113 | 2014-03-01 00:00:00 | 45 | t
1113 | 2014-04-01 00:00:00 | | f
1113 | 2014-05-01 00:00:00 | | f
1113 | 2014-06-01 00:00:00 | 45 | t
1113 | 2014-07-01 00:00:00 | 60 | t
1113 | 2014-08-01 00:00:00 | 50 | t
1113 | 2014-09-01 00:00:00 | 20 | t
1113 | 2014-10-01 00:00:00 | 10 | t
(25 rows)
I had to get the min and max dates for each account and then use generate_series to generate the months between the two dates. It would have been a bit cleaner query if you wanted a record for each day but I had to use another subquery to get the results at a monthly level.

How to add status to the table

I have the following table where is clipping from my db. I have 2 types of contracts.
I: client pays for first 6mth 60$, next 6mth 120$ (111 client)
II: client pays for first 6mth 60$ but if want still pays 60$ the contract will be extended at 6mth, whole contract is 18mth. (321 client who still pays)
ID_Client | Amount | Amount_charge | Lenght | Date_from | Date_to | Reverse
--------------------------------------------------------------------------------
111 60 60 12 2015-01-01 2015-01-31 12
111 60 60 12 2015-02-01 2015-02-28 11
111 60 60 12 2015-03-01 2015-03-31 10
111 60 60 12 2015-04-01 2015-04-30 9
111 60 60 12 2015-05-01 2015-05-31 8
111 60 60 12 2015-06-01 2015-06-30 7
111 120 60 12 2015-07-01 2015-07-31 6
111 120 60 12 2015-08-01 2015-08-31 5
111 120 60 12 2015-09-01 2015-09-30 4
111 120 60 12 2015-10-01 2015-10-31 3
111 120 60 12 2015-11-01 2015-11-30 2
111 120 60 12 2015-12-01 2015-12-31 1
111 120 60 12 2016-01-01 2015-01-31 0
111 120 60 12 2016-02-01 2015-02-29 0
321 60 60 12 2015-01-01 2015-01-31 12
321 60 60 12 2015-02-01 2015-02-28 11
321 60 60 12 2015-03-01 2015-03-31 10
321 60 60 12 2015-04-01 2015-04-30 9
321 60 60 12 2015-05-01 2015-05-31 8
321 60 60 12 2015-06-01 2015-06-30 7
321 60 60 12 2015-07-01 2015-07-31 6
321 60 60 12 2015-08-01 2015-08-31 5
321 60 60 12 2015-09-01 2015-09-30 4
321 60 60 12 2015-10-01 2015-10-31 3
321 60 60 12 2015-11-01 2015-11-30 2
321 60 60 12 2015-12-01 2015-12-31 1
321 60 60 12 2016-01-01 2016-01-30 0
321 60 60 12 2016-02-01 2016-02-31 0
321 60 60 12 2016-03-01 2016-03-30 0
321 60 60 12 2016-04-01 2016-04-31 0
I need to add status column.
A - normal period of agreement
D - where the agreement is doubled after 6mth but after 12mth is E(nd of agreemnt)
E - where contract is finished
L - where contract after 6mth was extended, after 18mth the status will be type E
For 321 Client after 12mth the lenght of contract was updated from 12 to 18
I have a lot of clients so i think better will be using loop to go by all clients?
ID_Client | Amount | Amount_charge | Lenght | Date_from | Date_to | Reverse | Status
-----------------------------------------------------------------------------------------
111 60 60 12 2015-01-01 2015-01-31 12 A
111 60 60 12 2015-02-01 2015-02-28 11 A
111 60 60 12 2015-03-01 2015-03-31 10 A
111 60 60 12 2015-04-01 2015-04-30 9 A
111 60 60 12 2015-05-01 2015-05-31 8 A
111 60 60 12 2015-06-01 2015-06-30 7 A
111 120 60 12 2015-07-01 2015-07-31 6 D
111 120 60 12 2015-08-01 2015-08-31 5 D
111 120 60 12 2015-09-01 2015-09-30 4 D
111 120 60 12 2015-10-01 2015-10-31 3 D
111 120 60 12 2015-11-01 2015-11-30 2 D
111 120 60 12 2015-12-01 2015-12-31 1 D
111 120 60 12 2016-01-01 2015-01-31 0 E
111 120 60 12 2016-02-01 2015-02-29 0 E
321 60 60 12 2015-01-01 2015-01-31 12 A
321 60 60 12 2015-02-01 2015-02-28 11 A
321 60 60 12 2015-03-01 2015-03-31 10 A
321 60 60 12 2015-04-01 2015-04-30 9 A
321 60 60 12 2015-05-01 2015-05-31 8 A
321 60 60 12 2015-06-01 2015-06-30 7 A
321 60 60 12 2015-07-01 2015-07-31 6 L
321 60 60 12 2015-08-01 2015-08-31 5 L
321 60 60 12 2015-09-01 2015-09-30 4 L
321 60 60 12 2015-10-01 2015-10-31 3 L
321 60 60 12 2015-11-01 2015-11-30 2 L
321 60 60 12 2015-12-01 2015-12-31 1 L
321 60 60 18 2016-01-01 2016-01-30 0 L
321 60 60 18 2016-02-01 2016-02-31 0 L
321 60 60 18 2016-03-01 2016-03-30 0 L
321 60 60 18 2016-04-01 2016-04-31 0 L

If the Reverse column is what I think:
update table1 a
set "Status"=
CASE
WHEN A."Reverse" > 6 THEN
'A'
WHEN A."Reverse" > 0 THEN
DECODE (A."Amount", A."Amount_charge", 'L', 'D')
ELSE
CASE
WHEN A."Amount" <> A."Amount_charge" THEN
'E'
ELSE
CASE WHEN ADD_MONTHS ( (SELECT b."Date_from" FROM table1 b WHERE a."ID_Client" = b."ID_Client" AND b."Reverse" = 1),6) > a."Date_from" THEN 'L'
ELSE
'E'
END
END
END
Better is to calculate the sums. The amount per month come from first payment. Something like this:
DECLARE
CURSOR c2
IS
SELECT ID_CLIENT, --AMOUNT, AMOUNT_CHARGE, LENGTH, DATE_FROM, DATE_TO, REVERSE, STATUS,
FIRST_VALUE (amount_charge) OVER (PARTITION BY id_client ORDER BY date_from) first_amount_charge,
SUM (amount) OVER (PARTITION BY id_client ORDER BY date_from) sum_amount,
SUM (amount_charge) OVER (PARTITION BY id_client ORDER BY date_from) sum_amount_charge
FROM TABLE2
FOR UPDATE NOWAIT;
BEGIN
FOR c1 IN c2
LOOP
UPDATE table2
SET status = CASE WHEN c1.sum_amount <= 6 * c1.first_amount_charge THEN 'A'
WHEN c1.sum_amount > 18 * c1.first_amount_charge THEN 'E'
WHEN c1.sum_amount > c1.sum_amount_charge THEN 'D'
ELSE 'L'
END
WHERE CURRENT OF c2;
END LOOP;
END;

SQL - Compare rows by id, date and amount

I need to SELECT a row in which issue_date = maturity_date of another row with same id, and same amount_usd.
I tried with self join, but I do not get right result.
Here is a simplified version of my table:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2015-12-01 00:00:00.000 5000
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
2 2015-02-02 00:00:00.000 2015-12-05 00:00:00.000 12000
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Result should be:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Thanks in advance!

Do following: http://sqlfiddle.com/#!6/c0a02/1
select a.id, a.issue_date, a.maturity_date, a.amount_usd
from tbl a
inner join tbl b
on a.id = b.id
and a.maturity_date = b.issue_date
-- added to prevent same maturity date and issue date
where a.maturity_date <> a.issue_date
Output:
| id | issue_date | maturity_date | amount_usd |
|----|----------------------------|----------------------------|------------|
| 1 | January, 01 2010 00:00:00 | December, 01 2015 00:00:00 | 5000 |
| 2 | February, 02 2015 00:00:00 | December, 05 2015 00:00:00 | 12000 |

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

grouping a table with different dates - sql

In SQL Server, the fastest way is often a correlated subquery: select t.* from t where t.datesale = (select max(t2.datesale) from t t2 where t2.salesid = t.salesid);

Related

LAG / OVER / PARTITION / ORDER BY using conditions - SQL Server 2017

Wrong results with group by for distinct count

How to restrict the upper limit of rows while doing join in SQL?

How to add status to the table

SQL - Compare rows by id, date and amount

Categories

Resources