Removing duplicates from 'over partition by' - sql

Im using over partition by clause to calculate peoples monthly figures.
A short example of my results:
Date Person Team Daily Figure Month To Date
24/09/17 James 2 50 200
24/09/17 James 2 50 200
25/09/17 James 2 50 200
25/09/17 James 2 50 200
I am calculating the monthly figure by partitioning the daily figure over the person and the month e.g.
CASE
WHEN
MONTH([DATE]) = MONTH(getdate())
THEN SUM(Figure)
OVER (PARTITION BY [Name],
MONTH(DATE]))
ELSE 0
END AS 'Month To Date'
The main issue im having is i only want to display todays daily figure but with the whole month value. I group the figure for each person and limit to today but to be able to group by person i need to SUM the month to date figure which obviously leaves me with
Date Person Team Daily Figure Month To Date
25/09/17 James 2 100 800
The daily figure is correct but its obviously summing the duplicated rows which gives me an incorrect figure.
The ideal result for today would be:
Date Person Team Daily Figure Month To Date
25/09/17 James 2 100 200
with no duplicated rows.
Has anyone got any advice on this? To basically remove duplicated partitioned rows so i just get the correct month to date figure grouped
UPDATE:
Each row is just an individual figure for each person. Nothing is grouped so each person could have atleast 20 separate rows for figures on each day

Something like this?
declare #t table (Date date, Person varchar(100), Team int, [Daily Figure] int);
insert into #t values
('20170924', 'James', 2, 50),
('20170924', 'James', 2, 50),
('20170925', 'James', 2, 50),
('20170925', 'James', 2, 50),
('20170801', 'James', 2, 80),
('20170802', 'James', 2, 80);
select Date, Person, Team, sum([Daily Figure]) as [Daily Figure],
sum(sum([Daily Figure])) over(partition by Person, Team, month(date)) as [month to date figure]
from #t
group by Date, Person, Team;

Related

How to get data of 12-month ago

In Oracle database, I have this data in a table:
person
category
month
profit
John
A
Jun-1-2022
100
Mary
A
May-1-2022
200
John
B
Jun-1-2021
230
John
A
Jun-1-2021
430
I need to add a new column into this table, called 'Same_month_last_year', which contains the data of same month last year. For example, John's data would be 430 for row 1.
I know a function in Oracle called ADD_MONTHS. but I'm new to programming (a finance student) and cannot seem to figure out how to use ADD_MONTHS to create this new column. Could you please advise?
Use the SUM analytic function with a range window:
SELECT t.*,
SUM(profit) OVER (
PARTITION BY person
ORDER BY month
RANGE BETWEEN INTERVAL '12' MONTH PRECEDING
AND INTERVAL '12' MONTH PRECEDING
) AS last_year_profit
FROM table_name t
Which, for the sample data:
CREATE TABLE table_name (person, month, profit) AS
SELECT 'John', DATE '2022-06-01', 100 FROM DUAL UNION ALL
SELECT 'Mary', DATE '2022-05-01', 200 FROM DUAL UNION ALL
SELECT 'John', DATE '2021-06-01', 430 FROM DUAL;
Outputs:
PERSON
MONTH
PROFIT
LAST_YEAR_PROFIT
John
01-JUN-21
430
null
John
01-JUN-22
100
430
Mary
01-MAY-22
200
null
db<>fiddle here
You can not have dynamically calculated columns in a table. You need to create a view and assign IDs to each person. Then in the view's script, the calculation of the new column will work like this:
(select profit
from table t
where
month = add_months(t.month, -6)
and id = t.id) as same_month_last_year

Sum over N days excluding Weekends and Holidays

I have below table
AccountID
Date
Amount
123
07/02/2021
2000
123
07/09/2021
9000
123
07/15/2021
500
123
07/20/2021
500
123
07/28/2021
500
I am trying to create a test script to test data for just one month(July). I want to sum the amount over 5 days where 5 days does not count weekends and holidays. Since it is month of July the holiday falls on July 5th 2021(07/05/2021).
The output should look something like below
AccountID
Date
Amount
123
07/02/2021
11000
123
07/09/2021
9500
123
07/15/2021
1000
123
07/20/2021
500
123
07/28/2021
500
Below is the table create and data insert statements for reference :-
create table TRANSACTIONS (
AccountID int,
Date date,
Amount int
)
insert into TRANSACTIONS values (123, '07/02/2021', 2000)
insert into TRANSACTIONS values (123, '07/09/2021', 9000)
insert into TRANSACTIONS values (123, '07/15/2021', 500)
insert into TRANSACTIONS values (123, '07/20/2021', 500)
insert into TRANSACTIONS values (123, '07/28/2021', 500)
I was able to create script that could sum over 5 days with skipping weekends(Saturday and Sunday). I am not able to think how can I skip the holiday on July 5th, 2021. I am fine with hardcoding it since this is just for testing purposes. The code 'DATEPART(WEEKDAY, h2.Date) not in (1, 7)' skips Weekend and 'DATEADD(d, 6, h1.Date)' here I am adding 6 and not 5 even the sum should be for over 5 days because after reading some articles I figured that in skipping weekends the last day is not inclusive so used 6 instead of 5. This code adds perfectly over 5 days skipping weekends
SELECT AccountId, Date,
(
SELECT SUM(Amount)
FROM TRANSACTIONS h2
WHERE
h1.AccountID = h2.AccountID and
DATEPART(WEEKDAY, h2.Date) not in (1, 7) and
h2.Date between h1.Date AND DATEADD(d, 6, h1.Date)
) as SumAmount
FROM TRANSACTIONS h1
The only sane way to tackle this is to have a calendar table to represent holidays. The easiest approach is to store every date for the date range you're likely to need (eg 1970-2030) with the type of the date, perhaps and enum of WORKDAY, WEEKEND, HOLIDAY or whatever works, eg
CREATE TABLE CALENDAR (
Date DATE,
Day_type varchar(16)
);
-- insert rows for dates you care about
Depending on where you live, you may need to include a region column too (typically the country and/or state).
With such a table, you join to it:
SELECT
AccountId,
DATEADD(DAY, (DATEDIFF(DAY, 0, t.Date)/7)*7 + 7, 0) as Date,
SUM(Amount)
FROM TRANSACTIONS t
JOIN CALENDAR c on t.Date = c.Date
AND c.day_type = 'WORKDAY'
WHERE t.Date BETWEEN <your date range>
GROUP BY AccountId, DATEADD(DAY, (DATEDIFF(DAY, 0, t.Date)/7)*7 + 7, 0)

Return absolute number, and percentage together

I am trying to calculate the total number of [visits] from my hospital database, so that I can use the result to read from my Python script and send out weekly summary to our team every week. So, I am wondering if anyone can help me out for my query since I am still learning SQL.
Goal Table format:
- Date (prefer dd/mm/yyyy)
- Patient_Name (e.g John)
- Patient_Id (e.g 12345)
- Visits
- Professionals (Categorical Variables: Nurse, Doctor, Assistant Nurse)
So, I want to get a query that can list out total visits by nurse in specific date range and percentage of total visits from all professionals for the specific patient in a week. For example, Nurse visit patient (John) 15 times, and Assistant Nurse visits 10 times while Doctor pay visits 5 times/week, so my final table would be this:
____________________________________________
|____Date_____|__Prof__|__Visits_|___Percen__|
|06/01/2018 | Nurse | 15 | 0.5 |
|02/11/2017 | A-Nurse| 10 | 0.33 |
|19/04/2016 | Nurse | 5 | 0.16 |
|
Below is my SQL Statement on my SSMS, and I used case statement for Professionals data since based on patient needs, sometimes therapists visits instead of nurse/doctor so I would like that part to be dynamic:
SELECT CONVERT(VARCHAR(10), [myDate], 101), SUM([visits]) AS [Date] , [Professionals], ((SELECT [Visits] from MyHospitalTable)* 100 / (Select SUM([Visits]) From MyHospitalTable)) as Percen
FROM
(SELECT
Count(*) as [total],
[Date] as [myDate],
[Patient_id] as [myPatient_Id],
[Patient_Name] as [myPatient_Name],
[visits] as [visits],
CASE
WHEN [Professionals] LIKE '%Nurse%' THEN 'Nurse'
WHEN [Professionals] LIKE '%Therapist%' THEN 'Therapy'
else 'Unknown'
END AS [Professionals]
FROM [MyHospitalTable]
) a
GROUP BY [myDate]
I understand that my query is not correct, and need improvement, and if anyone can please help me out getting the data, that would be awesome.
Thanks in advance.
You can calculate the grand total using window function and find the percentage.
Below give you an idea how to do that. it is not exactly compatible with your table though.
;with ct as (
select MyDate, sum(Visits) Visits
, count(Visits) over (partition by MyDate order by MyDate) TotalVisits
from HospitalTable
group by MyDate
)
select MyDate, Visits, (Visits * 100 / TotalVisits)
from ct

How to validate if there was 12 sequential payments

As example :
I have this scenario where we receive payments, a singular payment per family, and register those payments with it's amount in the DB.
The thing is that a family can move their loan from bank1 to bank2, only if they have 12 or more sequential payments.
As example if they have registered a payment for
oct, nov, dec, jan, feb, mar, apr, may, jun, jul, ago, and sept.
and feb didn't received any payment, the count will start over at march.
Coworkers are suggesting that the best approach is, in every payment registration count the total payments and register the total sequential payments in an int column called sequential.
as:
Payment Family Bank Date Sequential
---------------------------------------------------------
1200 2 1 10-22-2009 1
1200 2 1 11-22-2009 2
.
.
.
1200 2 1 08-22-2010 11
1200 2 1 09-22-2010 12
What I think, there must be an approach where the sequential column is needless, where if I want to validate if the last order by Date DESC 12 rows are sequential with only 1 month in difference.
any ideas?
Edited:
There will be million of rows in this table.
Also prefer to have only the dates in the tables and work with them at application level
Analytics!
Data:
create table payments
(amount number,
family number,
bank number,
payment_date date
);
insert into payments values (1200, 2, 1, date '2010-01-01');
insert into payments values (1200, 2, 1, date '2010-02-02');
insert into payments values (1200, 2, 1, date '2010-03-03');
insert into payments values (1200, 2, 1, date '2010-04-04');
insert into payments values (1200, 2, 1, date '2010-05-05');
insert into payments values (1200, 2, 1, date '2010-06-07');
insert into payments values (1200, 2, 1, date '2010-07-07');
--skip august
--insert into payments values (1200, 2, 1, date '2010-08-08');
insert into payments values (1200, 2, 1, date '2010-09-09');
insert into payments values (1200, 2, 1, date '2010-10-10');
insert into payments values (1200, 2, 1, date '2010-11-11');
--double pay november
insert into payments values (1200, 2, 1, date '2010-11-30');
insert into payments values (1200, 2, 1, date '2010-12-12');
Query:
select *
from (select family, bank,
trunc(payment_date, 'mon') as payment_month,
lead ( trunc(payment_date, 'mon'))
over ( partition by family
order by payment_date)
as next_payment_month
from payments
order by payment_date desc
)
-- eliminate multiple payments in month
where payment_month <> next_payment_month
-- find a gap
and add_months(payment_month, 1) <> (next_payment_month)
-- stop at the first gap
and rownum = 1
Results:
FAMILY BANK PAYMENT_M NEXT_PAYM
---------- ---------- --------- ---------
2 1 01-JUL-10 01-SEP-10
You can use the value in NEXT_PAYMENT_MONTH to perform whatever comparison you want at the application level.
SELECT trunc(MONTHS_BETWEEN(SYSDATE, DATE '2010-01-01')) FROM DUAL
gives you a number of months - that was what I meanty by using the value at the application level.
So this:
select trunc(
months_between(sysdate,
(select next_payment_date
from (select family, bank,
trunc(payment_date, 'mon') as payment_month,
lead ( trunc(payment_date, 'mon'))
over ( partition by family
order by payment_date)
as next_payment_month
from payments
where family = :family
order by payment_date desc
)
where payment_month <> next_payment_month
and add_months(payment_month, 1) <> (next_payment_month)
and rownum = 1
)
)
from dual
Gives you a number of months with successive payments since the last missed month.
To validate whether a single family have 12 sequential payments over the past twelve months, regardless of bank, use:
select sum(payment) total_paid,
count(*) total_payments,
count(distinct trunc(pay_date,'mon')) paid_months
from payment_table
where family = :family and pay_date between :start_date and :end_date;
total_payments indicates the number of payments made in the period, while paid_months indicates the number of separate months in which payments were made.
If you want to check whether they have already switched bank in the selected period, add a group by bank clause to the above query.
To list all families with 12 distinct months of payments within the period, use:
select family,
sum(payment) total_paid,
count(*) total_payments,
count(distinct trunc(pay_date,'mon')) paid_months
from payment_table
where pay_date between :start_date and :end_date
group by family
having count(distinct trunc(pay_date,'mon')) = 12;
If you want to restrict the results to families that have not already switched bank in the selected period, add a and count(distinct bank) = 1 condition to the having clause of the above query.
I suggest ensuring that the payment table has an index on family and pay_date.
I think a simple query will help, check this:
SELECT COUNT(*)
FROM payments p
WHERE p.Family = 2 AND p.Date between '01-01-2009' and '12-01-2009'
this way, you'll get the number of payments between any date with your current table structure.
How about this:
SELECT PT.Payment
, PT.Family
, PT.Bank
, PT.Date
, (SELECT COUNT(*) FROM PaymentTable T
WHERE DATEDIFF (d, T.Date, PT.Date) < 31) as IsSequential
FROM PaymentsTable PT
The above query will tell you for each payment if it's sequential (i.e. if there was a payment made the month before it)
Then you could run a query to determine if there are 12 sequential payments made for a specific month, or for a specific family.
Let's say you want to display all families that have at least 12 sequential payments:
SELECT ST.Family
, COUNT(ST.IsSequential) as NumberOfSequentialPayments
FROM
(SELECT PT.Payment
, PT.Family
, PT.Bank
, PT.Date
, (SELECT COUNT(*) FROM PaymentTable T
WHERE DATEDIFF (d, T.Date, PT.Date) < 31) as IsSequential
FROM PaymentsTable PT
) AS ST
WHERE NumberOfSequentialPayments >= 12
GROUP BY ST.Family
It is possible to do it as other have pointed out.
However, this is not a case when you have relational data, but you do things sequentially, which is a bad thing.
This is a case when a business rule is sequential in nature; in such cases having a sequential helper field might
simplify your queries
improve performance (if you talk about 100M records this sudenlly becomes almost highest rated factor and various denormalization ideas spring to mind)
make sense for other business rules (allow more functionality and flexibility)
Re last point: I think the most complete solution would require re-examining the business rules - you would probably discover that users would talk about 'missed payments', which suggest other tables, such as 'payment plan/schedule' and tied with other processes this might be really the right place to have either missed payment column or sequential value... This structure would also support flexibility in grace periods, prepaying, etc...

SQL how to make one query out of multiple ones

I have a table that holds monthly data of billing records. so say Customer 1234 was billed in Jan/Feb and Customer 2345 was billing Jan/Feb/Mar. How can I group these to show me a concurrent monthly billing cycle. But also need to have non-concurrent billed months, so Customer 3456 was billed Feb/Apl/Jun/Aug
SELECT custName, month, billed, count(*) as Tally
FROM db_name
WHERE
GROUP BY
Results needed:
Customer 1234 was billed for 2 months Concurrent
Customer 2345 was billed for 3 months Concurrent
Customer 3456 was billed for 4 months Non-Concurrent
Any suggestions?
If the month is stored as a datetime field, you can use DATEDIFF to calculate the number of months between the first and the last bill. If the number of elapsed months equals the total number of bills, the bills are consecutive.
select
'Customer ' + custname + ' was billed for ' +
cast(count(*) as varchar) + ' months ' +
case
when datediff(month,min(billdate),max(billdate))+1 = count(*)
then 'Concurrent'
else 'Non-Concurrent'
end
from #billing
where billed = 1
group by custname
If you store the billing month as an integer, you can just subtract instead of using DATEDIFF. Replace the WHEN row with:
when max(billdate)-min(billdate)+1 = count(*)
But in that case I wonder how you distinguish between years.
If the months were all in a sequence, and we are limiting our search to a particular year then Min(month) + Count(times billed) - 1 should = Max(month).
declare #billing table(Custname varchar(10), month int, billed bit)
insert into #billing values (1234, 1, 1)
insert into #billing values (1234, 2, 1)
insert into #billing values (2345, 3, 1)
insert into #billing values (2345, 4, 1)
insert into #billing values (2345, 5, 1)
insert into #billing values (3456, 1, 1)
insert into #billing values (3456, 3, 1)
insert into #billing values (3456, 9, 1)
insert into #billing values (3456, 10, 1)
Select CustName, Count(1) as MonthsBilled,
Case
when Min(Month) + Count(1) - 1 = Max(Month)
then 1
else 0
end Concurrent
From #billing
where Billed = 1
Group by CustName
Cust Months Concurrent
1234 2 1
2345 3 1
3456 4 0
The suggestions here work based on an assumption that you will never bill a customer twice or more in the same month. If that isn't a safe assumption, you need a different approach. Let us know if that's the case.
how about:
SELECT custName, month, count(*) as tally
from billing
where billed = 1
group by custName, month
You left out some important information (like how Month is stored) and what database you're using, but here's a logical approach that you can start with:
CREATE VIEW CustomerBilledInMonth (CustName, Month, AmountBilled, ContinuousFlag) AS
SELECT CustName, Month, SUM(AmountBilled), 'Noncontinuous'
FROM BillingTable BT1
WHERE NOT EXISTS
(SELECT * FROM BillingTable BT2 WHERE BT2.CustName = BT1.CustName AND BT2.Month = BT1.Month - 1)
GROUP BY CustName, Month
UNION
SELECT CustName, Month, SUM(AmountBilled), 'Continuous'
FROM BillingTable BT1
WHERE EXISTS
(SELECT * FROM BillingTable BT2 WHERE BT2.CustName = BT1.CustName AND BT2.Month = BT1.Month - 1)
GROUP BY CustName, Month
Assuming that Month here is a consecutive integer field incremented by one from the first possible month in the system, this gives you with each customer's billing for each month summed up, and an additional flag containing 'Continuous' for those months that followed a month in which the customer was also billed and 'Noncontinuous' for those months that followed a month in which the customer was not billed.
Then:
SELECT CustName, LISTOF(Month), SUM(AmountBilled), MAX(ContinuousFlag)
FROM CustomerBilledInMonth GROUP BY CustName
will give you more or less what you want (where LISTOF is some kind of COALESCE type function dependent on the exact database you're using).