Count of distinct values per day, excluding reoccuring until value changes - sql

I'm really struggling with how to explain this so I'll try and give you the format of the table below, and the desired outcome.
I have a table which contains a uniqueID, date, userID and result. I'm trying to count the number of results that are 'Correct' per day, but I only want to count unique occurances based on the userID column. I then want to exclude any furhter occurances of 'Correct' for that particular userID, until the result for the userID changes to 'Success'.
UID Date UserID Result
1 01/01/2014 5 Correct
2 01/01/2014 5 Correct
3 02/01/2014 4 Correct
4 03/01/2014 4 Correct
5 03/01/2014 5 Incorrect
6 03/01/2014 4 Incorrect
7 05/01/2014 5 Correct
8 07/01/2014 4 Correct
9 08/01/2014 5 Success
10 08/01/2014 4 Success
Based on the above data, I'd expect to see the below:
Date Correct Success
01/01/2014 1 0
02/01/2014 1 0
03/01/2014 0 0
05/01/2014 0 0
07/01/2014 0 0
08/01/2014 0 2
Can anyone help? I'm using SQL Server 2008

Use count(distinct) with case:
select date,
count(distinct case when result = 'Correct' then UserId end) as Correct,
count(distinct case when result = 'Success' then UserId end) as Success
from data d
group by date
order by date;
EDIT:
The above counts correct on all occurrences. If you only want the first one to be counted:
select date,
count(case when result = 'Correct' and seqnum = 1 then UserId end) as Correct,
count(case when result = 'Success' and seqnum = 1 then UserId end) as Success
from (select d.*,
row_number() over (partition by UserId, result order by Uid) as seqnum
from data d
) d;
In this case, the distinct is unnecessary.

Related

Get earliest value from a column with other aggregated columns in postgresql

I have a very simple stock ledger dataset.
1. date_and_time store_id product_id batch opening_qty closing_qty inward_qty outward_qty
2. 01-10-2021 14:20:00 56 a 1 5 1 0 4
3. 01-10-2021 04:20:00 56 a 1 8 5 0 3
4. 02-10-2021 15:30:00 56 a 1 9 2 1 8
5. 03-10-2021 08:40:00 56 a 2 2 6 4 0
6. 04-10-2021 06:50:00 56 a 2 8 4 0 4
Output I want:
select date, store_id,product_id, batch, first(opening_qty),last(closing_qty), sum(inward_qty),sum(outward_qty)
e.g.
1. date store_id product_id batch opening_qty closing_qty inward_qty outward_qty
2. 01-10-2021 56 a 1 8 1 0 7
I am writing a query using First_value window function and tried several others but not able to get the out put I want.
select
date,store_id,product_id,batch,
FIRST_VALUE(opening_total_qty)
OVER(
partition by date,store_id,product_id,batch
ORDER BY created_at
) as opening__qty,
sum(inward_qty) as inward_qty,sum(outward_qty) as outward_qty
from table
group by 1,2,3,4,opening_total_qty
Help please.
As your expected result is one row per group of rows with the same date, you need aggregates rather than window functions which provide as many rows as the ones filtered by the WHERE clause. You can try this :
SELECT date_trunc('day', date),store_id,product_id,batch
, (array_agg(opening_qty ORDER BY datetime ASC))[1] as opening__qty
, (array_agg(closing_qty ORDER BY datetime DESC))[1] as closing_qty
, sum(inward_qty) as inward_qty
, sum(outward_qty ) as outward_qty
FROM table
GROUP BY 1,2,3,4
see the test result in dbfidle.

Does Oracle allow to do a sum over a partition but only when it obeys certain conditions, otherwise use a lag?

So my company has an application that has a certain "in-app currency". We record every transaction.
Recently, we found out there was a bug running for a couple of weeks that allowed users to spend currency in a certain place, even when they had none. When this happened, users wouldn't get charged at all: e.g. User had 4 m.u. and bought something that's worth 10 m.u. it's balance would remain at 4.
Now we need to find out who abused it and what's their available balance.
I want to get the column BUG_ABUSE and WISHFUL_CUMMULATIVE that reflect the illegitimate transactions and the amount that our users really see in their in-app wallets but I'm running out of ideas of how to get there.
I was wondering if I could do something like a sum(estrelas) if result over 0 else lag over (partition by user order by date) or something of the likes to get the wishful cummulative.
We're using oracle. Any help is highly appreciated
User_ID
EVENT_DATE
AMOUNT
DIRECTION
RK
CUM
WISHFUL_CUMMULATIVE
BUG_ABUSE
1
02/01/2021 13:37:19,009000
-5
0
1
-5
0
1
1
08/01/2021 01:55:40,000000
40
1
2
35
40
0
1
10/01/2021 10:45:41,000000
2
1
3
37
42
0
1
10/01/2021 10:45:58,000000
2
1
4
39
44
0
1
10/01/2021 13:47:37,456000
-5
0
5
34
39
0
2
13/01/2021 20:09:59,000000
2
1
1
2
2
0
2
16/01/2021 15:14:54,000000
-50
0
2
-48
2
1
2
19/01/2021 02:02:59,730000
-5
0
3
-53
2
1
2
23/01/2021 21:14:40,000000
3
1
4
-50
5
0
2
23/01/2021 21:14:50,000000
-5
0
5
-55
0
0
Here's something you can try. This uses recursive subquery factoring (recursive WITH clause), so it will only work in Oracle 11.2 and higher.
I use columns USER_ID, EVENT_DATE and AMOUNT from your inputs. I assume all three columns are constrained NOT NULL, two events can't have exactly the same timestamp for the same user, and AMOUNT is negative for purchases and other debits (fees, etc.) and positive for deposits or other credits.
The input data looks like this:
select user_id, event_date, amount
from sample_data
order by user_id, event_date
;
USER_ID EVENT_DATE AMOUNT
------- ----------------------------- ------
1 02/01/2021 13:37:19,009000000 -5
1 08/01/2021 01:55:40,000000000 40
1 10/01/2021 10:45:41,000000000 2
1 10/01/2021 10:45:58,000000000 2
1 10/01/2021 13:47:37,456000000 -5
2 13/01/2021 20:09:59,000000000 2
2 16/01/2021 15:14:54,000000000 -50
2 19/01/2021 02:02:59,730000000 -5
2 23/01/2021 21:14:40,000000000 3
2 23/01/2021 21:14:50,000000000 -5
Perhaps your input data has additional columns (like cumulative amount, which I left out because it plays no role in the problem or its solution). You show a RK column - I assume you computed it as a step in your attempt to solve the problem; I re-create it in my solution below.
Here is what you can do with a recursive query (recursive WITH clause):
with
p (user_id, event_date, amount, rk) as (
select user_id, event_date, amount,
row_number() over (partition by user_id order by event_date)
from sample_data
)
, r (user_id, event_date, amount, rk, bug_flag, balance) as (
select user_id, event_date, amount, rk,
case when amount < 0 then 'bug' end, greatest(amount, 0)
from p
where rk = 1
union all
select p.user_id, p.event_date, p.amount, p.rk,
case when p.amount + r.balance < 0 then 'bug' end,
r.balance + case when r.balance + p.amount >= 0
then p.amount else 0 end
from p join r on p.user_id = r.user_id and p.rk = r.rk + 1
)
select *
from r
order by user_id, event_date
;
Output:
USER_ID EVENT_DATE AMOUNT RK BUG BALANCE
------- ----------------------------- ------ -- --- -------
1 02/01/2021 13:37:19,009000000 -5 1 bug 0
1 08/01/2021 01:55:40,000000000 40 2 40
1 10/01/2021 10:45:41,000000000 2 3 42
1 10/01/2021 10:45:58,000000000 2 4 44
1 10/01/2021 13:47:37,456000000 -5 5 39
2 13/01/2021 20:09:59,000000000 2 1 2
2 16/01/2021 15:14:54,000000000 -50 2 bug 2
2 19/01/2021 02:02:59,730000000 -5 3 bug 2
2 23/01/2021 21:14:40,000000000 3 4 5
2 23/01/2021 21:14:50,000000000 -5 5 0
In order to produce the result you want you'll probably want to process the rows sequentially: once the first row is processed for a user you'll compute the second one, then the third one, and so on.
Assuming the column RK is already computed and sequential for each user you can do:
with
n (user_id, event_date, amount, direction, rk, cum, wishful, bug_abuse) as (
select t.*,
greatest(amount, 0),
case when amount < 0 then 1 else 0 end
from t where rk = 1
union all
select
t.user_id, t.event_date, t.amount, t.direction, t.rk, t.cum,
greatest(n.wishful + t.amount, 0),
case when n.wishful + t.amount < 0 then n.wishful
else n.wishful + t.amount
end
from n
join t on t.user_id = n.user_id and t.rk = n.rk + 1
)
select *
from n
order by user_id, rk;

Summing up only the values of previous rows with the same ID

As I am preparing my data for predicting no-shows at a hospital, I ran into the following problem: In the query below I tried to get the number of shows/no-shows relatively shown to the number of appointments (APPTS). INDICATION_NO_SHOW means whether a patient showed up at a appointment. 0 means show, and 1 means no-show.
with t1 as
(
select
PAT_ID
,APPT_TIME
,APPT_ID
,ROW_NUMBER () over(PARTITION BY PAT_ID order by pat_id,APPT_TIME) as [TOTAL_APPTS]
,INDICATION_NO_SHOW
from appointments
)
,
t2 as
(
t1.PAT_ID
,t1.APPT_TIME
,INDICATION_NO_SHOW
,sum(INDICATION_NO_SHOW) over(order by PAT_ID, APPT_TIME ) as TOTAL_NO_SHOWS
,TOTAL_APPT
from t1
)
SELECT *
,(TOTAL_APPT- TOTAL_NO_SHOWS) AS TOTAL_SHOWS
FROM T2
order by PAT_ID, APPT_TIME
This resulted into the following dataset:
PAT ID APPT_TIME INDICATION_NO_SHOW TOTAL_SHOWS TOTAL_NO_SHOWS TOTAL_APPTS
1 1-1-2001 0 1 0 1
1 1-2-2001 0 2 0 2
1 1-3-2001 1 2 1 3
1 1-4-2001 0 3 1 4
2 1-1-2001 0 0 1 1
2 2-1-2001 0 1 1 2
2 2-2-2001 1 1 2 3
2 2-3-2001 0 2 2 4
As you can see my query only worked for patient 1, and then it also counts the no-shows for patient 1 for patient 2. So individually it worked for 1 patient, but not over the whole dataset.
The TOTAL_APPTs column worked out, because it counted the number of appts the patient had at the moment of that given appt. My question is: How do I succesfully get these shows and no-shows succesfully added up (as I did for patient 1)? I'm completely aware why this query doesn't work, I'm just completely in the blue on how to fix it..
I think that you can just use window functions. You seem to be looking for window sums of shows and no shows per patient, so:
select
pat_id,
appt_time,
indication_no_show,
sum(1 - indication_no_show)
over(partition by pat_id order by appt_time) total_shows,
sum(indication_no_show)
over(partition by pat_id order by appt_time) total_no_shows
from appointments

Conditional SUM in SQL Server 2014

I am using SQL Server 2014. When I was testing my code I noticed a problem.
Assume that max personal hour is 80 hours.
SELECT
lsm.EmployeeName,
pd.absenceDate,
pd.amountInDays * 8 AS [HoursReported],
pd.status,
(SUM(CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END) OVER (partition by lsm.[EmployeeName] order by pd.absenceDate)) AS [TotalUsedHours]
( #maxPSHours ) - (sum(
CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END)
over (
partition by lsm.[EmployeeName] order by pd.absenceDate)) AS [TotalRemainingHours]
FROM
[LocationStaffMembers] lsm
INNER JOIN
[PersonalDays] pd ON lsm.staffMemberId = pd.staffMemberId
This query returns these results:
EmployeeName AbsenceDate HoursReported Status TotalUsdHrs TotalRemingHrs
X 11/11/2015 4 approved 4 76
X 11/15/2015 8 approved 12 68
X 11/20/2015 2 decline 14 66
X 11/20/2015 2 approved 14 66
So, query works fine for different status. First 2 rows are fine. But when an employee does more than one action in a day (decline, approved etc.), my query only shows the total used and total remaining for the day.
Here is the expected result.
EmployeeName AbsenceDate HoursReported Status TotalUsdHrs TotalRemingHrs
X 11/11/2015 4 approved 4 76
X 11/15/2015 8 approved 12 68
X 11/20/2015 2 decline 12 68
X 11/20/2015 2 approved 14 66
You are doing a cumulative sum that returns results based on the order of AbsenceDate (sum(...) over (partition by ... order by pd.absenceDate). But your last 2 records have the exact same date (11/20/2015) -- at least, according to what you are showing us. This creates an ambiguity.
So, it is absolutely conceivable, and legal, that SQL Server is processing the 2 approved hours row before the 2 declined hours row when calculating the cumulative sum --which would explain your current results--, despite the fact that rows themselves are returned to you in a different order (BTW, consider adding an order by clause to the query, otherwise, the order of the rows themselves are not guaranteed).
If the 2 rows do in fact share the exact same date, you'll have to find a 2nd column to remove the ambiguity and add that to the order by clause in the cumulative sum window function. Maybe you could add a timestamp field that you can order by.
Or maybe you always want the declined status to be considered ahead of the approved status when the AbsenceDate is the same. Here is an example of a query that would do exactly that (notice the changes in the order by clauses):
SELECT
lsm.EmployeeName,
pd.absenceDate,
pd.amountInDays * 8 AS [HoursReported],
pd.status,
(SUM(CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END) OVER (partition by lsm.[EmployeeName] order by pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end)) AS [TotalUsedHours]
( #maxPSHours ) - (sum(
CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END)
over (
partition by lsm.[EmployeeName] order by pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end)) AS [TotalRemainingHours]
FROM
[LocationStaffMembers] lsm
INNER JOIN
[PersonalDays] pd ON lsm.staffMemberId = pd.staffMemberId
ORDER BY lsm.[EmployeeName],
pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end

Return results where first entry is 1 and all subsequent rows are 0

I m working on weird SQL query
Patient_ID Count order_no
1 1 1
2 1 2
2 0 3
2 0 4
3 1 5
3 0 6
where I need to count the patient as above, for every new patient , the count column is 1.
If repeated , the below entry it should be 0
I m confused how should make that work in SQL
In order to make the first entry 1 and all subsuqent entries 0, I believe you need a ranking with partition by the order number. Please checkout the sqlfiddle below to test results.
http://www.sqlfiddle.com/#!3/4e2e2/17/0
SELECT
patient_id
,CASE WHEN r.rank = 1
THEN 1
ELSE 0
END
, order_number
FROM
(
SELECT
order_number
,patient_id
,ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY order_number)[rank]
FROM
PatientTable
)r