if I have transactions table like that:
+----+--------+------------+-------------+--------+
| id | userID | debitAccID | creditAccID | amount |
+----+--------+------------+-------------+--------+
| 1 | 1 | 1 | 2 | 500 |
| 2 | 1 | 1 | 3 | 600 |
| 3 | 1 | 3 | 1 | 200 |
+----+--------+------------+-------------+--------+
how what query to use to get a table for account with id 1 like that:
+----+--------+------------+-------------+--------+
| debit | credit |balance |
+----+--------+------------+-------------+--------+
| | 500 | | 500 |
| | 600 | | 1100 |
| | | 200| 900 |
+----+--------+------------+-------------+--------+
900
Assuming the id column shows the correct order of transactions, you can use case and window with the default of rows between unlimited preceding and current row to get your output:
select id, user_id,
case when user_id = debit_acc_id then amount else 0 end as debit,
case when user_id = credit_acc_id then amount else 0 end as credit,
sum(case when user_id = debit_acc_id then amount else 0 end) over w
- sum(case when user_id = credit_acc_id then amount else 0 end) over w as balance
from transactions
where user_id = 1
window w as (partition by user_id order by id)
order by user_id, id;
db<>fiddle here
I have a table with the following schema:
CREATE TABLE example (
userID,
status, --'SUCCESS' or 'FAIL'
date -- self explanatory
);
INSERT INTO example
Values(123, 'SUCCESS', 20211010),
(123, 'SUCCESS', 20211011),
(123, 'SUCCESS', 20211028),
(123, 'FAIL', 20211029),
(123, 'SUCCESS', 20211105),
(123, 'SUCCESS', 20211110)
I am trying to utilize a lag or lead function to assess whether the current line is within a 2-week window of the previous 'SUCCESS'. Given the current data, I would expect a isWithin2WeeksofSuccessFlag to be as following:
123, 'SUCCESS', 20211010,0 --since it is the first instance
123, 'SUCCESS', 20211011,1
123, 'SUCCESS', 20211028,1
123, 'FAIL', 20211029, 1 --failed, but criteria is that it is within 2 weeks of last success, which it is
123, 'SUCCESS', 20211105, 1 --last success is 2 rows back, but it is within 2 weeks
123, 'SUCCESS', 20211128, 0 --outside of 2 weeks
I would initially think to do something like this:
Select userID, status, date,
case when lag(status,1) over (partition by userid order by date asc) = 'SUCCESS'
and date_add('day',-14, date) <= lag(date,1) over (partition by userid order by date asc)
then 1 end as isWithin2WeeksofSuccessFlag
from example
This would work if I didn't have the 'FAIL' line in there. To handle it, I could modify the lag to 2 (instead of 1), but what about if I have 2,3,4,n 'FAIL's in a row? I would need to lag by 3,4,5,n+1. The specific number of FAILs in between is variable. How do I handle this variability?
NOTE I am querying billions of rows. Efficiency isn't really a concern (since it is for analysis), but running into memory allocation errors is.Thus, endlessly adding more window functions would likely cause an automatic termination of the query due memory requirement being above node limit.
How should I handle this?
Here's an approach, also using window functions, with each "common table expression" handling one step at a time.
Note: The expected result in the question does not match the data in the question. '20211128' doesn't exist in the actual data. I used the example INSERT statement.
In the test case, I changed the column name to xdate to avoid any potential SQL reserved word issues.
The SQL:
WITH cte1 AS (
SELECT *
, SUM(CASE WHEN status = 'SUCCESS' THEN 1 ELSE 0 END) OVER (PARTITION BY userID ORDER BY xdate) AS grp
FROM example
)
, cte2 AS (
SELECT *
, MAX(CASE WHEN status = 'SUCCESS' THEN xdate END) OVER (PARTITION BY userID, grp) AS lastdate
FROM cte1
)
, cte3 AS (
SELECT *
, CASE WHEN LAG(lastdate) OVER (PARTITION BY userID ORDER BY xdate) > (xdate - INTERVAL '2' WEEK) THEN 1 ELSE 0 END AS isNear
FROM cte2
)
SELECT * FROM cte3
ORDER BY userID, xdate
;
The result:
+--------+---------+------------+------+------------+--------+
| userID | status | xdate | grp | lastdate | isNear |
+--------+---------+------------+------+------------+--------+
| 123 | SUCCESS | 2021-10-10 | 1 | 2021-10-10 | 0 |
| 123 | SUCCESS | 2021-10-11 | 2 | 2021-10-11 | 1 |
| 123 | SUCCESS | 2021-10-28 | 3 | 2021-10-28 | 0 |
| 123 | FAIL | 2021-10-29 | 3 | 2021-10-28 | 1 |
| 123 | SUCCESS | 2021-11-05 | 4 | 2021-11-05 | 1 |
| 123 | SUCCESS | 2021-11-10 | 5 | 2021-11-10 | 1 |
+--------+---------+------------+------+------------+--------+
and with the data adjusted to match your expected result, plus a new user introduced, the result is this:
+--------+---------+------------+------+------------+--------+
| userID | status | xdate | grp | lastdate | isNear |
+--------+---------+------------+------+------------+--------+
| 123 | SUCCESS | 2021-10-10 | 1 | 2021-10-10 | 0 |
| 123 | SUCCESS | 2021-10-11 | 2 | 2021-10-11 | 1 |
| 123 | SUCCESS | 2021-10-28 | 3 | 2021-10-28 | 0 |
| 123 | FAIL | 2021-10-29 | 3 | 2021-10-28 | 1 |
| 123 | SUCCESS | 2021-11-05 | 4 | 2021-11-05 | 1 |
| 123 | SUCCESS | 2021-11-28 | 5 | 2021-11-28 | 0 |
| 323 | SUCCESS | 2021-10-10 | 1 | 2021-10-10 | 0 |
| 323 | SUCCESS | 2021-10-11 | 2 | 2021-10-11 | 1 |
| 323 | SUCCESS | 2021-10-28 | 3 | 2021-10-28 | 0 |
| 323 | FAIL | 2021-10-29 | 3 | 2021-10-28 | 1 |
| 323 | SUCCESS | 2021-11-05 | 4 | 2021-11-05 | 1 |
| 323 | SUCCESS | 2021-11-28 | 5 | 2021-11-28 | 0 |
+--------+---------+------------+------+------------+--------+
Here's an extra test case, which might expose problems in some solutions:
INSERT INTO example VALUES
(123, 'SUCCESS', '2021-10-11')
, (123, 'FAIL' , '2021-10-12')
, (123, 'FAIL' , '2021-10-13')
;
The result:
+--------+---------+------------+------+------------+--------+
| userID | status | xdate | grp | lastdate | isNear |
+--------+---------+------------+------+------------+--------+
| 123 | SUCCESS | 2021-10-11 | 1 | 2021-10-11 | 0 |
| 123 | FAIL | 2021-10-12 | 1 | 2021-10-11 | 1 |
| 123 | FAIL | 2021-10-13 | 1 | 2021-10-11 | 1 |
+--------+---------+------------+------+------------+--------+
If your DBMS doesn't support window function filters you can order by status desc so 'SUCCESS' goes before 'FAIL'.
select userID, status, date,
case when lag(status,1) over (partition by userid order by status desc , date asc) = 'SUCCESS'
and dateadd(d, -14, date) <= lag(date,1) over (partition by userid order by status desc , date asc)
then 1 end as isWithin2WeeksofSuccessFlag
from example
order by date
Sql Server fiddle
Given tables CollegeMajors
| Id | Major |
|----|-------------|
| 1 | Accounting |
| 2 | Math |
| 3 | Engineering |
and EnrolledStudents
| Id | CollegeMajorId | Name | HasGraduated |
|----|----------------|-----------------|--------------|
| 1 | 1 | Grace Smith | 1 |
| 2 | 1 | Tony Fabio | 0 |
| 3 | 1 | Michael Ross | 1 |
| 4 | 3 | Fletcher Thomas | 1 |
| 5 | 2 | Dwayne Johnson | 0 |
I want to do a query like
Select
CollegeMajors.Major,
Count(select number of students who have graduated) AS TotalGraduated,
Count(select number of students who have not graduated) AS TotalNotGraduated
From
CollegeMajors
Inner Join
EnrolledStudents On EnrolledStudents.CollegeMajorId = CollegeMajors.Id
and I'm expecting these kind of results
| Major | TotalGraduated | TotalNotGraduated |
|-------------|----------------|-------------------|
| Accounting | 2 | 1 |
| Math | 0 | 1 |
| Engineering | 1 | 0 |
So the question is, what kind of query goes inside the COUNT to achieve the above?
Select CollegeMajors.Major
, COUNT(CASE WHEN EnrolledStudents.HasGraduated= 0 then 1 ELSE NULL END) as "TotalNotGraduated",
COUNT(CASE WHEN EnrolledStudents.HasGraduated = 1 then 1 ELSE NULL END) as "TotalGraduated"
From CollegeMajors
InnerJoin EnrolledStudents On EnrolledStudents.CollegeMajorId = CollegeMajors.Id
GROUP BY CollegeMajors.Major
You can use the CASE statement inside your COUNT to achieve the desired result.Please try the below updated query.
Select CollegeMajors.Major
, COUNT(CASE WHEN EnrolledStudents.HasGraduated= 0 then 1 ELSE NULL END) as "TotalNotGraduated",
COUNT(CASE WHEN EnrolledStudents.HasGraduated = 1 then 1 ELSE NULL END) as "TotalGraduated"
From CollegeMajors
InnerJoin EnrolledStudents On EnrolledStudents.CollegeMajorId = CollegeMajors.Id
GROUP BY CollegeMajors.Major
You can try this for graduated count:
Select Count(*) From EnrolledStudents group by CollegeMajorId having HasGraduated = 1
And change 1 to zero for not graduated ones:
Select Count(*) From EnrolledStudents group by CollegeMajorId having HasGraduated = 0
I have a SQL table with the following format:
+------------------------------------+
| function_id | event_type | counter |
+-------------+------------+---------+
| 1 | fail | 1000 |
| 1 | started | 5000 |
| 2 | fail | 800 |
| 2 | started | 4500 |
| ... | ... | ... |
+-------------+------------+---------+
I want to run a query over this that will group the results by function_id, by giving a ratio of the number of 'fail' events vs the number of 'started' events, as well as maintaining the number of failures. I.e. I want to run a query that will give something that looks like the following:
+-------------------------------------+
| function_id | fail_ratio | failures |
+-------------+------------+----------+
| 1 | 20% | 1000 |
| 2 | 17.78% | 800 |
| ... | ... | |
+-------------+------------+----------+
I've tried a few approaches but have been unsuccessful so far. I'm using Apache Drill SQL at the moment, as this data is being pulled from flat files.
Any help would be greatly appreciated! :)
This is all conditional aggregation:
select function_id,
sum(case when event_type = 'fail' then counter*1.0 end) / sum(case when event_type = 'started' then counter end) as fail_start_ratio,
sum(case when event_type = 'fail' then counter end) as failures
from t
group by function_id
I apologize if this is a duplicate question but I could not find my answer.
I am trying to take data that is horizontal, and get a count of how many times a specific number appears.
Example table
+-------+-------+-------+-------+
| Empid | KPI_A | KPI_B | KPI_C |
+-------+-------+-------+-------+
| 232 | 1 | 3 | 3 |
| 112 | 2 | 3 | 2 |
| 143 | 3 | 1 | 1 |
+-------+-------+-------+-------+
I need to see the following:
+-------+--------------+--------------+--------------+
| EmpID | (1's Scored) | (2's Scored) | (3's Scored) |
+-------+--------------+--------------+--------------+
| 232 | 1 | 0 | 2 |
| 112 | 0 | 2 | 1 |
| 143 | 2 | 0 | 1 |
+-------+--------------+--------------+--------------+
I hope that makes sense. Any help would be appreciated.
Since you are counting data across multiple columns, it might be easier to unpivot your KPI columns first, then count the scores.
You could use either the UNPIVOT function or CROSS APPLY to convert your KPI columns into multiple rows. The syntax would be similar to:
select EmpId, KPI, Val
from yourtable
cross apply
(
select 'A', KPI_A union all
select 'B', KPI_B union all
select 'C', KPI_C
) c (KPI, Val)
See SQL Fiddle with Demo. This gets your multiple columns into multiple rows, which is then easier to work with:
| EMPID | KPI | VAL |
|-------|-----|-----|
| 232 | A | 1 |
| 232 | B | 3 |
| 232 | C | 3 |
| 112 | A | 2 |
Now you can easily count the number of 1's, 2's, and 3's that you have using an aggregate function with a CASE expression:
select EmpId,
sum(case when val = 1 then 1 else 0 end) Score_1,
sum(case when val = 2 then 1 else 0 end) Score_2,
sum(case when val = 3 then 1 else 0 end) Score_3
from
(
select EmpId, KPI, Val
from yourtable
cross apply
(
select 'A', KPI_A union all
select 'B', KPI_B union all
select 'C', KPI_C
) c (KPI, Val)
) d
group by EmpId;
See SQL Fiddle with Demo. This gives a final result of:
| EMPID | SCORE_1 | SCORE_2 | SCORE_3 |
|-------|---------|---------|---------|
| 112 | 0 | 2 | 1 |
| 143 | 2 | 0 | 1 |
| 232 | 1 | 0 | 2 |