Find MIN date associated with FIRST non-0 value - sql

I am trying to generate a list of manager start dates which can be determined by the minimum AS_OF date which is the table partition.
I'm not sure how to accomplish this in a non-processing heavy manner. I believe there are some windows functions that can are better suited to accomplish this.
I do have the below which works, but is terribly slow.
SELECT
Employee_ID,
MIN(As_Of) as manager_start_date
FROM table
WHERE Direct_Reports > 0
GROUP BY 1
Sample table below with desired output at bottom.
+-------------+----------------+----------+
| Employee_ID | Direct_Reports | As_Of |
+-------------+----------------+----------+
| 1 | 0 | 1/1/2019 |
+-------------+----------------+----------+
| 1 | 0 | 1/2/2019 |
+-------------+----------------+----------+
| 1 | 0 | 1/3/2019 |
+-------------+----------------+----------+
| 1 | 1 | 1/4/2019 | '<--- First non 0 value for Employee 1'
+-------------+----------------+----------+
| 2 | 0 | 1/1/2019 |
+-------------+----------------+----------+
| 2 | 0 | 1/2/2019 |
+-------------+----------------+----------+
| 2 | 5 | 1/3/2019 | '<--- First non 0 value for Employee 2'
+-------------+----------------+----------+
| 3 | 0 | 1/1/2019 |
+-------------+----------------+----------+
| 3 | 0 | 1/2/2019 |
+-------------+----------------+----------+
| 3 | 5 | 1/3/2019 | '<--- First non 0 value for Employee 3'
+-------------+----------------+----------+
| 3 | 10 | 1/4/2019 |
+-------------+----------------+----------+
| 3 | 7 | 1/5/2019 |
+-------------+----------------+----------+
+-------------+--------------------+
| Employee_ID | Manager_Start_Date |
+-------------+--------------------+
| 1 | 1/4/2019 |
+-------------+--------------------+
| 2 | 1/3/2019 |
+-------------+--------------------+
| 3 | 1/3/2019 |
+-------------+--------------------+

Try this:
select empid , min(case when directreport > 0 then as_of END) from dbo.manager
group by empid

Related

How to count the number of occurrent of each user ID with conditions in SQL database

I have a table in MS SQL that collects the status of each ID in a marketing campaign. In each month, there is a column to check that each consumer ID is in the marketing campaign or not (is_in_programme), if so, in each month, are they newcomers in our programme or not (is_new_apply). Each ID can apply in the programme in multiple times.
My table contains datetime (reported in the last day of every month, with no skipped month), ID, status of each ID as I stated above. And I want to check that in each period, how many time that each ID is in this programme (on EXPECTED column).
In my Output column, I've tried to use the ROW_NUMBER() function that partitioned by id, is_in_programme, is_new_apply when is_in_programme, is_new_apply are both 1. But I cannot check the occurent of each ID when is_new_apply == 0
+------------+-------+-----------------+--------------+--------+----------+
| datetime | ID | is_in_programme | is_new_apply | Output | EXPECTED |
+------------+-------+-----------------+--------------+--------+----------+
| 31/01/2020 | 12345 | 1 | 1 | 1 | 1 |
| 29/02/2020 | 12345 | 1 | 0 | 0 | 1 |
| 31/03/2020 | 12345 | 1 | 0 | 0 | 1 |
| 30/04/2020 | 12345 | 1 | 0 | 0 | 1 |
| 31/05/2020 | 12345 | 0 | 0 | 0 | 0 |
| 30/06/2020 | 12345 | 1 | 1 | 2 | 2 |
| 31/07/2020 | 12345 | 1 | 0 | 0 | 2 |
| 31/08/2020 | 12345 | 1 | 0 | 0 | 2 |
| 31/01/2020 | 67890 | 0 | 0 | 0 | 0 |
| 29/02/2020 | 67890 | 1 | 1 | 1 | 1 |
| 31/03/2020 | 67890 | 1 | 0 | 0 | 1 |
| 30/04/2020 | 67890 | 0 | 0 | 0 | 0 |
| 31/05/2020 | 67890 | 0 | 0 | 0 | 0 |
| 30/06/2020 | 67890 | 1 | 1 | 2 | 2 |
| 31/07/2020 | 67890 | 1 | 0 | 0 | 2 |
| 31/08/2020 | 67890 | 1 | 0 | 0 | 2 |
| 30/09/2020 | 67890 | 0 | 0 | 0 | 0 |
| 31/10/2020 | 67890 | 1 | 1 | 3 | 3 |
| 30/11/2020 | 67890 | 1 | 0 | 0 | 3 |
| 31/12/2020 | 67890 | 1 | 0 | 0 | 3 |
+------------+-------+-----------------+--------------+--------+----------+
Is there any way to check that how many time that each ID is in the marketing campaign in each period like my EXPECTED column?
You seem to want a cumulative sum of is_new_apply when is_in_program is not 0. That would be:
select t.*,
(case when is_in_program <> 0
then sum(is_new_apply) over (partition by id order by datetime)
else 0
end) as expected
from t;

Create results grid from database tables: SQL

I have a table which describes patients' medical symptoms which has the following structure.
Note that patient 1 and patient 2 have two symptoms.
| patientID | symptomName | SymptomStartDate | SymptomDuration |
|-----------|----------------|------------------|-----------------|
| 1 | Fever | 01/01/2020 | 10 |
| 1 | Cough | 02/01/2020 | 5 |
| 2 | ChestPain | 03/01/2020 | 6 |
| 2 | DryEyes | 04/01/2020 | 8 |
| 3 | SoreThroat | 05/01/2020 | 2 |
| 4 | AnotherSymptom | 06/01/2020 | 1 |
Using this data, I want to create a grid showing which symptoms each patient had, in the following format (with 1 indicating that the patient had that symptom and 0 indicating that the patient did not have that symptom)
| patientID | Fever | Cough | ChestPain | DryEyes | SoreThroat | AnotherSymptom |Headache|
|-----------|-------|-------|-----------|---------|------------|----------------|--------|
| 1 | 1 | 1 | 0 | 0 | 0 | 0 |0 |
| 2 | 0 | 0 | 1 | 1 | 0 | 0 |0 |
| 3 | 0 | 0 | 0 | 0 | 1 | 0 |0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 1 |0 |
Note that none of the patients in this first table have headache but table 2 does have a column for headache filled with 0s. I have a list of all symptoms I want to include as columns in a separate table. (let's call that table symptom: The symptom table has only two columns : symptomName and symptomID)
Use a crosstab query:
TRANSFORM
Count(Symptoms.SymptomStartDate)
SELECT
Symptoms.PatientID
FROM
Symptoms
GROUP BY
Symptoms.PatientID
PIVOT
Symptoms.SymptomName
IN ('Fever','Cough','ChestPain','DryEyes','SoreThroat','AnotherSymptom','Headache');
Apply this format to the Format property of field SymptomStartDate:
0;;;0
Output:

How do I conditionally increase the value of the proceeding row number by 1

I need to increase the value of the proceeding row number by 1. When the row encounters another condition I then need to reset the counter. This is probably easiest explained with an example:
+---------+------------+------------+-----------+----------------+
| Acct_ID | Ins_Date | Acct_RowID | indicator | Desired_Output |
+---------+------------+------------+-----------+----------------+
| 5841 | 07/11/2019 | 1 | 1 | 1 |
| 5841 | 08/11/2019 | 2 | 0 | 2 |
| 5841 | 09/11/2019 | 3 | 0 | 3 |
| 5841 | 10/11/2019 | 4 | 0 | 4 |
| 5841 | 11/11/2019 | 5 | 1 | 1 |
| 5841 | 12/11/2019 | 6 | 0 | 2 |
| 5841 | 13/11/2019 | 7 | 1 | 1 |
| 5841 | 14/11/2019 | 8 | 0 | 2 |
| 5841 | 15/11/2019 | 9 | 0 | 3 |
| 5841 | 16/11/2019 | 10 | 0 | 4 |
| 5841 | 17/11/2019 | 11 | 0 | 5 |
| 5841 | 18/11/2019 | 12 | 0 | 6 |
| 5132 | 11/03/2019 | 1 | 1 | 1 |
| 5132 | 12/03/2019 | 2 | 0 | 2 |
| 5132 | 13/03/2019 | 3 | 0 | 3 |
| 5132 | 14/03/2019 | 4 | 1 | 1 |
| 5132 | 15/03/2019 | 5 | 0 | 2 |
| 5132 | 16/03/2019 | 6 | 0 | 3 |
| 5132 | 17/03/2019 | 7 | 0 | 4 |
| 5132 | 18/03/2019 | 8 | 0 | 5 |
| 5132 | 19/03/2019 | 9 | 1 | 1 |
| 5132 | 20/03/2019 | 10 | 0 | 2 |
+---------+------------+------------+-----------+----------------+
The column I want to create is 'Desired_Output'. It can be seen from this table that I need to use the column 'indicator'. I want the following row to be n+1; unless the next row is 1. The counter needs to reset when the value 1 is encountered again.
I have tried to use a loop method of some sort but this did not produce the desired results.
Is this possible in some way?
The trick is to identify the group of consecutive rows starts from indicator 1 to the next 1. This is achieve by using the cross apply finding the Acct_RowID with indicator = 1 and use that as a Grp_RowID to use as partition by in the row_number() window function
select *,
Desired_Output = row_number() over (partition by t.Acct_ID, Grp_RowID
order by Acct_RowID)
from your_table t
cross apply
(
select Grp_RowID = max(Acct_RowID)
from your_table x
where x.Acct_ID = t.Acct_ID
and x.Acct_RowID <= t.Acct_RowID
and x.indicator = 1
) g

SQL window excluding current group?

I'm trying to provide rolled up summaries of the following data including only the group in question as well as excluding the group. I think this can be done with a window function, but I'm having problems with getting the syntax down (in my case Hive SQL).
I want the following data to be aggregated
+------------+---------+--------+
| date | product | rating |
+------------+---------+--------+
| 2018-01-01 | A | 1 |
| 2018-01-02 | A | 3 |
| 2018-01-20 | A | 4 |
| 2018-01-27 | A | 5 |
| 2018-01-29 | A | 4 |
| 2018-02-01 | A | 5 |
| 2017-01-09 | B | NULL |
| 2017-01-12 | B | 3 |
| 2017-01-15 | B | 4 |
| 2017-01-28 | B | 4 |
| 2017-07-21 | B | 2 |
| 2017-09-21 | B | 5 |
| 2017-09-13 | C | 3 |
| 2017-09-14 | C | 4 |
| 2017-09-15 | C | 5 |
| 2017-09-16 | C | 5 |
| 2018-04-01 | C | 2 |
| 2018-01-13 | D | 1 |
| 2018-01-14 | D | 2 |
| 2018-01-24 | D | 3 |
| 2018-01-31 | D | 4 |
+------------+---------+--------+
Aggregated results:
+------+-------+---------+----+------------+------------------+----------+
| year | month | product | ct | avg_rating | avg_rating_other | other_ct |
+------+-------+---------+----+------------+------------------+----------+
| 2018 | 1 | A | 5 | 3.4 | 2.5 | 4 |
| 2018 | 2 | A | 1 | 5 | NULL | 0 |
| 2017 | 1 | B | 4 | 3.6666667 | NULL | 0 |
| 2017 | 7 | B | 1 | 2 | NULL | 0 |
| 2017 | 9 | B | 1 | 5 | 4.25 | 4 |
| 2017 | 9 | C | 4 | 4.25 | 5 | 1 |
| 2018 | 4 | C | 1 | 2 | NULL | 0 |
| 2018 | 1 | D | 4 | 2.5 | 3.4 | 5 |
+------+-------+---------+----+------------+------------------+----------+
I've also considered producing two aggregates, one with the product in question and one without, but having trouble with creating the appropriate joining key.
You can do:
select year(date), month(date), product,
count(*) as ct, avg(rating) as avg_rating,
sum(count(*)) over (partition by year(date), month(date)) - count(*) as ct_other,
((sum(sum(rating)) over (partition by year(date), month(date)) - sum(rating)) /
(sum(count(*)) over (partition by year(date), month(date)) - count(*))
) as avg_other
from t
group by year(date), month(date), product;
The rating for the "other" is a bit tricky. You need to add everything up and subtract out the current row -- and calculate the average by doing the sum divided by the count.

Aggregating tsrange values into day buckets with a tie-breaker

So I've got a schema that lets people donate $ to a set of organizations, and that donation is tied to a certain arbitrary period of time. I'm working on a report that looks at each day, and for each organization shows the total number of donations and the total cumulative value of those donations for that organization's day.
For example, here's a mockup of 3 donors, Alpha (orange), Bravo (green), and Charlie (Blue) donating to 2 different organizations (Foo and Bar) over various time periods:
I've created a SQLFiddle that implements the above example in a schema that somewhat reflects what I'm working with in reality: http://sqlfiddle.com/#!17/88969/1
(The schema is broken out into more tables than what you'd come up with given the problem statement to better reflect the real-life version I'm working with)
So far, the query that I've managed to put together looks like this:
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
)
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT
COALESCE(sum(doa.amount_cents), 0) AS total_donations_cents,
COALESCE(count(doa.*), 0) AS total_donors
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
) o2 ON true;
With the results looking like this:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1500 | 2 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |
That's pretty close, however the problem with this query is that on days where a donation ends and that same donor begins a new one, it should only count that donor's donation one time, using the higher amount donation as a tie-breaker for the cumulative $ count. An example of that is on 2018-01-13 for organization Foo: total_donors should be 1 and total_donations_cents 1000.
I tried to implement a tie-breaker for using DISTINCT ON but I got off into the weeds... any help would be appreciated!
Also, should I be worried about the performance implications of my implementation so far, given the CTEs and the CROSS JOIN?
Figured it out using DISTINCT ON: http://sqlfiddle.com/#!17/88969/4
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
), donors_by_date AS (
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT DISTINCT ON (date, da.donor_id)
da.donor_id,
doa.id,
doa.donor_amounts_id,
doa.amount_cents
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
ORDER BY date, da.donor_id, doa.amount_cents DESC
) foo ON true
)
SELECT
date,
organization_id,
COALESCE(SUM(amount_cents), 0) AS total_donations_cents,
COUNT(*) FILTER (WHERE donor_id IS NOT NULL) AS total_donors
FROM donors_by_date
GROUP BY date, organization_id
ORDER BY organization_id, date;
Result:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1000 | 1 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |