I need to find the difference in averages between patient weights at different visits (time points), but I'm struggling with finding the "paired" averages:
I have 1 table (PHYS) containing patient weights at different visits:
PATIENT VISIT WEIGHT
1 Baseline 200
1 1 Month 190
1 2 Month 170
2 Baseline 300
2 1 Month 290
2 2 Month 280
3 Baseline 250
3 1 Month 230
My problem is that I only want to find the difference for paired data. For example, when calculating the amount of weight loss between the 2 month and Baseline visits, I would want to find the difference between the (average 2 Month weight) and the (average Baseline weight FOR ONLY THOSE PATIENTS WITH A 2 MONTH WEIGHT). In this example, the result should be AVG(170,280) - AVG(200,300) = -25 (since only patient 1 and 2 have 2 Month weights).
Here is what I have, but it calculates the difference based on all weights:
SELECT VISIT
AVG(WEIGHT)
-
(SELECT
AVG(WEIGHT)
FROM PHYS
WHERE VISIT = 'BASELINE')
FROM PHYS
GROUP BY VISIT
My desired output would be (I know I need to add an ORDER BY):
VISIT CHANGE FROM BASELINE
Baseline 0
1 Month -13.3
2 Month -25
Thank you and sorry for such a newb question.
You can do this with a join to the same table but only for the 'Baseline'. Then, the aggregation only aggregates the values that match, so you should get different baseline averages for the three groups (because the populations are different):
select p.visit, avg(p.weight) as avg_weight, avg(pbl.weight) as avg_blweight,
(avg(p.weight) - avg(pbl.weight)) as change
from phys p join
phys pbl
on p.patient = pbl.patient and
pbl.visit = 'Baseline'
group by p.visit;
Related
I have a table that looks like this:
user
type
quantity
order_id
purchase_date
john
travel
10
1
2022-01-10
john
travel
15
2
2022-01-15
john
books
4
3
2022-01-16
john
music
20
4
2022-02-01
john
travel
90
5
2022-02-15
john
clothing
200
6
2022-03-11
john
travel
70
7
2022-04-13
john
clothing
70
8
2022-05-01
john
travel
200
9
2022-06-15
john
tickets
10
10
2022-07-01
john
services
20
11
2022-07-15
john
services
90
12
2022-07-22
john
travel
10
13
2022-07-29
john
services
25
14
2022-08-01
john
clothing
3
15
2022-08-15
john
music
5
16
2022-08-17
john
music
40
18
2022-10-01
john
music
30
19
2022-11-05
john
services
2
20
2022-11-19
where i have many different users, multiple types making purchases daily.
I want to end up with a table of this format
user
label
month
john
travel
2022-01-01
john
travel
2022-02-01
john
clothing
2022-03-01
john
travel-clothing
2022-04-01
john
travel-clothing
2022-05-01
john
travel-clothing
2022-06-01
john
travel
2022-07-01
john
travel
2022-08-01
john
services
2022-10-01
john
music
2022-11-01
where the label would record the most popular type (based on % of quantity sold) for each user in a timeframe of the last 4 months (including the current month). So for instance, for March 2022 john ordered 200/339 clothing (Jan to and including Mar) so his label is clothing. But for months where two types are almost even I'd want to use a double label like for April (185 travel 200 clothing out of 409). In terms of rules this is not set in stone yet but it's something like, if two types are around even (e.g. >40%) then use both types in the label column; if three types are around even (e.g. around 30% each) use three types as label; if one label is 40% but the rest is made up of many small % keep the first label; and of course where one is clearly a majority use that. One other tricky bit is that there might be missing months for a user.
I think regarding the rules I need to just compare the % of each type, but I don't know how to retrieve the type as label afterwards. In general, I don't have the SQL/BigQuery logic very clearly in my head. I have done somethings but nothing that comes close to the target table.
Broken down in steps, I think I need 3 things:
group by user, type, month and get the partial and total count (I have done this)
then retrieve the counts for the past 4 months (have done something but it's not exactly accurate yet)
compare the ratios and make the label column
I'm not very clear on the sql/bigquery logic here, so please advise me on the correct steps to achieve the above. I'm working on bigquery but sql logic will also help
Consider below approach. It looks a little bit messy and has a room to optimize but hope you get some idea or a direction to address your problem.
WITH aggregation AS (
SELECT user, type, DATE_TRUNC(purchase_date, MONTH) AS month, month_no,
SUM(quantity) AS net_qty,
SUM(SUM(quantity)) OVER w1 AS rolling_qty
FROM sample_table, UNNEST([EXTRACT(YEAR FROM purchase_date) * 12 + EXTRACT(MONTH FROM purchase_date)]) month_no
GROUP BY 1, 2, 3, 4
WINDOW w1 AS (
PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
)
),
rolling AS (
SELECT user, month, ARRAY_AGG(STRUCT(type, net_qty)) OVER w2 AS agg, rolling_qty
FROM aggregation
QUALIFY ROW_NUMBER() OVER (PARTITION BY user, month) = 1
WINDOW w2 AS (PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
)
SELECT user, month, ARRAY_TO_STRING(ARRAY(
SELECT type FROM (
SELECT type, SUM(net_qty) / SUM(SUM(net_qty)) OVER () AS pct,
FROM r.agg GROUP BY 1
) QUALIFY IFNULL(FIRST_VALUE(pct) OVER (ORDER BY pct DESC) - pct, 0) < 0.10 -- set threshold to 0.1
), '-') AS label
FROM rolling r
ORDER BY month;
Query results
I have a table called match_score which have following data
id
participant
round
score
1
gabe
1
100
2
john
1
90
3
duff
1
80
4
vlad
1
85
5
gabe
2
75
6
john
2
70
Let's just say that round 1 is the preliminary round and 2 is the final round
I want to rank the result based on the score and grouped by the participant , if I'm using some normal sql group by participant and order by score desc.
vlad are the 1st, duff 2nd and gabe 3rd, which one is wrong.
what i want is
1st gabe with 75 point in the final round
2nd john with 70 point in the final round
3rd vlad with 85 point in the preliminary round
4th duff with 80 point in the preliminary round
maybe something like this:
select
h.participant,
h.round,
h.score
from
match_score h
where
not exists (
select 1
from
match_score t
where
t.participant = h.participantand t.round > h.round
)
order by
h.round desc,
h.score desc;
I have a user request for a report and I’m too new to SQL programming to know how to approach it.
My user wants to know for each Staff ID what is the min, avg and max number of days between visits. What I don’t know how to figure out is the number of days between Visit 1 and Visit 2; Visit 2 and Visit 3, etc., for each Person ID. Some Person ID’s only have one visit, others (most) have multiple visits (up to 26). Here is a snapshot of some data (the full dataset is over 14k records):
PersonID VisitNo StaffID VisitDate
161 1 42344 06/19/2018
163 1 32987 05/14/2018
163 2 32987 09/17/2018
193 1 42344 04/09/2018
193 2 42344 07/18/2018
193 1 33865 07/18/2018
207 1 32987 10/10/2018
207 2 32987 11/05/2018
329 1 42344 04/15/2018
329 2 42344 05/23/2018
329 3 42344 06/10/2018
329 4 42344 07/18/2018
329 1 33865 06/30/2018
329 2 33865 09/14/2018
My research found a lot of references to comparing rows in the same table and I figured out how to compare one visit to the next for a single PersonID using a self join and datadiff, but how do I get from one PersonID to the next, or skip those PersonID’s with only 1 visit? Or a PersonID who has visits with multiple StaffId's?
Any ideas/suggestions are greatly appreciated as I have two requests that will benefit.
You can use analytic function LEAD (myvar,1) OVER ()
example from https://www.techonthenet.com/sql_server/functions/lead.php
SELECT dept_id, last_name, salary,
LEAD (salary,1) OVER (ORDER BY salary) AS next_highest_salary
FROM employees;
For the average number of days, you can just use aggregation:
select personid,
(datediff(day, min(visitdate), max(visitdate)) * 1.0 / nullif(count(*) - 1, 0)
from t
group by personid;
I used SQL Server syntax, but the same idea holds in any database. The average is the maximum minus the minimum divided by one less than the number of visits.
I've a Measure like below,
Year ProductCategory CompanyId TotalCustCnt SalesAmt Rank
2012 Prd1 1 20 100,000 1
2012 Prd2 1 10 75,000 2
2013 Prd1 2 18 80,000 2
2013 Prd2 2 15 50,000 1
Now I want to calculate three averages out of this data.
Average = SalesAmt / TotalCustCnt
Company average (Average for each company)
Leader average (Average for the leader for the Product category,year i.e company with Rank 1)
Industry average (Average for the whole industry for the product category,year)
The first one is straight forward, I've added a calculated field in the CUBE for adding an expression (SalesAmt / TotalCustCnt)
But how do I calculate the other two averages?
I've checked the AVG function and also tried SUM([Measures].[SalesAmt].ALLMEMBER) / SUM([Measures].[TotalCustCnt].ALLMEMBER) but no success.
Im trying to figure out the total for the quarter when the only data shown is a running total for the year:
Id Amount Periods Year Type Date
-------------------------------------------------------------
1 65 2 2014 G 4-1-12
2 75 3 2014 G 7-1-12
3 25 1 2014 G 1-1-12
4 60 1 2014 H 1-1-12
5 75 1 2014 Y 1-1-12
6 120 3 2014 I 7-1-12
7 30 1 2014 I 1-1-12
8 90 2 2014 I 4-1-12
In the data shown above. The items in type G and I are running totals for the period (in qtrs). If my query returns period 3, is there a sql way to get the data for the qtr? The math would involve retrieving the data for the 3rd period - 2nd period.
Right now my sql is something like:
SELECT * FROM data WHERE Date='4-1-12';
In this query, it will return row #1, which is a total for 2 periods. I would like it to return just the total for the 2nd period. Im looking to make this happen with SQLite.
Any help would be appreciated.
Thank alot
You want to subtract the running total of the previous quarter:
SELECT Id,
Year,
Type,
Date,
Amount - IFNULL((SELECT Amount
FROM data AS previousQuarter
WHERE previousQuarter.Year = data.year
AND previousQuarter.Type = data.Type
AND previousQuarter.Periods = data.Periods - 1
), 0) AS Amount
FROM data
The IFNULL is needed to handle a quarter that has no previous quarter.