I have a table of medical visit data. I'd like to determine how frequently patients move from one location to another. My source database is using SQL Server 2008, so LAG window functions aren't possible.
I'd like to start by differentiating each Location_Name change for each patient over time. The Desired_Result column below shows the result that I'm hoping for--each time the Location_Name changes for the same Patient_ID, the column increments by 1. Note that the final change for Patient_ID 1 is to move back to a previous location, which I'd like to treat as a change in location.
Patient_ID | Location_Name | Contact_Date | Desired_Result
1 | Site A | 1/1/2019 | 1
1 | Site A | 1/2/2019 | 1
1 | Site B | 1/3/2019 | 2
1 | Site B | 1/4/2019 | 2
1 | Site C | 1/5/2019 | 3
1 | Site C | 1/6/2019 | 3
1 | Site C | 1/7/2019 | 3
1 | Site A | 1/8/2019 | 4
2 | Site B | 1/1/2019 | 1
2 | Site B | 1/4/2019 | 1
2 | Site B | 1/9/2019 | 1
Is this possible in SQL server 2008? Thank you!
This is a variation of the groups-and-islands problem. You can use the difference of row numbers to describe the group:
select t.*,
dense_rank() over (partition by patient_id order by first_contact) as location_name
from (select t.*,
min(contact_date) over (partition by patient_id, location_name, seqnum - seqnum_2) as desired_result
from (select t.*,
row_number() over (partition by patient_id order by contact_date) as seqnum,
row_number() over (partition by patient_id, location_name order by contact_date) as seqnum_2
from t
) t
) t;
Related
The following is a snippet of my table...
My table has a lot of more users and higher order_rank
I'm trying to get the number of visits leading up to that order_rank in postgres.
So the result I'm trying to generate looks like...
I would address this as a gaps-and-island problem, where each island ends with a visit. You want the end of each island, along with the count of preceding records in the same island.
You can define the group with a window count of non-null values that starts from the end of the table. Then, just use that information to count how many records belong to each group:
select *
from (
select t.*,
count(*) over(partition by customer_id, grp) - 1 as number_of_visits
from (
select t.*,
count(order_rank) over(partition by customer_id order by visit_time desc) grp
from mytable t
) t
) t
where order_rank is not null
Demo on DB Fiddle:
customer_id | visit_time | txn_flag | order_rank | grp | number_of_visits
----------: | :--------- | -------: | ---------: | --: | ---------------:
123 | 2020-01-04 | 1 | 1 | 3 | 3
123 | 2020-01-06 | 1 | 2 | 2 | 1
123 | 2020-01-11 | 1 | 3 | 1 | 4
Posting here in case someone with more knowledge than may be able to help me with some direction.
I have a table like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201125 | 1 | 0 |
-----------------------------------
| 4 | 20201114 | 2 | 32 |
-----------------------------------
| 5 | 20201116 | 2 | 0 |
-----------------------------------
| 6 | 20201120 | 2 | 23 |
-----------------------------------
However, from this, I need to have a record for each user for each day where if a day is missing for a user, then the last score recorded should be maintained then I would have something like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201122 | 1 | 14 |
-----------------------------------
| 4 | 20201123 | 1 | 14 |
-----------------------------------
| 5 | 20201124 | 1 | 14 |
-----------------------------------
| 6 | 20201125 | 1 | 0 |
-----------------------------------
| 7 | 20201114 | 2 | 32 |
-----------------------------------
| 8 | 20201115 | 2 | 32 |
-----------------------------------
| 9 | 20201116 | 2 | 0 |
-----------------------------------
| 10 | 20201117 | 2 | 0 |
-----------------------------------
| 11 | 20201118 | 2 | 0 |
-----------------------------------
| 12 | 20201119 | 2 | 0 |
-----------------------------------
| 13 | 20201120 | 2 | 23 |
-----------------------------------
I'm trying to to this in BigQuery using StandardSQL. I have an idea of how to keep the same score across following empty dates, but I really don't know how to add new rows for missing dates for each user. Also, just to keep in mind, this example only has 2 users, but in my data I have more than 1500.
My end goal would be to show something like the average of the score per day. For background, because of our logic, if the score wasn't recorded in a specific day, this means that the user is still in the last score recorded which is why I need a score for every user every day.
I'd really appreciate any help I could get! I've been trying different options without success
Below is for BigQuery Standard SQL
#standardSQL
select date, user_id,
last_value(score ignore nulls) over(partition by user_id order by date) as score
from (
select user_id, format_date('%Y%m%d', day) date,
from (
select user_id, min(parse_date('%Y%m%d', date)) min_date, max(parse_date('%Y%m%d', date)) max_date
from `project.dataset.table`
group by user_id
) a, unnest(generate_date_array(min_date, max_date)) day
)
left join `project.dataset.table` b
using(date, user_id)
-- order by user_id, date
if applied to sample data from your question - output is
One option uses generate_date_array() to create the series of dates of each user, then brings the table with a left join.
select d.date, d.user_id,
last_value(t.score ignore nulls) over(partition by d.user_id order by d.date) as score
from (
select t.user_id, d.date
from mytable t
cross join unnest(generate_date_array(min(date), max(date), interval 1 day)) d(date)
group by t.user_id
) d
left join mytable t on t.user_id = d.user_id and t.date = d.date
I think the most efficient method is to use generate_date_array() but in a very particular way:
with t as (
select t.*,
date_add(lead(date) over (partition by user_id order by date), interval -1 day) as next_date
from t
)
select row_number() over (order by t.user_id, dte) as id,
t.user_id, dte, t.score
from t cross join join
unnest(generate_date_array(date,
coalesce(next_date, date)
interval 1 day
)
) dte;
I am new to SQL and was not able to solve the following problem: I've got a column of names [name]), a column of integer values that I wanna sum up ([Values]) and another column of integer values ([Day]). I want to sum up the values grouped by name for each day. So for example if there is a name "Chris" with value 4 on day 1 and there is another entry "Chris" with value 2 on day 3, I want to show the sum of chris on day_1 (4) and on day_2 (4+2=6).
As in the example above ("chris") I wanna sum them up, showing the sum for each name on each day (the sum from day 1 until day x).
I was only able to sum up the values for each name per day (see code below) but this is not what I am searching for since I need to keep the structure of the database. Therefore I need to show the sum for each value in every row in a further column.
select name, day,
sum(value) over (partition by name order by day) total
from tablename
There is a table below showing what I want to achieve.
With sum() window function:
select *,
sum(value) over (partition by name order by day) SumValue,
sum(value2) over (partition by name2 order by day) SumValue2
from tablename
order by day
or:
select name, day, value, name2, value2,
sum(value) over (partition by name order by day) SumValue,
sum(value2) over (partition by name2 order by day) SumValue2
from (select *, row_number() over (order by day) rn from tablename) as t
order by rn
if you want to preserve the original sequence of the rows.
See the demo.
Results:
> name | day | value | name2 | value2 | SumValue | SumValue2
> :---- | --: | ----: | :---- | -----: | -------: | --------:
> Chris | 1 | 2 | Paul | 5 | 2 | 5
> Alice | 1 | 5 | Ken | 4 | 5 | 4
> Paul | 2 | 8 | Alice | 1 | 8 | 1
> Ken | 2 | 4 | Chris | 2 | 4 | 2
> Alice | 3 | 3 | Ken | 3 | 8 | 7
> Chris | 3 | 6 | Paul | 0 | 8 | 5
Apologies for the confusing title, I was unsure how to phrase it.
Below is my dataset:
+----+-----------------------------+--------+
| Id | Date | Amount |
+----+-----------------------------+--------+
| 1 | 2019-02-01 12:14:08.8056282 | 10 |
| 1 | 2019-02-04 15:23:21.3258719 | 10 |
| 1 | 2019-02-06 17:29:16.9267440 | 15 |
| 1 | 2019-02-08 14:18:14.9710497 | 10 |
+----+-----------------------------+--------+
It is an example of a bank trying to collect money from a debtor, where first, 10% of the owed sum is attempted to be collected, if a card is managed to be charged 15% is attempted, if that throws an error (for example insufficient funds), 10% is attempted again.
The desired output would be:
+----+--------+---------+
| Id | Amount | Attempt |
+----+--------+---------+
| 1 | 10 | 1 |
| 1 | 15 | 2 |
| 1 | 10 | 3 |
+----+--------+---------+
I have tried:
SELECT Id, Amount
FROM table1
GROUP BY Id, Amount
I am struggling to create a new column based on when value changes in the Amount column as I assume that could be used as another grouping variable that could fix this.
If you just want when a value changes, use lag():
select t.id, t.amount,
row_number() over (partition by id order by date) as attempt
from (select t.*, lag(amount) over (partition by id order by date) as prev_amount
from table1 t
) t
where prev_amount is null or prev_amount <> amount
I'm trying to rank the rows in the following table that looks like this:
| ID | Key | Date | Row|
*****************************
| P175 | 5 | 2017-01| 2 |
| P175 | 5 | 2017-02| 2 |
| P175 | 5 | 2017-03| 2 |
| P175 | 12 | 2017-03| 1 |
| P175 | 12 | 2017-04| 1 |
| P175 | 12 | 2017-05| 1 |
This person has two Keys at once during 2017-03, but I want the formula to put '1' for the rows where Key=12 since it reflects the most recent records.
I want the same formula to also work for the people who don't have overlapping Keys, putting '1' for the most recent records:
| ID | Key | Date | Row|
*****************************
| P170 | 8 | 2017-01| 2 |
| P170 | 8 | 2017-02| 2 |
| P170 | 8 | 2017-03| 2 |
| P170 | 6 | 2017-04| 1 |
| P170 | 6 | 2017-05| 1 |
I've tried variations of ROW_NUMBER() OVER PARTITION BY and DENSE_RANK but cannot figure out the correct formula. Thanks for your help.
First calculate the max date for the key. Then use dense_rank():
select t.*,
dense_rank() over (partition by id order by max_date desc, key) as row
from (select t.*, max(date) over (partition by id, key) as max_date
from t
) t;
If the ranges for each key did not overlap, you could do this with a cumulative count distinct:
select t.*, count(distinct key) over (partition by id order by date desc) as rank
from t;
However, this would not work in the first case. I just find it interesting that this does almost the same thing as the first query.
I guess you are looking for something like this
select personid, mykey, month,
dense_rank() over (partition by personid order by mykey desc) rown
from personkeys
order by month
see the example
http://sqlfiddle.com/#!15/cf751/8