Hi Guys I have a data set from my query that looks like this:
Date | Count | Activity
10 Nov | 10 | A
11 Nov | 11 | A
10 Nov | 12 | B
11 Nov | 13 | B
I am trying to achieve this result. Basically the logic is the 2nd row will minus the 1st row, 4th row minus the 3rd row.
Date | Count | Activity | Diff
10 Nov | 10 | A | 0
11 Nov | 11 | A | 1
10 Nov | 12 | B | 0
11 Nov | 13 | B | 1
My current query looks like this:
select DATE, count(distinct(ID)) as Count,
count(distinct(ID)) - LAG(count(distinct(ID)),1) over (order by count(distinct(ID))) as Eng_change
from (Select DATA.*,PRODUCT.MAPPING from DATA left join PRODUCT on DATA.Part_Number=PRODUCT.PRODUCT_NUMBER ) OVERALLFUNNEL
WHERE ACTIVITY_RANK>5
group by OVERALLFUNNEL.ACTIVITY,OVERALLFUNNEL.DATE
ORDER BY ACTIVITY_RANK ASC
Using lag will give minus always the previous row but that's not what I want.
Any help or function?
regards
If I understand your problem correctly, you want the difference within each activity. If so:
select DATE, activity,
count(distinct ID) as Count,
(count(distinct ID) -
LAG(count(distinct ID), 1) over (partition by activity
order by count(distinct ID)
)
) as Eng_change
This will give NULL for the first value. If you want 0, use coalesce() or something similar.
Related
How might I calculate cumulative percentages in SQL (Postgres/Vertica)?
For instance, the question is "As of each date, of all patients who had been diagnosed by that date, what percent had been treated by that date?"
For instance, this table shows dates of diagnosis and treatment, with binary values that might be summed
ID | diagnosed | date_diag | treated | date_treat
---|------------|-----------|----------|-----------
1 1 Jan 1 0 null
2 1 Jan 15 1 Feb 20
3 1 Jan 29 1 Feb 1
4 1 Feb 08 1 Mar 4
5 1 Feb 12 0 null
6 1 Feb 18 1 Feb 24
7 1 Mar 15 1 May 5
8 1 Apr 14 1 Apr 20
I'd like to get a table of cumulative treated-vs-diagnosed ratio that might look like this.
date | ytd_diag | ytd_treat | ytd_percent
-------|------------|-----------|----------
Jan 01 1 0 0.00
Jan 15 2 0 0.00
Jan 29 3 0 0.00
Feb 08 4 1 0.25
Feb 12 5 1 0.20
Feb 18 6 1 0.17
Mar 15 7 4 0.57
Apr 14 8 4 0.50
I can calculate cumulative counts of diagnosed or treated (e.g. below), using window functions but I can't figure out a SQL query to get the number of people who'd already been treated as of each diagnosis date.
SELECT
date_diag ,
SUM(COUNT(*)) OVER ( ORDER BY date_diag ) as freq
FROM patients
WHERE diagnosed = 1
GROUP BY date_diag
ORDER BY date_diag;
You can use conditional aggregation with SUM() window function:
WITH cte AS (
SELECT kind,
date,
SUM((kind = 1)::int) OVER (ORDER BY date) ytd_diag,
SUM((kind = 2)::int) OVER (ORDER BY date) ytd_treat
FROM (
SELECT 1 kind, date_diag date, diagnosed status FROM patients
UNION ALL
SELECT 2, date_treat, treated FROM patients WHERE date_treat IS NOT NULL
) t
)
SELECT date, ytd_diag, ytd_treat,
ROUND(1.0 * ytd_treat / ytd_diag, 2) ytd_percent
FROM cte
WHERE kind = 1;
See the demo.
You can solve this with window functions. The first thing you want to do is to derive a table from your patients table that has a running tally of both the diagnosed and treated columns. The rows should be tallied in ascending order of the diagnosis date.
Here's how you do that.First I'll create a sample patients table and data (I'll only include the columns necessary):
create temporary table patients (
date_diag date,
diagnosed int default 0,
treated int default 0
);
insert into patients (date_diag, diagnosed, treated) values
('2021-01-01', 1, 0),
('2021-01-11', 1, 1),
('2021-01-16', 1, 0),
('2021-01-30', 1, 1),
('2021-02-04', 1, 1),
('2021-01-14', 1, 1);
Then here's how to create the derived table of all the tallied results.
select
date_diag,
diagnosed,
treated,
sum(treated) over(order by date_diag ASC ) as treated_cmtv,
count(diagnosed) over(order by date_diag ASC) as diagnosed_cmtv
from patients
/*
date_diag | diagnosed | treated | treated_cmtv | diagnosed_cmtv
------------+-----------+---------+--------------+----------------
2021-01-01 | 1 | 0 | 0 | 1
2021-01-11 | 1 | 1 | 1 | 2
2021-01-14 | 1 | 1 | 2 | 3
2021-01-16 | 1 | 0 | 2 | 4
2021-01-30 | 1 | 1 | 3 | 5
2021-02-04 | 1 | 1 | 4 | 6
*/
Now that you have this table you can easily calculate the percentage by using defining this derived table in a subquery and then selecting the necessary columns for the calculation. Like so:
select
p.date_diag,
p.diagnosed,
p.diagnosed_cmtv,
p.treated_cmtv,
p.treated,
TRUNC(p.treated_cmtv::numeric / p.diagnosed_cmtv * 1.0, 2) as percent
from (
-- same table as above
select
date_diag,
diagnosed,
treated,
sum(treated) over(order by date_diag ASC ) as treated_cmtv,
count(diagnosed) over(order by date_diag ASC) as diagnosed_cmtv
from patients
) as p;
/*
date_diag | diagnosed | diagnosed_cmtv | treated_cmtv | treated | percent
------------+-----------+----------------+--------------+---------+---------
2021-01-01 | 1 | 1 | 0 | 0 | 0.00
2021-01-11 | 1 | 2 | 1 | 1 | 0.50
2021-01-14 | 1 | 3 | 2 | 1 | 0.66
2021-01-16 | 1 | 4 | 2 | 0 | 0.50
2021-01-30 | 1 | 5 | 3 | 1 | 0.60
2021-02-04 | 1 | 6 | 4 | 1 | 0.66
*/
I think that gives you what you are asking for.
An alternative approach to the other answers is to use a coordinated sub query in the select
SELECT
p.date_diag,
(SELECT COUNT(*)
FROM patients p2
WHERE p2.date_treat <= p.date_diag) ytd_treated
FROM
patients p
WHERE diagnosed = 1
GROUP BY p.date_diag
ORDER BY p.date_diag
This will give you that column of 0,0,0,1,1,4,4 - you can divide it by the diagnosed column to give your percentage
SELECT
(select ...) / SUM(COUNT(*)) OVER(...)
Note you might need some more clauses in your inner where, such as having a treated date greater than or equal to Jan 1st of the year of the diag date if you're running it against a dataset with more than just one year's data
Also bear in mind that treated as an integer will (should) nearly always be less than diagnosed so if you do an integer divide you'll get zero. Cast one of the operands to float or if you're doing your percentage out of a hundred maybe *100.0
My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here
I have an issue similar to the following query:
select name, number, id
from tableName
order by id
limit 10 offset 5
But in this case I only take the 10 elements from the group with offset 5
Is there a way to set limit and offset by id?
For example if I have a set:
|------------------------------------|---|---------------------------------------|
| Ana | 1 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
| Joe | 2 | 64ed0011-ef54-4708-a64a-f85228149651 |
and if I have skip 1 I should get
|------------------------------------|---|---------------------------------------|
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
I think that you want to filter by row_number():
select name, number, id
from (
select t.*, row_number() over(partition by name order by id) rn
from mytable t
) t
where
rn >= :number_of_records_per_group_to_skip
and rn < :number_of_records_per_group_to_skip + :number_of_records_per_group_to_keep
The query ranks records by id withing groups of records having the same name, and then filters using two parameters:
:number_of_records_per_group_to_skip: how many records per group should be skipped
:number_of_records_per_group_to_skip: how many records per group should be kept (after skipping :number_of_records_per_group_to_skip records)
This might not be the answer you are looking for but it gives you the results your example shows:
select name, number, id
from (
select * from tableName
order by id
limit 3 offset 0
) d
where id > 1;
Best regards,
Bjarni
This's example data.
KEY | MONTH | NAME
-------------------
13 | 201311 | A
24 | 201310 | B
77 | 201309 | C
19 | 201307 | D
15 | 201304 | E
I want to select previous adjacent month until not exits.
I expect results likes this.
KEY | MONTH | NAME
-------------------
13 | 201311 | A
24 | 201310 | B
77 | 201309 | C
Assume current MONTH is 201312.
For the data that you have, you could do:
select t.m_key, t.name
from (select t.*,
m_key + row_number() over (order by m_key) as grp
from table t
) t
where grp = (select max(m_key) + 1 from table t);
I say "for the data that you have" because it is unclear what happens when you pass a year boundary. Handling that case is a bit more complicated because you have to consider two keys adjacent when they don't differ by 1.
Try this query. It will also handle years breaks (..,201212,201301,...). In this query (M_KEY/100)*12+MOD(M_KEY,100) expression converts YEAR+MONTH format to MONTHS .
SELECT * FROM T
WHERE M_KEY BETWEEN
(SELECT MAX(M_KEY)
FROM T as T1
WHERE M_KEY <= 201312
AND NOT EXISTS(SELECT * FROM T
WHERE
(M_KEY/100)*12+MOD(M_KEY,100) + 1
= (T1.M_KEY /100)*12+MOD(T1.M_KEY,100)
)
)
AND 201312
I want to create an additional column which calculates the value of a row from count column with its predecessor row from the sum column. Below is the query. I tried using ROLLUP but it does not serve the purpose.
select to_char(register_date,'YYYY-MM') as "registered_in_month"
,count(*) as Total_count
from CMSS.USERS_PROFILE a
where a.pcms_db != '*'
group by (to_char(register_date,'YYYY-MM'))
order by to_char(register_date,'YYYY-MM')
This is what i get
registered_in_month TOTAL_COUNT
-------------------------------------
2005-01 1
2005-02 3
2005-04 8
2005-06 4
But what I would like to display is below, including the months which have count as 0
registered_in_month TOTAL_COUNT SUM
------------------------------------------
2005-01 1 1
2005-02 3 4
2005-03 0 4
2005-04 8 12
2005-05 0 12
2005-06 4 16
To include missing months in your result, first you need to have complete list of months. To do that you should find the earliest and latest month and then use heirarchial
query to generate the complete list.
SQL Fiddle
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
)
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date;
Once you have all the months, you can outer join it to your table. To get the cumulative sum, simply add up the count using SUM as analytical function.
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
),
y(all_months) as (
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date
)
select to_char(a.all_months,'yyyy-mm') registered_in_month,
count(b.register_date) total_count,
sum(count(b.register_date)) over (order by a.all_months) "sum"
from y a left outer join users_profile b
on a.all_months = trunc(b.register_date,'month')
group by a.all_months
order by a.all_months;
Output:
| REGISTERED_IN_MONTH | TOTAL_COUNT | SUM |
|---------------------|-------------|-----|
| 2005-01 | 1 | 1 |
| 2005-02 | 3 | 4 |
| 2005-03 | 0 | 4 |
| 2005-04 | 8 | 12 |
| 2005-05 | 0 | 12 |
| 2005-06 | 4 | 16 |