SQL Calculate weekly active users by device resolution - sql

I want to calculate my apps weekly active users by device resolution.
I got dailyData table that looks like that:
raw_advertiser_id unique user id
screen_dimensions screen resolution information string
day_id an integer that counts how many days past from April 2017
It's unique on raw_advertiser_id and day_id
Here is a result of weekly active users by resolution for users who installed my app on April 25 2017, using PostgresSQL
WITH users AS (
SELECT raw_advertiser_id
FROM beep.dailydata
WHERE day_id < 25 + 7
GROUP BY 1
HAVING min(day_id) = 25
)
SELECT
screen_dimensions,
count(DISTINCT raw_advertiser_id) totalUsers,
count(DISTINCT CASE WHEN day_id > 25
THEN raw_advertiser_id END) weeklyActiveUsersCount,
round(count(DISTINCT CASE WHEN day_id > 25
THEN raw_advertiser_id END) :: NUMERIC / count(DISTINCT raw_advertiser_id) * 100,2) weeklyActiveUsersPercent
FROM beep.dailydata
JOIN users USING (raw_advertiser_id)
WHERE day_id < 25 + 7
GROUP BY 1
ORDER BY totalUsers DESC
Result:
resolution, totalUsers, weeklyActiveUsers, weeklyActiveUsersPercent
720x1280x2.00 10 2 20
320x1152x1.00 8 0 0
480x800x1.00 5 0 0
720x1280x0.00 3 1 33.33
480x854x1.00 3 0 0
720x1184x2.00 2 0 0
I would like to perform this query not only on April 25 but on all the next days as well and merge the result. In the above query to get the result of April 26, you simply need to replace the number 25 with 26. The question is how do I do this in a query?

Related

Postgresql how do I aggregate the data in a table to produce a summarized table with unique values in a range

So, I am new to postgresql. my data is as follows:
PRODUCT WKnum SALES
HAMMER 1 17
HAMMER 2 20
HAMMER 3 17
HAMMER 4 10
HAMMER 5 12
HAMMER 6 13
HAMMER 7 2
HAMMER 8 25
SINK 1 25
SINK 2 20
SINK 3 9
SINK 4 7
SINK 5 24
SINK 6 16
SINK 7 10
SINK 8 16
BUCKET 1 22
BUCKET 2 2
BUCKET 3 10
BUCKET 4 24
BUCKET 5 9
BUCKET 6 20
BUCKET 7 9
BUCKET 8 21
I would then like an outcome like the below:
PRODUCT BEST_CONSEC_4WEEKS BEST_CONSEC_4WEEKS_SLS
HAMMER 1-4 64
SINK 5-8 66
BUCKET 3-6 63
Where "BEST_CONSEC_4WEEKS" is a character string that tells the actual week numbers with the highest sum of sales over a consecutive 4 week period. And "BEST_CONSEC_4WEEKS_SLS" provides the total units sold for that product during the weeks identified above. Database-specific functions must be avoided.
The following piece of code was my attempt, but it is not working as I expected.
SELECT PRODUCT,
CASE WHEN Wknum IN ‘1 - 4’:: INTETER AS BEST_CONSEC_4WEEKS,
SUM (CASE WHEN Wknum IS BETWEEN ‘1 AND 4’::INTEGER THEN SALES ELSE 0 END) OR
SUM (CASE WHEN Wknum IS BETWEEN ‘5 AND 8’::INTEGER THEN SALES ELSE 0 END)
SUM (CASE WHEN Wknum IS BETWEEN ‘3 AND 6’::INTEGER THEN SALES ELSE 0 END) AS BEST_CONSEC_4WEEKS_SLS
FROM TABLE1
Can anyone please tell me what I am doing wrong? Thanks
You can use window functions with frames as follows:
with u as
(select PRODUCT,
WKnum,
sum(SALES) over(partition by PRODUCT order by WKnum rows between current row and 3 following) as BEST_CONSEC_4WEEKS_SLS
from TABLE1),
v as
(select *, rank() over(partition by PRODUCT order by BEST_CONSEC_4WEEKS_SLS desc) as r
from u)
select PRODUCT,
concat(wknum, '-', wknum + 3) as BEST_CONSEC_4WEEKS,
BEST_CONSEC_4WEEKS_SLS
from v where r = 1
fiddle

Fetching data from DB and populate a partitioned List

I am confused about this both from front end point of view as well as querying the data from SQLite Database. If you have any idea how to solve either of these please do answer.
SQLite Database
I have a table likes this:
transactionId | productId | quantity
1 2 1
2 4 0
3 1 null
4 3 1
5 9 1
6 6 0
7 1 1
8 7 1
9 8 1
10 2 1
11 0 null
12 3 1
13 5 1
14 7 1
15 1 0
16 2 1
17 9 1
18 0 null
19 2 1
Now I want to display this data in groups of 5 units(i.e. groups till 5 units are completed) in list in my flutter app.
So 1st group will have 8 items,
2nd will have 6 items,
and 3rd group will have 5 items
(and is still incomplete since more items can be added till quantity for that group becomes 5)
Something like this:
Now my App can have multiple groups like this. Also, I don't think Grid view builder can work here since for each group I'll have to display some data for the group as well as accumulated data (which isn't shown in the picture)
Questions:
1) How to query data from SQFLite database?
2) How to display the queried data in my Flutter App front end?
Unfortunately, this type of problem requires a recursive CTE (or other iterative processing).
Assuming that transactionId is consecutive with no gaps:
with recursive cte as (
select transactionId, productId,
coalesce(quantity, 0) as quantity,
1 as bin
from t
where transactionId = 1
union all
select t.transactionId, t.productId,
(case when cte.quantity > 5
then 0 else cte.quantity
end) + coalesce(t.quantity, 0) as quantity,
(case when cte.quantity > 5 then 1 else 0 end) + cte.bin as bin
from cte join
t
on t.transactionId = cte.transactionId + 1
)
select *
from cte;
If transactionId has gaps or other issues, just use row_number() (in another CTE) to create an appropriate column for the where clauses.

Finding first sighting in SQL

We have a time series in an spark sql table which describes every time a user does an event.
However, users tend to do many events in a burst. I want to find the min time for everyone of these bursts.
Unfortunately this is historical data so I cant change how the table was created. So I essentially want a select min(time_), user from my_table group by user, but for each burst. Any help would be much appreciated!
EDIT:
Some example data would be:
user time_
0 10
0 11
2 12
0 12
2 13
2 15
0 83
0 84
0 85
so for example in the above data I would like to find (0, 10), (2, 12) and (0, 83). We can say that a burst occurs if it is within 1 hour (that would be 60 in the above example data).
If this is the only information you need:
select user
,time_
from (select user
,time_
,case when time_ - lag (time_,1,time_-60) over (partition by user order by time_) >= 60 then 'Y' else null end as burst
from my_table
) t
where burst = 'Y'
;
user time_
0 10
0 83
2 12
If you'll need to gather some additional information on each burst:
select user
,burst_seq
,min (time_) as min_time_
,max (time_) as max_time_
,count (*) as events_num
from (select user
,time_
,count(burst) over
(
partition by user
order by time_
rows unbounded preceding
) + 1 as burst_seq
from (select user
,time_
,case when time_ - lag (time_) over (partition by user order by time_) >= 60 then 'Y' else null end as burst
from my_table
) t
) t
group by user
,burst_seq
;
user burst_seq min_time_ max_time_ events_num
0 1 10 12 3
0 2 83 85 3
2 1 12 15 3
P.s.
There seems to be a bug with the CASE statement.
case when ... then 'Y' end yields FAILED: IndexOutOfBoundsException Index: 2, Size: 2 although it is a legal syntax.
Adding else null solved it.

Conditional SUM in SQL Server 2014

I am using SQL Server 2014. When I was testing my code I noticed a problem.
Assume that max personal hour is 80 hours.
SELECT
lsm.EmployeeName,
pd.absenceDate,
pd.amountInDays * 8 AS [HoursReported],
pd.status,
(SUM(CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END) OVER (partition by lsm.[EmployeeName] order by pd.absenceDate)) AS [TotalUsedHours]
( #maxPSHours ) - (sum(
CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END)
over (
partition by lsm.[EmployeeName] order by pd.absenceDate)) AS [TotalRemainingHours]
FROM
[LocationStaffMembers] lsm
INNER JOIN
[PersonalDays] pd ON lsm.staffMemberId = pd.staffMemberId
This query returns these results:
EmployeeName AbsenceDate HoursReported Status TotalUsdHrs TotalRemingHrs
X 11/11/2015 4 approved 4 76
X 11/15/2015 8 approved 12 68
X 11/20/2015 2 decline 14 66
X 11/20/2015 2 approved 14 66
So, query works fine for different status. First 2 rows are fine. But when an employee does more than one action in a day (decline, approved etc.), my query only shows the total used and total remaining for the day.
Here is the expected result.
EmployeeName AbsenceDate HoursReported Status TotalUsdHrs TotalRemingHrs
X 11/11/2015 4 approved 4 76
X 11/15/2015 8 approved 12 68
X 11/20/2015 2 decline 12 68
X 11/20/2015 2 approved 14 66
You are doing a cumulative sum that returns results based on the order of AbsenceDate (sum(...) over (partition by ... order by pd.absenceDate). But your last 2 records have the exact same date (11/20/2015) -- at least, according to what you are showing us. This creates an ambiguity.
So, it is absolutely conceivable, and legal, that SQL Server is processing the 2 approved hours row before the 2 declined hours row when calculating the cumulative sum --which would explain your current results--, despite the fact that rows themselves are returned to you in a different order (BTW, consider adding an order by clause to the query, otherwise, the order of the rows themselves are not guaranteed).
If the 2 rows do in fact share the exact same date, you'll have to find a 2nd column to remove the ambiguity and add that to the order by clause in the cumulative sum window function. Maybe you could add a timestamp field that you can order by.
Or maybe you always want the declined status to be considered ahead of the approved status when the AbsenceDate is the same. Here is an example of a query that would do exactly that (notice the changes in the order by clauses):
SELECT
lsm.EmployeeName,
pd.absenceDate,
pd.amountInDays * 8 AS [HoursReported],
pd.status,
(SUM(CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END) OVER (partition by lsm.[EmployeeName] order by pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end)) AS [TotalUsedHours]
( #maxPSHours ) - (sum(
CASE WHEN pd.[status]='App' THEN (pd.amountInDays * 8)
ELSE 0 END)
over (
partition by lsm.[EmployeeName] order by pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end)) AS [TotalRemainingHours]
FROM
[LocationStaffMembers] lsm
INNER JOIN
[PersonalDays] pd ON lsm.staffMemberId = pd.staffMemberId
ORDER BY lsm.[EmployeeName],
pd.absenceDate,
case when pd.[status] = 'App' then 1 else 0 end

SQL - Get Sum of Values with same Date

I'm sure I've done this type of operation a 1000 times before but for some reason this is not working for me. I'm doing a report to determine if a patient receive medication on a day. So regardless if they get 1 does or 5 doses in a day the value should be 1. Staff also do corrections on the system, that come in as negative values. So I need to sum all of the dose value for each day, if it is a + value then its 1, otherwise its a 0.
All i want to accomplish at this point is to have 1 row for each date as either 1 or 0.
Here is my SQL Query to sum the values:
SELECT
DIM_DRUG_NAME_SHORT.Drug_Name_Short AS 'Med_Name_Short'
, SUM(Baseline.Doses) as 'DOT'
, Day(Baseline.Dispense_Date) as 'd_Date'
FROM
FACT_AMS_Baseline_Report Baseline
INNER JOIN DIM_DRUG_NAME_SHORT ON Baseline.Med_Name_ID = DIM_DRUG_NAME_SHORT.Drug_Name_Long
INNER JOIN DIM_Date tDate ON Baseline.Dispense_Date = tDate.Date
WHERE
Baseline.Encounter = '00000001/01'
GROUP BY
DIM_DRUG_NAME_SHORT.Drug_Name_Short
, Baseline.Dispense_Date
, Doses
Order By
Drug_Name_Short
For time being I'm just pulling one encounter out of the data set to test with.
This is the output i'm getting. I also included the Day in the select just to show that the same day is coming through twice and they are not getting Summed.
Here is a sample of the output I get:
Med_Name_Short DOT day of month
CEFTRIAXONE 1 15
CEFTRIAXONE 1 16
CEFTRIAXONE 4 16
CEFTRIAXONE 1 17
CEFTRIAXONE 1 18
CEFTRIAXONE 1 20
CEFTRIAXONE -3 21
CEFTRIAXONE 1 21
CEFTRIAXONE -1 23
PROPRANOLOL -1 24
PROPRANOLOL 3 24
PROPRANOLOL 1 25
PROPRANOLOL 2 26
PROPRANOLOL 2 27
What I was hoping to see in this was that Day 16 would be a 5, day 21 would be -2 and day 24 would be -2.
Any assistance would be greatly appreciated.
Thanks
Remove Doses from your Group By list. You are using an aggregate function on it (SUM) which is correct, so it should not be in the GROUP BY.
I don't think you should be grouping by doses. Without seeing your data, I can only guess that, for example, there are two doses of quantity 2 on the 16th.
So try:
SELECT
DIM_DRUG_NAME_SHORT.Drug_Name_Short AS 'Med_Name_Short'
, SUM(Baseline.Doses) as 'DOT'
, Day(Baseline.Dispense_Date) as 'd_Date'
FROM
FACT_AMS_Baseline_Report Baseline
INNER JOIN DIM_DRUG_NAME_SHORT ON Baseline.Med_Name_ID = DIM_DRUG_NAME_SHORT.Drug_Name_Long
INNER JOIN DIM_Date tDate ON Baseline.Dispense_Date = tDate.Date
WHERE
Baseline.Encounter = '00000001/01'
GROUP BY
DIM_DRUG_NAME_SHORT.Drug_Name_Short
, Baseline.Dispense_Date
Order By
Drug_Name_Short
Since you're aggregating on doses you should remove it from the group by, and to get either 1 or 0 for doses use a case expression:
SELECT
DIM_DRUG_NAME_SHORT.Drug_Name_Short AS 'Med_Name_Short'
, CASE WHEN SUM(Baseline.Doses) >= 1 THEN 1 ELSE 0 END AS 'DOT'
, Day(Baseline.Dispense_Date) as 'd_Date'
FROM
FACT_AMS_Baseline_Report Baseline
INNER JOIN DIM_DRUG_NAME_SHORT ON Baseline.Med_Name_ID = DIM_DRUG_NAME_SHORT.Drug_Name_Long
INNER JOIN DIM_Date tDate ON Baseline.Dispense_Date = tDate.Date
WHERE
Baseline.Encounter = '00000001/01'
GROUP BY
DIM_DRUG_NAME_SHORT.Drug_Name_Short
, Baseline.Dispense_Date
Order By
Drug_Name_Short
If the dispense_date is a datetime value you should probably use Day(Baseline.Dispense_Date) in the group by or remove the time part.
If you group by day and your data spans over more than one month you should either limit the range or include year and month as well so that data from different months/years don't get summed up.
With your sample data you should get a result like:
Med_Name_Short DOT day of month
CEFTRIAXONE 1 15
CEFTRIAXONE 1 16
CEFTRIAXONE 1 17
CEFTRIAXONE 1 18
CEFTRIAXONE 1 20
CEFTRIAXONE 0 21
CEFTRIAXONE 0 23
PROPRANOLOL 1 24
PROPRANOLOL 1 25
PROPRANOLOL 1 26
PROPRANOLOL 1 27