Calculating averages by quarters - sql

I have a table in presto with 2 columns: date and value.
I want to calculate the average of 2nd Quarter's values so the expected result should be:
15.
How can I do this in presto?
date value
2021-01-01 10
2021-01-30 20
2021-02-10 10
2021-04-01 20
2021-04-02 10
2021-07-10 20

You can divide month by 3 and group by the result:
-- sample data
WITH dataset (date, value) AS (
VALUES (date '2021-01-01' , 10),
(date '2021-01-30' , 20),
(date '2021-02-10' , 10),
(date '2021-04-01' , 20),
(date '2021-04-02' , 10),
(date '2021-07-10', 20)
)
--query
SELECT avg(value)
FROM dataset
WHERE month(date) / 3 = 1
GROUP BY month(date) / 3
Output:
_col0
15.0

Use quarter function:
with mytable as (
SELECT * FROM (
VALUES
(date '2021-01-01', 10),
(date '2021-01-30', 20),
(date '2021-02-10', 10),
(date '2021-04-01', 20),
(date '2021-04-02', 10),
(date '2021-07-10', 20)
) AS t (date, value)
)
select quarter(date) as qt, avg(value) as avg
from mytable
where quarter(date)=2
group by quarter(date)
Result:
qt avg
2 15.0

Related

Extract the number of daily users from table

Given a start date and end date for every user I would like to count the daily number of users on the platform:
ID
START
END
1
2022-12-01
2022-12-03
2
2022-12-01
2022-12-01
I want to get an output like this:
DATE
NUMBER
2022-12-01
2
2022-12-02
1
2022-12-03
1
Make a list of all the dates (generate_series) and count for each of them.
with the_table(id, dstart, dend) as
(
values
(1, '2022-12-01'::date, '2022-12-03'::date),
(2, '2022-12-01', '2022-12-01')
)
select d::date as "DATE",
(select count(*) from the_table where d between dstart and dend) as "NUMBER"
from generate_series('2022-12-01'::date,'2022-12-03'::date,interval '1 day') as d;
Alternative
with the_table(id,dstart,dend) as
(
values
(1, '2022-12-01'::date, '2022-12-03'::date),
(2, '2022-12-01', '2022-12-01')
),
d (id, dlogged) as
(
select id, generate_series(dstart,dend,interval '1 day')::date
from the_table
)
select dlogged as "DATE", count(*) as "NUMBER"
from d group by dlogged;

SQL Server Query for average value over a date period

DECLARE #SampleOrderTable TABLE
(
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '12/10/2019', '762.84'),
(2, '11/10/2019', '886.32'),
(3, '11/9/2019', '10245.00')
How do I select the the last 4 days prior to OrderDate and the average Amount over that period?
So result data would be:
pkPersonID Date Amount
------------------------------------
1 '12/7/2019' 190.71
1 '12/8/2019' 190.71
1 '12/9/2019' 190.71
1 '12/10/2019' 190.71
2 '12/7/2019' 221.58
2 '12/8/2019' 221.58
2 '12/9/2019' 221.58
2 '12/10/2019' 221.58
3 '11/6/2019' 2561.25
3 '11/7/2019' 2561.25
3 '11/8/2019' 2561.25
3 '11/9/2019' 2561.25
You may try with the following approach, using DATEADD(), windowed COUNT() and VALUES() table value constructor:
Table:
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '20191210', '762.84'),
(2, '20191210', '886.32'),
(3, '20191109', '10245.00')
Statement:
SELECT
t.pkPersonID,
DATEADD(day, -v.Day, t.OrderDate) AS [Date],
CONVERT(numeric(18, 6), Amount / COUNT(Amount) OVER (PARTITION BY t.pkPersonID)) AS Amount
FROM #SampleOrderTable t
CROSS APPLY (VALUES (0), (1), (2), (3)) v(Day)
ORDER BY t.pkPersonID, [Date]
Result:
pkPersonID Date Amount
1 07/12/2019 00:00:00 190.710000
1 08/12/2019 00:00:00 190.710000
1 09/12/2019 00:00:00 190.710000
1 10/12/2019 00:00:00 190.710000
2 07/12/2019 00:00:00 221.580000
2 08/12/2019 00:00:00 221.580000
2 09/12/2019 00:00:00 221.580000
2 10/12/2019 00:00:00 221.580000
3 06/11/2019 00:00:00 2561.250000
3 07/11/2019 00:00:00 2561.250000
3 08/11/2019 00:00:00 2561.250000
3 09/11/2019 00:00:00 2561.250000
You can use sql functions like AVG, DATEADD and GETDATE.
SELECT AVG(Amount) as AverageAmount
FROM #SampleOrderTable
WHERE OrderDate >= DATEADD(DAY, -4, GETDATE())
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
);
INSERT INTO #SampleOrderTable
(pkPersonID, OrderDate, Amount)
VALUES
(1, '12/20/2019', 762.84),
(2, '12/20/2019', 886.32),
(3, '12/20/2019', 10245.00),
(4, '12/19/2019', 50.00),
(5, '12/19/2019', 100.00),
(6, '09/01/2019', 200.00),
(7, '09/01/2019', 300.00),
(8, '12/15/2019', 400.00),
(9, '12/15/2019', 500.00),
(10, '09/02/2019', 150.00),
(11, '09/02/2019', 1100.00),
(12, '09/02/2019', 1200.00),
(13, '09/02/2019', 1300.00),
(14, '09/02/2019', 1400.00),
(15, '09/02/2019', 1500.00);
SELECT OrderDate,AVG(Amount) AS Average_Value
FROM #SampleOrderTable
WHERE DATEDIFF(DAY, CAST(OrderDate AS DATETIME), CAST(GETDATE() AS Datetime)) <= 4
GROUP BY OrderDate;

Calculate total time without vacations in postgres

I have a database table that represents activities and for each activity, how long it took.
It looks something like this :
activity_id | name | status | start_date | end_date
=================================================================
1 | name1 | WIP | 2019-07-24 ... | 2019-07-24 ...
start_date and end_date are timestamps. I use a view with a column total_time that is described like that:
date_part('day'::text,
COALESCE(sprint_activity.end_date::timestamp with time zone, CURRENT_TIMESTAMP)
- sprint_activity.start_date::timestamp with time zone
) + date_part('hour'::text,
COALESCE(sprint_activity.end_date::timestamp with time zone, CURRENT_TIMESTAMP)
- sprint_activity.start_date::timestamp with time zone
) / 24::double precision AS total_time
I would like to create a table for vacation or half day vacations that looks like:
date | work_percentage
=================================================
2019-07-24 | 0.4
2019-07-23 | 0.7
And then, I would like to calculate total_time in a way that uses this vacations table such that:
If a date is not in the column it's considered to have work_percentage==1
For every date that is in the table, reduce the relative percentage from the total_time query.
So let's take an example:
Activity - "Write report" started at 11-July-2019 14:00 and ended at 15-July-2019 19:00 - so the time diff is 4 days and 5 hours.
The 13th and 14th were weekend so I'd like to have a column in the vacations table that holds 2019-07-13 with work_percentage == 1 and the same for the 14th.
Deducting those vacations, the time diff would be 2 days and 5 hours as the 13th and 14th are not workdays.
Hope this example explains it better.
I think you can take this example and add some modifications based on your database
Just ddl statements to test script
create table activities (
user_id int,
activity_id int,
name text,
status text,
start_date timestamp,
end_date timestamp
);
create table vacations (
user_id int,
date date,
work_percentage numeric
);
insert into activities
values
(1, 1, 'name1', 'WIP', timestamp'2019-07-20 10:00:00', timestamp'2019-07-25 8:00:00'),
(2, 2, 'name2', 'DONE', timestamp'2019-07-28 19:00:00', timestamp'2019-08-01 7:00:00'),
(1, 3, 'name3', 'DONE', timestamp'2019-07-21 12:00:00', timestamp'2019-07-21 15:00:00'),
(-1, 4, 'Write report', 'DONE', timestamp'2019-07-11 14:00:00', timestamp'2019-07-15 19:00:00');
insert into vacations
values
(1, date'2019-07-21', 0.5),
(1, date'2019-07-22', 0),
(1, date'2019-07-23', 0.25),
(2, date'2019-07-29', 0),
(2, date'2019-07-30', 0),
(-1, date'2019-07-13', 0),
(-1, date'2019-07-14', 0);
sql script
with
daily_activity as (
select
*,
date(
generate_series(
date(start_date),
date(end_date),
interval'1 day')
) as date_key
from
activities
),
raw_data as (
select
da.*,
v.work_percentage,
case
when date(start_date) = date(end_date)
then (end_date - start_date) * coalesce(work_percentage, 1)
when date(start_date) = date_key
then (date(start_date) + 1 - start_date) * coalesce(work_percentage, 1)
when date(end_date) = date_key
then (end_date - date(end_date)) * coalesce(work_percentage, 1)
else interval'24 hours' * coalesce(work_percentage, 1)
end as activity_coverage
from
daily_activity as da
left join vacations as v on da.user_id = v.user_id
and da.date_key = v.date
)
select
user_id,
activity_id,
name,
status,
start_date,
end_date,
justify_interval(sum(activity_coverage)) as total_activity_time
from
raw_data
group by
1, 2, 3, 4, 5, 6

Compare data of current week against same week of previous years

I have this table that contains sales by stores & date.
-------------------------------------------
P_DATE - P_STORE - P_SALES
-------------------------------------------
2019-02-05 - S1 - 5000
2019-02-05 - S2 - 9850
2018-06-17 - S1 - 6980
2018-05-17 - S2 - 6590
..
..
..
-------------------------------------------
I want to compare Sum of sales for each store of last 10 weeks of this year with same week of previous years.
I want a result like this :
---------------------------------------------------
Week - Store - Sales-2019 - Sales2018
---------------------------------------------------
20 - S1 - 2580 - 2430
20 - S2 - 2580 - 2430
.
.
10 - S1 - 5905 - 5214
10 - S2 - 4789 - 6530
---------------------------------------------------
I'v tried this :
Select
[Week] = DATEPART(WEEK, E_Date),
[Store] = E_store
[Sales 2019] = Case when Year(P_date) = '2019' Then Sum (P_Sales)
[Sales 2018] = Case when Year(P_date) = '2018' Then Sum (P_Sales)
From
PIECE
Group by
DATEPART(WEEK, E_Date),
E_store
I need your help please.
This script will consider 10 weeks including current week-
WITH wk_list (COMMON,DayMinus)
AS
(
SELECT 1,0 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,2 UNION ALL
SELECT 1,3 UNION ALL
SELECT 1,4 UNION ALL
SELECT 1,5 UNION ALL
SELECT 1,6 UNION ALL
SELECT 1,7 UNION ALL
SELECT 1,8 UNION ALL
SELECT 1,9
)
SELECT
DATEPART(ISO_WEEK, P_DATE) WK,
P_STORE,
SUM(CASE WHEN YEAR(P_DATE) = 2019 THEN P_SALES ELSE 0 END) SALES_2019,
SUM(CASE WHEN YEAR(P_DATE) = 2018 THEN P_SALES ELSE 0 END) SALES_2018
FROM your_table
WHERE YEAR(P_DATE) IN (2019,2018)
AND DATEPART(ISO_WEEK, P_DATE) IN
(
SELECT A.WKNUM-wk_list.DayMinus AS [WEEK NUMBER]
FROM wk_list
INNER JOIN (
SELECT 1 AS COMMON,DATENAME(ISO_WEEK,GETDATE()) WKNUM
) A ON wk_list.COMMON = A.COMMON
)
GROUP BY DATEPART(ISO_WEEK, P_DATE),P_STORE
But if you want to exclude current week, just replace the following part in above script
, wk_list (COMMON,DayMinus)
AS
(
SELECT 1,1 UNION ALL
SELECT 1,2 UNION ALL
SELECT 1,3 UNION ALL
SELECT 1,4 UNION ALL
SELECT 1,5 UNION ALL
SELECT 1,6 UNION ALL
SELECT 1,7 UNION ALL
SELECT 1,8 UNION ALL
SELECT 1,9 UNION ALL
SELECT 1,10
)
Is this what you're looking for?
DECLARE #t TABLE (TransactionID INT, Week INT, Year INT, Amount MONEY)
INSERT INTO #t
(TransactionID, Week, Year, Amount)
VALUES
(1, 20, 2018, 50),
(2, 20, 2019, 20),
(3, 19, 2018, 35),
(4, 19, 2019, 40),
(5, 20, 2018, 70),
(6, 20, 2019, 80)
SELECT TOP 10 Week, [2018], [2019] FROM (SELECT Week, Year, SUM(Amount) As Amount FROM #t GROUP BY Week, Year) t
PIVOT
(
SUM(Amount)
FOR Year IN ([2018], [2019])
) sq
ORDER BY Week DESC

SQL query - Find daily MIN value from hourly sums

Let's cut to the chase. I have a table which looks like this one (using SQL Server 2014):
DEMO:
http://sqlfiddle.com/#!6/75f4a/1/0
CREATE TABLE TAB (
DT datetime,
VALUE float
);
INSERT INTO TAB VALUES
('2015-05-01 06:00:00', 12),
('2015-05-01 06:20:00', 10),
('2015-05-01 06:40:00', 11),
('2015-05-01 07:00:00', 14),
('2015-05-01 07:20:00', 15),
('2015-05-01 07:40:00', 13),
('2015-05-01 08:00:00', 10),
('2015-05-01 08:20:00', 9),
('2015-05-01 08:40:00', 5),
('2015-05-02 06:00:00', 19),
('2015-05-02 06:20:00', 7),
('2015-05-02 06:40:00', 11),
('2015-05-02 07:00:00', 9),
('2015-05-02 07:20:00', 7),
('2015-05-02 07:40:00', 6),
('2015-05-02 08:00:00', 10),
('2015-05-02 08:20:00', 19),
('2015-05-02 08:40:00', 15),
('2015-05-03 06:00:00', 8),
('2015-05-03 06:20:00', 8),
('2015-05-03 06:40:00', 8),
('2015-05-03 07:00:00', 21),
('2015-05-03 07:20:00', 12),
('2015-05-03 07:40:00', 7),
('2015-05-03 08:00:00', 10),
('2015-05-03 08:20:00', 4),
('2015-05-03 08:40:00', 10)
I need to:
sum values hourly
select the smallest 'hourly sum' for each day
select hour for which that sum occurred
In other words, I want to have a table which looks like this:
DATE | SUM VAL | ON HOUR
--------------------------
2015-03-01 | 24 | 8:00
2015-03-02 | 22 | 7:00
2015-03-03 | 24 | 6:00
First two points a very easy (check out sqlfiddle). I have a problem with the third one. I can't just like that select Datepart(HOUR, DT) bacause it has to be aggregated. I was trying to use JOINS and WHERE clause, but with no success (some values may occur in table more than once, which thrown an error).
I'm kinda new with SQL and I got stuck. Need your help SO! :)
One way is to use the set with minimum hourly values as a derived table and join against that. I would do something like this:
;WITH CTE AS (
SELECT Cast(Format(DT, 'yyyy-MM-dd HH:00') AS datetime) AS DT, SUM(VALUE) AS VAL
FROM TAB
GROUP BY Format(DT, 'yyyy-MM-dd HH:00')
)
SELECT b.dt "Date", val "sum val", cast(min(a.dt) as time) "on hour"
FROM cte a JOIN (
SELECT Format(DT,'yyyy-MM-dd') AS DT, MIN(VAL) AS DAILY_MIN
FROM cte HOURLY
GROUP BY Format(DT,'yyyy-MM-dd')
) b ON CAST(a.DT AS DATE) = b.DT and a.VAL = b.DAILY_MIN
GROUP BY b.DT, a.VAL
This would get:
Date sum val on hour
2015-05-01 24 08:00:00.0000000
2015-05-02 22 07:00:00.0000000
2015-05-03 24 06:00:00.0000000
I used min() for the time part as your sample data has the same low value for two separate hour for the 3rd. If you want both then remove the min function from the outer select and the group by. Then you would get:
Date sum val on hour
2015-05-01 24 08:00:00.0000000
2015-05-02 22 07:00:00.0000000
2015-05-03 24 06:00:00.0000000
2015-05-03 24 08:00:00.0000000
I'm sure it can be improved, but you should get the idea.
DECLARE #TAB TABLE
(
DT DATETIME ,
VALUE FLOAT
);
INSERT INTO #TAB
VALUES ( '2015-05-01 06:00:00', 12 ),
( '2015-05-01 06:20:00', 10 ),
( '2015-05-01 06:40:00', 11 ),
( '2015-05-01 07:00:00', 14 ),
( '2015-05-01 07:20:00', 15 ),
( '2015-05-01 07:40:00', 13 ),
( '2015-05-01 08:00:00', 10 ),
( '2015-05-01 08:20:00', 9 ),
( '2015-05-01 08:40:00', 5 ),
( '2015-05-02 06:00:00', 19 ),
( '2015-05-02 06:20:00', 7 ),
( '2015-05-02 06:40:00', 11 ),
( '2015-05-02 07:00:00', 9 ),
( '2015-05-02 07:20:00', 7 ),
( '2015-05-02 07:40:00', 6 ),
( '2015-05-02 08:00:00', 10 ),
( '2015-05-02 08:20:00', 19 ),
( '2015-05-02 08:40:00', 15 ),
( '2015-05-03 06:00:00', 8 ),
( '2015-05-03 06:20:00', 8 ),
( '2015-05-03 06:40:00', 8 ),
( '2015-05-03 07:00:00', 21 ),
( '2015-05-03 07:20:00', 12 ),
( '2015-05-03 07:40:00', 7 ),
( '2015-05-03 08:00:00', 10 ),
( '2015-05-03 08:20:00', 4 ),
( '2015-05-03 08:40:00', 10 );
WITH cteh
AS ( SELECT DT ,
CAST(dt AS DATE) AS D ,
SUM(VALUE) OVER ( PARTITION BY CAST(dt AS DATE),
DATEPART(hh, DT) ) AS S
FROM #TAB
),
ctef
AS ( SELECT * ,
ROW_NUMBER() OVER ( PARTITION BY D ORDER BY S ) AS rn
FROM cteh
)
SELECT D ,
S ,
CAST(DT AS TIME) AS H
FROM ctef
WHERE rn = 1
Output:
D S H
2015-05-01 24 08:00:00.0000000
2015-05-02 22 07:00:00.0000000
2015-05-03 24 06:00:00.0000000
Here's a method that uses a Temp Table (as opposed to the CTE's in the other solutions) to store calculated values and then filters the results to give you your desired output:
-- INSERT CALCULATED GROUPED VALUES INTO TEMP TABLE
SELECT CONVERT(DATE, DT) AS DateVal ,
SUM(VALUE) AS SumVal ,
DATEPART(HOUR, CONVERT(TIME, DT)) AS HourVal
INTO #TEMP_CALC
FROM TAB
GROUP BY CONVERT(DATE, DT) , DATEPART(HOUR, CONVERT(TIME, DT))
-- TAKE THE RELEVANT ROWS
SELECT t.DateVal ,
MIN(t.SumVal) AS SumVal ,
( SELECT TOP 1
HourVal
FROM #TEMP_CALC t2
WHERE t2.DateVal = t.DateVal
AND t2.SumVal = MIN(t.SumVal)
) AS MinHour
FROM #TEMP_CALC t
GROUP BY t.DateVal
ORDER BY DateVal
You can use DATEDIFF to get the time spans from any starting point in time (1990-1-1 in this sample) in hours and days. The use that spans to group and order, and finally use DATEADD with the same starting point to rebuild it:
WITH dates AS (
SELECT CAST(DT AS DATETIME) AS Date, -- cast the value to date
value FROM dbo.TAB AS T
),
ddh AS (SELECT
date,
DATEDIFF(DAY, '1990-1-1', date) AS daySpan, -- days span
DATEDIFF(HOUR, '1990-1-1', date) AS hourSpan, -- hours span
value
FROM dates
),
ddhv AS ( SELECT
daySpan,
hourSpan,
SUM(value) AS sumValues -- sum...
FROM ddh
group BY daySpan, hourSpan -- ...grouped by day & hour
),
ddhvr AS ( SELECT
daySpan,
hourSpan,
sumValues,
-- number rows by hourly sum of the value
ROW_NUMBER() OVER (PARTITION BY daySpan ORDER BY sumValues) AS row
FROM ddhv
)
SELECT
DATEADD(HOUR, hourSpan, '1990-1-1') AS DayHour, -- rebuild the date/hour
sumValues
FROM ddhvr
WHERE row = 1 -- take only the first occurrence for each day
This query has the advantage that you can change the periods, and the starting point easyly. For example you can make your days starts at 6:30 AM instead of at 00:00,so that the compared periods are 6:30 to 7:30, 7:30 to 8:30 and do on. And you can also change the grouping unit, for example, instead of 1 hour it could be half an hour, or 5 minutes or 2 hours. If you need to do do, please, see this SO answer. There you'll see how you can make the grouping by different periods, and get back the period staring point. It's just some simple maths.
I tested my against your fiddle:
with agg as (
select cast(dt as date) as dt, datepart(hh, dt) as hr, sum(VALUE) as sum_val
from TAB
group by cast(dt as date), datepart(hh, dt)
)
select
dt, min(sum_val) as "SUM VAL",
(
select cast(hr as varchar(2)) + ':00' from agg as agg2
where agg2.dt = agg.dt and not exists (
/* select earliest in case of ties */
select 1 from agg as agg3
where agg3.dt = agg2.dt and agg3.sum_val >= agg3.sum_val and agg3.hr > agg2.hr
)
) as "ON HOUR"
from agg
group by dt;