Getting aggregated results by different time intervals

Getting aggregated results by different time intervals - sql

I have a postgres DB with a table containing data according to an id and a timestamp.
The table has several columns with data. I want to create an pgsql function that would allow me to get an aggregation of data according to a time interval.
The table looks something like this:
user_id | created_at | value_a | value_b | value_c | value_d | unique_key
------------+---------------------+---------+---------+---------+---------+------------
1 | 2019-12-16 17:37:07 | 1 | 5 | 0 | 5 | 1
2 | 2019-12-19 15:37:07 | 4 | 7 | 0 | 42 | 2
3 | 2019-12-16 15:37:07 | 20 | 1 | 20 | 143 | 3
2 | 2019-12-18 12:01:32 | 0 | 0 | 5 | 987 | 4
1 | 2019-12-11 14:12:50 | 6 | 0 | 9 | 0 | 5
2 | 2019-12-10 15:37:07 | 1 | 72 | 100 | 90 | 6
1 | 2019-12-20 15:37:07 | 5 | 3 | 56 | 1546 | 7
3 | 2019-12-20 15:37:07 | 30 | 4 | 789 | 3 | 8
4 | 2019-12-01 15:37:07 | 35 | 90 | 0 | 5 | 9
(9 rows)
I want to create the function in a way that I can get a time range (before and after) and an interval so it would then group the data according to the interval (daily for example), group by user_id.
I have managed to create a function with a generate_series that returns the aggregated results, but it ignores some of the data.
The aggregation uses different formulas to get the data.
Most of the answers that I have found managed to select a grouped sum of only one value, and not several, I.E. it returns something along the lines of:
user_id | date | value_a + value_b + value_c + value_c
But in my case I would like to manipulate the data in different ways, for example:
user_id | date | a + b | (a*b)/c | count(a)
etc. (of course I will handle the divide by zero and stuff)..
So the function that I tried to create was something along the lines of:
CREATE OR REPLACE FUNCTION branch_performance_measurements_daily(
IN after DATE,
IN before DATE,
)
RETURNS TABLE (
date_of_sum DATE,
func_a INT,
func_b INT,
func_c INT
)
AS $$
BEGIN
RETURN QUERY
WITH days_series AS (
SELECT d::date day FROM generate_series(after, before, '1 day') day)
SELECT days_series.day AS date_of_sum,
sum(a + b),
sum((a*b)/c),
count(a)
FROM table b
WHERE DATE(b.created_at) = DATE(days_series.day)
GROUP BY days_series.day, b.user_id;
END;
$$ LANGUAGE plpgsql;
Sadly this type of query does not return all the available data in the table according to all dates available..
Is there any way to perhaps point me as to the proper usage of the generate_series for the case that I need?
P.S.
I am aware that the function of the sum won't work, it's just for the example :)
Many thanks in advance!

Welcome to Stack Overflow.
Your functions had a few syntax errors. This is what you might be looking for:
CREATE OR REPLACE FUNCTION branch_performance_measurements_daily(
after DATE, before DATE)
RETURNS TABLE (
date_of_sum DATE, func_a BIGINT,func_b BIGINT, func_c BIGINT) AS $$
BEGIN
RETURN QUERY
WITH days_series AS (
SELECT generate_series(after, before, '1 day') AS d)
SELECT
DATE(ds.d) AS date_of_sum,
sum(value_a + value_b),
COALESCE(sum((value_a*value_b)/NULLIF(value_c,0)),0),
count(value_a) FROM t
JOIN days_series ds ON ds.d = DATE(t.created_at)
GROUP BY ds.d, t.user_id
ORDER BY ds.d;
END;
$$ LANGUAGE plpgsql;
Sample data
CREATE TEMPORARY TABLE t
(user_id INT, created_at date,
value_a int,value_b int,value_c int,value_d int, unique_key int);
INSERT INTO t VALUES
(1,' 2019-12-16 17:37:07',1,5,0,5,1),
(2,' 2019-12-19 15:37:07',4,7,0, 42,2),
(3,' 2019-12-16 15:37:07',20,1,20,143,3),
(2,' 2019-12-18 12:01:32',0,0,5,987,4),
(1,' 2019-12-11 14:12:50',6,0,9,0,5),
(2,' 2019-12-10 15:37:07',1,72,100, 90,6),
(1,' 2019-12-20 15:37:07',5,3,56,1546,7),
(3,' 2019-12-20 15:37:07',30,4,789,3,8),
(4,' 2019-12-01 15:37:07',35, 90,0,5,9);
Testing function
SELECT * FROM branch_performance_measurements_daily('2019-12-01', '2019-12-20');
date_of_sum | func_a | func_b | func_c
-------------+--------+--------+--------
2019-12-01 | 125 | 0 | 1
2019-12-10 | 73 | 0 | 1
2019-12-11 | 6 | 0 | 1
2019-12-16 | 6 | 0 | 1
2019-12-16 | 21 | 1 | 1
2019-12-18 | 0 | 0 | 1
2019-12-19 | 11 | 0 | 1
2019-12-20 | 8 | 0 | 1
2019-12-20 | 34 | 0 | 1
(9 rows)
In case you want to group just by the generated date (not together with the user_id, as your query suggests) just remove the user_id from the GROUP BY clause and you'll get something like this:
date_of_sum | func_a | func_b | func_c
-------------+--------+--------+--------
2019-12-01 | 125 | 0 | 1
2019-12-10 | 73 | 0 | 1
2019-12-11 | 6 | 0 | 1
2019-12-16 | 27 | 1 | 2
2019-12-18 | 0 | 0 | 1
2019-12-19 | 11 | 0 | 1
2019-12-20 | 42 | 0 | 2

Related

SQL Server - Counting total number of days user had active contracts

I want to count the number of days while user had active contract based on table with start and end dates for each service contract. I want to count the time of any activity, no matter if the customer had 1 or 5 contracts active at same time.
+---------+-------------+------------+------------+
| USER_ID | CONTRACT_ID | START_DATE | END_DATE |
+---------+-------------+------------+------------+
| 1 | 14 | 18.02.2021 | 18.04.2022 |
| 1 | 13 | 02.01.2019 | 02.01.2020 |
| 1 | 12 | 01.01.2018 | 01.01.2019 |
| 1 | 11 | 13.02.2017 | 13.02.2019 |
| 2 | 23 | 19.06.2021 | 18.04.2022 |
| 2 | 22 | 01.07.2019 | 01.07.2020 |
| 2 | 21 | 19.01.2019 | 19.01.2020 |
+---------+-------------+------------+------------+
In result I want a table:
+---------+--------------------+
| USER_ID | DAYS_BEEING_ACTIVE |
+---------+--------------------+
| 1 | 1477 |
| 2 | 832 |
+---------+--------------------+
Where
1477 stands by 1053 (days from 13.02.2017 to 02.01.2020 - user had active contracts during this time) + 424 (days from 18.02.2021 to 18.04.2022)
832 stands by 529 (days from 19.01.2019 to 01.07.2020) + 303 (days from 19.06.2021 to 18.04.2022).
I tried some queries with joins, datediff's, case when conditions but nothing worked. I'll be grateful for any help.

If you don't have a Tally/Numbers table (highly recommended), you can use an ad-hoc tally/numbers table
Example or dbFiddle
Select User_ID
,Days = count(DISTINCT dateadd(DAY,N,Start_Date))
from YourTable A
Join ( Select Top 10000 N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1, master..spt_values n2
) B
On N<=DateDiff(DAY,Start_Date,End_Date)
Group By User_ID
Results
User_ID Days
1 1477
2 832

SQL: Complex query with subtraction from different cells

I have two tables and I want to combine their data.
The first table
+------------+-----+------+-------+
| BusinessID | Lat | Long | Stars |
+------------+-----+------+-------+
| abc123 | 32 | 74 | 4.5 |
| abd123 | 32 | 75 | 4 |
| abe123 | 33 | 76 | 3 |
+------------+-----+------+-------+
The second table is:
+------------+-----+------+-------+
| BusinessID | day | time | count |
+------------+-----+------+-------+
| abc123 | 1 | 14 | 5 |
| abc123 | 1 | 15 | 6 |
| abc123 | 2 | 13 | 1 |
| abd123 | 4 | 12 | 4 |
| abd123 | 4 | 13 | 8 |
| abd123 | 5 | 11 | 2 |
+------------+-----+------+-------+
So what I want to do is find all the Businesses that are in a specific radius and have more check ins in the next hour than the current.
So the results are
+------------+
| BusinessID |
+------------+
| abd123 |
| abc123 |
+------------+
Because they have more check-ins in the next hour than the previous (6 > 5, 8 > 4)
What is more it would be helpful if the results where ordered by their difference in check-ins number. Ex. ( 8 - 4 > 6 - 5 )
SELECT *
FROM table2 t2
WHERE t2.BusinessID IN (
SELECT t1.BusinessID
FROM table1 t1
WHERE earth_box(ll_to_earth(32, 74), 4000/1.609) #> ll_to_earth(Lat, Long)
ORDER by earth_distance(ll_to_earth(32, 74), ll_to_earth(Lat, Long)), stars DESC
) AND checkin_day = 1 AND checkin_time = 14;
From the above query I can find the businesses in a radius and then find their check-ins in the specified time. Ex. 14. What I need to do now is to find the number of check-ins in the 15 hour (of the same businesses) and find if the number of the check-ins is greater than it was in the previous time.

I think you want something like this:
SELECT
t1.BusinessID
FROM
table1 t1
JOIN
(SELECT
*,
"count" - LAG("count") OVER (PARTITION BY BusinessID, "day" ORDER BY "time") "grow"
FROM
table2
WHERE
/* Some condition on table2 */) t2
ON t1.BusinessID = t2.BusinessID AND t2.grow > 0
WHERE
/* Some condition on table1 */
ORDER BY
t2.grow DESC;

SQL moving aggregate SUM without partial results

Assume I have this schema (tested on postgresql) where the 'Scorelines' relation contains results of sport matches. (kickoff is a TIMESTAMP but replaced by INT for readability)
SQLFiddle here: http://sqlfiddle.com/#!12/52475/3
CREATE TABLE Scorelines (
team TEXT,
kickoff INT,
scored INT,
conceded INT
);
Now I want to produce another column 'three_matches_scored' that contains the sum of the points scored
over the 3 preceding game (determined by kickoff) of the same team. I have this:
SELECT team, kickoff, scored, conceded, SUM(scored) OVER three_matches AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
This works beautifully so far, except that I get values starting from the second game. Example:
| TEAM | KICKOFF | SCORED | CONCEDED | THREE_MATCHES_SCORED |
|------|---------|--------|----------|----------------------|
| A | 1 | 1 | 0 | (null) |
| B | 2 | 1 | 1 | (null) |
| A | 3 | 1 | 1 | 1 |
| A | 4 | 3 | 0 | 2 |
| B | 4 | 1 | 4 | 1 |
| A | 6 | 0 | 2 | 5 |
| B | 6 | 4 | 2 | 2 |
| B | 8 | 1 | 2 | 6 |
| B | 10 | 1 | 1 | 6 |
| A | 11 | 2 | 1 | 4 |
I want the column 'three_matches_scored' to be (null) for the first 3 games because there are no 3 results to sum up. How can I achieve this?
I'd prefer simple understandable solutions, performance is not critical for this particular case.
My only idea right now, is to define a stored function SUM3, that results in (null) with less than 3 values to add up. But I never defined a function in SQL and can't seem to figure it out.

You can use a case statement to null the rows where there are less than 3 games:
SELECT team, kickoff, scored, conceded,
CASE WHEN COUNT(scored) OVER three_matches = 3
THEN SUM(scored) OVER three_matches
ELSE NULL
END AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
Output:
team | kickoff | scored | conceded | three_matches_scored
------+---------+--------+----------+----------------------
A | 1 | 1 | 0 |
B | 2 | 1 | 1 |
A | 3 | 1 | 1 |
A | 4 | 3 | 0 |
B | 4 | 1 | 4 |
A | 6 | 0 | 2 | 5
B | 6 | 4 | 2 |
B | 8 | 1 | 2 | 6
B | 10 | 1 | 1 | 6
A | 11 | 2 | 1 | 4
(10 rows)

See harmics answer above.
(my first solution, just for reference)
Solution with user defined aggregate:
CREATE TYPE intermediate_sum AS (
sum INT,
count INT
);
CREATE FUNCTION sum_sfunc(intermediate_sum, INTEGER) RETURNS intermediate_sum AS
$$ SELECT $2 + $1.sum AS sum, $1.count - 1 AS count $$ LANGUAGE SQL;
CREATE FUNCTION sum_ffunc(intermediate_sum) RETURNS INTEGER AS
$$ SELECT (CASE WHEN $1.count > 1 THEN null
WHEN $1.count = 0 THEN $1.sum
END)
$$ LANGUAGE SQL;
CREATE AGGREGATE sum3(INTEGER) (
sfunc = sum_sfunc,
finalfunc = sum_ffunc,
stype = intermediate_sum,
initcond = '(0,3)'
);
The aggregate SUM3 wants at least 3 values, otherwise it returns (null). One can define other aggreates like SUM4 by changing the initcond, for example to '(0,4)'.

SQL Combine two tables with two parameters

I searched forum for 1h and didn't find nothing similar.
I have this problem: I want to compare two colums ID and DATE if they are the same in both tables i want to put number from table 2 next to it. But if it is not the same i want to fill yearly quota on the date. I am working in Access.
table1
id|date|state_on_date
1|30.12.2013|23
1|31.12.2013|25
1|1.1.2014|35
1|2.1.2014|12
2|30.12.2013|34
2|31.12.2013|65
2|1.1.2014|43
table2
id|date|year_quantity
1|31.12.2013|100
1|31.12.2014|150
2|31.12.2013|200
2|31.12.2014|300
I want to get:
table 3
id|date|state_on_date|year_quantity
1|30.12.2013|23|100
1|31.12.2013|25|100
1|1.1.2014|35|150
1|2.1.2014|12|150
2|30.12.2013|34|200
2|31.12.2013|65|200
2|1.1.2014|43|300
I tried joins and reading forums but didn't find solution.

Are you looking for this?
SELECT id, date, state_on_date,
(
SELECT TOP 1 year_quantity
FROM table2
WHERE id = t.id
AND date >= t.date
ORDER BY date
) AS year_quantity
FROM table1 t
Output:
| ID | DATE | STATE_ON_DATE | YEAR_QUANTITY |
|----|------------|---------------|---------------|
| 1 | 2013-12-30 | 23 | 100 |
| 1 | 2013-12-31 | 25 | 100 |
| 1 | 2014-01-01 | 35 | 150 |
| 1 | 2014-01-02 | 12 | 150 |
| 2 | 2013-12-30 | 34 | 200 |
| 2 | 2013-12-31 | 65 | 200 |
| 2 | 2014-01-01 | 43 | 300 |
Here is SQLFiddle demo It's for SQL Server but should work just fine in MS Accesss.

add a new entry if doesn't exist in mysql table

I have a table of below structure.
mysql> desc depot;
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| recd | date | YES | | NULL | |
| id | int(11) | YES | | NULL | |
+-------+----------+------+-----+---------+-------+
Currently I have records in the below manner.
mysql> select * from depot;
+---------------------+------+
| recd | id |
+---------------------+------+
| 2012-07-09 | 33 |
| 2012-07-11 | 32 |
| 2012-07-15 | 32 |
+---------------------+------+
3 rows in set (0.00 sec)
I need the records to print the query in the below manner, keeping the missed entries of dates of a month (say July-01 to July-31) and having 0 to the value id corresponding missed dates.
select < a magical query >;
+------------+------+
| recd | id |
+------------+------+
2012-07-01 0
2012-07-02 0
2012-07-03 0
2012-07-04 0
2012-07-05 0
2012-07-06 0
2012-07-07 0
2012-07-08 0
2012-07-09 33
2012-07-10 0
2012-07-11 32
2012-07-12 0
2012-07-13 0
2012-07-14 0
2012-07-15 32
2012-07-16 0
2012-07-17 0
2012-07-18 0
2012-07-19 0
2012-07-20 0
2012-07-21 0
2012-07-22 0
2012-07-23 0
2012-07-24 0
2012-07-25 0
2012-07-26 0
2012-07-27 0
2012-07-28 0
2012-07-29 0
2012-07-30 0
2012-07-31 0

You obviously need a second table with a list of possible dates and then you should select from that table with a left join to the one you already have.

A calendar table makes your query and your life easier. In standard SQL this query will give you what you're looking for.
select c.cal_date, coalesce(d.id, 0) id
from calendar c
left join depot d on d.recd = c.cal_date
where c.cal_date between '2012-07-01' and '2012-07-31'
order by c.cal_date
A minimal calendar table just needs a date column.
create table calendar (
cal_date date primary key
);
insert into calendar values
('2012-07-01'),
('2012-07-02'),
...
('2012-07-31');
Instead of writing INSERT statements, you can generate data with a spreadsheet or a scripting program, and load the rows through your database's bulk loader.
I've also written about a more useful calendar table on StackOverflow.

Thanks mates!! I was ambitious on any SQLs if existing. But yeah its reluctant procedure..
Found a workaround as it was clinging long time
BASE TABLE
CREATE TABLE `deopt` (
`recd` datetime DEFAULT NULL,
`id` int(11) DEFAULT NULL
) ENGINE=InnoDB;
Seed records to the base table
insert into deopt values ('2012-07-09 23:08:54',22);
insert into deopt values ('2012-07-11 23:08:54',22);
insert into deopt values ('2012-07-11 23:08:54',2222);
insert into deopt values ('2012-07-12 23:08:54',22);
insert into deopt values ('2012-07-14 23:08:54',245);
Create a table for dates of a month
CREATE TABLE seq_dates
(
sdate DATETIME NOT NULL,
);
Create a Stored Procedure to create records for a called month
delimiter //
DROP PROCEDURE IF EXISTS sp_init_dates;
CREATE PROCEDURE sp_init_dates (IN p_fdate DATETIME, IN p_tdate DATETIME)
BEGIN
DECLARE v_thedate DATETIME;
TRUNCATE TABLE seq_dates;
SET v_thedate = p_fdate;
WHILE (v_thedate <= p_tdate) DO
INSERT INTO seq_dates (sdate)
VALUES (v_thedate);
SET v_thedate = DATE_ADD(v_thedate, INTERVAL 1 DAY);
END WHILE;
END;
delimiter ;
Call the procedure for July month with starting and ending values to be seeded to seq_dates table.
call sp_init_dates ('2012-07-01','2012-07-31');
RESULT QUERY - To fetch records of all dates in a month and its corresponding ids keeping 0 inplace of null for ids.
select date(seq_dates.sdate),coalesce (deopt.id,0) from seq_dates LEFT JOIN deopt ON date(deopt.recd)=date(seq_dates.sdate);
+-----------------------+-----------------------+
| date(seq_dates.sdate) | coalesce (deopt.id,0) |
+-----------------------+-----------------------+
| 2012-07-01 | 0 |
| 2012-07-02 | 0 |
| 2012-07-03 | 0 |
| 2012-07-04 | 0 |
| 2012-07-05 | 0 |
| 2012-07-06 | 0 |
| 2012-07-07 | 0 |
| 2012-07-08 | 0 |
| 2012-07-09 | 22 |
| 2012-07-09 | 22 |
| 2012-07-10 | 0 |
| 2012-07-11 | 22 |
| 2012-07-11 | 2222 |
| 2012-07-11 | 22 |
| 2012-07-11 | 2222 |
| 2012-07-12 | 22 |
| 2012-07-13 | 0 |
| 2012-07-14 | 245 |
| 2012-07-15 | 0 |
| 2012-07-16 | 0 |
| 2012-07-17 | 0 |
| 2012-07-18 | 0 |
| 2012-07-19 | 0 |
| 2012-07-20 | 0 |
| 2012-07-21 | 0 |
| 2012-07-22 | 0 |
| 2012-07-23 | 0 |
| 2012-07-24 | 0 |
| 2012-07-25 | 0 |
| 2012-07-26 | 0 |
| 2012-07-27 | 0 |
| 2012-07-28 | 0 |
| 2012-07-29 | 0 |
| 2012-07-30 | 0 |
| 2012-07-31 | 0 |
+-----------------------+-----------------------+
35 rows in set (0.00 sec)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Getting aggregated results by different time intervals - sql

Related

SQL Server - Counting total number of days user had active contracts

SQL: Complex query with subtraction from different cells

SQL moving aggregate SUM without partial results

SQL Combine two tables with two parameters

add a new entry if doesn't exist in mysql table

Categories

Resources