Related
I have a postgres DB with a table containing data according to an id and a timestamp.
The table has several columns with data. I want to create an pgsql function that would allow me to get an aggregation of data according to a time interval.
The table looks something like this:
user_id | created_at | value_a | value_b | value_c | value_d | unique_key
------------+---------------------+---------+---------+---------+---------+------------
1 | 2019-12-16 17:37:07 | 1 | 5 | 0 | 5 | 1
2 | 2019-12-19 15:37:07 | 4 | 7 | 0 | 42 | 2
3 | 2019-12-16 15:37:07 | 20 | 1 | 20 | 143 | 3
2 | 2019-12-18 12:01:32 | 0 | 0 | 5 | 987 | 4
1 | 2019-12-11 14:12:50 | 6 | 0 | 9 | 0 | 5
2 | 2019-12-10 15:37:07 | 1 | 72 | 100 | 90 | 6
1 | 2019-12-20 15:37:07 | 5 | 3 | 56 | 1546 | 7
3 | 2019-12-20 15:37:07 | 30 | 4 | 789 | 3 | 8
4 | 2019-12-01 15:37:07 | 35 | 90 | 0 | 5 | 9
(9 rows)
I want to create the function in a way that I can get a time range (before and after) and an interval so it would then group the data according to the interval (daily for example), group by user_id.
I have managed to create a function with a generate_series that returns the aggregated results, but it ignores some of the data.
The aggregation uses different formulas to get the data.
Most of the answers that I have found managed to select a grouped sum of only one value, and not several, I.E. it returns something along the lines of:
user_id | date | value_a + value_b + value_c + value_c
But in my case I would like to manipulate the data in different ways, for example:
user_id | date | a + b | (a*b)/c | count(a)
etc. (of course I will handle the divide by zero and stuff)..
So the function that I tried to create was something along the lines of:
CREATE OR REPLACE FUNCTION branch_performance_measurements_daily(
IN after DATE,
IN before DATE,
)
RETURNS TABLE (
date_of_sum DATE,
func_a INT,
func_b INT,
func_c INT
)
AS $$
BEGIN
RETURN QUERY
WITH days_series AS (
SELECT d::date day FROM generate_series(after, before, '1 day') day)
SELECT days_series.day AS date_of_sum,
sum(a + b),
sum((a*b)/c),
count(a)
FROM table b
WHERE DATE(b.created_at) = DATE(days_series.day)
GROUP BY days_series.day, b.user_id;
END;
$$ LANGUAGE plpgsql;
Sadly this type of query does not return all the available data in the table according to all dates available..
Is there any way to perhaps point me as to the proper usage of the generate_series for the case that I need?
P.S.
I am aware that the function of the sum won't work, it's just for the example :)
Many thanks in advance!
Welcome to Stack Overflow.
Your functions had a few syntax errors. This is what you might be looking for:
CREATE OR REPLACE FUNCTION branch_performance_measurements_daily(
after DATE, before DATE)
RETURNS TABLE (
date_of_sum DATE, func_a BIGINT,func_b BIGINT, func_c BIGINT) AS $$
BEGIN
RETURN QUERY
WITH days_series AS (
SELECT generate_series(after, before, '1 day') AS d)
SELECT
DATE(ds.d) AS date_of_sum,
sum(value_a + value_b),
COALESCE(sum((value_a*value_b)/NULLIF(value_c,0)),0),
count(value_a) FROM t
JOIN days_series ds ON ds.d = DATE(t.created_at)
GROUP BY ds.d, t.user_id
ORDER BY ds.d;
END;
$$ LANGUAGE plpgsql;
Sample data
CREATE TEMPORARY TABLE t
(user_id INT, created_at date,
value_a int,value_b int,value_c int,value_d int, unique_key int);
INSERT INTO t VALUES
(1,' 2019-12-16 17:37:07',1,5,0,5,1),
(2,' 2019-12-19 15:37:07',4,7,0, 42,2),
(3,' 2019-12-16 15:37:07',20,1,20,143,3),
(2,' 2019-12-18 12:01:32',0,0,5,987,4),
(1,' 2019-12-11 14:12:50',6,0,9,0,5),
(2,' 2019-12-10 15:37:07',1,72,100, 90,6),
(1,' 2019-12-20 15:37:07',5,3,56,1546,7),
(3,' 2019-12-20 15:37:07',30,4,789,3,8),
(4,' 2019-12-01 15:37:07',35, 90,0,5,9);
Testing function
SELECT * FROM branch_performance_measurements_daily('2019-12-01', '2019-12-20');
date_of_sum | func_a | func_b | func_c
-------------+--------+--------+--------
2019-12-01 | 125 | 0 | 1
2019-12-10 | 73 | 0 | 1
2019-12-11 | 6 | 0 | 1
2019-12-16 | 6 | 0 | 1
2019-12-16 | 21 | 1 | 1
2019-12-18 | 0 | 0 | 1
2019-12-19 | 11 | 0 | 1
2019-12-20 | 8 | 0 | 1
2019-12-20 | 34 | 0 | 1
(9 rows)
In case you want to group just by the generated date (not together with the user_id, as your query suggests) just remove the user_id from the GROUP BY clause and you'll get something like this:
date_of_sum | func_a | func_b | func_c
-------------+--------+--------+--------
2019-12-01 | 125 | 0 | 1
2019-12-10 | 73 | 0 | 1
2019-12-11 | 6 | 0 | 1
2019-12-16 | 27 | 1 | 2
2019-12-18 | 0 | 0 | 1
2019-12-19 | 11 | 0 | 1
2019-12-20 | 42 | 0 | 2
I have a database table 'table1' as follows:
f_key | begin | counts|
1 | 2018-10-04 | 15 |
1 | 2018-10-06 | 20 |
1 | 2018-10-08 | 34 |
1 | 2018-10-09 | 56 |
I have another database table 'table2' as follows:
f_key | p_time | percent|
1 | 2018-10-05 | 80 |
1 | 2018-10-07 | 90 |
1 | 2018-10-08 | 70 |
1 | 2018-10-10 | 60 |
The tables can be joined by the f_key field.
I want to get a combined table as shown below:
If the begin time is earlier than any of the p_time then the p_time value in the combined table would be the same as begin time and the percent value would be 50. (As shown in row 1 in the following table)
If the begin time is later than any of the p_time then the p_time value in the combined table would be the very next available p_time and the percent value would be the corresponding value of the selected p_time.
(As shown in row 2, 3 and 4 in the following table)
row | f_key | begin | counts| p_time | percent|
1 | 1 | 2018-10-04 | 15 | 2018-10-04 | 50 |
2 | 1 | 2018-10-06 | 20 | 2018-10-05 | 80 |
3 | 1 | 2018-10-08 | 34 | 2018-10-07 | 90 |
4 | 1 | 2018-10-09 | 56 | 2018-10-08 | 70 |
You can try to use row_number window function to make row number which is the closest row from table1 by begin.
then use coalesce function to let begin time is earlier than any of the p_time then the p_time value in the combined table would be the same as begin time and the percent value would be 50
PostgreSQL 9.6 Schema Setup:
CREATE TABLE table1(
f_key INT,
begin DATE,
counts INT
);
INSERT INTO table1 VALUES (1,'2018-10-04',15);
INSERT INTO table1 VALUES (1,'2018-10-06',20);
INSERT INTO table1 VALUES (1,'2018-10-08',34);
INSERT INTO table1 VALUES (1,'2018-10-09',56);
CREATE TABLE table2(
f_key INT,
p_time DATE,
percent INT
);
INSERT INTO table2 VALUES (1, '2018-10-05',80);
INSERT INTO table2 VALUES (1, '2018-10-07',90);
INSERT INTO table2 VALUES (1, '2018-10-08',70);
INSERT INTO table2 VALUES (1, '2018-10-10',60);
Query 1:
SELECT ROW_NUMBER() OVER(ORDER BY begin) "row",
t1.f_key,
t1.counts,
coalesce(t1.p_time,t1.begin) p_time,
coalesce(t1.percent,50) percent
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY t1.begin,t1.f_key order by t2.p_time desc) rn,
t2.p_time,
t2.percent,
t1.counts,
t1.f_key,
t1.begin
FROM table1 t1
LEFT JOIN table2 t2 ON t1.f_key = t2.f_key and t1.begin > t2.p_time
)t1
WHERE rn = 1
Results:
| row | f_key | counts | p_time | percent |
|-----|-------|--------|------------|---------|
| 1 | 1 | 15 | 2018-10-04 | 50 |
| 2 | 1 | 20 | 2018-10-05 | 80 |
| 3 | 1 | 34 | 2018-10-07 | 90 |
| 4 | 1 | 56 | 2018-10-08 | 70 |
I have a 1 table in a db that stored Incoming, Outgoing and Net values for various Account Codes over time. Although there is a date field the sequence of events per Account Code is based on the "Version" number where 0 = original record for each Account Code and it increments by 1 after each change to that Account Code.
The Outgoing and Incoming values are stored in the db as cumulative values rather than the individual transaction value but I am looking for a way to Select * From this table and return the individual amounts as opposed to the cumulative.
Below are test scripts of table and data, and also 2 examples.
If i Select where code = '123' in the test table I currently get this (values are cumulative);
+------+------------+---------+---------+---------+-----+
| Code | Date | Version | Incoming| Outgoing| Net |
+------+------------+---------+---------+---------+-----+
| 123 | 01/01/2018 | 0 | 100 | 0 | 100 |
| 123 | 07/01/2018 | 1 | 150 | 0 | 150 |
| 123 | 09/01/2018 | 2 | 150 | 100 | 50 |
| 123 | 14/01/2018 | 3 | 200 | 100 | 100 |
| 123 | 18/01/2018 | 4 | 200 | 175 | 25 |
| 123 | 23/01/2018 | 5 | 225 | 175 | 50 |
| 123 | 30/01/2018 | 6 | 225 | 225 | 0 |
+------+------------+---------+---------+---------+-----+
This is what I would like to see (each individual transaction);
+------+------------+---------+----------+----------+------+
| Code | Date | Version | Incoming | Outgoing | Net |
+------+------------+---------+----------+----------+------+
| 123 | 01/01/2018 | 0 | 100 | 0 | 100 |
| 123 | 07/01/2018 | 1 | 50 | 0 | 50 |
| 123 | 09/01/2018 | 2 | 0 | 100 | -100 |
| 123 | 14/01/2018 | 3 | 50 | 0 | 50 |
| 123 | 18/01/2018 | 4 | 0 | 75 | -75 |
| 123 | 23/01/2018 | 5 | 25 | 0 | 25 |
| 123 | 30/01/2018 | 6 | 0 | 50 | -50 |
+------+------------+---------+----------+----------+------+
If I had the individual transaction values and wanted to report on the cumulative, I would use an OVER PARTITION BY, but is there an opposite to that?
I am not looking to redesign the create table or the process in which it is stored, I am just looking for a way to report on this from our MI environment.
Note: I've added other random Account Codes into this to emphasis how the data is not ordered by Code or Version, but by Date.
thanks in advance for any help.
USE [tempdb];
IF EXISTS ( SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'Table1'
AND TABLE_SCHEMA = 'dbo')
DROP TABLE [dbo].[Table1];
GO
CREATE TABLE [dbo].[Table1]
(
[Code] CHAR(3)
,[Date] DATE
,[Version] CHAR(3)
,[Incoming] DECIMAL(20,2)
,[Outgoing] DECIMAL(20,2)
,[Net] DECIMAL(20,2)
);
GO
INSERT INTO [dbo].[Table1] VALUES
('123','2018-01-01','0','100','0','100'),
('456','2018-01-02','0','50','0','50'),
('789','2018-01-03','0','0','0','0'),
('456','2018-01-04','1','100','0','100'),
('456','2018-01-05','2','150','0','150'),
('789','2018-01-06','1','50','50','0'),
('123','2018-01-07','1','150','0','150'),
('456','2018-01-08','3','200','0','200'),
('123','2018-01-09','2','150','100','50'),
('789','2018-01-10','2','0','0','0'),
('456','2018-01-11','4','225','0','225'),
('789','2018-01-12','3','75','25','50'),
('987','2018-01-13','0','0','50','-50'),
('123','2018-01-14','3','200','100','100'),
('654','2018-01-15','0','100','0','100'),
('456','2018-01-16','5','250','0','250'),
('987','2018-01-17','1','50','50','0'),
('123','2018-01-18','4','200','175','25'),
('789','2018-01-19','4','100','25','75'),
('987','2018-01-20','2','150','125','25'),
('321','2018-01-21','0','100','0','100'),
('654','2018-01-22','1','0','0','0'),
('123','2018-01-23','5','225','175','50'),
('321','2018-01-24','1','100','50','50'),
('789','2018-01-25','5','100','50','50'),
('987','2018-01-26','3','150','150','0'),
('456','2018-01-27','6','250','250','0'),
('456','2018-01-28','7','270','250','20'),
('321','2018-01-29','2','100','100','0'),
('123','2018-01-30','6','225','225','0'),
('987','2018-01-31','4','175','150','25')
;
GO
SELECT *
FROM [dbo].[Table1]
WHERE [Code] = '123'
GO;
USE [tempdb];
IF EXISTS ( SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'Table1'
AND TABLE_SCHEMA = 'dbo')
DROP TABLE [dbo].[Table1];
GO;
}
Just use lag():
select Evt, Date, Version,
(Loss - lag(Loss, 1, 0) over (partition by evt order by date)) as incoming,
(Rec - lag(Rec, 1, 0) over (partition by evt order by date)) as outgoing,
(Net - lag(Net, 1, 0) over (partition by evt order by date)) as net
from [dbo].[Table1];
What I thought was going to be a fairly easy task is becoming a lot more difficult than I expected. We have several tasks that get performed sometimes several times per day, so we have a table that gets a row added whenever a user performs the task. What I need is a snapshot of the month with the initials and time of the person that did the task like this:
The 'activity log' table is pretty simple, it just has the date/time the task was performed along with the user that did it and the scheduled time (the "Pass Time" column in the image); this is the table I need to flatten out into days of the week.
Each 'order' can have one or more 'pass times' and each pass time can have zero or more initials for that day. For example, for pass time 8:00, it can be done several times during that day or not at all.
I have tried standard joins to get the orders and the scheduled pass times with no issues, but getting the days of the week is escaping me. I have tried creating a function to get all the initials for the day and just creating
'select FuncCall() as 1, FuncCall() as 2', etc. for each day of the week but that is a real performance suck.
Does anyone know of a better technique?
Update: I think the comment about PIVOT looks promising, but not quite sure because everything I can find uses an aggregate function in the PIVOT part. So if I have the following table:
create table #MyTable (OrderName nvarchar(10),DateDone date, TimeDone time, Initials nvarchar(4), PassTime nvarchar(8))
insert into #MyTable values('Order 1','2018/6/1','2:00','ABC','1st Pass')
insert into #MyTable values('Order 1','2018/6/1','2:20','DEF','1st Pass')
insert into #MyTable values('Order 1','2018/6/1','4:40','XYZ','2nd Pass')
insert into #MyTable values('Order 1','2018/6/3','5:00','ABC','1st Pass')
insert into #MyTable values('Order 1','2018/6/4','4:00','QXY','2nd Pass')
insert into #MyTable values('Order 1','2018/6/10','2:00','ABC','1st Pass')
select * from #MyTable
pivot () -- Can't figure out what goes here since all examples I see have an aggregate function call such as AVG...
drop table #MyTable
I don't see how to get this output since I am not aggregating anything other than the initials column:
Something like this?
DECLARE #taskTable TABLE(ID INT IDENTITY,Task VARCHAR(100),TaskPerson VARCHAR(100),TaskDate DATETIME);
INSERT INTO #taskTable VALUES
('Task before June 2018','AB','2018-05-15T12:00:00')
,('Task 1','AB','2018-06-03T13:00:00')
,('Task 1','CD','2018-06-04T14:00:00')
,('Task 2','AB','2018-06-05T15:00:00')
,('Task 1','CD','2018-06-06T16:00:00')
,('Task 1','EF','2018-06-06T17:00:00')
,('Task 1','EF','2018-06-06T18:00:00')
,('Task 2','GH','2018-06-07T19:00:00')
,('Task 1','CD','2018-06-07T20:00:00')
,('After June 2018','CD','2018-07-15T21:00:00');
SELECT p.*
FROM
(
SELECT t.Task
,ROW_NUMBER() OVER(PARTITION BY t.Task,CAST(t.TaskDate AS DATE) ORDER BY t.TaskDate) AS Taskindex
,CONCAT(t.TaskPerson,' ',CONVERT(VARCHAR(5),t.TaskDate,114)) AS Content
,DAY(TaskDate) AS ColumnName
FROM #taskTable t
WHERE YEAR(t.TaskDate)=2018 AND MONTH(t.TaskDate)=6
) tbl
PIVOT
(
MAX(Content) FOR ColumnName IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]
,[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]
,[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])
) P
ORDER BY P.Task,Taskindex;
The result
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task | Taskindex | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 1 | NULL | NULL | AB 13:00 | CD 14:00 | NULL | CD 16:00 | CD 20:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 2 | NULL | NULL | NULL | NULL | NULL | EF 17:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 3 | NULL | NULL | NULL | NULL | NULL | EF 18:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 2 | 1 | NULL | NULL | NULL | NULL | AB 15:00 | NULL | GH 19:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
The first trick is, to use the day's index (DAY()) as column name. The second trick is the ROW_NUMBER(). This will add a running index per task and day thus replicating the rows per index. Otherwise you'd get just one entry per day.
You input tables will be more complex, but I think this shows the principles...
UPDATE: So we have to get it even slicker :-D
WITH prepareData AS
(
SELECT t.Task
,t.TaskPerson
,t.TaskDate
,CONVERT(VARCHAR(10),t.TaskDate,126) AS TaskDay
,DAY(t.TaskDate) AS TaskDayIndex
,CONVERT(VARCHAR(5),t.TaskDate,114) AS TimeContent
FROM #taskTable t
WHERE YEAR(t.TaskDate)=2018 AND MONTH(t.TaskDate)=6
)
SELECT p.*
FROM
(
SELECT t.Task
,STUFF((
SELECT ', ' + CONCAT(x.TaskPerson,' ',TimeContent)
FROM prepareData AS x
WHERE x.Task=t.Task
AND x.TaskDay= t.TaskDay
ORDER BY x.TaskDate
FOR XML PATH(''),TYPE
).value(N'.',N'nvarchar(max)'),1,2,'') AS Content
,t.TaskDayIndex
FROM prepareData t
GROUP BY t.Task, t.TaskDay,t.TaskDayIndex
) p--tbl
PIVOT
(
MAX(Content) FOR TaskDayIndex IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]
,[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]
,[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])
) P
ORDER BY P.Task;
The result
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task 1 | NULL | NULL | AB 13:00 | CD 14:00 | NULL | CD 16:00, EF 17:00, EF 18:00 | CD 20:00 | NULL |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task 2 | NULL | NULL | NULL | NULL | AB 15:00 | NULL | GH 19:00 | NULL |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
This will use a well discussed XML trick within a correlated sub-query in order to get all common entries together as one. With this united content you can go the normal PIVOT path. The aggregate will not compute anything, as there is - for sure - just one value per cell.
I'm working on the following presto/sql query using inline filter to get side by side comparison of current date range vs weeks ago data.
In my case query current date range is 2017-09-13 to 2017-09-14.
So far I'm able to get the following results, but unfortunately this is not what I want.
Any kind of help would be greatly appreciated.
SELECT
DATE_TRUNC('day',DATE_PARSE(CAST(sample.datep AS VARCHAR),'%Y%m%d')) AS date,
CAST(SUM(sample.page_views) FILTER (WHERE sample.datep BETWEEN 20170913 AND 20170914) AS DOUBLE) AS page_views,
CAST(SUM(sample.page_views) FILTER (WHERE sample.datep BETWEEN 20170906 AND 20170907) AS DOUBLE) AS page_views_weeks_ago
FROM
sample
WHERE
(
datep BETWEEN 20170906 AND 20170914
)
GROUP BY
1
ORDER BY
1 ASC
LIMIT 50
Actual result:
+------------+------------+----------------------+
| date | page_views | page_views_weeks_ago |
+------------+------------+----------------------+
| 2017-09-06 | 0 | 990,929 |
| 2017-09-07 | 0 | 913,802 |
| 2017-09-08 | 0 | 0 |
| 2017-09-09 | 0 | 0 |
| 2017-09-10 | 0 | 0 |
| 2017-09-11 | 0 | 0 |
| 2017-09-12 | 0 | 0 |
| 2017-09-13 | 1,507,715 | 0 |
| 2017-09-14 | 48,625 | 0 |
+------------+------------+----------------------+
Expected result:
+------------+------------+----------------------+
| date | page_views | page_views_weeks_ago |
+------------+------------+----------------------+
| 2017-09-13 | 1,507,715 | 990,929 |
| 2017-09-14 | 48,625 | 913,802 |
+------------+------------+----------------------+
You can achieve with joining a table with itself as a previous day. For brevity, I assume that we have a date field so that date substructions can be done easily.
SELECT date,
SUM(curr.page_views) AS page_views,
SUM(prev.page_views) AS page_views_weeks_ago
FROM sample curr
JOIN sample prev ON curr.date - 7 = prev.date
GROUP BY 1
ORDER BY 1 ASC