Group by dynamic time period - sql

For my problem I'll try to create a simplified example.
Table "order":
ID | date | client | product | product_limit_period (in months)
1 | 2015-01-01 | Bob | table | 1
2 | 2015-01-31 | Bob | table | 1
3 | 2015-02-01 | Bob | table | 1
4 | 2015-01-01 | Mary | lamb | 12
5 | 2015-06-01 | Mary | lamb | 12
6 | 2016-01-01 | Mary | lamb | 12
7 | 2016-12-31 | Mary | lamb | 12
This is the result, I'd like to get:
client | product | group | count
Bob | table | 1 | 2 #ID 1, 2
Bob | table | 2 | 1 #ID 3
Mary | lamb | 3 | 2 #ID 4, 5
Mary | lamb | 4 | 2 #ID 6, 7
Every product has a limit and a limit period (in months). I need to be able to see if there are any clients that have ordered a product more than its limit allows in a certain period. The period in months might be 1 month or several years in months. It is possible that the period is 1 month, 12 months, 24 months, ... until 108 months (9 years).
I feel like I need to use some combination of window functions and group by. But I haven't figured out how.
I'm using postgres 9.1. Please let me know if there is any more information I should provide.
Any help is appreciated, even just pointing me to the right direction!
Edit:
To clarify how the grouping works: The limit period starts with the first order. Bob's first order is 2015-01-01 and so this period ends with 2015-01-31. 2015-02-01 starts a second period. A period always starts with the first day of a month and ends with the last day of a month.

no need to complicate with both window and group by, just add case to either a window or group, like here:
t=# select
client
, product
, count(1)
, string_agg(id::text,',')
from so44
group by
client
, product
, date_trunc(case when product_limit_period = 1 then 'month' else 'year' end,date);
client | product | count | string_agg
----------+-----------+-------+------------
Bob | table | 2 | 1,2
Bob | table | 1 | 3
Mary | lamb | 2 | 4,5
Mary | lamb | 2 | 6,7
(4 rows)
sample:
t=# create table so44 (i int,"date" date,client text,product text,product_limit_period int);
CREATE TABLE
t=# copy so44 from stdin delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1 | 2015-01-01 | Bob | table | 1
>> 2 | 2015-01-31 | Bob | table | 1
3 | 2015-02-01 | Bob | table | 1
4 | 2015-01-01 | Mary | lamb | 12
5 | 2015-06-01 | Mary | lamb | 12
6 | 2016-01-01 | Mary | lamb | 12
7 | 2016-12-31 | Mary | lamb | 12>> >> >> >> >>
>> \.
COPY 7

Related

Find Aggregated Data Between Two Dates in Two Tables Where One is Updated Weekly and Other is Updated Hourly

I have data in two different tables, one is updated every week or once in the middle of the week if needed, and the other table is updated every hour or so because it has more data. The first table, can be seen as
agent_id | rank | ranking_date
---------------------------
1 | 1 | 2022-03-21
2 | 2 | 2022-03-21
1 | 4 | 2022-03-14
2 | 3 | 2022-03-14
1 | 2 | 2022-03-10
And the second table contains detailed information on sales.
agent_id | call_id | talk_time | product_sold | amount | call_date
------------------------------------------------------------------
1 | 1 | 13 | 1 | 53 |2022-03-10
1 | 2 | 24 | 2 | 2 |2022-03-10
2 | 3 | 43 | 4 | 11 |2022-03-10
1 | 4 | 31 | - | 0 |2022-03-10
2 | 5 | 12 | - | 0 |2022-03-10
1 | 6 | 11 | - | 0 |2022-03-11
1 | 7 | 35 | 2 | 79 |2022-03-11
2 | 8 | 76 | - | 0 |2022-03-11
1 | 9 | 42 | 1 | 23 |2022-03-11
2 | 10 | 69 | - | 0 |2022-03-11
How can I merge the two tables to get their aggregated information? Remember the ranks change at the beginning of every week, and the sales happen every day. But the rankings can also be changed in the middle of the week if needed. So what I am trying to get is created an aggregated table for understanding the sales by each agent. Something like this
agent_id | rank | ranking_date | total_calls_handled | total_talktime | total_amount
------------------------------------------------------------------------------------
1 | 1 | 2022-03-21 | 100 | 875 | 3000 (this is 3/21 - today)
2 | 2 | 2022-03-21 | 120 | 576 | 3689 (this is 3/21 - today)
1 | 4 | 2022-03-14 | 210 | 246 | 1846 (this is 3/14 - 3/21)
2 | 3 | 2022-03-14 | 169 | 693 | 8562 (this is 3/14 - 3/21)
1 | 2 | 2022-03-10 | 201 | 559 | 1749 (this is 3/7 - 3/10)
So the data is aggregated for each agent from 7-10, 10 - 14, then 14-21. Also, if say, the latest ranking date is 2022-03-21, and today is 2022-03-23, the query returns aggregation until today.
[Edit]: added table and data details
Table and data details:
Rankings table:
agent_id: unique_id of the agent
rank: rank of an agent assigned updated every Monday or if needed
ranking_date: date when agent's ranking was last updated (Automatically every Monday or if needed)
Sales Table:
agent_id: unique_id of the agent
call_id: unique_id for a call
talk_time: duration of the call
product_sold: unique_id of the product sold (- if agent was unsuccessful to sell)
amount: commission earned by the agent (therefore same product_id has different amount) (0 if agent was unsuccessful to sell)
call_date: date when which call was made
[Edit 2]: Here is SQLFiddle.
Here we join where ranking_date and call_date are in the same week. If you make calls sunday you will need to check whether it falls in the same week as you want.
The syntax in the query is for SQL server, as the SQL Fiddle given. You will need to modify the line of the join to
on date_part(w,r.ranking_date) = date_part(w,s.call_date)
which should be compatible with Google Redshift.
select
r.agent_id,
r.rank,
r.ranking_date,
count(s.call_id) TotalCalls,
sum(s.talk_time) TotalTime,
sum(s.amount) TotalAmount
from rankings r
left join sales s
on datename(ww,r.ranking_date)= datename(ww,s.call_date)
group by
r.agent_id,
r.rank,
r.ranking_date
GO
agent_id | rank | ranking_date | TotalCalls | TotalTime | TotalAmount
-------: | ---: | :----------- | ---------: | --------: | ----------:
1 | 1 | 2022-03-21 | 0 | null | null
1 | 2 | 2022-03-10 | 10 | 356 | 168
1 | 4 | 2022-03-14 | 0 | null | null
2 | 2 | 2022-03-21 | 0 | null | null
2 | 3 | 2022-03-14 | 0 | null | null
db<>fiddle here

SQL postgresql: maximum value and sum of values

I need to make a query to the following table, to return the maximum date grouped by code and also make the following calculation: deb-cre (maximum only).
How would I do this?
code | date | deb | cred
-----------------------------------
4 | 2018-01-01 | 100,00 | 200,00
4 | 2017-12-28 | 100,00 | 500,00
6 | 2018-01-23 | 350,00 | 400,00
6 | 2018-04-28 | 140,00 | 678,00
8 | 2018-01-12 | 156,00 | 256,00
8 | 2016-02-28 | 134,00 | 598,00
The result must be
4 | 2018-01-01 | -200,00
6 | 2018-04-28 | -50,00
8 | 2018-01-12 | -464,00
PostgreSQL's DISTINCT ON in combination with ORDER BY will return the first row per group:
SELECT DISTINCT ON (code)
code, date, deb - cre
FROM your_table
ORDER BY code, date DESC;

SQL Statement to show columns multiple times

I have a table containing an integer column that represents a work place, an integer column that represents the number of workpieces finished at that workplace and a date column.
I want to create a query that creates rows of the following type
location int | date of Max(workpiece) | max workpieces | Min(Date) | workpieces (Min(Date)) | max(Date) | workpieces (Max(Date))
So i want a row for each location containing the date of the day where the most pieces where finished plus the amount of the pieces, the oldest date and the pieces finished on that day and the newest date plus the number of pieces finished that day.
Do I have to use joins, to join the table with itself 3 times each given one of the criteria and then join on location? Is The GROUP BY Operator involved, which I don't quite get the hang of?
EDIT: Here's some sample data
+-------+-----------+-----------+-------------------+
| id | location | amount | date |
+-------+-----------+-----------+-------------------+
| 1 | 1 | 10 | 01.01.2016 |
| 2 | 2 | 5 | 01.01.2016 |
| 3 | 1 | 6 | 02.01.2016 |
| 4 | 2 | 35 | 02.01.2016 |
| 5 | 1 | 50 | 03.01.2016 |
| 6 | 2 | 20 | 03.01.2016 |
+-------+-----------+-----------+-------------------+
I want my output to look like this:
loc | dateMaxAmount| MaxAmount | MinDate | AmountMinDate | MaxDate | MaxDateAmount
1 | 03.01.2016 | 50 | 01.01.2016| 10 | 03.01.2016| 50
2 | 02.01.2016 | 35 | 01.01.2016| 5 | 03.01.2016| 20
I am using MS Access.

Percentage to total in BigQuery Legacy SQL (Subqueries?)

I can't understand how to calulate percentage to total in BigQuery Legacy SQL.
So, I have a table:
ID | Name | Group | Mark
1 | John | A | 10
2 | Lucy | A | 5
3 | Jane | A | 7
4 | Lily | B | 9
5 | Steve | B | 14
6 | Rita | B | 11
I want to calculate percentage like this:
ID | Name | Group | Mark | Percent
1 | John | A | 10 | 10/(10+5+7)=45%
2 | Lucy | A | 5 | 5/(10+5+7)=22%
3 | Jane | A | 7 | 7/(10+5+7)=33%
4 | Lily | B | 9 | 9/(9+14+11)=26%
5 | Steve | B | 14 | 14/(9+14+11)=42%
6 | Rita | B | 11 | 11/(9+14+11)=32%
My table is quite long for me (3 million rows).
I thought that I could do it with subqueries, but in SELECT I can't use subqueries.
Does anyone know a way to do it?
SELECT
ID, Name, [Group], Mark,
RATIO_TO_REPORT(Mark) OVER(PARTITION BY [Group]) AS percent
FROM YourTable
Check more about RATIO_TO_REPORT

Count the accumulated hours within a column

I have a table which holds information on the type of work a worker does and the amount of hours spent on the work.
eg.
work_id | user_id | work_type | hours_spent
-------------------------------------------------
1 | 1 | Maintain | 7
2 | 1 | sick | 4
3 | 1 | maintain | 3
4 | 1 | maintain | 6
5 | 2 | Web | 5
6 | 2 | Develop | 8
7 | 2 | develop | 5
8 | 3 | maintain | 5
9 | 3 | sick | 7
10 | 3 | sick | 7
I would like to count the amount of accumulated hours each user has spent on a type of work to display something like this:
user id | work_type | hours_spent
-----------------------------------
1 | maintain | 16
1 | sick | 4
2 | Web | 5
2 | develop | 13
3 | maintain | 5
3 | sick | 14
The sum() function I'm using now returns all the hours in the hours_spent column. Is this the right function for what I want to achieve?
I'm using SQL Server 2008 R2.
SELECT
user_id,
work_type = LOWER(work_type),
hours_spent = SUM(hours_spent)
FROM dbo.tablename
GROUP BY user_id, LOWER(work_type)
ORDER BY user_id, LOWER(work_type);
You don't need LOWER() there unless you have a case sensitive collation. And if you do, enter those strings consistently - or better yet, use a lookup table for those strings and store a tinyint in the main table instead.