SQL mean amount per group ordered by time - sql

I have the following table
timestamp ID eur
-----------------------
2022-01-01 A 10
2022-01-02 A 20
2022-01-01 B 30
2022-01-02 B 40
2022-01-03 B 50
2022-01-04 B 60
Now I am interested in all previous information for a specific ID. Then I want to do something with this information, lets say calculating the mean. Here is what I am aiming for:
timestamp ID eur sum_all mean_all
------------------------------------------------
2022-01-01 A 10 10 10
2022-01-02 A 20 30 15
2022-01-01 B 30 30 30
2022-01-02 B 40 70 35
2022-01-03 B 50 120 40
2022-01-04 B 60 180 45
This seems so easy but I just can't get my head around how to do this in SQL.
I appreciate any help. Thanks!

You can use the sum and avg window functions:
select *, sum(eur) over(partition by ID order by timestamp) as sum_all,
avg(eur) over(partition by ID order by timestamp) as mean_all
from table_name

Related

Avoid counting duplicate clicks SQL

I have a data like the below:
date id process name
2022-01-01 12:23:33 12 security John
2022-01-01 12:25:33 12 security John
2022-01-01 12:27:33 12 security John
2022-01-01 12:29:33 12 security John
2022-01-01 14:04:45 12 security John
2022-01-05 03:53:11 12 Infra Sasha
2022-01-05 03:57:30 12 Infra Sasha
2022-01-06 12:23:33 12 Infra Sasha
The data with date difference of 10 mins for same values in the other fields are basically multi clicks and only the first one needs to be considered.
Expected result:
2022-01-01 12:23:33 12 security John
2022-01-01 14:04:45 12 security John
2022-01-05 03:53:11 12 Infra Sasha
2022-01-06 12:23:33 12 Infra Sasha
I know we can use datediff() but i don't know how to group them by data thats same in the rest of the fields. I don't know where to start this logic.
Can someone help me get this please?
Thanks!
You can use LAG() to peek at a value of a previous row, according to a subgroup an ordering within the each subgroup. For example:
select *
from (
select *, lag(date) over(partition by id, process order by date) as pd from t
) x
where pd is null or date > pd + interval '10 minute'
Result:
date id process name pd
-------------------- --- --------- ------ -------------------
2022-01-05 03:53:11 12 Infra Sasha null
2022-01-06 12:23:33 12 Infra Sasha 2022-01-05 03:57:30
2022-01-01 12:23:33 12 security john null
2022-01-01 14:04:45 12 security john 2022-01-01 12:29:33
See running example at db<>fiddle.

SQL: how to average across groups, while taking a time constraint into account

I have a table named orders in a Postgres database that looks like this:
customer_id order_id order_date price product
1 2 2021-03-05 15 books
1 13 2022-03-07 3 music
1 14 2022-06-15 900 travel
1 11 2021-11-17 25 books
1 16 2022-08-03 32 books
2 4 2021-04-12 4 music
2 7 2021-06-29 9 music
2 20 2022-11-03 8 music
2 22 2022-11-07 575 travel
2 24 2022-11-20 95 food
3 3 2021-03-17 25 books
3 5 2021-06-01 650 travel
3 17 2022-08-17 1200 travel
3 19 2022-10-02 6 music
3 23 2022-11-08 70 food
4 9 2021-08-20 3200 travel
4 10 2021-10-29 2750 travel
4 15 2022-07-15 1820 travel
4 21 2022-11-05 8000 travel
4 25 2022-11-29 27 books
5 1 2021-01-04 3 music
5 6 2021-06-09 820 travel
5 8 2021-07-30 19 books
5 12 2021-12-10 22 music
5 18 2022-09-19 20 books
Here's a SQL Fiddle: http://sqlfiddle.com/#!17/262fc/1
I'd like to return the average money spent by customers per product, but only consider orders within the first 12 months of a given customer's first purchase within the given product group. (yes, this is challenging!)
For example, for customer 1, order ID 2 and order ID 11 would be factored into the average for books(because order ID 11 took place less than 12 months after customer 1's first order for books, which was order ID 2), but order ID 16 would not be factored into the average (because 8/3/22 is more than 12 months from customer 1's first purchase for books, which took place on 3/5/21).
Here is a matrix showing which orders would be included within a given product (denoted by "yes"):
The desired output would look as follows:
average_spent
books 22.20
music 7.83
travel 1530.71
food 82.50
How would I do this?
Thanks in advance for any assistance you can give!
You can use a subquery to check whether or not to include a product's price in the summation:
select o.product, sum(o.price)/count(*) val from orders o
where o.order_date < (select min(o1.order_date) from orders o1 where
o1.product = o.product and o.user_id = o1.user_id) + interval '12 months'
group by o.product
See fiddle

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

Getting price difference between two dates

There is a table where once a day/hour lines are added that contain the product ID, price, name and time at which the line was added.
CREATE TABLE products
(
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
product_id integer NOT NULL,
title text NOT NULL,
price double precision NOT NULL,
checked_at timestamp with time zone DEFAULT now()
);
The data in the products table looks like this:
id
product_id
title
price
checked_at
1
1000
Watermelon
50
2022-07-19 10:00:00
2
2000
Apple
30
2022-07-19 10:00:00
3
3000
Pear
20
2022-07-19 10:00:00
4
1000
Watermelon
100
2022-07-20 10:00:00
5
2000
Apple
50
2022-07-20 10:00:00
6
3000
Pear
35
2022-07-20 10:00:00
7
1000
Watermelon
150
2022-07-21 10:00:00
8
2000
Apple
50
2022-07-21 10:00:00
9
3000
Pear
60
2022-07-21 10:00:00
I need to pass a date range (for example, from 2022-07-19 to 2022-07-21) and get the difference in prices of all unique products, that is, the answer should be like this:
product_id
title
price_difference
1000
Watermelon
100
2000
Apple
20
3000
Pear
40
I only figured out the very beginning, where I need to get the ID of all unique products in the table using DISTINCT. Next, I need to find the rows that are closest to the date range. And finally find the difference in the price of each product.
You could use an aggregation approach here:
SELECT product_id, title,
MAX(price) FILTER (WHERE checked_at::date = '2022-07-21') -
MAX(price) FILTER (WHERE checked_at::date = '2022-07-19') AS price_difference
FROM products
GROUP BY product_id, title
ORDER BY product_id;

Showing Two Fields With Different Timeline in the Same Date Structure

In the project I am currently working on in my company, I would like to show sales related KPIs together with Customer Score metric on SQL / Tableau / BigQuery
The primary key is order id in both tables. However, order date and the date we measure Customer Score may be different. For example the the sales information for an order that is released in Feb 2020 will be aggregated in Feb 2020, however if the customer survey is made in March 2020, the Customer Score metric must be aggregated in March 2020. And what I would like to achieve in the relational database is as follows:
Sales:
Order ID
Order Date(m/d/yyyy)
Sales ($)
1000
1/1/2021
1000
1001
2/1/2021
2000
1002
3/1/2021
1500
1003
4/1/2021
1700
1004
5/1/2021
1800
1005
6/1/2021
900
1006
7/1/2021
1600
1007
8/1/2021
1900
Customer Score Table:
Order ID
Customer Survey Date(m/d/yyyy)
Customer Score
1000
3/1/2021
8
1001
3/1/2021
7
1002
4/1/2021
3
1003
6/1/2021
6
1004
6/1/2021
5
1005
7/1/2021
3
1006
9/1/2021
1
1007
8/1/2021
7
Expected Output:
KPI
Jan-21
Feb-21
Mar-21
Apr-21
May-21
June-21
July-21
Aug-21
Sep-21
Sales($)
1000
2000
1500
1700
1800
900
1600
1900
AVG Customer Score
7.5
3
5.5
3
7
1
I couldn't find a way to do this, because order date and survey date may/may not be the same.
For sample data and expected output, click here.
I think what you want to do is aggregate your results to the month (KPI) first before joining, as opposed to joining on the ORDER_ID
For example:
with order_month as (
select date_trunc(order_date, MONTH) as KPI, sum(sales) as sales
from `testing.sales`
group by 1
),
customer_score_month as (
select date_trunc(customer_survey_date, MONTH) as KPI, avg(customer_score) as avg_customer_score
from `testing.customer_score`
group by 1
)
select coalesce(order_month.KPI,customer_score_month.KPI) as KPI, sales, avg_customer_score
from order_month
full outer join customer_score_month
on order_month.KPI = customer_score_month.KPI
order by 1 asc
Here, we aggregate the total sales for each month based on the order date, then we aggregate the average customer score for each month based on the date the score was submitted. Now we can join these two on the month value.
This results in a table like this:
KPI
sales
avg_customer_score
2021-01-01
1000
null
2021-02-01
2000
null
2021-03-01
1500
7.5
2021-04-01
1700
3.0
2021-05-01
1800
null
2021-06-01
900
5.5
2021-07-01
1600
3.0
2021-08-01
1900
7.0
2021-09-01
null
1.0
You can pivot the results of this table in Tableau, or leverage a case statement to pull out each month into its own column - I can elaborate more if that will be helpful