group by of one column and having count of another - sql

I have a table 'customer' which contains 4 columns
name day product price
A 2021-04-01 p1 100
B 2021-04-01 p1 100
C 2021-04-01 p2 120
A 2021-04-01 p2 120
A 2021-04-02 p1 100
B 2021-04-02 p3 80
C 2021-04-03 p2 120
D 2021-04-03 p2 120
C 2021-04-04 p1 100
With a command
SELECT COUNT(name)
FROM (SELECT name
FROM customer
WHERE day > '2021-03-28'
AND day < '2021-04-09'
GROUP BY name
HAVING COUNT(name) > 2)
I could count number of customer that bought something more than twice in a period of time.
I would like to know in each day (GROUP BY over day) how many customers bought something with this condition that in a period they bought something more than twice.
Suggested Edit:
For above example A and C are valid agents by the condition.
The desired output will be:
day how_many
2021-04-01 2
2021-04-02 1
2021-04-03 1
2021-04-04 1

I interpret your question as wanting to know how many customers made more than one purchase on each day. If so, one method uses two levels of aggregation:
select day,
sum(case when day_count >= 2 then 1 else 0 end)
from (select c.name, c.day, count(*) as day_count
from customer c
group by c.name, c.day
) nc
group by day
order by day;

Related

Grouping and Summarize SQL

My table looks like the following:
income
date
productid
invoiceid
customerid
300
2015-01-01
A
1234551
1
300
2016-01-02
A
1234552
1
300
2016-01-03
B
1234553
2
300
2016-01-03
A
1234553
2
300
2016-01-04
C
1234554
3
300
2016-01-04
C
1234554
3
300
2016-01-08
A
1234556
3
300
2016-01-08
B
1234556
3
300
2016-01-11
C
1234557
3
I need to know : Number of invoices per customer, how many customers in total (for example one invoice = several customers, two invoices = two customers, three invoices = three customers, and so..).
What is the syntax for this query?
In my sample data above, customer 1 has two invoices, customer 2 one invoice and customer 3 three invoices. So there is one customer each with a count of 1, 2, and 3 invoices in my example.
Expected result:
invoice_count
customers_with_this_invoice_count
1
1
2
1
3
1
I tried this syntax and I'm still stuck:
select * from
(
select CustomerID,count(distinct InvoiceID) as 'Total Invoices'
from exam
GROUP BY CustomerID
) a
Select Count(customerID),CustomerID From a
Group By customerID
Having Count(customerID) > 1

Aggregate data based on fixed moving date window in Presto

I wanted to:
aggregate numbers in a “3-months" rolling window, (eg Jan-Mar, Feb-Apr, Mar-May....)
then compare the same country & city with last year's same rolling window
Table I already have: (unique at: country + city + month level)
country city month sum
US A 2019-03-01 3
US B 2019-03-01 4
DE C 2019-03-01 5
US A 2019-03-01 3
CN B 2019-03-01 4
US B 2019-04-01 4
UK C 2019-04-01 7
US C 2019-04-01 2
....
US A 2019-12-01 10
US B 2020-12-01 6
US C 2021-01-01 7
Step 1 ideal output:
country city period sum
US A 2019-03-01~2019-05-01 XXX
US A 2019-04-01~2019-06-01 YYY
UK A 2019-03-01~2019-05-01 ZZZ
...
UK A 2020-12-01~2021-02-01 BBB
Step 2 ideal output:
country city period sum last_year_sum year_over_year_%
US A 2019-03-01~2019-05-01 XXX 111 40%
US A 2019-04-01~2019-06-01 YYY 1111 30%
UK A 2019-03-01~2019-05-01 ZZZ 11111 20%
...
UK A 2020-12-01~2021-02-01 BBB 1111 15%
Ideally, I wanted to achieve this in Presto - any idea how to do that? Thanks!!
Unfortunately, Presto doesn't support the range window frame specification using dates. One method uses joins and aggregation and then lag() to get the last year amount:
select t.country, t.city, t.sum,
sum(t2.sum) as this_year_sum,
lag(sum(t2.sum), 12) over (partition by country, city order by month) as prev_year_sum,
(-1 +
sum(t2.sum) /
lag(sum(t2.sum), 12) over (partition by country, city order by month)
) as yoy_increase
from t left join
t t2
on t2.country = t.country and
t2.city = t.city and
t2.month >= t.month and
t2.month <= t.month + interval '2' month
group by t.country, t.city, t.sum;
Note: This assumes that you have data for all months for each country/city combination.

Total brokers with atleast one purchase weekly in postgres

cid
Company name
1
cname 1
2
cname2
bid
cid
broker name
1
2
broker 1
2
1
broker 2
pid
bid
purchase date
1
1
2021-05-01 00:20:30
2
2
2021-05-02 13:20:30
I have above tables. I would like to fetch data weekly data of brokers with at least one purchase in a week.
week start date
No of brokers
2021-04-03 00:00:00
5
2021-04-10 00:00:00
20
Also I would like to fetch data weekly data of companies with at least one purchase in a week.
week start date
No of companies
2021-04-03 00:00:00
5
2021-04-10 00:00:00
20
postgres sql queries.
You can use date_trunc() and then count the brokers and companies using count(distinct):
select date_trunc('week', purchase_date) as week,
count(distinct p.bid) as num_brokers,
count(distinct b.cid) as num_companies
from purchases p join
brokers b
on p.bid = b.id
group by week;
You would use count(*) to get the number of purchases.

Rolling Sum (4 Months)

I have been struggeling with building a query in access that calculates a "rolling 4 months" of sales data. I have been experimenting with DSUM, but I only seem to be able to get the subtotal or running total for a specific group (not a moving total). I have tried to illustrate what I am trying to do below.
Date Product Value Rolling_4_Month_Sum
January A 100 100
February A 200 300
March A 300 600
April A 300 900
May A 200 1000
June A 400 1200
July A 500 1400
August A 700 1800
Is it possible to make a running total for 4 rows/months only?
SELECT
a.Date,
a.Product,
a.Value,
SUM(b.value)
FROM
Table a
INNER JOIN Table b ON a.Product=b.Product
AND b.Date <= a.Date
AND b.Date >= DateAdd("q",1, a.Date)
GROUP BY
a.Date, a.Product
This should work in my opinion.
Table a is your "single month" row date.
Table b is self join to retrieve the last 4 predecessing months. It is done by adding b.Date >= DateAdd("q",1, a.Date) as self-join criteria.
Here is a nice example of how these kinds of things work.
Data:
OrderDetailID OrderID ProductID Price
1 1234 1 $5.00
2 1234 2 ($2.00)
3 1234 3 $4.00
4 1235 1 $5.00
5 1235 3 $4.00
6 1235 5 $12.00
7 1235 2 ($2.00)
SQL:
SELECT OD.OrderDetailID, OD.OrderID, OD.ProductID, OD.Price, (SELECT Sum(Price) FROM tblOrderDetails
WHERE OrderDetailID <= OD.OrderDetailID) AS RunningSum
FROM tblOrderDetails AS OD;

Joining to another table only on the first occurrence of a field

Note: I have tried to simplify the below to make it simpler both for me and for anyone else to understand, the tables I reference below are in fact sub-queries joining a lot of different data together from different sources)
I have a table of purchased items:
Items
ItemSaleID CustomerID ItemCode
1 100 A
2 100 B
3 100 C
4 200 A
5 200 C
I also have transaction header and detail tables coming from a till system:
TranDetail
TranDetailID TranHeaderID ItemSaleID Cost
11 51 1 $10
12 51 2 $10
13 51 3 $10
14 52 4 $20
15 52 5 $10
TranHeader
TranHeaderID CustomerID Payment Time
51 100 $100 11:00
52 200 $50 12:00
53 100 $20 13:00
I want to get to a point where I have a table like:
ItemSaleID CustomerID ItemCode Cost Payment Time
1 100 A $10 $120 11:00
2 100 B $10 11:00
3 100 C $10 11:00
4 200 D $20 $50 12:00
5 200 E $10 12:00
I have a query which produces the results but when I add in the ROW_NUMBER() case statement goes from 2 minutes to 30+ minutes.
The query is further confused because I need to supply the earliest date relating to the list of transactions and the total price paid (could be many transactions throughout the day for upgrades etc)
Query below:
SELECT ItemSaleID
, CustomerID
, ItemCode
, Cost
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY TranHeaderID ORDER BY ItemSaleID) = 1
THEN TRN.Payment ELSE NULL END AS Payment
FROM Items I
OUTER APPLY (
SELECT TOP 1 SUB.Payment, Time
FROM TranHeader H
INNER JOIN TranDetail D ON H.TranHeaderID = D.TranHeaderID
OUTER APPLY (SELECT SUM(Payment) AS Payment
FROM TranHeader H2
WHERE H2.CustomerID = Items.CustomerID
) SUB
WHERE D.CustomerID = I.CustomerID
) TRN
WHERE ...
Is there a way that I can only show payments for each occurrence of the customer ID whilst maintaining performance