SQL - GROUP BY 3 values of the same column - sql

I have this table in GBQ :
ClientID Type Month
XXX A 4
YYY C 4
FFX B 5
FFF B 6
XXX C 6
XXX A 6
YRE C 7
AAR A 7
FFF A 8
EGT B 8
FFF B 9
ETT C 9
I am counting the number of Type per ClientID and Month, with this basic query :
SELECT ClientID,
COUNT(DISTINCT Type) NbTypes,
Month
FROM Table
GROUP BY ClientID, Month
The result looks like this :
ClientID NbTypes Month
XXX 1 4
XXX 2 6
FFF 1 6
FFF 1 8
FFF 1 9
... ... ...
What I need to do is, count the number of Type per ClientID and for each Month : per the last 3 months.
For example :
For the ClientID = XXX, and Month = 8 : I want to have the count of Type where Month = 6 AND Month = 7 AND Month = 8
Is there a way to do this with GROUP BY ?
Thank you

You could use HAVING in your statement:
SELECT ClientID,
COUNT(DISTINCT Type) NbTypes,
Month
FROM Table
GROUP BY ClientID, Month
HAVING Month = EXTRACT(MONTH FROM CURRENT_DATE())
OR Month = EXTRACT(MONTH FROM DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 MONTH))
OR Month = EXTRACT(MONTH FROM DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 2 MONTH))
Note that in your table seems to be no column to determinate the year, so this statement will group all values with month value of the current month to current month minus two months. So for example every data from December, November and October 2021, 2020, 2019 etc. will be selected with this query.
Also note that I could not test this statement, since I don't use BigQuery.
Here is the source for the Date-Functions:
https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions

You can use a SELECT in a SELECT if that is allowed in Google Big Query
SELECT ClientID,
COUNT(DISTINCT Type) NbTypes,
Month,
MAX((select count(distinct Type)
from Table t2
where t1.ClientID=t2.ClientID
and t1.month-t2.month between 0 and 2
)
) as NbType_3_months
FROM Table t1
GROUP BY ClientID, Month

You can group rows by ClientID and Month, count the number of types and sort rows by ClientID in ascending order and by Month in descending order, and then select from each group the rows of the past three months. It is roundabout and complicated to handle such a scenario in SQL because SQL implements set-orientation only halfway up. For your case, you have to get the largest month for each ClientID, find the eligible records through a join filter, and perform grouping and count. The usual way is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easier to be integrated into a Java program and generate much simpler code. It gets the task done with only two lines of code:
A
1
=GBQ.query("SELECT CLIENTID, COUNT(DISTINCT TYPE) AS NBTYPES, MONTH FROM t2 GROUP BY CLIENTID, MONTH ORDER BY CLIENTID, MONTH DESC")
2
=A1.group#o(#1).run(m=~.#3-3,~=~.select(MONTH>m)).conj()

Related

Find Individuals who have purchased 10 times within a rolling 1 year period

So let's say I have 2 tables. One table is for consumers, and another is for sales.
Consumers
ID
Name
...
1
John Johns
...
2
Cathy Dans
...
Sales
ID
consumer_id
purchase_date
...
1
1
01/03/05
...
2
1
02/04/10
...
3
1
03/04/11
...
4
2
02/14/07
...
5
2
09/24/08
...
6
2
12/15/09
...
I want to find all instances of consumers who made more than 10 purchases within any 6 month rolling period.
SELECT
consumers.id
, COUNT(sales.id)
FROM
consumers
JOIN sales ON consumers.id = sales.consumer_id
GROUP BY
consumers.id
HAVING
COUNT(sales.id) >= 10
ORDER BY
COUNT(sales.id) DESC
So I have this code, which just gives me a list of consumers who have made more than 10 purchases ALL TIME. But how do I incorporate the rolling 6 month period logic?!
Any help or guidance on which functions can help me accomplish this would be appreciated!
You can use window functions to count the number of sales in a six-month period. Then just filter down to those consumers:
select distinct consumer_id
from (select s.*,
count(*) over (partition by consumer_id
order by purchase_date
range between current row and interval '6 month' following
) as six_month_count
from sales s
) s
where six_month_count > 10;

Mondrian MDX Last Element Aggregation

In TelCo industry is very important to know what was the customer status at some some point (end of week, month, etc).
So, I have SDC type II dimension with: customer_tk, customerID, status, date.
We use it custom reports to find what is state on some day (example):
Date = '2015-10-01'
Group Active Terminated Suspended Order
------------------------------------------------------
Group1 25 2 2 8
Group2 45 8 0 12
Group3 15 18 5 2
Group4 65 2 1 29
This is pivoted from query:
SELECT * FROM dim_customer
INNER JOIN (SELECT max(customer_tk) as maxId, customerId FROM dim_customer WHERE date<='2015-10-01' GROUP BY customerId) as maxCust
ON dim_customer.customer_tk = maxCust.maxId
And it works perfectly (date is parameter from report).
I want to put it in cube but how to create this type of join? I need cumulative count of customers
I tried with MDX Tail(filter(... )) expressions but didn't managed to get correct numbers.
So, basically, with no filters, it should return status = 8 for customer 29841 and status = 2 for customer 28425.
But if choose year = 2014, it should return status = 2 for both customers:
Thanks

select columns between month and year

I have table with columns:
id month year
1 10 2011
2 1 2012
3 4 2011
4 3 2012
I Want select ids where (month=10 and year=2011) and (month=1 and year=2012), it's possible?
This is a basic SQL SELECT:
SELECT id FROM myTable WHERE (month = 10 AND year = 2011) OR (month = 1 AND year = 2012);
To search for rows between any two dates, the simplest solution may be to combine the month and year into a single number and then use numeric comparison:
SELECT id
FROM myTable
WHERE year*100 + month BETWEEN 201110 AND 201201
A misfeature of this solution is that it can't take advantage of indexes, so it will be slow on very large tables.

Splitting SQL Data Into Months

I have a Datatable with several hundred rows for this year in it. (MS SqlServer 2k8)
I would like to split this data set out into customer enquiries / Month.
What I have so far is;
Select count(id) As Customers, DatePart(month, enquiryDate) as MonthTotal, productCode From customerEnquiries
where enquiryDate > '2012-01-01 00:00:00'
group by productCode, enquiryDate
But this then produces a row for each data item. (Whereas I want a row per month for each data item.)
So how do I change the above query, so that instead of getting
1 1 10
1 1 10
1 1 11
1 2 10
1 2 10
...
I get
2 1 10 <-- 2 enquiries for product code 10 in month 1
1 1 11 <-- 1 enquiries for product code 11 in month 1
2 2 10 <-- 2 enquiries for product code 10 in month 2
etc
And as a bonus question, is there an easy way of naming each month so the output is Jan, Feb, March instead of 1,2,3 in the month column?
Try this
Select count(id) As Customers, DatePart(month, enquiryDate) as MonthTotal, productCode From customerEnquiries
where enquiryDate > '2012-01-01 00:00:00'
group by productCode, DatePart(month, enquiryDate)
This may help you.
For the Bonus, DATENAME(MONTH, enquiryDate) will give you the name of the Month.

Using outer query result in a subquery in postgresql

I have two tables points and contacts and I'm trying to get the average points.score per contact grouped on a monthly basis. Note that points and contacts aren't related, I just want the sum of points created in a month divided by the number of contacts that existed in that month.
So, I need to sum points grouped by the created_at month, and I need to take the count of contacts FOR THAT MONTH ONLY. It's that last part that's tricking me up. I'm not sure how I can use a column from an outer query in the subquery. I tried something like this:
SELECT SUM(score) AS points_sum,
EXTRACT(month FROM created_at) AS month,
date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month,
(SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count
FROM points
GROUP BY month, next_month
ORDER BY month
So, I'm extracting the actual month that my points are being summed, and at the same time, getting the beginning of the next_month so that I can say "Get me the count of contacts where their created at is < next_month"
But it complains that column next_month doesn't exist This is understandable as the subquery knows nothing about the outer query. Qualifying with points.next_month doesn't work either.
So can someone point me in the right direction of how to achieve this?
Tables:
Points
score | created_at
10 | "2011-11-15 21:44:00.363423"
11 | "2011-10-15 21:44:00.69667"
12 | "2011-09-15 21:44:00.773289"
13 | "2011-08-15 21:44:00.848838"
14 | "2011-07-15 21:44:00.924152"
Contacts
id | created_at
6 | "2011-07-15 21:43:17.534777"
5 | "2011-08-15 21:43:17.520828"
4 | "2011-09-15 21:43:17.506452"
3 | "2011-10-15 21:43:17.491848"
1 | "2011-11-15 21:42:54.759225"
sum, month and next_month (without the subselect)
sum | month | next_month
14 | 7 | "2011-08-01 00:00:00"
13 | 8 | "2011-09-01 00:00:00"
12 | 9 | "2011-10-01 00:00:00"
11 | 10 | "2011-11-01 00:00:00"
10 | 11 | "2011-12-01 00:00:00"
Edit
Now with running sum of contacts. My first draft used new contacts per month, which is obviously not what OP wants.
WITH c AS (
SELECT created_at
,count(id) OVER (order BY created_at) AS ct
FROM contacts
), p AS (
SELECT date_trunc('month', created_at) AS month
,sum(score) AS points_sum
FROM points
GROUP BY 1
)
SELECT p.month
,EXTRACT(month FROM p.month) AS month_nr
,p.points_sum
,( SELECT c.ct
FROM c
WHERE c.created_at < (p.month + interval '1 month')
ORDER BY c.created_at DESC
LIMIT 1) AS contacts
FROM p
ORDER BY 1
This works for any number of months across the years.
Assumes that no month is missing in the table points. If you want all months, including missing ones in points, generate a list of months with generate_series() and LEFT JOIN to it.
Build a running sum in a CTE with a window function.
Both CTE are not strictly necessary - for performance and simplification only.
Get contacts_count in a subselect.
Your original form of the query could work like this:
SELECT month
,EXTRACT(month FROM month) AS month_nr
,points_sum
,(SELECT count(*)
FROM contacts c
WHERE c.created_at < (p.month + interval '1 month')) AS contact_count
FROM (
SELECT date_trunc('MONTH', created_at) AS month
,sum(score) AS points_sum
FROM points p
GROUP BY 1
) p
ORDER BY 1
The fix for the immediate cause of your error is to put the aggregate into a subquery. You were mixing levels in a way that is impossible.
I expect my variant to be slightly faster with big tables. Not sure about smaller tables. Would be great if you'd report back with test results.
Plus a minor fix: < instead of <=.