SQL Query to group ID overlap (via inner join) by month - sql

I'm trying to find a query that will give me the number of customers that have transacted with 2 different entities in the same month. In other words, customer_ids that transacted with company_a and company_b within the same month. Here is what I have so far:
SELECT Extract(year FROM company_a_customers.transaction_date)
|| Extract(month FROM company_a_customers.transaction_date) AS
payment_month,
Count(UNIQUE(company_a_customers.customer_id))
FROM (SELECT *
FROM my_table
WHERE ( merchant_name LIKE '%company_a%' )) AS company_a_customers
INNER JOIN (SELECT *
FROM my_table
WHERE ( merchant_name = 'company_b' )) AS
company_b_customers
ON company_a_customers.customer_id =
company_b_customers.customer_id
GROUP BY Extract(year FROM company_a_customers.transaction_date)
|| Extract(month FROM company_a_customers.transaction_date)
The problem is that this is giving me a running total of all customers that transacted with company A on a month-by-month basis who also ever transacted with company B.
If I whittle it down to a specific month, it will obviously give me the correct overlap, because the query is only getting IDs for that month:
SELECT Extract(year FROM company_a_customers.transaction_date)
|| Extract(month FROM company_a_customers.transaction_date) AS
payment_month,
Count(UNIQUE(company_a_customers.customer_id))
FROM (SELECT *
FROM my_table
WHERE ( merchant_name LIKE '%company_a%' )
AND transaction_date >= '2017-06-01'
AND transaction_date <= '2017-06-30') AS company_a_customers
INNER JOIN (SELECT *
FROM my_table
WHERE ( merchant_name = 'company_b' )
AND transaction_date >= '2017-06-01'
AND transaction_date <= '2017-06-30') AS
company_b_customers
ON company_a_customers.customer_id =
company_b_customers.customer_id
GROUP BY Extract(year FROM company_a_customers.transaction_date)
|| Extract(month FROM company_a_customers.transaction_date)
How can I do this in one query to get monthly totals for customers who transacted with both companies within the given month?
Desired result: Output of second query, but for every month that is in the database. In other words:
January 2017: xx,xxx overlapping customers
February 2017: xx,xxx overlapping customers
March 2017: xx,xxx overlapping customers
Thanks very much.

You could simply calculate year/month for both and then add it as a join-condition, but this is not very efficient as it might create a huge intermediate result.
You better check for each month/customer if there were transactions with both merchants using conditional aggregation. And then count by month:
SELECT payment_month, count(*)
FROM
( SELECT Extract(year FROM transaction_date)
|| Extract(month FROM transaction_date) AS payment_month,
customer_id
FROM my_table
WHERE ( merchant_name LIKE '%company_a%' )
OR ( merchant_name = 'company_b' )
GROUP BY payment_month,
customer_id
-- both merchants within the same months
HAVING SUM(CASE WHEN merchant_name LIKE '%company_a%' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN merchant_name = 'company_b' THEN 1 ELSE 0 END) > 0
) AS dt
GROUP BY 1
YOur payment_month calculation is to complicated (and the returned string is not nicely formatted).
To get year/month as string:
TO_CHAR(transaction_date, 'YYYYMM')
as number:
EXTRACT(YEAR FROM transaction_date) * 100
+ EXTRACT(MONTH FROM transaction_date)
or calculate the first of month:
TRUNC(transaction_date, 'mon')

You should be able to get your desired results in one query just by counting the number of merchant_names per month per customer id. Using HAVING > 1 will show you only customers with transactions with both (or more if there are more matches for like '%company_a%').
SELECT
EXTRACT(Year from transaction_date)||EXTRACT(Month from transaction_date) as payment_month
,customer_id
,COUNT(DISTINCT merchant_name) as CompanyCount
FROM my_table
WHERE transaction_date >= '2017-06-01' AND transaction_date <= '2017-06-30'
AND (merchant_name = 'company_b' or merchant_name LIKE '%company_a%')
GROUP BY
EXTRACT(Year from transaction_date)||EXTRACT(Month from transaction_date)
,customer_id
HAVING COUNT(DISTINCT merchant_name) > 1

Related

Using CTE to create pivot table

I've a table:
Task:
Create a pivot table using CTE.
Count the orders placed for each month for several years: from 2011 to 2013. The final table should include four fields: invoice_month, year_2011, year_2012, year_2013. The month field must store the month as a number between 1 and 12.
If no orders were placed in any month, the number of that month should still be included in the table.
I was able to solve this task with this query:
WITH year11
AS (
SELECT EXTRACT(MONTH FROM invoice.invoice_date::TIMESTAMP) AS invoice_month
,COUNT(*) AS orders
FROM invoice
WHERE EXTRACT(YEAR FROM invoice.invoice_date::TIMESTAMP) = 2011
GROUP BY invoice_month
)
,year12
AS (
SELECT EXTRACT(MONTH FROM invoice.invoice_date::TIMESTAMP) AS invoice_month
,COUNT(*) AS orders
FROM invoice
WHERE EXTRACT(YEAR FROM invoice.invoice_date::TIMESTAMP) = 2012
GROUP BY invoice_month
)
,year13
AS (
SELECT EXTRACT(MONTH FROM invoice.invoice_date::TIMESTAMP) AS invoice_month
,COUNT(*) AS orders
FROM invoice
WHERE EXTRACT(YEAR FROM invoice.invoice_date::TIMESTAMP) = 2013
GROUP BY invoice_month
)
SELECT year11.invoice_month
,year11.orders AS year_2011
,year12.orders AS year_2012
,year13.orders AS year_2013
FROM year11
INNER JOIN year12 ON year11.invoice_month = year12.invoice_month
INNER JOIN year13 ON year11.invoice_month = year13.invoice_month
But this request looks too big (or not?).
What can I improve (should I?)using CTE in my query?
Other tools to solve this task fast and beautiful?
I find using filtered aggregation a lot easier to generate pivot tables:
SELECT extract(month from inv.invoice_date) AS invoice_month
COUNT(*) filter (where extract(year from inv.invoice_date) = 2011) AS orders_2011,
COUNT(*) filter (where extract(year from inv.invoice_date) = 2012) AS orders_2012,
COUNT(*) filter (where extract(year from inv.invoice_date) = 2013) AS orders_2013
FROM invoice inv
WHERE inv.invoice_date >= date '2011-01-01'
AND inv.invoice_date < date '2014-01-01'
GROUP BY invoice_month

PostgreSQL Query To Obtain Value that Occurs more than once in 12 months

I have the following query to return the number of users that booked a flight at least twice, but I need to identify those which have booked a flight more than once in the range of 12 months
SELECT COUNT(*)
FROM sales
WHERE customer in
(
SELECT customer
FROM sales
GROUP BY customer
HAVING COUNT(*) > 1
)
You would use window functions. The simplest method is lag():
select count(distinct customer)
from (select s.*,
lag(date) over (partition by customer order by date) as prev_date
from sales s
) s
where prev_date > s.date - interval '12 month';
At the cost of a self-join, #AdrianKlaver's answer can adapt to any 12-month period.
SELECT COUNT(DISTINCT customer) FROM
(SELECT customer
FROM sales s1
JOIN sales s2
ON s1.customer = s2.customer
AND s1.ticket_id <> s2.ticket_id
AND s2.date_field BETWEEN s1.date_field AND (s1.date_field + interval'1 year')
GROUP BY customer
HAVING COUNT(*) > 1) AS subquery;
A stab at it with a made up date field:
SELECT COUNT(*)
FROM sales
WHERE customer in
(
SELECT customer
FROM sales
WHERE date_field BETWEEN '01/01/2019' AND '12/31/2019'
GROUP BY customer
HAVING COUNT(*) > 1
)

Count records for first day of every month in a year

I have a table with 4 columns huge number of records. It has the following structure:
DATE_ENTERED EMP_NAME DATA ORIGINATED
01-JAN-20 A 545454 APPLE
I want to calculate no of records for every first day of every month in a year
is there any way can we fetch the data for every first day of month.
In oracle you can use TRUNC function on the date as follows:
SELECT TRUNC(DATE_ENTERED), COUNT(1) AS CNT
FROM YOUR_TABLE
WHERE TRUNC(DATE_ENTERED) = TRUNC(DATE_ENTERED, 'MON')
GROUP BY TRUNC(DATE_ENTERED, 'MON')
Please note that the TRUNC(DATE_ENTERED, 'MON') returns the first day of the month for DATE_ENTERED.
Cheers!!
SELECT Year, Month, COUNT(*)
FROM
(
SELECT
YEAR(DATE_ENTERED) Year
MONTH(DATE_ENTERED) Month
DAY(DATE_ENTERED) Day
FROM your_table
WHERE DAY(DATE_ENTERED) = 1
) A
GROUP BY Year, Month
Generally WHERE DAY(DATE_ENTERED) = 1 will get you the records only for dates at the start of each month. Thus using Year and Month function you can group them by in order to get a count for each year and each month
You mean something like
SELECT COUNT(*)
FROM Table
WHERE DAY(DATE_ENTERED) = 1 AND
YEAR(DATE_ENTERED) = Some_Year
GROUP BY DATE_ENTERED
You can also use DATE_ENTERED BETWEEN 'YYYY0101' and 'YYYY1231' (replace the YYYY with the year you want to retrieve data for) instead of YEAR(DATE_ENTERED) = Some_Year, if performance is an issue.
You can use something like this:
select * from your_table
where DAY(DATE_ENTERED) = 1
and DATE_ENTERED between '2020-01-01' and '2020-12-31'
for number of count use this:
select count(*) from your_table
where DAY(DATE_ENTERED)= 1
and DATE_ENTERED between '2020-01-01' and '2020-12-31'
UPDATE
select * from your_table where Extract(day FROM DATE_ENTERED) = 1 and DATE_ENTERED between '01-JAN-20 ' and '01-DEC-20 ';
this is how the data looks like:
For the list of records
select count(*) from your_table where Extract(day FROM DATE_ENTERED) = 1 and DATE_ENTERED between '01-JAN-20 ' and '01-DEC-20 ';
UPDATE-2
select EXTRACT(month from DATE_ENTERED) as Count,
to_char(to_date(DATE_ENTERED, 'DD-MM-YYYY'), 'Month') from your_table
where Extract(day FROM DATE_ENTERED) = 1 and DATE_ENTERED between '01-JAN-20
'and '01-DEC-20 ' group by EXTRACT(month from DATE_ENTERED),
to_char(to_date(DATE_ENTERED, 'DD-MM-YYYY'), 'Month');
Here is the output:

GROUP BY month when selecting a date Teradata SQL assistant

SELECT EVENT_DT - ((EVENT_DT -DATE'1900-01-07') MOD 7) AS dates,
CLSFD_USER_ID AS user_id,
COUNT(DISTINCT CLSFD_USER_ID) AS number_of_user_ids,
COUNT(DISTINCT CLSFD_CAS_AD_ID) AS number_of_ads,
SUM(IMPRSN_CNT) AS number_of_impressions
FROM clsfd_access_views.CLSFD_CAS_AD_HST
WHERE CLSFD_SITE_ID = 3001
AND datum >= '2017-01-01'
GROUP BY 1,2
I want to have the total number of unique users during each month of the year 2017. I tried:
GROUP BY EXTRACT(MONTH FROM datum), 2
But this returns an error. What would be the most efficient code to retrieve the total number of user ids, ads, and impressions, per month.
It doesn't make sense to me to be aggregating by users, since they are what you are trying to count. Try grouping by the month and year alone:
SELECT
EXTRACT(YEAR FROM EVENT_DT) || '-' || EXTRACT(MONTH FROM EVENT_DT) AS month,
COUNT(DISTINCT CLSFD_USER_ID) AS number_of_user_ids,
COUNT(DISTINCT CLSFD_CAS_AD_ID) AS number_of_ads,
SUM(IMPRSN_CNT) AS number_of_impressions
FROM clsfd_access_views.CLSFD_CAS_AD_HST
WHERE
CLSFD_SITE_ID = 3001 AND
datum >= '2017-01-01' AND datum < '2018-01-01'
GROUP BY
EXTRACT(YEAR FROM EVENT_DT) || '-' || EXTRACT(MONTH FROM EVENT_DT);
Note that I changed your restriction on datum to also exclude any year greater than 2017.
If you want this values to be included in current query, then you should use analytical functions. For example "total number of unique users during each month" would be something like:
select count(distinct user_id) over(partition by EXTRACT(MONTH FROM datum))
Be aware that those values will be repeated for each user.

sql to find row for min date in each month

I have a table, lets say "Records" with structure:
id date
-- ----
1 2012-08-30
2 2012-08-29
3 2012-07-25
I need to write an SQL query in PostgreSQL to get record_id for MIN date in each month.
month record_id
----- ---------
8 2
7 3
as we see 2012-08-29 < 2012-08-30 and it is 8 month, so we should show record_id = 2
I tried something like this,
SELECT
EXTRACT(MONTH FROM date) as month,
record_id,
MIN(date)
FROM Records
GROUP BY 1,2
but it shows 3 records.
Can anybody help?
SELECT DISTINCT ON (EXTRACT(MONTH FROM date))
id,
date
FROM Records1
ORDER BY EXTRACT(MONTH FROM date),date
SQLFiddle http://sqlfiddle.com/#!12/76ca2/3
UPD: This query:
1) Orders the records by month and date
2) For every month picks the first record (the first record has MIN(date) because of ordering)
Details here http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
This will return multiples if you have duplicate minimum dates:
Select
minbymonth.Month,
r.record_id
From (
Select
Extract(Month From date) As Month,
Min(date) As Date
From
records
Group By
Extract(Month From date)
) minbymonth
Inner Join
records r
On minbymonth.date = r.date
Order By
1;
Or if you have CTEs
With MinByMonth As (
Select
Extract(Month From date) As Month,
Min(date) As Date
From
records
Group By
Extract(Month From date)
)
Select
m.Month,
r.record_id
From
MinByMonth m
Inner Join
Records r
On m.date = r.date
Order By
1;
http://sqlfiddle.com/#!1/2a054/3
select extract(month from date)
, record_id
, date
from
(
select
record_id
, date
, rank() over (partition by extract(month from date) order by date asc) r
from records
) x
where r=1
order by date
SQL Fiddle
select distinct on (date_trunc('month', date))
date_trunc('month', date) as month,
id,
date
from records
order by 1, 3 desc
I think you need use sub-query, something like this:
SELECT
EXTRACT(MONTH FROM r.date) as month,
r.record_id
FROM Records as r
INNER JOIN (
SELECT
EXTRACT(MONTH FROM date) as month,
MIN(date) as mindate
FROM Records
GROUP BY EXTRACT(MONTH FROM date)
) as sub on EXTRACT(MONTH FROM r.date) = sub.month and r.date = sub.mindate