Find Customers Who Shop at Multiple Stores - sql

I need a query that will give me a count of customers who have shopped at multiple store locations within the last 3 years.
I have formulated the following query, but it's not what I need to know:
SELECT STORE_ID, CUSTOMER_ID, COUNT(DISTINCT CUSTOMER_ID) as SERVICE_COUNT
From SALES INNER JOIN
STORE_DETAILS
ON trim(STORE_ID) = trim(STORE_ID)
WHERE (CURRENT_DATE - cast(SALE_DATE AS DATE format 'mm/dd/yyyy')) < 1095
ORDER BY 1,2
Group by 1,2
HAVING COUNT(DISTINCT SALE_DATE) > 1

If you want customers at multiple stores, then something like:
SELECT CUSTOMER_ID
FROM SALES INNER JOIN
STORE_DETAILS
ON trim(STORE_ID) = trim(STORE_ID)
WHERE (CURRENT_DATE - cast(SALE_DATE AS DATE format 'mm/dd/yyyy')) < 1095
GROUP BY 1
HAVING COUNT(DISTINCT STORE_ID) > 1;
I don't understand your date expression, but presumably you know what it is supposed to be doing.

Optimized version of Gordon's query based on your comments:
Often, the store_id has trailing spaces not allowing a true match
Comparing strings ignores trailing spaces. As long as there are no leading spaces (which is a worst case and should be fixed during load) you don't have to TRIM (it's quite bad for performance).
The datatype for SALE_DATE is DATE
If it's a date there's no need for a CAST. Additionally the within three years logic can be simplified to avoid date calculattion on every row.
SELECT CUSTOMER_ID, COUNT(DISTINCT CUSTOMER_ID) as SERVICE_COUNT
FROM SALES
JOIN STORE_DETAILS
ON STORE_ID = STORE_ID
WHERE SALE_DATE >= ADD_MONTHS(CURRENT_DATE, -12*3)
GROUP BY 1
HAVING SERVICE_COUNT > 1
;

Related

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

Count the number of repeating pairs of information

I have a table with customer_ID, date, and payment_method as 3 columns. payment_method can be 'cash', 'credit', or 'others'. I want to find out the number of customers who have used credit as a payment method more than 5 times, in the last 6 months.
I found this solution for displaying the rows where the customer used credit:
SELECT customer_ID, payment_method, COUNT(*) AS unique_pair_repeats
FROM tab1
WHERE customer_ID IS NOT NULL
GROUP BY customer_ID, payment_method
HAVING count(*) > 1;
The problem is, I don't want a list of the names/ids, I want to know how many people used their credit card for a purchase 5 times or more in the last 6 months.
This is one way you could do it:
SELECT COUNT(*)
FROM
(
SELECT customer_id
FROM tab1
WHERE
customer_ID IS NOT NULL and
payment_method = 'credit' and
tran_date > add_months(sysdate, -6)
GROUP BY customer_ID
HAVING count(*) > 5
) x
The inner query generates a list of all customer ids that have used credit more than 5 times in 6 months. The outer query counts them
You might feel it more logical to write it like this:
SELECT COUNT(*)
FROM
(
SELECT customer_id, count(*) as ctr
FROM tab1
WHERE
customer_ID IS NOT NULL and
payment_method = 'credit' and
tran_date > add_months(sysdate, -6)
GROUP BY customer_ID
) x
WHERE x.ctr > 5
So, remove customer_ID, payment_method, from select.
Though, that still doesn't answer "at least 5 times in last 6 months", so you need another condition: date (presuming you use Oracle, although you didn't tag the question but - you do use Oracle SQL Developer):
and date_column >= add_months(trunc(sysdate), -6)
Finally, something like this might help:
SELECT COUNT(*) AS unique_pair_repeats --> changes here
FROM tab1
WHERE customer_ID IS NOT NULL
and date_column >= add_months(trunc(sysdate), -6) --> here
GROUP BY customer_ID, payment_method
HAVING count(*) >= 5; --> here

Is it possible to look at two consecutive rows and determine the difference in time between the two using SQL?

I am relatively new to SQL, so please bear with me! I am trying to see how many customers make a purchase after being dormant for two years. Relevant fields include cust_id and purchase_date (there can be several observations for the same cust_id but with different dates). I am using Redshift for my SQL scripts.
I realize I cannot put the same thing in for the DATEDIFF parameters (it just doesn't make any sense), but I am unsure what else to do.
SELECT *
FROM tickets t
LEFT JOIN d_customer c
ON c.cust_id = t.cust_id
WHERE DATEDIFF(year, t.purchase_date, t.purchase_date) between 0 and 2
ORDER BY t.cust_id, t.purchase_date
;
I think you want lag(). To get the relevant tickets:
SELECT t.*
FROM (SELECT t.*,
LAG(purchase_date) OVER (PARTITION BY cust_id ORDER BY purchase_date) as prev_pd
FROM tickets t
) t
WHERE prev_pd < purchase_date - interval '2 year';
If you want the number of customers, use count(distinct):
SELECT COUNT(DISTINCT cust_id)
FROM (SELECT t.*,
LAG(purchase_date) OVER (PARTITION BY cust_id ORDER BY purchase_date) as prev_pd
FROM tickets t
) t
WHERE prev_pd < purchase_date - interval '2 year';
Note that these do not use DATEDIFF(). This counts the number of boundaries between two date values. So, 2018-12-31 and 2019-01-01 have a difference of 1 year.

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a

SQL Count Query Using Non-Index Column

I have a query similar to this, where I need to find the number of transactions a specific customer had within a time frame:
select customer_id, count(transactions)
from transactions
where customer_id = 'FKJ90838485'
and purchase_date between '01-JAN-13' and '31-AUG-13'
group by customer_id
The table transactions is not indexed on customer_id but rather another field called transaction_id. Customer_ID is character type while transaction_id is numeric.
'accounting_month' field is also indexed.. this field just stores the month that transactions occured... ie, purchase_date = '03-MAR-13' would have accounting_month = '01-MAR-13'
The transactions table has about 20 million records in the time frame from '01-JAN-13' and '31-AUG-13'
When I run the above query, it has taken more than 40 minutes to come back, any ideas or tips?
As others have already commented, the best is to add an index that will cover the query, So:
Contact the Database administrator and request that they add an index on (customer_id, purchase_date) because the query is doing a table scan otherwise.
Sidenotes:
Use date and not string literals (you may know that and do it already, still noted here for future readers)
You don't have to put the customer_id in the SELECT list and if you remove it from there, it can be removed from the GROUP BY as well so the query becomes:
select count(*) as number_of_transactions
from transactions
where customer_id = 'FKJ90838485'
and purchase_date between DATE '2013-01-01' and DATE '2013-08-31' ;
If you don't have a WHERE condition on customer_id, you can have it in the GROUP BY and the SELECT list to write a query that will count number of transactions for every customer. And the above suggested index will help this, too:
select customer_id, count(*) as number_of_transactions
from transactions
where purchase_date between DATE '2013-01-01' and DATE '2013-08-31'
group by customer_id ;
This is just an idea that came up to me. It might work, try running it and see if it is an improvement over what you currently have.
I'm trying to use the transaction_id, which you've said is indexed, as much as possible.
WITH min_transaction (tran_id)
AS (
SELECT MIN(transaction_ID)
FROM TRANSACTIONS
WHERE
CUSTOMER_ID = 'FKJ90838485'
AND purchase_date >= '01-JAN-13'
), max_transaction (tran_id)
AS (
SELECT MAX(transaction_ID)
FROM TRANSACTIONS
WHERE
CUSTOMER_ID = 'FKJ90838485'
AND purchase_date <= '31-AUG-13'
)
SELECT customer_id, count(transaction_id)
FROM transactions
WHERE
transaction_id BETWEEN min_transaction.tran_id AND max_transaction.tran_id
GROUP BY customer_ID
May be this will run faster since it look at the transaction_id for the range instead of the purchase_date. I also take in consideration that accounting_month is indexed :
select customer_id, count(*)
from transactions
where customer_id = 'FKJ90838485'
and transaction_id between (select min(transaction_id)
from transactions
where accounting_month = '01-JAN-13'
) and
(select max(transaction_id)
from transactions
where accounting_month = '01-AUG-13'
)
group by customer_id
May be you can also try :
select customer_id, count(*)
from transactions
where customer_id = 'FKJ90838485'
and accounting_month between '01-JAN-13' and '01-AUG-13'
group by customer_id