SQL SELECT data from table group by different data - sql

Display pymtmode, and total number of payments for those payments which were paid before the year 2015 and total number of payments should be more than 1 from the data given:
ORDERID QUOTATIONID QTYORDERED ORDERDATE STATUS PYMTDATE DELIVEREDDATE AMOUNTPAID PYMTMODE
O1001 Q1002 100 30-OCT-14 Delivered 05-NOV-14 05-NOV-14 140000 Cash
O1003 Q1003 50 15-DEC-14 Delivered 18-DEC-14 20-DEC-14 310000 Cash
O1004 Q1006 100 15-DEC-14 Delivered 25-DEC-14 30-DEC-14 80000 Cheque
O1005 Q1002 50 30-JAN-15 Delivered 01-FEB-15 03-FEB-15 70000 Cheque
O1006 Q1008 75 20-FEB-15 Delivered 22-FEB-15 23-FEB-15 161250 Cash
I've tried the below code for fetching Year and select only values before the year 2015 and grouping by year.
SELECT pymtmode, COUNT(*) as pymtcount
FROM orders
GROUP BY to_char(pymtdate, 'Year')
HAVING to_char(pymtdate,'Year')<2015 AND count(*)>1
I've learnt that group by columns/functions should be mentioned in SELECT statement as well. But this question and it's expected result doesn't relate with it. Clarity with basic explanations would help
Expected Result
PYMTMODE PYMTCOUNT
Cash 2
Thanks!

Your expected result needs pymtmode to be selected, so you must GROUP BY pymtmode and not GROUP BY to_char(pymtdate, 'Year'), because you don't need to get results for each year, right?.
Also the condition to_char(pymtdate,'Year')<2015 might as well be put in a WHERE clause, so to restrict the rows before aggregation:
SELECT pymtmode, COUNT(*) as pymtcount
FROM orders
WHERE EXTRACT(YEAR FROM pymtmode) < 2015
GROUP BY pymtmode
HAVING count(*) > 1

I strongly recommend using direct date comparisons for this purpose:
SELECT o.pymtmode, COUNT(*) as pymtcount
FROM orders o
WHERE o.pymtdate < DATE '2015-01-01'
GROUP BY o.pymtmode
HAVING COUNT(*) > 1;
Notes:
You want to do the filtering before the aggregation, not afterwards. I think that is the source of your confusion.
A direct comparison to dates makes it easier for the optimizer to generate the best execution plan.
Learning how to use table aliases (the o) is a good habit.

select pymtmode, count(pymtmode) pymtcount from orders where extract(year from to_date(pymtdate))<2015 group by pymtmode having count(pymtmode)>1;

Related

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Filter Context and Group by in SQL

Consider the table with 2 columns which has a new row every time a user sign in. (Approximately 10M rows and can have duplicates as users can sign in multiple times per month)
Sign in Month
MemberID
2020-10
1000000
2020-12
1000001
Now to find out the unique user each month I can use the following query.
Select Sign_in_Month, count(distinct(memberid)) as unique_users from table group by Sign_in_Month
But I want to break the distinct count in the first query further into 2 cohorts where the user either sign in the last 3 month before or didn't. (For October 2020 it would be July-Sep 2020 and for Sep 2020 it would be June-August 2020)
I used to do this in powerBI through DAX which was easy with filter context but I am not sure how to implement this in SQL.
Desire Result
Sign in Month
Total Unique Users
Unique Users who signed in in the last 3 months
Unique Users who did not sign in in the last 3 months
2020-10
4000
3000
1000
2020-11
5000
2500
2500
2020-12
3500
1500
2000
The last 2 column should add up to the second column.
How do I create some sort of indicator as to where the member has shopped in the last 3 month in reference to that particular month?
Thanks
I would approach this as follows:
Get a unique value per month.
Use lag() to see the previous month when someone logged in.
Compare the time difference.
For convenience, I would convert the sign_in_month to a date:
select sign_in_date,
count(*) as num_members,
sum(case when prev_sign_in_date >= dateadd(month, -4, sign_in_date)
then 1 else 0
end) as num_members_who_signed_in
from (select t.memberId, v.sign_in_date,
lag(v.sign_in_date) over (partition by t.memberId order by v.sign_in_date) as prev_sign_in_date
from t cross apply
(values (convert(date, t.sign_in_month, '-01'))
) v(sign_in_date)
group by t.memberId, v.sign_in_date
) t
group by sign_in_date;
Note that count(distinct) is no longer needed because the subquery takes care of that.
You can try this below script-
Note: syntax is for MSSQL, but the logic is global
select
sign_in_month,
count(distinct MemberId) as total_unique_user,
count(
CASE
WHEN CAST(sign_in_month+'-01' as Date) >= DATEADD(mm,-4,getdate()) then MemberId
else null
end
) as last_3_month,
count(
CASE
WHEN CAST(sign_in_month+'-01' as Date) < DATEADD(mm,-4,getdate()) then MemberId
else null
end
) as before_3_month
from your_table_name
GROUP BY sign_in_month

How select all data using HAVING clause in WHERE condition?

I have created a table which is keeps records of which product is sold by whom and how much each month;
month
total
product
cons_name
2020-01
10
xyz
123
2020-02
5
abc
456
2020-02
4
def
789
I was creating a query from this table to find out who has sold over 500 products on certain products since the beginning of the year, but I was a bit confused at the time of writing. Because i am not needed to query how much it sells during the year, but how much it sells each month that's i need to find. I can easily find more than 500 sales in total during a year with this query:
SELECT cons_name, product, SUM(total)
FROM TMP_REPORT
WHERE product IN ('abc','xyz')
GROUP BY cons_name, product
HAVING sum(total) > 500
But when it came to querying in detail I got this far:
SELECT
month,
product,
cons_name,
total
FROM TMP_REPORT
WHERE product IN ('abc','xyz')
AND cons_name IN
(SELECT
cons_name
FROM TMP_REPORT
WHERE
product IN ('abc','xyz')
GROUP BY
cons_name
HAVING sum(total) > 500)
The result of this query showed even the totals of sold product are not 500. For example, we would expect the cons_name named '123' to not be in the query result for only 200 sold 'abc' products in a year, but it does exist because of where clause. I knew my mistakes but I don't know that how to fix.
How can i do it?
Thanks for your help.
One approach uses SUM as an analytic function:
WITH cte AS (
SELECT t.*, SUM(total) OVER (PARTITION BY cons_name, product) sum_total
FROM TMP_REPORT t
WHERE product IN ('abc', 'xyz')
)
SELECT *
FROM cte
WHERE sum_total > 500;
I couldn't clearly understand you requirement, I assume you need a query to fetch the product that sold more than 500 unit on monthly basis. I hope below query will fetch the records you need.
-- YearlyReportCTE will Qualify the people who sold more than 500 units in total (i.e. yearly from your statement)
WITH YearlyReportCTE AS (
SELECT
product,
cons_name,
SUM(total) AS Total
FROM #TMP_REPORT
WHERE product in ('abc','xyz')
GROUP BY product,cons_name
HAVING SUM(total) > 500
)--This Query will fetch the month wise report from the qulified records
SELECT month,
TR.product,
TR.cons_name,
SUM(TR.total) AS Total
FROM #TMP_REPORT TR
JOIN YearlyReportCTE YR ON YR.cons_name = TR.cons_name
AND YR.product = TR.product
GROUP BY month,TR.product,TR.cons_name

Finding when requests are met or exceeded by customer by month

I have a table that has customers and I want to find what month the customer met or exceeded a certain number of requests.
The table has customer_id a timestamp of each request.
What I am looking for is the month (or day) that the customer met or exceeded 10000 requests. I've tried to get a running total in place but this just isn't working for me. I've left it in the code in case someone knows how I can do this.
What I have is the following:
SELECT
customer_id
, DATE_TRUNC(CAST(TIMESTAMP_MILLIS(created_timestamp) AS DATE), MONTH) as cMonth
, COUNT(created_timestamp) as searchCount
-- , SUM(COUNT (DISTINCT(created_timestamp))) OVER (ROWS UNBOUNDED PRECEDING) as RunningTotal2
FROM customer_requests.history.all
GROUP BY distributor_id, cMonth
ORDER BY 2 ASC, 1 DESC;
The representation I am after is something like this.
customer requests cMonth totalRequests
cust1 6000 2017-10-01 6000
cust1 4001 2017-11-01 10001
cust2 4000 2017-10-01 4000
cust2 4000 2017-11-01 8000
cust2 4000 2017-12-01 12000
cust2 3000 2017-12-01 3000
cust2 3000 2017-12-01 6000
cust2 3000 2017-12-01 9000
cust2 3000 2017-12-01 12000
Assuming SQL Server, try this (adjusting the cutoff at the top to get the number of transactions you need; right now it looks for the thousandth transaction per customer).
Note that this will not return customers who have not exceeded your cutoff, and assumes that each transaction has a unique date (or is issued a sequential ID number to break ties if there can be ties on date).
DECLARE #cutoff INT = 1000;
WITH CTE
AS (SELECT customer_id,
transaction_ID,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date, transaction_ID) AS RN,
COUNT(transaction_ID) OVER (PARTITION BY customer_id) AS TotalTransactions
FROM #test)
SELECT DISTINCT
customer_id,
transaction_date as CutoffTransactionDate,
TotalTransactions
FROM CTE
WHERE RN = #cutoff;
How it works:
row_number assigns a unique sequential identifier to each of a customer's transactions, in the order in which they were made. count tells you the total number of transactions a person made (assuming again one record per transaction - otherwise you would need to calculate this separately, since distinct won't work with the partition).
Then the second select returns the 1,000th (or however many you specify) row for each customer and its date, along with the total for that customer.
this is my solution.
SELECT
customerid
,SUM(requests) sumDay
,created_timestamp
FROM yourTable
GROUP BY
customerid,
created_timestamp
HAVING SUM(requests) >= 10000;
Its pretty simple. You just group according to your needs, sum up the requests and select the rows that meet your HAVING clause.
You can try the query here.
If you want a cumulative sum, you can use window functions. In Standard SQL, this looks like:
SELECT customer_id,
DATE_TRUNC(CAST(TIMESTAMP_MILLIS(created_timestamp) AS DATE), MONTH) as cMonth
COUNT(*) as searchCount,
SUM(COUNT(*)) OVER (ORDER BY MIN(created_timestamp) as runningtotal
FROM customer_requests.history.all
GROUP BY distributor_id, cMonth
ORDER BY 2 ASC, 1 DESC;

SQL query to identify records that are effective during a certain interval

I am building a SQL query that can be used to fetch records whose effective dates falls between a given start/end date time with a catch that the effective date is valid forever until there is another record with a different effective date. For example, I am showing a price table where the product prices are effective based on certain dates
ProductID EffectiveDate Price
Milk 01/01/2012 3
Milk 02/01/2012 2.85
Milk 03/01/2012 3.1
Milk 03/15/2012 3.4
Milk 04/01/2012 3.2
If my start/end date time is 03/01 and 03/31, then I want both the records that are dated 03/01 and 03/15, but if my start/end is 03/20 and 03/31, then I need to get just the record 03/15
The query I have works, however I am trying to see if there is a much efficient way in getting the desired results.
SELECT productid,
effectivedate,
price
FROM product p
WHERE ( p.effectivedate >= '03/20/2012'
AND c.effectivedate <= '03/31/2012' )
OR p.effectivedate = (SELECT TOP 1 pp.[effectivedate]
FROM product pp
WHERE pp.effectivedate <= '03/20/2012'
AND pp.productid = p.productid
ORDER BY pp.effectivedate DESC)
The reason I am looking to improve is the table can get quite bigger and i am just showing an example here as product, however the final table has many more columns.
Thanks for any suggestions.
For a structure like this, I think it is best to add in the end dates and then work from there:
with BetterDataStructure as (
select p.*,
(select min(p2.EffectiveDate) from product p2 where p2.productId = p.productId and p2.EffectiveDate > p.EffectiveDate
) as EndDate
from product p
)
select *
from BetterDataStructure bds
where bds.startDate <= '2012-03-31' and
(bds.endDate > '2013-03-01' or bds.endDate is NULL);
If you are using SQL Server 2012, then you can use the lead() function instead of a correlated subquery.
Something like this should work:
select somefields, max(effectivedate) maxdate
from product
where effectivedate >= #startdate
and effectivedate < #TheDayAfterEndDate
and not exists
(select *
from product
where effectivedate > #EndDate)
group by somefields