Count transactions within a month only once - sql

I have a situations like below:
I have two database tables. The first table, which I will call TB1 contains all the salaries that the client credits & also the date when the transaction is made.
The second table, which I will call TB2, contains all the products the client has in the bank.
My purpose is to find the number of salaries the client has got before the date he/she got a product (OVERDRAFT in my case) in our bank.
Till now, everything works fine and I have made the query to extract the necessary data.
The only problem, is that I need to improve the query. So, if a certain client has got more than 1 salary (for example every 15 days) within the same month of the same year, the salary is counted only once.
How can I do that PLEASE?
The query is like below:
SELECT TB1.customer_id, COUNT(TB1.customer_id)
FROM table_1 TB1
JOIN
( SELECT TB2.CUSTOMER_ID, TB2.OD_START_DATE
FROM table_2 TB2
JOIN table_2 TB2_MAX
ON TB2.CUSTOMER_ID = TB2_MAX.CUSTOMER_ID
HAVING TB2.od_start_date = MAX(TB2.od_start_date)
GROUP BY TB2.customer_id, TB2.od_start_date
) TB2
ON TB1.CUSTOMER_ID = TB2.CUSTOMER_ID
WHERE TB1.DATE_FROM < TB2.OD_START_DATE
GROUP BY TB1.CUSTOMER_ID
PS: DATE_FROM field contains the date when the transaction is made, while OD_START_DATE field contains the date when the LATEST product is opened.

JOIN in your inner query is redundant. You simply need a MAX date for each customer.
In your outer query you should be counting the DATE_FROM, and not Customer_Id. Since you want to count only once for transactions in a month, Convert DATE_FROM to year month combination and use DISTINCT to count only once.
SELECT TB1.customer_id, COUNT(DISTINCT TO_CHAR(TB1.DATE_FROM,'YYYYMM'))
FROM table_1 TB1
JOIN
( SELECT CUSTOMER_ID, MAX(OD_START_DATE) AS OD_START_DATE
FROM table_2
GROUP BY customer_id
) TB2
ON TB1.CUSTOMER_ID = TB2.CUSTOMER_ID
WHERE TB1.DATE_FROM < TB2.OD_START_DATE
GROUP BY TB1.CUSTOMER_ID

Related

Best approach to display all the users who have more than 1 purchases in a month in SQL

I have two tables in an Oracle Database, one of which is all the purchases done by all the customers over many years (purchase_logs). It has a unique purchase_id that is paired with a customer_id.The other table contains the user info of all the customers. Both have a common key of customer_id.
I want to display the user info of customers who have more than 1 unique item (NOT the item quantity) purchased in any month (i.e if A customer bought 4 unique items in february 2020 they would be valid as well as someone who bought 2 items in june). I was wondering what should my correct approach be and also how to correct execute that approach.
The two approaches that I can see are
Approach 1
Count the overall number of purchases done by all customers, filter the ones that are greater than 1 and then check if they any of them were done within a month.
Use this as a subquery in the where clause of the main query for retrieving the customer info for all the customer_id which match this condition.
This is what i've done so far,this retrieves the customer ids of all the customers who have more than 1 purchases in total. But I do not understand how to filter out all the purchases that did not occur in a single arbitrary month.
SELECT * FROM customer_details
WHERE customer_id IN (
SELECT cust_id from purchase_logs
group by cust_id
having count(*) >= 2);
Approach 2
Create a temporary table to Count the number of monthly purchases of a specific user_id then find the MAX() of the whole table and check if that MAX value is bigger than 1 or not. Then if it is provide it as true for the main query's where clause for the customer_info.
Approach 2 feels like the more logical option but I cannot seem to understand how to write the proper subquery for it as the command MAX(COUNT(customer_id)) from purchase_logs does not seem to be a valid query.
This is the DDL diagram.
This is the Sample Data of Purchase_logs
Customer_info
and Item_info
and the expected output for this sample data would be
It is certainly possible that there is a simpler approach that I am not seeing right now.
Would appreciate any suggestions and tips on this.
You need this query:
SELECT DISTINCT cust_id
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1;
to get all the cust_ids of the customers who have more than 1 unique item purchased in any month and you can use with the operator IN:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT DISTINCT cust_id -- here DISTINCT may be removed as it does not make any difference when the result is used with IN
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1
);
One approach might be to try
with multiplepurchase as (
select customer_id,month(purchasedate),count(*) as order_count
from purchase_logs
group by customer_id,month(purchasedate)
having count(*)>=2)
select customer_id,username,usercategory
from mutiplepurchase a
left join userinfo b
on a.customer_id=b.customer_id
Expanding on #MT0 answer:
SELECT *
FROM customer_details CD
WHERE exists (
SELECT cust_id
FROM purchase_logs PL
where CD.customer_id = PL.customer_id
GROUP BY cust_id, item_id, to_char(purchase_date,'YYYYMM')
HAVING count(*) >= 2
);
I want to display the user info of customers who have more than 1 purchases in a single arbitrary month.
Just add a WHERE filter to your sub-query.
So assuming that you wanted the month of July 2021 and you had a purchase_date column (with a DATE or TIMESTAMP data type) in your purchase_logs table then you can use:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT cust_id
FROM purchase_logs
WHERE DATE '2021-07-01' <= purchase_date
AND purchase_date < DATE '2021-08-01'
GROUP BY cust_id
HAVING count(*) >= 2
);
If you want the users where they have bought two-or-more items in any single calendar month then:
SELECT *
FROM customer_details c
WHERE EXISTS (
SELECT 1
FROM purchase_logs p
WHERE c.customer_id = p.cust_id
GROUP BY cust_id, TRUNC(purchase_date, 'MM')
HAVING count(*) >= 2
);

SQL How to pull in all records that don't contain

This is a bit of a trick question to explain, but I'll try my best.
The essence of the question is that I have a employee salary table and the columns are like so,: Employee ID, Month of Salary, Salary (Currency).
I want to run a select that will show me all of the employees that don't have a record for X month.
I have attached an image to assist in the visualising of this, and here is an example of what UI would want from this data:
Let's say from this small example that I want to see all of the employees that weren't paid on the 1st October 2021. From looking I know that employee 3 was the only one paid and 1 and 2 were not paid. How would I be able to query this on a much larger range of data without knowing which month it could be that they weren't paid?
You need to join your EmployeeSalary table against a list of expected EmployeeID/MonthOfSalary values, and determine the gaps - the instances where there is no matching record in the EmployeeSalary table. A LEFT OUTER JOIN can be used here, whenever there's no matching record / missing record in your EmployeeSalary table, the LEFT OUTER JOIN will give you NULL.
The following query shows how to perform the LEFT OUTER JOIN, however note that I've joined your table on itself to get the list of EmployeeID and MonthOfSalary values. You would be better to join these from other tables, i.e. I assume you have an Employee table with all the IDs in it, which would be more efficient (and more accurate) to use, than building the ID list from the EmployeeSalary table (like I've done).
SELECT EmployeeList.EmployeeID, MonthList.MonthOfSalary
FROM (SELECT DISTINCT MonthOfSalary FROM EmployeeSalary) MonthList
JOIN (SELECT DISTINCT EmployeeID FROM EmployeeSalary) EmployeeList
LEFT OUTER JOIN EmployeeSalary
ON MonthList.MonthOfSalary = EmployeeSalary.MonthOfSalary
AND EmployeeList.EmployeeID = EmployeeSalary.EmployeeID
WHERE EmployeeSalary.EmployeeID IS NULL
You need first to get the latest value, then to calculate the difference and make a filter on it. The filter can be done thanks to having clause.
I propose you the following starting point, that you might need to adapt, at least to cast some formats according to your column types.
with latest_pay as (
-- Filter to get, for each employee, the latest paid month
select Employee_ID, Month, Salary, max(month) as latest_pay_month
from your_table
group by Employee_ID
)
-- Look for employees not paid since more than 'your_treshold' months
select Employee_ID, latest_pay_month, Salary, datediff(latest_pay_month, getdate(), Month) as latest_paid_month_delay
from latest_pay
having datediff(latest_pay_month, getdate(), Month) > your_threshold
Btw, I know it's an example, but avoid using column names such as Month, which would lead to confusions and errors with SQL keywords
This is ideally where you would use a calendar table - having one available is handy for tasks such as this where you need to find missing dates.
You can build one on the fly, I have done so in this example however you would normally have a permanant table to use.
In order to determin which rows are missing you need to generate a list of expected rows, an outer join to your actual data will then reveal the missing rows.
So here we have a CTE that generates a list of dates (based on a date range you can set), followed by another to give a list of all the EmployeeId values.
You expect each employeeId to have a row for each month, so we do a cross join to generate the list of expected results, we then outer join with the actual data and filter to the null rows, these are the employees who have no been paid for that month.
See example DB<>Fiddle
declare #from date='20210101', #to date='20211001';
with dates as (
select DateAdd(month,n,#from) dt from (
select top(100) Row_Number() over(order by (select null))-1 n from master.dbo.spt_values
)v
), e as (select distinct employeeId from t)
select dt, e.EmployeeId
from dates d cross join e
left join t on DatePart(month,d.dt)=DatePart(month,t.PaidDate) and t.EmployeeId=e.EmployeeId
where d.dt<=#to
and t.EmployeeId is null

Select difference between two tables

I want to list four columns, date, hourly count, daily count and difference between two counts.
I have used union all for two tables, but I am getting 2rows as shown in the image:
Select a.date, a.hour,b.daily,sum(a.hour-b.daily)
from (select date,count(*) hour,''daily
From table a union all select '' hour,count(*) daily from table b)
Group by date, daily, hourly..
Please suggest to me a solution.
I see that the code supplied uses a UNION to achieve the output. This would be better served by using a JOIN of some kind.
The result is the total number of rows in table_a grouped by the date subtracted from the total number of rows in table_b grouped by the date.
This code is untested but should give a good indication of how to achieve this:
SELECT a.date,
a.hour,
ISNULL(b.daily, 0) AS daily,
a.hour - ISNULL(b.daily) AS difference
FROM (
SELECT date,
COUNT(*) AS hour
FROM table_a
GROUP BY date
) a
LEFT JOIN (
SELECT date,
COUNT(*) AS daily
FROM table_b
GROUP BY date
) b ON b.date = a.date
ORDER BY a.date;
This works by:
Calculating the count per date in table_a.
Calculating the count per date in table_b.
Joining all results from table_a with those matching in table_b.
Outputting the date, the hour from table_a, the daily (or 0 if NULL) from table_b, and the difference between the two.
Notes:
I have renamed table a and table b to table_a and table_b. I presume these are not the actual table names
An INNER JOIN may be preferable if you only want results that have matching date columns in both tables. Using the LEFT JOIN will return all results from table_a regardless of whether table_b has an entry.
I'm not convinced that date is an allowed column name but I have reproduced it in the code as per the example given by OP.
Your method is fine. Your group by columns are not correct:
Select date, sum(hourly) as hourly, sum(daily) as daily,
sum(hourly) - sum(daily) as diff
from ((select date, count(*) as hourly, 0 as daily
from table a
group by date
) union all
(select date, 0 as hourly, count(*) as daily
from table b
group by date
)
) ab
group by date;
The key idea is that the outer query aggregates only by date -- and you still need aggregation functions there as well.
You have other errors in your subquery, such as missing group bys and date columns. I assume those are transcription errors.

Showing zeroes in sql count

I`m using redshift and trying to count different things by days, but its not showing when the count in table 2 is zero. How can i make it show count zero?
SELECT TO_CHAR(date1,'dd') AS day,
COUNT(*) as Volume,sum(CASE WHEN status = 'ANSWERED' THEN 1 ELSE 0 END )as ANSWERED , t2.Volume AS TRANSFERS
FROM table1 t1
RIGHT JOIN (SELECT TO_CHAR(date2,'dd') AS day,
COUNT(*) as Volume
FROM table2
WHERE TO_CHAR(date2,'yyyy_MM') IN (SELECT DISTINCT TO_CHAR(date2,'yyyy_MM')
FROM table2
WHERE date2 BETWEEN DATE ('2016-11-01') AND DATE ('2016-12-30'))
AND type = 'Active'
GROUP BY day) t2 ON TO_CHAR(date1,'dd') = day
WHERE TO_CHAR(date1,'yyyy_MM') IN (SELECT DISTINCT TO_CHAR(date1,'yyyy_MM')
FROM table1
WHERE date1 BETWEEN DATE ('2016-11-01') AND DATE ('2016-12-30'))
GROUP BY 1,4
ORDER BY 1
Notice that you used a right join between the tables. This means that any row from the first table that doesn't have a matching day in the second table will not display.
If you're new with SQL joins you can refer to this image that explains it.
If your first (or left table) contains all of the unique days that should show up in the result, just switch the "right" to a "left" join.

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');