Multiple Aggregation in SQL - sql

loan_no loan_amt contact date customer_id salesman_id
I have the following table. I need to somehow get the average of loan_no and the average of loan_amt for the people with more than one loan_no. I need to somehow plug in the avg and count functions.
I am seriously struggling with that. I was also thinking of a pivot function.
I would really appreciate it if someone can suggest a SQL code
My efforts so far:
select count (loan_no), tcustomer_id
from table
group by customer_id
having count (loan_no) > 1
Now I just do not know how to also include the avg function.

Not sure why do you need average Loan_no but you can still get it through -
select customer_id, avg(loan_no), avg(loan_amt)
from (select *, count(*) over(partition by customer_id) cnt
from table)
where cnt > 1
group by customer_id

You can use two levels of aggregation:
select avg(num_loans), sum(total) / sum(num_loans)
from (select customer_id, count(*) as num_loans, sum(loan_amt) as total
from table
group by customer_id
) t
where num_loans > 1;

Related

Best approach to display all the users who have more than 1 purchases in a month in SQL

I have two tables in an Oracle Database, one of which is all the purchases done by all the customers over many years (purchase_logs). It has a unique purchase_id that is paired with a customer_id.The other table contains the user info of all the customers. Both have a common key of customer_id.
I want to display the user info of customers who have more than 1 unique item (NOT the item quantity) purchased in any month (i.e if A customer bought 4 unique items in february 2020 they would be valid as well as someone who bought 2 items in june). I was wondering what should my correct approach be and also how to correct execute that approach.
The two approaches that I can see are
Approach 1
Count the overall number of purchases done by all customers, filter the ones that are greater than 1 and then check if they any of them were done within a month.
Use this as a subquery in the where clause of the main query for retrieving the customer info for all the customer_id which match this condition.
This is what i've done so far,this retrieves the customer ids of all the customers who have more than 1 purchases in total. But I do not understand how to filter out all the purchases that did not occur in a single arbitrary month.
SELECT * FROM customer_details
WHERE customer_id IN (
SELECT cust_id from purchase_logs
group by cust_id
having count(*) >= 2);
Approach 2
Create a temporary table to Count the number of monthly purchases of a specific user_id then find the MAX() of the whole table and check if that MAX value is bigger than 1 or not. Then if it is provide it as true for the main query's where clause for the customer_info.
Approach 2 feels like the more logical option but I cannot seem to understand how to write the proper subquery for it as the command MAX(COUNT(customer_id)) from purchase_logs does not seem to be a valid query.
This is the DDL diagram.
This is the Sample Data of Purchase_logs
Customer_info
and Item_info
and the expected output for this sample data would be
It is certainly possible that there is a simpler approach that I am not seeing right now.
Would appreciate any suggestions and tips on this.
You need this query:
SELECT DISTINCT cust_id
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1;
to get all the cust_ids of the customers who have more than 1 unique item purchased in any month and you can use with the operator IN:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT DISTINCT cust_id -- here DISTINCT may be removed as it does not make any difference when the result is used with IN
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1
);
One approach might be to try
with multiplepurchase as (
select customer_id,month(purchasedate),count(*) as order_count
from purchase_logs
group by customer_id,month(purchasedate)
having count(*)>=2)
select customer_id,username,usercategory
from mutiplepurchase a
left join userinfo b
on a.customer_id=b.customer_id
Expanding on #MT0 answer:
SELECT *
FROM customer_details CD
WHERE exists (
SELECT cust_id
FROM purchase_logs PL
where CD.customer_id = PL.customer_id
GROUP BY cust_id, item_id, to_char(purchase_date,'YYYYMM')
HAVING count(*) >= 2
);
I want to display the user info of customers who have more than 1 purchases in a single arbitrary month.
Just add a WHERE filter to your sub-query.
So assuming that you wanted the month of July 2021 and you had a purchase_date column (with a DATE or TIMESTAMP data type) in your purchase_logs table then you can use:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT cust_id
FROM purchase_logs
WHERE DATE '2021-07-01' <= purchase_date
AND purchase_date < DATE '2021-08-01'
GROUP BY cust_id
HAVING count(*) >= 2
);
If you want the users where they have bought two-or-more items in any single calendar month then:
SELECT *
FROM customer_details c
WHERE EXISTS (
SELECT 1
FROM purchase_logs p
WHERE c.customer_id = p.cust_id
GROUP BY cust_id, TRUNC(purchase_date, 'MM')
HAVING count(*) >= 2
);

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

Counting the number of times a given ID has a specific value, and then using an arithmetic operator

Running a query in Hive using Apache, and I want to count the number of times a given ID has an order number, and then only include ID's which have at least 3 orders. I used something like this to aggregate the values:
select customer_id, count (distinct order_id)
from customer_table
group by customer_id
What's a good way of pulling only customer_id's that have more than 3 orders? I tried adding a where clause with an arithmetic operator can't get it to work (e.g. where count (distinct claim_id) is >= 3)
You need to use HAVING clause:
select customer_id, count(distinct order_id)
from customer_table
group by customer_id
having count(distinct order_id) >= 3
You can't have a group by and distinct in the same query.
Please see open hive Jira ticket
I have tested the below script in hive and it works for me.
select customer_id, order_id, count(1) as counting from customer_table
group by customer_id, order_id
having counting >= 3

Is it possible to calculate the sum of each group in a table without using group by clause

I am trying to find out if there is any way to aggregate a sales for each product. I realise I can achieve it either by using group-by clause or by writing a procedure.
example:
Table name: Details
Sales Product
10 a
20 a
4 b
12 b
3 b
5 c
Is there a way possible to perform the following query with out using group by query
select
product,
sum(sales)
from
Details
group by
product
having
sum(sales) > 20
I realize it is possible using Procedure, could it be done in any other way?
You could do
SELECT product,
(SELECT SUM(sales) FROM details x where x.product = a.product) sales
from Details a;
(and wrap it into another select to simulate the HAVING).
It's possible to use analytic functions to do the sum calculation, and then wrap that with another query to do your filtering.
See and play with the example here.
select
running_sum,
OwnerUserId
from (
select
id,
score,
OwnerUserId,
sum(score) over (partition by OwnerUserId order by Id) running_sum,
last_value(id) over (partition by OwnerUserId order by OwnerUserId) last_id
from
Posts
where
OwnerUserId in (2934433, 10583)
) inner_q
where inner_q.id = inner_q.last_id
--and running_sum > 20;
We keep a running sum going on the partition of the owner (product), and we tally up the last id for the same window, which is the ID we'll use to get the total sum. Wrap it all up with another query to make sure you get the "last id", take the sum, and then do any filtering you want on the result.
This is an extremely round-about way to avoid using GROUP BY though.
If you don't want nested select statements (run slower), use CASE:
select
sum(case
when c.qty > 20
then c.qty
else 0
end) as mySum
from Sales.CustOrders c

See whether an item appears more than once in a database column

I want to check if a piece of data appears more than once in a particular column in my table using SQL. Here is my SQL code of what I have so far:
select * from AXDelNotesNoTracking where count(salesid) > 1
salesid is the column I wish to check for, any help would be appreciated, thanks.
It should be:
SELECT SalesID, COUNT(*)
FROM AXDelNotesNoTracking
GROUP BY SalesID
HAVING COUNT(*) > 1
Regarding your initial query:
You cannot do a SELECT * since this operation requires a GROUP BY
and columns need to either be in the GROUP BY or in an aggregate
function (i.e. COUNT, SUM, MIN, MAX, AVG, etc.)
As this is a GROUP BY operation, a HAVING clause will filter it
instead of a WHERE
Edit:
And I just thought of this, if you want to see WHICH items are in there more than once (but this depends on which database you are using):
;WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY SalesID ORDER BY SalesID) AS [Num]
FROM AXDelNotesNoTracking
)
SELECT *
FROM cte
WHERE cte.Num > 1
Of course, this just shows the rows that have appeared with the same SalesID but does not show the initial SalesID value that has appeared more than once. Meaning, if a SalesID shows up 3 times, this query will show instances 2 and 3 but not the first instance. Still, it might help depending on why you are looking for multiple SalesID values.
Edit2:
The following query was posted by APC below and is better than the CTE I mention above in that it shows all rows in which a SalesID has appeared more than once. I am including it here for completeness. I merely added an ORDER BY to keep the SalesID values grouped together. The ORDER BY might also help in the CTE above.
SELECT *
FROM AXDelNotesNoTracking
WHERE SalesID IN
( SELECT SalesID
FROM AXDelNotesNoTracking
GROUP BY SalesID
HAVING COUNT(*) > 1
)
ORDER BY SalesID
How about:
select salesid from AXDelNotesNoTracking group by salesid having count(*) > 1;
To expand on Solomon Rutzky's answer, if you are looking for a piece of data that shows up in a range (i.e. more than once but less than 5x), you can use
having count(*) > 1 and count(*) < 5
And you can use whatever qualifiers you desire in there - they don't have to match, it's all just included in the 'having' statement.
https://webcheatsheet.com/sql/interactive_sql_tutorial/sql_having.php
try this:
select salesid,count (salesid) from AXDelNotesNoTracking group by salesid having count (salesid) >1