Complex Aggregations in SQL - sql

All I want to do is get a data set that shows me how many orders were placed and how many calls were made for each order. I also need the date of the calls and the date of the order, in the same table. The table is 10M+ rows, so aggregation of the result set is essential for analysis. The only analysis I want to do is the sum of calls/the total orders, and to be able to see how many support_tickets were generated from orders within an order range, up to a call_date. Very simple, but surprisingly complex to code up. Here is my attempt. I have also tried to change the below into a union, but still get wrong aggregate results.
-- The Query:-
SELECT
category_name
count(order_code)
order_date
sum(support_ticket_call)
call_date
FROM
(Select distinct name, order_code, order_date from table1) b
left join
(select count(call_ids), call_date FROM table2) b
on b.order_ID_code = a.order_id_code
group by category_name, order_date, call_date
Whenever there are no support_ticket_calls, the call_date is NULL, as you would expect. The count of orders is like 60,000 though, which is different from the usual 12 or so in the rest of the result set. I know something is wrong with this query, but It's driving me insane trying to solve it, literally all day so far.

It's a little difficult to answer this question without sample data and expected results, but the comment was getting too long.
You have several problems with your current query. First you need to join on the date fields by using on criteria. You also need to add group by to those queries that use aggregation. Finally, where does support_ticket_call come from? Can I presume it's the alias for the count(call_ids)?
Something like this should get you close:
SELECT
a.name as category_name,
count(a.order_code),
sum(b.support_ticket_call),
a.order_date as call_date
FROM
(Select distinct name, order_code, order_date
from table1) a
left join
(select count(call_ids) as support_ticket_call, call_date
from table2 group by call_date) b on a.order_date = b.call_date
group by a.name, a.order_date

Related

SQL How to pull in all records that don't contain

This is a bit of a trick question to explain, but I'll try my best.
The essence of the question is that I have a employee salary table and the columns are like so,: Employee ID, Month of Salary, Salary (Currency).
I want to run a select that will show me all of the employees that don't have a record for X month.
I have attached an image to assist in the visualising of this, and here is an example of what UI would want from this data:
Let's say from this small example that I want to see all of the employees that weren't paid on the 1st October 2021. From looking I know that employee 3 was the only one paid and 1 and 2 were not paid. How would I be able to query this on a much larger range of data without knowing which month it could be that they weren't paid?
You need to join your EmployeeSalary table against a list of expected EmployeeID/MonthOfSalary values, and determine the gaps - the instances where there is no matching record in the EmployeeSalary table. A LEFT OUTER JOIN can be used here, whenever there's no matching record / missing record in your EmployeeSalary table, the LEFT OUTER JOIN will give you NULL.
The following query shows how to perform the LEFT OUTER JOIN, however note that I've joined your table on itself to get the list of EmployeeID and MonthOfSalary values. You would be better to join these from other tables, i.e. I assume you have an Employee table with all the IDs in it, which would be more efficient (and more accurate) to use, than building the ID list from the EmployeeSalary table (like I've done).
SELECT EmployeeList.EmployeeID, MonthList.MonthOfSalary
FROM (SELECT DISTINCT MonthOfSalary FROM EmployeeSalary) MonthList
JOIN (SELECT DISTINCT EmployeeID FROM EmployeeSalary) EmployeeList
LEFT OUTER JOIN EmployeeSalary
ON MonthList.MonthOfSalary = EmployeeSalary.MonthOfSalary
AND EmployeeList.EmployeeID = EmployeeSalary.EmployeeID
WHERE EmployeeSalary.EmployeeID IS NULL
You need first to get the latest value, then to calculate the difference and make a filter on it. The filter can be done thanks to having clause.
I propose you the following starting point, that you might need to adapt, at least to cast some formats according to your column types.
with latest_pay as (
-- Filter to get, for each employee, the latest paid month
select Employee_ID, Month, Salary, max(month) as latest_pay_month
from your_table
group by Employee_ID
)
-- Look for employees not paid since more than 'your_treshold' months
select Employee_ID, latest_pay_month, Salary, datediff(latest_pay_month, getdate(), Month) as latest_paid_month_delay
from latest_pay
having datediff(latest_pay_month, getdate(), Month) > your_threshold
Btw, I know it's an example, but avoid using column names such as Month, which would lead to confusions and errors with SQL keywords
This is ideally where you would use a calendar table - having one available is handy for tasks such as this where you need to find missing dates.
You can build one on the fly, I have done so in this example however you would normally have a permanant table to use.
In order to determin which rows are missing you need to generate a list of expected rows, an outer join to your actual data will then reveal the missing rows.
So here we have a CTE that generates a list of dates (based on a date range you can set), followed by another to give a list of all the EmployeeId values.
You expect each employeeId to have a row for each month, so we do a cross join to generate the list of expected results, we then outer join with the actual data and filter to the null rows, these are the employees who have no been paid for that month.
See example DB<>Fiddle
declare #from date='20210101', #to date='20211001';
with dates as (
select DateAdd(month,n,#from) dt from (
select top(100) Row_Number() over(order by (select null))-1 n from master.dbo.spt_values
)v
), e as (select distinct employeeId from t)
select dt, e.EmployeeId
from dates d cross join e
left join t on DatePart(month,d.dt)=DatePart(month,t.PaidDate) and t.EmployeeId=e.EmployeeId
where d.dt<=#to
and t.EmployeeId is null

I can't get max and min from my table

How come this query returns an error?
select CUSTOMER, TOTAL_VALUE
from CUSTOMER, SALES
where TOTAL_VALUE in (select max(TOTAL_VALUE), min(TOTAL_VALUE)
from SALES)
When I just do max(TOTAL_VALUE) or min(TOTAL_VALUE) alone it works perfectly. But I need to get the min number in TOTAL_VALUE and max number in TOTAL_VALUE. Can anyone help me figure out why this query won't work for me? I would like to keep the structure that i have (using the in operator and nested subquery)
It returns an error because the subquery is returning two values, not one. Here is one fix:
select CUSTOMER, TOTAL_VALUE
from CUSTOMER cross join
SALES join
(select max(TOTAL_VALUE) as maxt, min(TOTAL_VALUE) as mint
from sales
) sm
where s.total_value in (sm.maxt, sm.mint);
That said, the query makes no sense. There you are going to get a list of every customer along with the value of the overall minimum and maximum sales.
This does answer your question. If you have another question, then provide sample data, desired results, in another question.
Try this (Joining SALES and CUSTOMER tables):
select C.CUSTOMER, MIN(TOTAL_VALUE), MAX(TOTAL_VALUE)
from SALES S
join CUSTOMER C on S.Customer_ID=C.Customer_ID
group by C.CUSTOMER
order by C.CUSTOMER

PostgreSQL how to compare a column's numeric value against the SUM of another numeric value?

I am new to PostgreSQL and restored this Database in order to practice my Queries. It contains the following Tables:
What is the best query to find how many orders have an order_total that is less than the sum of their line_total(s)?
This is the query I have but I doubt that my number is accurate. I feel like I am doing something wrong:
select COUNT(order_total) from orders
join order_lines
on orders.id = order_lines.order_id
having count(order_total) < sum(line_total)
Am I querying correctly or not?
Thanks
Pill
perhaps something like this
select o.order_total,sum( l.line_total) as sum_line_total
from orders o join order_lines l on o.orders.id=l.order_id
group by o.order_total
having o.order_total < sum(l.line_total)
That answer gave more than one result. I experimented and came across the correct answer by using the following sub-query:
select count(*) from orders,
(Select order_id, sum(line_total) from order_lines group by order_id) a
where order_total < a.sum and order_id = orders.id;
Thanks though for the response
Pill

Several MAX values based on another column value

I'm trying to write a SQL query in MS ACCESS and I've narrowed it down to the table below, but can't seem to get the last thing right without making several extremely large querys.
Here's the strucuture of thetable I'm trying to query:
The results I want: MemberId and year where memberId had most visits in that year.(That is which memberId had most visits 2014, which had most visits 2015 etc..and I also want the relevant year to be shown in the result)
Thanks!
Sounds like you need to determine MAX(Visits) by year in a subquery, then JOIN to that:
SELECT a.*,b.Max_Visits
FROM YourTable a
JOIN (SELECT Year,MAX(Visits) AS Max_Visits
FROM YourTable
GROUP BY Year
) b
ON a.Year = b.Year
AND a.Visits = b.Max_Visits
If you want to see all members and not just those that had the most visits per year, you can change from JOIN to LEFT JOIN
If there's a tie, this returns both members.

update with the latest date

I have a table as shown below:
I need to update them into the latest last_order_date, which the table can be shown as below:
I have 20000 plus records, I need a query to update them at once.
Thank you for spending your time to look at it.
Using join on max calculating subquery
UPDATE t
SET t.last_order_date =a.maxDate
FROM tableName t
INNER JOIN
( SELECT cust_id ,MAX(last_order_date) As maxDate
FROM tableName GROUP BY cust_id ) a
ON a.cust_id =t.cust_id
This should work for you:
UPDATE [table_name]
SET last_order_date = (SELECT Max([b].last_order_date)
FROM [table_name] [b]
WHERE [b].cust_id = [table_name].cust_id);
You could calculate the maximum dates in a CTE using a window MAX(), then reference the CTE in the main (UPDATE) statement:
WITH maxdates AS (
SELECT
last_order_date,
actual_last_order_date = MAX(last_order_date) OVER (PARTITION BY cust_id)
FROM atable
)
UPDATE maxdates
SET last_order_date = actual_last_order_date
;
However, duplicating this piece of information like this doesn't seem to make much sense. You should probably consider storing last_order_date in a table where cust_id is the primary key (probably some customers table). Or even abandon storing it in a table and calculate it dynamically every time: 20,000 rows isn't really that much. (Unless you have serious expectations for that number to grow rapidly and soon.)