SQL Server select on aggregate function - sql

I have a challenge regarding the use of an aggregate function in a where statement.
SELECT [tb_users].[id_user]
,[name]
,[Title]
,[FirstName]
,[LastName]
,[type]
,sum ([tb_LockAuditTrail].[price]) as SumPrice
FROM [SALTO_SPACE].[dbo].[tb_Users]
join [tb_LockAuditTrail]
on [tb_Users].[Cardcode] = [tb_LockAuditTrail].[Cardcode]
where [type] = 1
group by [tb_users].[id_user]
,[name]
,[Title]
,[FirstName]
,[LastName]
,[type]
order by [SumPrice] desc
This is running fine. From this dataset I want to select only the records above a certain SumPrice level. How do I do this?
I cannot use the alias SumPrice in the where statement because aliases are not allowed in where statements.
I also cannot use the aggregate SUM function in a where statement. So for now I don't see a solution other than filtering the result afterwards in an Excel sheet.

After the group by block and before the orderby block you write
HAVING sum ([tb_LockAuditTrail].[price]) > 100
Change the 100 accordingly
Even if HAVING weren't a thing, you don't have to use excel. Every SELECT creates a block of data that can be used like a table so you could wrap it all in brackets, give it an alias, and select from the query as if it were a table:
SELECT *
FROM
(
SELECT productcategory, sum(price) as sumprice
FROM products
GROUP BY productcategory
) summed
WHERE
summed.sumprice > 100
Get down with the notion that tables are only one kind of "block of data" that queries can use; you'll considerably increase your SQL querying ability if you keep in mind the fact FROM can use blocks of data from other sources such as queries and table valued functions. Here's another example:
SELECT *
FROM
products p
INNER JOIN
(
SELECT productcategory, sum(price) as sumprice
FROM products
GROUP BY productcategory
) sumcat
ON p.productcategory = sumcat.productcategory
INNER JOIN
(
SELECT warehouse, count(*) as sumprice
FROM products
GROUP BY warehouse
) countwh
ON p.warehouse = countwh.warehouse
WHERE
...
You can also use the WITH command, to alias a query so you can then use it like a table:
WITH
sumcat AS (
SELECT productcategory, sum(price) as sumprice
FROM products
GROUP BY productcategory
),
countwh as (
SELECT warehouse, count(*) as sumprice
FROM products
GROUP BY warehouse
)
--now use them like tables
SELECT *
FROM
products p
INNER JOIN sumcat s ON p.productcategory = s.productcategory
INNER JOIN countwh c ON p.warehouse = c.warehouse
This is one way how we can get every product detail but also the sum of all the prices in the category and the count of all the products in the various warehouses - perhaps pointless data but I'm demonstrating how you can use subqueries to generate blocks of data that can be joined to tables, and also each detail line can have a summary aspect too. Ordinarily you have to lose detail to generate summaries so the learning notions here are:
subqueries are blocks of data just like tables and can be joined just like tables
give everything an alias and use it
to get detail and summary together generate the summary and join it to the detail
a table can appear multiple times in a query and can be joined to itself even, so long as the two occurrences of the table have different aliases

Related

SQL Server query for related products

I am trying to get related products but the issue which I'm facing is that there is product photos table which has one-to-many relationship with products table, so when I get products by matching category Id it also returns multiple product photos with that product which i do not want. I want only one product photo from product photos table of specific product. Is there any way to use distinct in joins or any other way? what I have done so far....
SELECT [Product].[ID],
,[Thumbnail]
,[ProductName]
,[Model]
,[SKU]
,[Price]
,[IsExclusive]
,[DiscountPercentage]
,[DiscountFixed]
,[NetPrice]
,[Url]
FROM [dbo].[Product]
INNER JOIN [ProductPhotos] ON [ProductPhotos].[ProductID]=[Product].[ID]
INNER JOIN [ProductCategories] ON [ProductCategories].[ProductID]=
[Product].[ID]
WHERE [ProductCategories].[CategoryID]=4
And the result I am getting is...
Product Photos table has
Is there any way to use distinct or group by on product Id column in product photos table to return only one row from photos table.
Instead of using inner join, use cross apply:
SELECT . . .
FROM dbo.Product p CROSS APPLY
(SELECT TOP (1) pp.*
FROM ProductPhotos pp
WHERE pp.ProductID = p.id
ORDER BY NEW_ID()
) pp INNER JOIN
ProductCategories pc
ON pc.ProductID = p.id
WHERE pc.CategoryID = 4;
Notes:
The ORDER BY NEWID() chooses a random photo. You can order by specific columns to get the earliest, latest, biggest, or whatever.
Note that I added table aliases. These make the query easier to write and to read.
You should qualify all column names in your query, so it is clear which tables they come from.
I removed the superfluous square braces. They just make the query harder to write and to read.
You can use ROW_NUMBER() to return one row for ProductID, like this:
JOIN (SELECT *,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY PhotoID) rn
FROM [ProductPhotos]) [ProductPhotos]
ON [ProductPhotos].[ProductID]=[Product].[ID] AND [ProductPhotos].rn = 1
Instead of this:
JOIN [ProductPhotos] ON [ProductPhotos].[ProductID]=[Product].[ID]
you can use sub query in join with distinct instead of joining table directly.
you can create alias and use that column as distinct in select statement, but it will create performance issues when having loads of data inside.
if you have 3 different photos for same product Id (like 2). you can use sub-query with top 1 order by PK desc to get latest picture.

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

NTILE Function and Using Inner Join in Oracle

I am supposed to use the given Database(Its pretty huge so I used codeshare) to list last names and customer numbers of top 5% of customers for each branch. To find the top 5% of customers, I decided to use the NTILE Function, (100/5 = 20, hence NTILE 20). The columns are pulled from two separate tables so I used Inner joins. For the life of me, I honesly cannot figure out where I am going wrong. I keep getting "missing expression" errors but Do not know what exactly I am missing. Here is the Database
Database: https://codeshare.io/5XKKBj
ERD: https://drive.google.com/file/d/0Bzum6VJXi9lUX1d2ZkhudTE3QXc/view?usp=sharing
Here is my SQL Query so far.
SELECT
Ntile(20) over
(partition by Employee.Branch_no
order by sum(ORDERS.SUBTOTAL) desc
) As Top_5,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
FROM
CUSTOMER
INNER JOIN ORDERS
ON
CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
GROUP BY
ORDERS.SUBTOTAL,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME;
You need to join Employee and the GROUP BY must include all non-aggregated expressions. You can use a subquery to generate the subtotals and get the NTILE in the outer query, e.g.:
SELECT
Ntile(20) over
(partition by BRANCH_NO
order by sum_subtotal desc
) As Top_5,
CUSTOMER_NO,
LNAME
FROM (
SELECT
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME,
sum(ORDERS.SUBTOTAL) as sum_subtotal
FROM CUSTOMER
JOIN ORDERS
ON CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
JOIN EMPLOYEE
ON ORDERS.EMPLOYEE_NO = EMPLOYEE.EMPLOYEE_NO
GROUP BY
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
);
Note: you might want to include BRANCH_NO in the select list as well, otherwise the output will look confusing with duplicate customers (if a customer has ordered from employees in multiple branches).
Now, if you want to filter the above query to just get the top 5%, you can put the whole thing in another subquery and add a predicate on the Top_5 column, e.g.:
SELECT CUSTOMER_NO, LNAME
FROM (... the query above...)
WHERE Top_5 = 1;

Easiest Approach to Selecting Top Result from Windowed Function?

So, let's say I have a list of customers and I want to select details for all customers, as well as their most purchased product from a specific class of products. Even if they have not purchased one of these products I want to select the customer detail, while simply displaying null for their most purchased product from that class.
I would start with the following either as a CTE or temp table:
SELECT
CUST_NUMBER
,PRODUCT
,ROW_NUMBER() OVER (PARTITION BY CUST_NUMBER ORDER BY COUNT(ORDER_NUM) DESC) [ProdRank]
FROM ORDERS
WHERE PROD_CLASS = 'x'
GROUP BY
CUST_NUMBER
,PRODUCT
The thing is this - There can be many different products within this product class, and I am only interested in selecting where ProdRank = 1. As you might know though, I cannot specify either in the WHERE or in HAVING clause for ProdRank to = 1.
I get the error message "Windowed functions can only appear in the SELECT or ORDER BY clauses."
The situation is further complicated by the fact that many customers may have not ordered any products within the product class. Because of this I cannot simply left join the customer list to the above and specify WHERE ProdRank = 1, or else it mimics an inner join and I drop any customers where ProdRank is Null.
The method I've come up with in order to deal with this is to first create a temp table with the code above as #Products which includes the customer and every product with the respective ranking. I then create a second temp table called #TopProducts where I simply :
SELECT * FROM
#Products WHERE
ProdRank = 1
After that I just left join against #TopProducts from my Customers table.
It seems like there should be a simpler way of dealing with this though. Is there any way I can select the top partitioned result of ROW_NUMBER() or RANK() in one step, rather than creating two temp tables?
Use a Common Table Expression
WITH topProducts AS (
SELECT
CUST_NUMBER
,PRODUCT
,ROW_NUMBER() OVER (PARTITION BY CUST_NUMBER ORDER BY COUNT(ORDER_NUM) DESC) [ProdRank]
FROM ORDERS
WHERE PROD_CLASS = 'x'
GROUP BY
CUST_NUMBER
,PRODUCT
)
SELECT *
FROM CustomerDetails c
LEFT JOIN TopProducts p ON (ProdRank = 1 AND c.CUST_NUMBER = p.CUST_NUMBER)
Use a subquery:
SELECT *
FROM CustomerDetails c
LEFT JOIN (
SELECT
CUST_NUMBER
,PRODUCT
,ROW_NUMBER() OVER (PARTITION BY CUST_NUMBER ORDER BY COUNT(ORDER_NUM) DESC) [ProdRank]
FROM ORDERS
WHERE PROD_CLASS = 'x'
GROUP BY
CUST_NUMBER
,PRODUCT
) p ON (ProdRank = 1 AND c.CUST_NUMBER = p.CUST_NUMBER)
I would use outer apply and top in your scenario. Does that make sense?
Few examples here Real life example, when to use OUTER / CROSS APPLY in SQL
I would write a piece of code, but I'm on mobile and that's really not comfortable...

Passing Data Through A Three Tiered Scalar Subquery

I have a query, that has three tiers. That is to say, I have a main query, which has a scalar subquery, which also contains a scalar subquery.
The bottom level scalar query is returning two different values two which the mid-level subquery is returning an average of. However, instead of the bottom level query receiving the current value, it is averaging ALL of the values in the table.
Does anyone know how to properly pass the value of the current top level query result to the bottom subquery?
Code Example:
Select Product,
Description,
(Select Avg(Mfg_Cost, Purchasing_Cost)
FROM (Select Mfg_Cost,
Purchasing Cost
From Cost
Where Cost.Product = Products.Product))
From Products;
Can't you just use a JOIN and GROUP BY, like so:
Select p.Product,
p.Description,
Avg(c.Mfg_Cost),
Avg(c.Purchasing_Cost)
From Products p
INNER JOIN
Cost c
ON c.Product = p.Product
GROUP BY p.Product, p.Description;
In general, if you need to return more than one value from a subquery:
Select p.Product,
p.Description,
q2.AvgMfg_Cost,
q2.AvgPurchasing_Cost
From Products p INNER JOIN
(
Select c.Product,
Avg(c.Mfg_Cost) AS AvgMfg_Cost,
Avg(c.Purchasing_Cost) AS AvgPurchasing_Cost
From Cost c
Group by c.Product
) AS q2
on q2.Product = p.Product;
In Microsoft SQL Server, you can also use a Common Table Expression (CTE)