SQL query Parent-child distinct - sql

I have a pair of SQL server tables.
P contains id and name.
PR contains id, interestrate, tiernumber, fromdate, todate and P.id. PR may contain many rows listed per p.id / tier. (tiers are a list of rates a product may have in any given date period.)
eg: Product 1 tier 1 starts 1/1/2008 to 1/1/2009 and has 6 rates shown 1 row per rate. Product 1 tier 2 starts 1/2/2009 etc etc etc
I need a view on this that shows the P.name and the PR.tiernumber and dates... BUT I want only one row to represent the tier.
This is easy:
SELECT DISTINCT P.ID, P.PRODUCTCODE, P.PRODUCTNAME, PR.TIERNO,
PR.FROMDATE, PR.TODATE, PR.PRODUCTID
FROM dbo.PRODUCTRATE AS PR INNER JOIN dbo.PRODUCT AS P
ON P.ID = PR.PRODUCTID
ORDER BY P.ID DESC
This gives me the exact right data... However: this disallows me to see the PR.ID as that would negate the distinct.
I need to limit the resultset because the user needs to just see just a list of tiers, I need to see the PR.ID displaying all of the data.
Any ideas?

SELECT P.ID, P.ACUPRODUCTCODE, P.PRODUCTNAME, PR.TIERNO,
PR.FROMDATE, PR.TODATE, PR.PRODUCTID, MIN(PR.ID)
FROM dbo.PRODUCTRATE AS PR INNER JOIN dbo.PRODUCT AS P
ON P.ID = PR.PRODUCTID
GROUP BY P.ID, P.ACUPRODUCTCODE, P.PRODUCTNAME, PR.TIERNO,
PR.FROMDATE, PR.TODATE, PR.PRODUCTID
ORDER BY P.ID DESC
Ought to do the job. GROUP BY in place of DISTINCT, with a summary function (MIN) to get a particular value back for the PR.ID.

It sounds like you want to accomplish two different things with the same query, which doesn't make sense. Either you want a list of product/tier/date information or you want a list of interest rates.
If you want to pick a particular PR.ID to go with your data then you need to decide on what the rule is for that - what determines which ID you want to get back?

Related

Get most sold product for each country from NORTHWIND database

Good day guys, I've been struggling with this for the past day and I just can't seem to figure it out.
My task is to derive the most sold product for each country from the popular open source database called NORTHWIND: https://northwinddatabase.codeplex.com
I was able to get to this stage, here is my code in SQL Server:
--Get most sold product for each country
WITH TotalProductsSold AS
(
SELECT od.ProductID, SUM(od.Quantity) AS TotalSold
FROM [Order Details] AS od
GROUP BY od.ProductID
)
SELECT MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity, s.Country --,p.ProductName
FROM Products AS p
INNER JOIN TotalProductsSold
ON TotalProductsSold.ProductID = p.ProductID
INNER JOIN Suppliers AS s
ON s.SupplierID = p.SupplierID
GROUP BY s.Country
ORDER BY MostSoldQuantity DESC
This gives me the following result:
That's all good but I wish to find out the product name for the MostSoldQuantity.
Thank you very much !
P.S I put a comment --p.ProductName where I thought it would work but it didnt and if someone could explain me why does GROUP BY not automatically allow me to derive the product name for the row that would be great
First, start with the count of products sold, per country, not just per product. Then rank them and pick only anything at RANK = 1.
Something like...
WITH
ProductQuantityByCountry AS
(
SELECT
s.CountryID,
p.ProductID,
SUM(od.Quantity) AS Quantity
FROM
[Order Details] AS od
INNER JOIN
Products AS p
ON p.ProductID = od.ProductID
INNER JOIN
Suppliers AS s
ON s.SupplierID = p.SupplierID
GROUP BY
s.CountryID,
p.ProductID
),
RankedProductQuantityByCountry
AS
(
SELECT
RANK() OVER (PARTITION BY CountryID ORDER BY Quantity DESC) AS countryRank,
*
FROM
ProductQuantityByCountry
)
SELECT
*
FROM
RankedProductQuantityByCountry
WHERE
countryRank = 1
Note, one country may supply identical quantity of different producs, and so two products could both have rank = 1. Look into ROW_NUMER() and/or DENSE_RANK() for other but similar behaviours to RANK().
EDIT:
A simple though exercise to cover why SQL doesn't let you put Product.Name in your final query is to ask a question.
What should SQL do in this case?
SELECT
MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity,
MIN(TotalProductsSold.TotalSold) AS LeastSoldQuantity,
s.Country,
p.ProductName
FROM
blahblahblah
GROUP BY
s.Country
ORDER BY
MostSoldQuantity DESC
The presence of a MIN and a MAX makes things ambiguous.
You may be clear that you want to perform an operation by country and that operation to be to pick the product with the highest sales volume from that country. But it's not actually explicit, and small changes to the query could have very confusing consequences to any inferred behaviour. Instead SQL's declarative syntax provides a very clear / explicit / deterministic description of the problem to be solved.
If an expression isn't mentioned in the GROUP BY clause, you can't SELECT it, without aggregating it. This is so that there is no ambiguity as to what is meant or what the SQL engine is supposed to do.
By requiring you to stipulate get the total sales per country per product at one level of the query, you can then cleanly state and then pick the highest ranked per country at another level of the query.
This can feel like you end up with queries that are longer than "should" be necessary. But it also results in queries that are completely un-ambiguous, both for compiling the query down to an execution plan, and for other coders who will read your code in the future.

Excluding multiple results in specific column (SQL JOIN)

I'm taking my first steps in terms of practical SQL use in real life.
I have a few tables with contractual and financial information and the query works exactly as I need - to a certain point. It looks more or less like that:
SELECT /some columns/ from CONTRACTS
Linked 3 extra tables with INNER JOIN to add things like department names, product information etc. This all works but they all have simplish one-to-one relationship (one contract related to single department in Department table, one product information entry in the corresponding table etc).
Now this is my challenge:
I also need to add contract invoicing information doing something like:
inner join INVOICES on CONTRACTS.contnoC = INVOICES.contnoI
(and selecting also the Invoice number linked to the Contract number, although that's partly optional)
The problem I'm facing is that unlike with other tables where there's always one-to-one relationship when joining tables, INVOICES table can have multiple (or none at all) entries that correspond to a single contract no. The result is that I will get multiple query results for a single contract no (with different invoice numbers presented), needlessly crowding the query results.
Essentially I'm looking to add INVOICES table to a query to just identify if the contract no is present in the INVOICES table (contract has been invoiced or not). Invoice number itself could be presented (it is with INNER JOIN), however it's not critical as long it's somehow marked. Invoice number fields remains blank in the result with the INNER JOIN function, which is also necessary (i.e. to have the row presented even if the match is not found in INVOICES table).
SELECT DISTINCT would look to do what I need, but I seemed to face the problem that I need to levy DISTINCT criteria only for column representing contract numbers, NOT any other column (there can be same values presented, but all those should be presented).
Unfortunately I'm not totally aware of what database system I am using.
Seems like the question is still getting some attention and in an effort to provide some explanation here are a few techniques.
If you just want any contract with details from the 1 to 1 tables you can do it similarily to what you have described. the key being NOT to include any column from Invoices table in the column list.
SELECT
DISTINCT Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
INNER JOIN Invoices i
ON c.contnoC = i.contnoI
Perhaps a Little cleaner would be to use IN or EXISTS like so:
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
EXISTS (SELECT 1 FROM Invoices i WHERE i.contnoI = c.contnoC )
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
contnoC IN (SELECT contnoI FROM Invoices)
Don't use IN if the SELECT ... list can return a NULL!!!
If you Actually want all of the contracts and just know if a contract has been invoiced you can use aggregation and a case expression:
SELECT
Contract, Department, ProductId, CASE WHEN COUNT(i.contnoI) = 0 THEN 0 ELSE 1 END as Invoiced
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
LEFT JOIN Invoices i
ON c.contnoC = i.contnoI
GROUP BY
Contract, Department, ProductId
Then if you actually want to return details about a particular invoice you can use a technique similar to that of cybercentic87 if your RDBMS supports or you could use a calculated column with TOP or LIMIT depending on your system.
SELECT
Contract, Department, ProductId, (SELECT TOP 1 InvoiceNo FROM invoices i WHERE c.contnoC = i.contnoI ORDER BY CreateDate DESC) as LastestInvoiceNo
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
GROUP BY
Contract, Department, ProductId
I would do it this way:
with mainquery as(
<<here goes you main query>>
),
invoices_rn as(
select *,
ROW_NUMBER() OVER (PARTITION BY contnoI order by
<<some column to decide which invoice you want to take eg. date>>) as rn
)
invoices as (
select * from invoices_rn where rn = 1
)
select * from mainquery
left join invoices i on contnoC = i.contnoI
This gives you an ability to get all of the invoice details to your query, also it gives you full control of which invoice you want see in your main query. Please read more about CTEs; they are pretty handy and much easier to understand / read than nested selects.
I still don't know what database you are using. If ROW_NUMBER is not available, I will figure out something else :)
Also with a left join you should use COALESCE function for example:
COALESCE(i.invoice_number,'0')
Of course this gives you some more possibilities, you could for example in your main select do:
CASE WHEN i.invoicenumber is null then 'NOT INVOICED'
else 'INVOICED'
END as isInvoiced
You can use
SELECT ..., invoiced = 'YES' ... where exists ...
union
SELECT ..., invoiced = 'NO' ... where not exists ...
or you can use a column like "invoiced" with a subquery into invoices to set it's value depending on whether you get a hit or not

SQL Beginner: Getting items from 2 tables (+grouping+ordering)

I have an e-commerce website (using VirtueMart) and I sell products that consist child products. When a product is a parent, it doesn't have ParentID, while it's children refer to it. I know, not the best logic but I didn't create it.
My SQL is very basic and I believe I ask for something quite easy to achieve
Select products that have children.
Sort results by prices (ASC/DSC).
SELECT * FROM Products INNER JOIN Prices ON Products.ProductID = Prices.ProductID ORDER BY Products.Price [ASC/DSC]
Explanation:
SELECT - Select (Get/Retrieve)
* - ALL
FROM Products - Get them from a DB Table named "Products".
INNER JOIN Prices - Selects all rows from both tables as long as there is a match between the columns in both tables. Rather, JOIN DB Table "Products" with DB Table "Prices".
ON - Like WHERE, this defines which rows will be checked for matches.
Products.ProductID = Prices.ProductID - Your match criteria. Get the rows where "ProductID" exists in both DB Tables "Products" and "Prices".
ORDER BY Products.Price [ASC/DSC] - Sorting. Use ASC for Ascending, DSC for Descending.
This table design is subpar for a number of reasons. First, it appears that the value 0 is being used to indicate lack of a parent (as there's no 0 ID for products). Typically this will be a NULL value instead.
If it were a NULL value, the SQL statement to get everything without a parent would be as simple as this:
SELECT * FROM Products WHERE ParentID IS NULL
However, we can't do that. If we make the assumption that 0 = no parent, we can do this:
SELECT * FROM Products WHERE ParentID = 0
However, that's a dangerous assumption to make. Thus, the correct way to do this (given your schema above), would be to compare the two tables and ensure that the parentID exists as a ProductID:
SELECT a.*
FROM Products AS a
WHERE EXISTS (SELECT * FROM Products AS b WHERE a.ID = b.ParentID)
Next, to get the pricing, we have to join those two tables together on a common ID. As the Prices table seems to reference a ProductID, we can use that like so:
SELECT p.ProductID, p.ProductName, pr.Price
FROM Products AS p INNER JOIN Prices AS pr ON p.ProductID = pr.ProductID
WHERE EXISTS (SELECT * FROM Products AS b WHERE p.ID = b.ParentID)
ORDER BY pr.Price
That might be sufficient per the data you've shown, but usually that type of table structure indicates that it's possible to have more than one price associated with a product (we're unable to tell whether this is true based on the quick snapshot).
That should get you close... if you need something more, we'll need more detail.
use the below script if you are using ssms.
SELECT pd.ProductId,ProductName,Price
FROM product pd
LEFT JOIN price pr ON pd.ProductId=pr.ProductID
WHERE EXISTS (SELECT 1 FROM product pd1 WHERE pd.productID=pd1.ParentID)
ORDER BY pr.Price ASC
Note :neither of your parent product have price in price table. If you want the sum of price of their child product use the below script.
SELECT pd.ProductId,pd.ProductName,SUM(ISNULL(pr.Price,0)) SUM_ChildPrice
FROM product pd
LEFT JOIN product pd1 ON pd.productID=pd1.ParentID
LEFT JOIN price pr ON pd1.ProductId=pr.ProductID
GROUP BY pd.ProductId,pd.ProductName
ORDER BY pr.Price ASC
You will have to use self-join:
For example:
SELECT * FROM products parent
JOIN products children ON parent.id = children.parent_id
JOIN prices ON prices.product_id = children.id
ORDER BY prices.price
Because we are using JOIN it will filter out all entries that don't have any children.
I haven't tested it, I hope it would work.

How to Make a Query to Return Non-Dup values From Two Tables

Suppose the following:
Table Parts
--------------------------------
ID Category Name Price
--------------------------------
1 A Processor 100
2 A MotherBoard 80
3 B Memory Card 40
4 B HD 70
5 C Cooler 10
Table Product_Views
-----------------------------------
Customer Date Part_ID
-----------------------------------
Bill mar-24-15 17:45 1
Wallace mar-25-15 08:17 4
Heather mar-25-15 08:43 1
Chuck mar-25-15 09:01 5
Cindy mar-25-15 11:23 1
How can I build a SQL query in order to retrieve most viewed parts showing: Category, price and number of views, grouped by Category, WITHOUT a sum on Price column? Must I do a subquery or there's a trick to do that in a simple [INNER/LEFT/RIGHT] JOIN?
select p.ID, p.Name, p.price, Count(v.*) "Number of Views"
from parts p
join product_views v
on p.id = v.part_id
group by p.ID, p.Name, p.price
order by Count(v.*) desc
Something along those lines should work.
EDIT for category:
Sorry I've been out for a while. What do you want to do with the category? If you just need it included for analysis you can just add it to the select and group by statements, like the following:
select p.ID, p.Category, p.Name, p.price, Count(v.*) "Number of Views"
from parts p
join product_views v
on p.id = v.part_id
group by p.ID, p.Category, p.Name, p.price
order by Count(v.*) desc
If however you want to see how many views there are per category, you will either need to average your price or leave the price off. If you think about it, in a category you have multiple items with different prices each. So you need some way to unify those prices in order to have a single price point per category. Generally (not always), the average price is the most indicative price of how a category is doing. A query to look at the information at the category level would look something like the following:
select p.Category, AVG(p.price) "Average Price", Count(v.*) "Number of Views"
from parts p
join product_views v
on p.id = v.part_id
group by p.Category
order by Count(v.*) desc
In order to leave off the price, just remove the AVG(p.price) "Average Price" from the query.
You can compare this last query to the previous one and see that the differences are all in the select and group by statements. The select is going to have all of the different things that you want to see and the group by statement is going to choose at what level you want to see those things. So if you want to see how your categories are doing on a whole, then the most detail that you want to group by will be just the category column. If you want to see how well each item is doing, then you will want to group by the ID or name of each item.

SQLite COUNT and LEFT JOIN - how to combine?

There are two tables: one (P) that contains list of products, the other one (H) contains history of products consumed. Every product can be consumed 0 or more times. I need to build a query that will return all products from P along with number of times every product has been consumed, sorted by the times it's been consumed. Here is what I did:
SELECT P.ID, P.Name, H.Date, COUNT(H.P_ID) as Count
FROM P
LEFT JOIN H
ON P.ID=H.P_ID
ORDER BY Count DESC
This seems to work only if history table contains data, but if it does not - the result is incorrect. What am I doing wrong?
You need a group by to get the counts that you need. You also need to apply an aggregate function to H.Date, otherwise it is not clear which date to pick:
SELECT P.ID, P.Name, COUNT(H.P_ID) as Count, MAX(H.Date) as LastDate
FROM P
LEFT JOIN H ON P.ID=H.P_ID
GROUP BY P.ID, P.Name
ORDER BY Count DESC
I picked MAX(H.Date) to produce the date of last consumption; if you need a different date from H, change the aggregating function.
I am not sure if sqlite lets you sort by alias; if it does not, replace
ORDER BY Count DESC
with
ORDER BY COUNT(H.P_ID) DESC