Modify SQL query to include cases without a value - sql

I have this database assignment where I need to write a query "to display the id and name of each category, with the number of products that belong to the category". I was able to solve it and used this query.
SELECT Category.Id, Category.Name, COUNT(Category.Name)
FROM Category, Product
WHERE (CategoryId = Category.Id)
GROUP BY Category.Id;
But I want to modify it to make all categories appear, even those with no products. Stuck on this part. Any help is appreciated.

You can left join:
select c.id, c.name, count(p.categoryid) cnt_products
from category c
left join product p on p.categoryid = c.id
group by c.id;
A correlated subquery is also a fine solution, which avoids outer aggregation:
select c.*,
(select count(*) from product p where p.categoryid = c.id) cnt_products
from category c

Related

Supply a default value for an incomplete SQL join

Sorry I don't know if there's proper name for 'incomplete join' but consider this sort of query, designed to return details of every sale:
SELECT a.id, a.productId, b.productDescription FROM Sales a, AdditionalProductData b WHERE a.productId = b.productId;
In a situation where AdditionalProductData doesn't guarantee to to have a row for every productId, but I want to return a result for every row in Sales, how can I modify my query to return either null or some default value e.g. "unknown" in such cases? I want to ensure sales of unregistered products are not omitted.
(It is a slightly contrived example, and indicates a problem in the DB, but those are outside scope of the question)
Use OUTER JOIN :
SELECT a.productId, b.productName
FROM Products b LEFT JOIN
Sales a
ON a.productId = b.id;
but I want to return a result for every row in Sales
Do the table swapping :
SELECT a.productId, COALESCE(b.productName, 'unknown') AS productName
FROM Sales a LEFT JOIN
Products b
ON a.productId = b.id;
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
You want a left join:
SELECT p.id, p.productName
FROM Products p LEFT JOIN
Sales s
ON s.productId = p.id;
I am guessing you really want at least one row per product. It doesn't make sense that you would have sales for non-existent products -- at least in most databases.
The above doesn't make sense -- only selecting from one table. You probably want something like this:
SELECT p.id, p.productName, SUM(s.amount)
FROM Products p LEFT JOIN
Sales s
ON s.productId = p.id
GROUP BY p.id, p.productName;
EDIT:
If you really do want one row per sales, then you still want a LEFT JOIN just in the other order:
SELECT s.*, p.productName
FROM Sales s LEFT JOIN
Products p L
ON s.productId = p.id;

Self join and inner join to remove duplicates

I am stuck on this and I am relatively new to SQL.
Here is the question we were given:
List the productname and vendorid for all products that we have
purchased from more than one vendor (Hint: you’ll need a Self-Join and
an additional INNER JOIN to solve, don't forget to remove any
duplicates!!)
Here is a screenshot of tables we are working with:
Here is what I have.....I know it is wrong. It works to a degree, just not exactly how the prof wants it.
SELECT DISTINCT productname, product_vendors.vendorid
FROM products INNER JOIN Product_Vendors
ON products.PRODUCTNUMBER = PRODUCT_VENDORS.PRODUCTNUMBER
INNER JOIN vendors ON Product_Vendors.VENDORID = vendors.VENDORID
ORDER BY products.PRODUCTNAME;
Expected output provided the prof:
I agree with #jarlh that additional information would be helpful- i.e. are there triplicates in the data or just duplicates, etc.
That said, this should get your started
SELECT
c.productname AS 'Product'
,a.vendorid AS 'Vendor1'
,b.vendorid AS 'Vendor2'
FROM
product_vendors AS a
JOIN
product_vendors AS b
ON
a.productnumber = b.productnumber
AND a.vendorid <> b.vendorid
JOIN
dbo.products AS c
ON
a.productnumber = c.productnumber
This will limit the population of 'Product Vendors' just to products with unmatching vendors.
From there you are joining to products to pull back product name.
Also- work on coding format, clean code makes the dream work :)
The solution to this problem is usually to count vendors per product with COUNT OVER and only stick with products with more than one. Simply:
select productname, vendorid
from
(
select
p.productname,
pv.vendorid,
count(*) over (partition by product) as cnt
from products p
join product_vendors pv using (productnumber)
)
where cnt > 1;
If this shall be done without window functions, then one option is to aggregate product_vendors and use this result:
select p.productname, pv.vendorid
from
(
select productid
from product_vendors
group by productname
having count(*) > 1
) px
join products p using (productid)
join product_vendors pv using (productid);
or check whether exists another vendor for the product:
select
p.productname,
pv.vendorid,
count(*) over (partition by product) as cnt
from products p
join product_vendors pv on pv.productnumber = p.productnumber
where exists
(
select *
from product_vendors other
where other.productnumber = pv.productnumber
and other.vendorid <> pv.vendorid
);
In neither of these approaches I see the need to eliminate duplicates, as there should be one row per product in products and one row per product and vendor in product_vendors. So I guess what your prof was thinking of is:
select distinct
p.productname,
pv.vendorid
from products p
join product_vendors pv on pv.productnumber = p.productnumber
join product_vendors other on other.productnumber = pv.productnumber
and other.vendorid <> pv.vendorid
This, however, is an approach I don't recommend. You'd combine all vendors for a product (e.g. with 10 vendors for one product you already have 45 combinations for that product only, if I'm not mistaken). So you'd create a large intermediate result only to dismiss most of it with DISTINCT later. Don't do that. Remember: SELECT DISTINCT is often an indicator for a poorly written query (i.e. unnecessary joins leading to too many combinations you are not actually interested in).
SELECT DISTINCT p.name AS product, v.id
FROM products p
INNER JOIN product_vendors pv ON p.id = pv.productid
INNER JOIN product_vendors pv2 ON pv.productid = pv2.productid AND pv.vendorid != pv2.vendorid
INNER JOIN vendors v ON v.id = pv.vendorid
ORDER BY p.name

Sql query to display records that appear more than once in a table

I have two tables, Customer with columns CustomerID, FirstName, Address and Purchases with columns PurchaseID, Qty, CustomersID.
I want to create a query that will display FirstName(s) that have bought more than two products, product quantity is represented by Qty.
I can't seem to figure this out - I've just started with T-SQL
You could sum the purchases and use a having clause to filter those you're interested in. You can then use the in operator to query only the customer names that fit these IDs:
SELECT FirstName
FROM Customer
WHERE CustomerID IN (SELECT CustomerID
FROM Purchases
GROUP BY CustomerID
HAVING SUM(Qty) > 2)
Please try this, it should work for you, according to your question.
Select MIN(C.FirstName) FirstName from Customer C INNER JOIN Purchases P ON C.CustomerID=P.CustomersID Group by P.CustomersID Having SUM(P.Qty) >2
Please try this:
select c.FirstName,p.Qty
from Customer as c
join Purchase as p
on c.CustomerID = p.CustomerID
where CustomerID in (select CustomerID from Purchases group by CustomerID having count(CustomerID)>2);
SELECT
c.FirstName
FROM
Customer c
INNER JOIN Purchases p
ON c.CustomerId = p.CustomerId
GROUP BY
c.FirstName
HAVING
SUM(p.Qty) > 2
While the IN suggestions would work they are kind of overkill and more than likely less performant than a straight up join with aggregation. The trick is the HAVING Clause by using it you can limit your result to the names you want. Here is a link to learn more about IN vs. Exists vs JOIN (NOT IN vs NOT EXISTS)
There are dozens of ways of doing this and to introduce you to Window Functions and common table expressions which are way over kill for this simplified example but are invaluable in your toolset as your queries continue to get more complex:
;WITH cte AS (
SELECT DISTINCT
c.FirstName
,SUM(p.Qty) OVER (PARTITION BY c.CustomerId) as SumOfQty
FROM
Customer c
INNER JOIN Purchases p
ON c.CustomerId = p.CustomerId
)
SELECT *
FROM
cte
WHERE
SumOfQty > 2

Need hints on seemingly simple SQL query

I'm trying to do something like:
SELECT c.id, c.name, COUNT(orders.id)
FROM customers c
JOIN orders o ON o.customerId = c.id
However, SQL will not allow the COUNT function. The error given at execution is that c.Id is not valid in the select list because it isn't in the group by clause or isn't aggregated.
I think I know the problem, COUNT just counts all the rows in the orders table. How can I make a count for each customer?
EDIT
Full query, but it's in dutch... This is what I tried:
select k.ID,
Naam,
Voornaam,
Adres,
Postcode,
Gemeente,
Land,
Emailadres,
Telefoonnummer,
count(*) over (partition by k.id) as 'Aantal bestellingen',
Kredietbedrag,
Gebruikersnaam,
k.LeverAdres,
k.LeverPostnummer,
k.LeverGemeente,
k.LeverLand
from klanten k
join bestellingen on bestellingen.klantId = k.id
No errors but no results either..
When using an aggregate function like that, you need to group by any columns that aren't aggregates:
SELECT c.id, c.name, COUNT(orders.id)
FROM customers c
JOIN orders o ON o.customerId = c.id
GROUP BY c.id, c.name
If you really want to be able to select all of the columns in Customers without specifying the names (please read this blog post in full for reasons to avoid this, and easy workarounds), then you can do this lazy shorthand instead:
;WITH o AS
(
SELECT CustomerID, CustomerCount = COUNT(*)
FROM dbo.Orders GROUP BY CustomerID
)
SELECT c.*, o.OrderCount
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON c.id = o.CustomerID;
EDIT for your real query
SELECT
k.ID,
k.Naam,
k.Voornaam,
k.Adres,
k.Postcode,
k.Gemeente,
k.Land,
k.Emailadres,
k.Telefoonnummer,
[Aantal bestellingen] = o.klantCount,
k.Kredietbedrag,
k.Gebruikersnaam,
k.LeverAdres,
k.LeverPostnummer,
k.LeverGemeente,
k.LeverLand
FROM klanten AS k
INNER JOIN
(
SELECT klantId, klanCount = COUNT(*)
FROM dbo.bestellingen
GROUP BY klantId
) AS o
ON k.id = o.klantId;
I think this solution is much cleaner than grouping by all of the columns. Grouping on the orders table first and then joining once to each customer row is likely to be much more efficient than joining first and then grouping.
The following will count the orders per customer without the need to group the overall query by customer.id. But this also means that for customers with more than one order, that count will repeated for each order.
SELECT c.id, c.name, COUNT(orders.id) over (partition by c.id)
FROM customers c
JOIN orders ON o.customerId = c.id

Combine Two Tables in Select (SQL Server 2008)

If I have two tables, like this for example:
Table 1 (products)
id
name
price
agentid
Table 2 (agent)
userid
name
email
How do I get a result set from products that include the agents name and email, meaning that products.agentid = agent.userid?
How do I join for example SELECT WHERE price < 100?
Edited to support price filter
You can use the INNER JOIN clause to join those tables. It is done this way:
select p.id, p.name as ProductName, a.userid, a.name as AgentName
from products p
inner join agents a on a.userid = p.agentid
where p.price < 100
Another way to do this is by a WHERE clause:
select p.id, p.name as ProductName, a.userid, a.name as AgentName
from products p, agents a
where a.userid = p.agentid and p.price < 100
Note in the second case you are making a natural product of all rows from both tables and then filtering the result. In the first case you are directly filtering the result while joining in the same step. The DBMS will understand your intentions (regardless of the way you choose to solve this) and handle it in the fastest way.
This is a very rudimentary INNER JOIN:
SELECT
products.name AS productname,
price,
agent.name AS agentname
email
FROM
products
INNER JOIN agent ON products.agentid = agent.userid
I recommend reviewing basic JOIN syntax and concepts. Here's a link to Microsoft's documentation, though what you have above is pretty universal as standard SQL.
Note that the INNER JOIN here assumes every product has an associated agentid that isn't NULL. If there are NULL agentid in products, use LEFT OUTER JOIN instead to return even the products with no agent.
select p.name productname, p.price, a.name as agent_name, a.email
from products p
inner join agent a on (a.userid = p.agentid)
This is my join for slightly larger tables in Prod.Hope it helps.
SELECT TOP 1000 p.[id]
,p.[attributeId]
,p.[name] as PropertyName
,p.[description]
,p.[active],
a.[appId],
a.[activityId],
a.[Name] as AttributeName
FROM [XYZ.Gamification.V2B13.Full].[dbo].[ADM_attributeProperty] p
Inner join [XYZ.Gamification.V2B13.Full].[dbo].[ADM_activityAttribute] a
on a.id=p.attributeId
where a.appId=23098;
select ProductName=p.[name]
, ProductPrice=p.price
, AgentName=a.[name]
, AgentEmail=a.email
from products p
inner join agent a on a.userid=p.agentid
If you don't want to use inner join (or don't have possibility to do it!) and would combine rows, you can use a cross join :
SELECT *
FROM table1
CROSS JOIN table2
or simply
SELECT *
FROM table1, table2