distinct group by join problem - sql

Here's what I want to achieve:
I have a number of categories, each one with products in it.
I want to produce a report that shows various information about those products for each category. So I have a query that looks something like:
select
category,
count(products),
sum(product_price),
from product
group by category
So far so good.
But now I also want to get some category-specific information from a table that has information by category. So effectively I want to say:
join category_info on category
except that that will create a join for each row of each group, rather than just one join for each group.
What I really want to be able to say to sql is 'for each group, take the distinct category value, of which there's guaranteed to only be one since I'm grouping on it, and then use that to join to the category info table'
How can I accomplish this in SQL? By the way, I'm using Oracle 10g..
Many thanks!

select a.category, a.Count, a.SumPrice
ci.OtherColumn
from (
select p.category,
count(p.products) as Count,
sum(p.product_price) as SumPrice,
from product p
group by category
) a
inner join category_info ci on a.category = ci.category

Related

Faceted search count in SQL

I'm trying to implement faceted search count in SQL. For simplicity, I'll take the data that already exists on https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all. A product has a category and a category belongs to many products, so it's a one-to-many relationship. I'm interested in filtering products by category, so if there are multiple categories selected, the query will get products whose category Id can be found in the list of Id's that the user filtered by (So it's an OR operation between categories). But this is not the challenge that I'm currently facing.
The query below tries to answer the question: For every category that exists, how many products would I get if that category was among the selected categories?
SELECT
cat.CategoryId,
p.Count
FROM Categories AS cat
LEFT JOIN (SELECT
COUNT(DISTINCT ProductId) AS Count
FROM Products AS p
WHERE p.CategoryId IN #CategoryIds
OR p.CategoryId = cat.CategoryId) AS p
The #CategoryIds is a parameter that is going to be handled by an ORM. For a more concrete scenario, you can just replace it with the list (1, 2) (so you can consider the case in which the user wants to filter all products that have the category 1 or 2).
The issue is that the word "cat" (on the last line) is not recognised so the query just throws an error.
Is there a way to make the second table recognise the first table's alias "cat" that I want to LEFT JOIN with? Or is there a better solution to this problem that I didn't take into consideration?
LEFT JOIN requires predicate. Some DBMS, like MS SQL Server, supports CROSS APPLY. This query should be equivalent to following one, ready to run on every SQL Database known to me:
SELECT
cat.CategoryId,
COUNT(ProductId)
FROM Categories AS cat
LEFT JOIN Products P ON p.CategoryId=cat.CategoryId OR p.CategoryId IN [list]
GROUP BY cat.CategoryId
Or, if you are using SQL Server:
SELECT
cat.CategoryId,
p.Count
FROM Categories AS cat
CROSS APPLY (SELECT COUNT(DISTINCT ProductId) AS Count
FROM Products AS p
WHERE p.CategoryId IN #CategoryIds
OR p.CategoryId = cat.CategoryId) AS p

Sum matching entries in SQL

In this database I need to find the total amount that each customer paid for books in a category, and then sort them by their customer ID. The code appears to run correctly but I end up with approximately 20 extra rows than I should, although the sum appears to be correct in the right rows.
The customer ID is part of customer, but is not supposed to appear in the select clause, when I try and ORDER BY it, I get strange errors. The DB engine is DB2.
SELECT distinct customer.name, book.cat, sum(offer.price) AS COST
FROM offer
INNER JOIN purchase ON purchase.title=offer.title
INNER JOIN customer ON customer.cid=purchase.cid
INNER JOIN member ON member.cid=customer.cid
INNER JOIN book ON book.title=offer.title
WHERE
member.club=purchase.club
AND member.cid=purchase.cid AND purchase.club=offer.club
GROUP BY customer.name, book.cat;
You should fix your join conditions to include the ones in the where clause (between table relationships usually fit better into an on clause).
SELECT DISTINCT is almost never appropriate with a GROUP BY.
But those are not your question. You can use an aggregation function:
GROUP BY customer.name, book.cat
ORDER BY MIN(customer.id)

How to glue two dependent tables?

I have Customer and Application tables. I want to create select query which provides info about a customer and also to count a number of applications user has in the system.
select distinct c.id, c.region, c.city, count(a.customer_id_id)
from customers c
join applications a on c.id=a.customer_id_id
group by c.id;
But I get an error that I need to group by region and city but I want to display info about each application not to group by region and city. Because in such a way I will get not a number of applications for each user but for each group of users.
I read that it's possible to do with nested queries and full outer join but I tried and it didn't work. Can you explain to me how to do that?
You are close.
Use a LEFT OUTER JOIN so that Customers with 0 records in Applications will also be included (assuming your intent here)
Don't use DISTINCT and GROUP BY together. Distinct means "If all the fields are the same value across multiple records in the record set produced by this SELECT statement, then only give back distinct records, dropping the duplicates". Instead with GROUP BY, "Group by this list of fields. Any remaining fields not in this list will be aggregated using a formula in your SELECT clause like count(a.customer_id_id)." They are similar, but you can't aggregate a field with merely a DISTINCT.
When using GROUP BY, if you are not going to aggregate a field with an aggregation formula (count, sum, avg, etc..) then you must include it in your group by. This isn't necessary with some RDBMS (older versions of MySQL, for example) but it's poor practice since a field that isn't explicitly aggregated with a formula that is also missing from the GROUP BY is like telling the RDBMS "Just pick which ever value you wish from matching records" which might have some unexpected consequences.
SELECT c.id, c.region, c.city, count(a.customer_id_id)
FROM customers c
LEFT OUTER JOIN applications a on c.id=a.customer_id_id
GROUP BY c.id, c.region, c.city;
Not sure what your problem is. I assume that region and city are functionally dependent of id (that is id is a candidate key). Newer versions of postgresql will therefor accept your query. However, if you're on an older version you can expand your group by clause to:
select c.id, c.region, c.city, count(a.customer_id_id)
from customers c
join applications a
on c.id=a.customer_id_id
group by c.id, c.region, c.city;
You say that you would like to display information about each application, but why are you then counting the number of applications per customer? Do you mean something like:
select c.id, c.region, c.city, a.customer_id_id, a.<other attributes>
from customers c
join applications a
on c.id=a.customer_id_id;

Using two tables to get one report in SQL Server

I have two tables, product and download as follows.
product (product_id (pk), product_name)
download (download_date(pk), download_version(pk), product_id(pk,fk))
I need a report to show how many downloaded, form which version of what product took place in each month.
SELECT
[product_name],
[version],
MONTH(download_date) AS [Month],
COUNT(MONTH(download_date)) AS [Count]
FROM
product
INNER JOIN
download ON product.product_id = download.product_id
GROUP BY
MONTH(download_date)
and I get this error
Column 'product.product_name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Use alias names for the tables for better readability.
Mention the aliasname.columnname in the SELECT to avoid fetching the wrong values.
You missed the other columns except the aggregate values in the GROUP BY.
So the query below will return the result.
SELECT P.[product_name],
P.[version],
MONTH(D.download_date) AS [Month],
COUNT(MONTH(D.download_date)) AS [Count]
FROM product P
INNER JOIN download D ON D.product_id = P.product_id
GROUP BY P.[product_name], P.[version], MONTH(D.download_date)
You have some issue with tables and primary key.
create the table like this.
product(product_id(PK),name,verion)
download(date,product_id)
and run this query
SELECT product.name,product.version,COUNT(download.product_id)
FROM product INNER JOIN download ON product.product_id=download.download_id
Group BY(download._productid);
i think this is what you want if not post replay i will answer when i get back to stack.

Uses of unequal joins

Of all the thousands of queries I've written, I can probably count on one hand the number of times I've used a non-equijoin. e.g.:
SELECT * FROM tbl1 INNER JOIN tbl2 ON tbl1.date > tbl2.date
And most of those instances were probably better solved using another method. Are there any good/clever real-world uses for non-equijoins that you've come across?
Bitmasks come to mind. In one of my jobs, we had permissions for a particular user or group on an "object" (usually corresponding to a form or class in the code) stored in the database. Rather than including a row or column for each particular permission (read, write, read others, write others, etc.), we would typically assign a bit value to each one. From there, we could then join using bitwise operators to get objects with a particular permission.
How about for checking for overlaps?
select ...
from employee_assignments ea1
, employee_assignments ea2
where ea1.emp_id = ea2.emp_id
and ea1.end_date >= ea2.start_date
and ea1.start_date <= ea1.start_date
Whole-day inetervals in date_time fields:
date_time_field >= begin_date and date_time_field < end_date_plus_1
Just found another interesting use of an unequal join on the MCTS 70-433 (SQL Server 2008 Database Development) Training Kit book. Verbatim below.
By combining derived tables with unequal joins, you can calculate a variety of cumulative aggregates. The following query returns a running aggregate of orders for each salesperson (my note - with reference to the ubiquitous AdventureWorks sample db):
select
SH3.SalesPersonID,
SH3.OrderDate,
SH3.DailyTotal,
SUM(SH4.DailyTotal) RunningTotal
from
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH3
join
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH4
on SH3.SalesPersonID = SH4.SalesPersonID AND SH3.OrderDate >= SH4.OrderDate
group by SH3.SalesPersonID, SH3.OrderDate, SH3.DailyTotal
order by SH3.SalesPersonID, SH3.OrderDate
The derived tables are used to combine all orders for salespeople who have more than one order on a single day. The join on SalesPersonID ensures that you are accumulating rows for only a single salesperson. The unequal join allows the aggregate to consider only the rows for a salesperson where the order date is earlier than the order date currently being considered within the result set.
In this particular example, the unequal join is creating a "sliding window" kind of sum on the daily total column in SH4.
Dublicates;
SELECT
*
FROM
table a, (
SELECT
id,
min(rowid)
FROM
table
GROUP BY
id
) b
WHERE
a.id = b.id
and a.rowid > b.rowid;
If you wanted to get all of the products to offer to a customer and don't want to offer them products that they already have:
SELECT
C.customer_id,
P.product_id
FROM
Customers C
INNER JOIN Products P ON
P.product_id NOT IN
(
SELECT
O.product_id
FROM
Orders O
WHERE
O.customer_id = C.customer_id
)
Most often though, when I use a non-equijoin it's because I'm doing some kind of manual fix to data. For example, the business tells me that a person in a user table should be given all access roles that they don't already have, etc.
If you want to do a dirty join of two not really related tables, you can join with a <>.
For example, you could have a Product table and a Customer table. Hypothetically, if you want to show a list of every product with every customer, you could do somthing like this:
SELECT *
FROM Product p
JOIN Customer c on p.SKU <> c.SSN
It can be useful. Be careful, though, because it can create ginormous result sets.