How to glue two dependent tables? - sql

I have Customer and Application tables. I want to create select query which provides info about a customer and also to count a number of applications user has in the system.
select distinct c.id, c.region, c.city, count(a.customer_id_id)
from customers c
join applications a on c.id=a.customer_id_id
group by c.id;
But I get an error that I need to group by region and city but I want to display info about each application not to group by region and city. Because in such a way I will get not a number of applications for each user but for each group of users.
I read that it's possible to do with nested queries and full outer join but I tried and it didn't work. Can you explain to me how to do that?

You are close.
Use a LEFT OUTER JOIN so that Customers with 0 records in Applications will also be included (assuming your intent here)
Don't use DISTINCT and GROUP BY together. Distinct means "If all the fields are the same value across multiple records in the record set produced by this SELECT statement, then only give back distinct records, dropping the duplicates". Instead with GROUP BY, "Group by this list of fields. Any remaining fields not in this list will be aggregated using a formula in your SELECT clause like count(a.customer_id_id)." They are similar, but you can't aggregate a field with merely a DISTINCT.
When using GROUP BY, if you are not going to aggregate a field with an aggregation formula (count, sum, avg, etc..) then you must include it in your group by. This isn't necessary with some RDBMS (older versions of MySQL, for example) but it's poor practice since a field that isn't explicitly aggregated with a formula that is also missing from the GROUP BY is like telling the RDBMS "Just pick which ever value you wish from matching records" which might have some unexpected consequences.
SELECT c.id, c.region, c.city, count(a.customer_id_id)
FROM customers c
LEFT OUTER JOIN applications a on c.id=a.customer_id_id
GROUP BY c.id, c.region, c.city;

Not sure what your problem is. I assume that region and city are functionally dependent of id (that is id is a candidate key). Newer versions of postgresql will therefor accept your query. However, if you're on an older version you can expand your group by clause to:
select c.id, c.region, c.city, count(a.customer_id_id)
from customers c
join applications a
on c.id=a.customer_id_id
group by c.id, c.region, c.city;
You say that you would like to display information about each application, but why are you then counting the number of applications per customer? Do you mean something like:
select c.id, c.region, c.city, a.customer_id_id, a.<other attributes>
from customers c
join applications a
on c.id=a.customer_id_id;

Related

Sum matching entries in SQL

In this database I need to find the total amount that each customer paid for books in a category, and then sort them by their customer ID. The code appears to run correctly but I end up with approximately 20 extra rows than I should, although the sum appears to be correct in the right rows.
The customer ID is part of customer, but is not supposed to appear in the select clause, when I try and ORDER BY it, I get strange errors. The DB engine is DB2.
SELECT distinct customer.name, book.cat, sum(offer.price) AS COST
FROM offer
INNER JOIN purchase ON purchase.title=offer.title
INNER JOIN customer ON customer.cid=purchase.cid
INNER JOIN member ON member.cid=customer.cid
INNER JOIN book ON book.title=offer.title
WHERE
member.club=purchase.club
AND member.cid=purchase.cid AND purchase.club=offer.club
GROUP BY customer.name, book.cat;
You should fix your join conditions to include the ones in the where clause (between table relationships usually fit better into an on clause).
SELECT DISTINCT is almost never appropriate with a GROUP BY.
But those are not your question. You can use an aggregation function:
GROUP BY customer.name, book.cat
ORDER BY MIN(customer.id)

"Group By" failing - ORACLE

I found a dozen or so different threads that were similar to my question, but I didn't see any that addressed what I am experiencing. I have three databases that keep track of customer/sale transactions. I can join them and get the individual transactions I am looking for without a problem, but when I try to group the results by vendor_name, I get "ORA-00979: not a GROUP BY expression", although vendor_name is one of the columns I am selecting (which I thought was the pre-requisite). Am I overlooking something real simple here?
select tran_date,product_name,quantity,
product_price,vendor_name,quantity*product_price as total
from transactions
join products using(product_num)
join customers using(vendor_id) group by vendor_name;
"Grouping by" vendor name means that you are trying to get one record per vendor name. So, you need to specify how the other columns should be grouped/aggregated.
For example "quantity*product_price", being a number, part of what you need would be
select vendor_name, sum(quantity*product_price)
from transactions
join products using(product_num)
join customers using(vendor_id)
group by vendor_name;
The full answer to your question depends on How you want the other columns to be grouped for a given vendor name.

distinct group by join problem

Here's what I want to achieve:
I have a number of categories, each one with products in it.
I want to produce a report that shows various information about those products for each category. So I have a query that looks something like:
select
category,
count(products),
sum(product_price),
from product
group by category
So far so good.
But now I also want to get some category-specific information from a table that has information by category. So effectively I want to say:
join category_info on category
except that that will create a join for each row of each group, rather than just one join for each group.
What I really want to be able to say to sql is 'for each group, take the distinct category value, of which there's guaranteed to only be one since I'm grouping on it, and then use that to join to the category info table'
How can I accomplish this in SQL? By the way, I'm using Oracle 10g..
Many thanks!
select a.category, a.Count, a.SumPrice
ci.OtherColumn
from (
select p.category,
count(p.products) as Count,
sum(p.product_price) as SumPrice,
from product p
group by category
) a
inner join category_info ci on a.category = ci.category

Uses of unequal joins

Of all the thousands of queries I've written, I can probably count on one hand the number of times I've used a non-equijoin. e.g.:
SELECT * FROM tbl1 INNER JOIN tbl2 ON tbl1.date > tbl2.date
And most of those instances were probably better solved using another method. Are there any good/clever real-world uses for non-equijoins that you've come across?
Bitmasks come to mind. In one of my jobs, we had permissions for a particular user or group on an "object" (usually corresponding to a form or class in the code) stored in the database. Rather than including a row or column for each particular permission (read, write, read others, write others, etc.), we would typically assign a bit value to each one. From there, we could then join using bitwise operators to get objects with a particular permission.
How about for checking for overlaps?
select ...
from employee_assignments ea1
, employee_assignments ea2
where ea1.emp_id = ea2.emp_id
and ea1.end_date >= ea2.start_date
and ea1.start_date <= ea1.start_date
Whole-day inetervals in date_time fields:
date_time_field >= begin_date and date_time_field < end_date_plus_1
Just found another interesting use of an unequal join on the MCTS 70-433 (SQL Server 2008 Database Development) Training Kit book. Verbatim below.
By combining derived tables with unequal joins, you can calculate a variety of cumulative aggregates. The following query returns a running aggregate of orders for each salesperson (my note - with reference to the ubiquitous AdventureWorks sample db):
select
SH3.SalesPersonID,
SH3.OrderDate,
SH3.DailyTotal,
SUM(SH4.DailyTotal) RunningTotal
from
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH3
join
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH4
on SH3.SalesPersonID = SH4.SalesPersonID AND SH3.OrderDate >= SH4.OrderDate
group by SH3.SalesPersonID, SH3.OrderDate, SH3.DailyTotal
order by SH3.SalesPersonID, SH3.OrderDate
The derived tables are used to combine all orders for salespeople who have more than one order on a single day. The join on SalesPersonID ensures that you are accumulating rows for only a single salesperson. The unequal join allows the aggregate to consider only the rows for a salesperson where the order date is earlier than the order date currently being considered within the result set.
In this particular example, the unequal join is creating a "sliding window" kind of sum on the daily total column in SH4.
Dublicates;
SELECT
*
FROM
table a, (
SELECT
id,
min(rowid)
FROM
table
GROUP BY
id
) b
WHERE
a.id = b.id
and a.rowid > b.rowid;
If you wanted to get all of the products to offer to a customer and don't want to offer them products that they already have:
SELECT
C.customer_id,
P.product_id
FROM
Customers C
INNER JOIN Products P ON
P.product_id NOT IN
(
SELECT
O.product_id
FROM
Orders O
WHERE
O.customer_id = C.customer_id
)
Most often though, when I use a non-equijoin it's because I'm doing some kind of manual fix to data. For example, the business tells me that a person in a user table should be given all access roles that they don't already have, etc.
If you want to do a dirty join of two not really related tables, you can join with a <>.
For example, you could have a Product table and a Customer table. Hypothetically, if you want to show a list of every product with every customer, you could do somthing like this:
SELECT *
FROM Product p
JOIN Customer c on p.SKU <> c.SSN
It can be useful. Be careful, though, because it can create ginormous result sets.

Select based on the number of appearances of an id in another table

I have a table B with cids and cities. I also have a table C that has these cids with extra information. I want to list all the cids in table C that are associated with ALL appearances of a given city in Table B.
My current solution relies on counting the number of times the given city appears in Table B and selecting only the cids that appear that many times. I don't know all the SQL syntax yet, but is there a way to select for this kind of pattern?
My current solution:
SELECT Agents.aid
FROM Agents, Customers, Orders
WHERE (Customers.city='Duluth')
AND (Agents.aid = Orders.aid)
AND (Customers.cid = Orders.cid)
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
It only works because I know right now with the HAVING statement.
Thanks for the help. I wasn't sure how to google this problem, since it's pretty specific.
EDIT: I'm pinpointing my problem a bit. I need to know how to determine if EVERY row in a table has a certain value for a field. Declaring a variable and counting the rows in a sub-selection and filtering out my results by IDs that appear that many times works, but It's really ugly.
There HAS to be a way to do this without explicitly count()ing rows. I hope.
Not an answer to your question, but a general improvement.
I'd recommend using JOIN syntax to join your tables together.
This would change your query to be:
SELECT Agents.aid
FROM Agents
INNER JOIN Orders
ON Agents.aid = Orders.aid
INNER JOIN Customers
ON Customers.cid = Orders.cid
WHERE Customers.city='Duluth'
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
What variant of SQL are you using?
To start with, you can (and should) use JOIN instead of doing it in the WHERE clause, e.g.,
select Agents.aid
from Agents
join Orders on Agents.aid = Orders.aid
join Customers on Customers.cid = Orders.cid
where Customers.city = 'Duluth'
group by Agents.aid
having count(Agents.aid) > 1
After that, I'm afraid I might be a little lost. Using the table names in your example query, what (in English, not pseudocode) are you trying to retrieve? For example, I think your sample query is retrieving the PK for all Agents that have been involved in at least 2 Orders involving Customers in Duluth.
Also, some table definitions for Agents, Orders, and Customers might help (then again, they might be irrelevant).
I'm not sure if I understood you problem, but I think the following query is what you want:
SELECT *
FROM customers b
INNER JOIN orders c USING (cid)
WHERE b.city = 'Duluth'
AND NOT EXISTS (SELECT 1
FROM customers b2
WHERE b2.city = b.city
AND b2.cid <> cid);
Probably you will need some indexes on these columns.