Oracle WITH clause and grouping by results - sql

I have two tables: a product table and a territory table. The product tables holds IDs of products and the territory code denoting which countries they can be sold in:
PRODUCT:
PRODUCT_ID | TERRITORY_CODE
----------------------------
PROD1 | 2
PROD2 | 0
PROD3 | 1
PROD4 | 0
PROD5 | 2
PROD6 | 0
PROD7 | 2
The second table table holds a territory code and the corresponding ISO code of countries it's allowed to be sold in. For example:
TERRITORY:
TERRITORY_CODE | COUNTRY_CODE
---------------------------
0 | US
1 | CA
2 | US
2 | CA
I would like to write a query that counts the number of PRODUCT_IDs using COUNTRY_CODE as a key.
For example, I want to know how many distinct products there are for sale in the US. I don't want to have to know that 0 and 2 are territory codes that contain the US, I just want to look up by COUNTRY_CODE. How can I do this?
In some preliminary research, I've found that a WITH clause may be useful, and came up with the following query:
WITH country AS (
SELECT (DISTINCT COUNTRY_CODE)
FROM TERRITORY
)
SELECT COUNT(DISTINCT PRODUCT_ID)
FROM country c,
PRODUCT p
WHERE p.TERRITORY_CODE=c.TERRITORY_ID;
However, this doesn't produce the expected result. I also can't get it to group by COUNTRY_CODE. What am I doing wrong?

Looks like you need to use GROUP BY. Try something like this:
SELECT T.Country_Code, COUNT(DISTINCT PRODUCT_ID)
FROM Product P
JOIN Territory T ON P.Territory_Code = T.Territory_Code
GROUP BY T.Country_Code
And the SQL Fiddle.
Good luck.

Related

Oracle SQL query partially including the desired results

My requirement is to display country name, total number of invoices and their average amount. Moreover, I need to return only those countries where the average invoice amount is greater than the average invoice amount of all invoices.
Query for Oracle Database
SELECT cntry.NAME,
COUNT(inv.NUMBER),
AVG(inv.TOTAL_PRICE)
FROM COUNTRY cntry JOIN
CITY ct ON ct.COUNTRY_ID = cntry.ID JOIN
CUSTOMER cst ON cst.CITY_ID = ct.ID JOIN
INVOICE inv ON inv.CUSTOMER_ID = cst.ID
GROUP BY cntry.NAME,
inv.NUMBER,
inv.TOTAL_PRICE
HAVING AVG(inv.TOTAL_PRICE) > (SELECT AVG(TOTAL_PRICE)
FROM INVOICE);
Result: Austria 1 9500
Expected: Austria 2 4825
Schema
Country
ID(INT)(PK) | NAME(VARCHAR)
City
ID(INT)(PK) | NAME(VARCHAR) | POSTAL_CODE(VARCHAR) | COUNTRY_ID(INT)(FK)
Customer
ID(INT)(PK) | NAME(VARCHAR) | CITY_ID(INT)(FK) | ADDRS(VARCHAR) | POC(VARCHAR) | EMAIL(VARCHAR) | IS_ACTV(INT)(0/1)
Invoice
ID(INT)(PK) | NUMBER(VARCHAR) | CUSTOMER_ID(INT)(FK) | USER_ACC_ID(INT) | TOTAL_PRICE(INT)
With no sample data, we can't really tell whether this:
Expected: Austria 2 4825
is true or not.
Anyway: would changing the GROUP BY clause to
GROUP BY cntry.NAME
(i.e. removing additional two columns from it) do any good?
`SELECT C.COUNTRY_NAME,COUNT(I.INVOICE_NUMBER),AVG(I.TOTAL_PRICE) AS AVERAGE
FROM COUNTRY AS C JOIN CITY AS CS ON C.ID=CS.COUNTRY_ID
JOIN CUSTOMER AS CUS ON CUS.CITY_ID=CS.ID
JOIN INVOICE AS I ON I.CUSTOMER_ID=CUS.ID
GROUP BY C.COUNTRY_NAME,C.ID
HAVING AVERAGE>(SELECT AVG(TOTAL_PRICE) FROM INVOICE`
would changing the GROUP BY clause to
GROUP BY cntry.NAME , cntry.ID
Fix your group by columns.
Keep only cntry.name.
It will work.
This is a hackerrank question.

SQL query to match similar customers

I'm trying to find the query in order to match similar customers.
To simplify the situation consider this scenario:
I have a table which contains a customer name and product purchased.
customer name can have multiple purchases of same and different products.
So firstly I can take distinct customer name and product name, so I see all customers and all products they purchased at least once.
Now I want a query to show me a sort of matching customers, according to the product they both purchased, so I want to count the similar products they purchased.
So I want to see for each pair of customers (pairing all the table) the amount of similar product they purchased.
Lets say the raw data is:
CustomerName | ProductName
A | 1
A | 2
A | 1
A | 3
B | 1
B | 2
B | 4
C | 2
Then I want to see the result of:
CustomerName1 | CustomerName2 | CountSimilarity
A | B | 2
A | C | 1
B | C | 1
And so on for all pairs of customers that have at least 1 similar product purchasing
Any suggestions how to approach this query?
The environment is SQL Server.
Thanks
Here is a self join approach:
SELECT t1.CustomerName, t2.CustomerName, COUNT(*) AS CountSimilarity
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.ProductName = t2.ProductName
WHERE
t1.CustomerName < t2.CustomerName
GROUP BY
t1.CustomerName, t2.CustomerName;
Two records are joined together above if their products match. Note that the inequality in the WHERE clause ensures that customer pairs do not appear in duplicate.

Why is INNER JOIN producing more records than original file?

I have two tables. Table A & Table B. Table A has 40516 rows, and records sales by seller_id. The first column in Table A is the seller_id that repeats every time a sale is made.
Example: Table A (40516 rows)
seller_id | item | cost
------------------------
1 | dog | 5000
1 | cat | 50
4 |lizard| 80
5 |bird | 20
5 |fish | 90
The seller_id is also present in Table B, and also contains the corresponding name of the seller.
Example: Table B (5851 rows)
seller_id | seller_name
-------------------------
1 | Dog and Cat World INC
4 | Reptile Love.com
5 | Ocean Dogs Inc
I want to join these two tables, but only display the seller name from Table B and all other columns from Table A. When I do this with an INNER JOIN I get 40864 rows (348 extra rows). Shouldn't the query produce only the original 40516 rows?
Also not sure if this matters, but the seller_id can contain several zeros before the number (e.g., 0000845, 0000549).
I've looked around on here and haven't really found an answer. I've tried LEFT and RIGHT joins and get the same results for one and way more results for the other.
SQL Code Example:
SELECT public.table_B.seller_name, *
FROM public.table_A
INNER JOIN public.table_B ON public.table_A.seller_id =
public.table_B.seller_id;
Expected Results:
seller_name | seller_id | item | cost
------------------------------------------------
Dog and Cat World INC | 1 | dog | 5000
Dog and Cat World INC | 1 | cat | 50
Reptile Love.com | 4 |lizard| 80
Ocean Dogs Inc | 5 |bird | 20
Ocean Dogs Inc | 5 |fish | 90
I expected the results to contain the same number of rows in Table A. Instead I gut names matching up and an additional 348 rows...
Update:
I changed "unique_id" to "seller_id" in the question.
I guess I should have chosen a better name for unique_id in the original example. I didn't mean it to be unique in the sense of a key. It is just the seller's id that repeats every time there is a sale (in Table A). The seller's ID does repeat in Table A because it is supposed to. I simply want to pair up the seller IDs with the seller names.
Thanks again everyone for their help!
unique_id is already not correctly named in the first table, so there is no reason to assume it is unique in the second table either.
Run this query to find the duplicates:
select unique_id
from table_b
group by unique_id
having count(*) > 1;
You can fix the query using distinct on:
SELECT b.seller_name, a.*
FROM public.table_A a JOIN
(SELECT DISTINCT ON (b.unique_id) b.*
FROM public.table_B b
ORDER BY b.unique_id
) b
ON a.unique_id = b.unique_id;
In this case, you may get fewer records, if there are no matches. To fix that, use a LEFT JOIN.
Because unique id column is not unique.
Gordon Linoff was correct. The seller_id (formerly listed as unique_id) was indeed duplicated throughout the data set. I foolishly assumed otherwise. Also the seller_name had many duplicates too! In the end I had to use the CONCAT() function to join the seller_id with second identifier to create a type of foreign key. After I did this the join worked as expected. Thanks everyone!

How do you use two SUM() aggregate functions in the same query for PostgreSQL?

I have a PostgreSQL query that yields the following results:
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | null | 125.00 | new
123-2 | B corp. | null | 100.00 | new
I need to replace the o.order_total (it doesn't work properly) and sum up the sum of the order_shipment_total column so that, for the example above, each row winds up saying 225.00. I need the results above to look like this below:
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 225.00 | 125.00 | new
123-2 | B corp. | 225.00 | 100.00 | new
What I've Tried
1.) To replace o.order_total, I've tried SUM(SUM(osh.items)) but get the error message that you cannot nest aggregate functions.
2.) I've tried to put the entire query as a subquery and sum the order_shipment_total column, but when I do, it just repeats the column itself. See below:
SELECT order,
company,
SUM(order_shipment_total) AS order_shipment_total,
order_shipment_total,
order_type
FROM (
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type
) subquery
GROUP BY order,
company,
order_shipment_total,
order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 125.00 | 125.00 | new
123-2 | B corp. | 100.00 | 100.00 | new
3.) I've tried to only include the rows I actually want to group by in my subquery/query example above, because I feel like I was able to do this in Oracle SQL. But when I do that, I get an error saying "column [name] must appear in the GROUP BY clause or be used in an aggregate function."
...
GROUP BY order,
company,
order_type;
ERROR: column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.
How do I accomplish this? I was certain that a subquery would be the answer but I'm confused as to why this approach will not work.
The thing you're not quite grasping with your query / approach is that you're actually wanting two different levels of grouping in the same query row results. The subquery approach is half right, but when you do a subquery that groups, inside another query that groups you can only use the data you've already got (from the subquery) and you can only choose to keep it at the level of aggregate detail it already is, or you can choose to lose precision in favor of grouping more. You can't keep the detail AND lose the detail in order to sum up further. A query-of-subquery is hence (in practical terms) relatively senseless because you might as well group to the level you want in one hit:
SELECT groupkey1, sum(sumx) FROM
(SELECT groupkey1, groupkey2, sum(x) as sumx FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1
Is the same as:
SELECT groupkey1, sum(x) FROM
table
GROUP BY groupkey1
Gordon's answer will probably work out (except for the same bug yours exhibits in that the grouping set is wrong/doesn't cover all the columns) but it probably doesn't help much in terms of your understanding because it's a code-only answer. Here's a breakdown of how you need to approach this problem but with simpler data and foregoing the window functions in favor of what you already know.
Suppose there are apples and melons, of different types, in stock. You want a query that gives a total of each specific kind of fruit, regardless of the date of purchase. You also want a column for the total for each fruit overall type:
Detail:
fruit | type | purchasedate | count
apple | golden delicious | 2017-01-01 | 3
apple | golden delicious | 2017-01-02 | 4
apple | granny smith | 2017-01-04 ! 2
melon | honeydew | 2017-01-01 | 1
melon | cantaloupe | 2017-01-05 | 4
melon | cantaloupe | 2017-01-06 | 2
So that's 7 golden delicious, 2 granny smith, 1 honeydew, 6 cantaloupe, and its also 9 apples and 7 melons
You can't do it as one query*, because you want two different levels of grouping. You have to do it as two queries and then (critical understanding point) you have to join the less-precise (apples/melons) results back to the more precise (granny smiths/golden delicious/honydew/cantaloupe):
SELECT * FROM
(
SELECT fruit, type, sum(count) as fruittypecount
FROM fruit
GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
SELECT fruit, sum(count) as fruitcount
FROM fruit
GROUP BY fruit
) fruitsum
ON
fruittypesum.fruit = fruitsum.fruit
You'll get this:
fruit | type | fruittypecount | fruit | fruitcount
apple | golden delicious | 7 | apple | 9
apple | granny smith | 2 | apple | 9
melon | honeydew | 1 | melon | 7
melon | cantaloupe | 6 | melon | 7
Hence for your query, different groups, detail and summary:
SELECT
detail.order || '-' || detail.ordinal_number as order,
detail.company,
summary.order_total,
detail.order_shipment_total,
detail.order_type
FROM (
SELECT o.order,
osh.ordinal_number,
o.company,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_type
) detail
INNER JOIN
(
SELECT o.order,
SUM(osh.items) AS order_total
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
--don't need the where clause; we'll join on order number
GROUP BY o.order,
o.company,
o.order_type
) summary
ON
summary.order = detail.order
Gordon's query uses a window function achieve the same effect; the window function runs after the grouping is done, and it establishes another level of grouping (PARTITION BY ordernumber) which is the effective equivalent of my GROUP BY ordernumber in the summary. The window function summary data is inherently connected to the detail data via ordernumber; it is implicit that a query saying:
SELECT
ordernumber,
lineitemnumber,
SUM(amount) linetotal
sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
ordernumber,
lineitemnumber
..will have an ordertotal that is the total of all the linetotal in the order: The GROUP BY prepares the data to the line level detail, and the window function prepares data to just the order level, and repeats the total as many times are necessary to fill in for every line item. I wrote the SUM that belongs to the GROUP BY operation in capitals.. the sum in lowercase belongs to the partition operation. it has to sum(SUM()) and cannot simply say sum(amount) because amount as a column is not allowed on its own - it's not in the group by. Because amount is not allowed on its own and has to be SUMmed for the group by to work, we have to sum(SUM()) for the partition to run (it runs after the group by is done)
It behaves exactly the same as grouping to two different levels and joining together, and indeed I chose that way to explain it because it makes it more clear how it's working in relation to what you already know about groups and joins
Remember: JOINS make datasets grow sideways, UNIONS make them grow downwards. When you have some detail data and you want to grow it sideways with some more data(a summary), JOIN it on. (If you'd wanted totals to go at the bottom of each column, it would be unioned on)
*you can do it as one query (without window functions), but it can get awfully confusing because it requires all sorts of trickery that ultimately isn't worth it because it's too hard to maintain
You should be able to use window functions:
SELECT o.order || '-' || osh.ordinal_number AS order, o.company,
SUM(SUM(osh.items)) OVER (PARTITION BY o.order) as order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o JOIN
order_shipments osh
ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order, o.company, o.order_type;

Adding another column based on different criteria (SQL-server)

I do quite a bit of data analysis and use SQL on a daily basis but my queries are rather simple, usually pulling a lot of data which I thereafter manipulate in excel, where I'm a lot more experienced.
This time though I'm trying to generate some Live Charts which have as input a single SQL query. I will now have to create complex tables without the aid of the excel tools I'm so familiar with.
The problem is the following:
We have telesales agents that book appointments by answering to inbound calls and making outbound cals. These will generate leads that might potentially result in a sale. The relevant tables and fields for this problem are these:
Contact Table
Agent
Sales Table
Price
OutboundCallDate
I want to know for each telesales agent their respective Total Sales amount in one column, and their outbound sales value in another.
The end result should look something like this:
+-------+------------+---------------+
| Agent | TotalSales | OutboundSales |
+-------+------------+---------------+
| Tom | 30145 | 0 |
| Sally | 16449 | 1000 |
| John | 10500 | 300 |
| Joe | 50710 | 0 |
+-------+------------+---------------+
With the below SQL I get the following result:
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id
GROUP BY contact.agent
+-------+------------+
| Agent | TotalSales |
+-------+------------+
| Tom | 30145 |
| Sally | 16449 |
| John | 10500 |
| Joe | 50710 |
+-------+------------+
I want to add the third column to this query result, in which the price is summed only for records where the OutboundCallDate field contains data. Something a bit like (where sales.OutboundCallDate is Not Null)
I hope this is clear enough. Let me know if that's not the case.
Use CASE
SELECT c.Agent,
SUM(s.price) AS TotalSales,
SUM(CASE
WHEN s.OutboundCallDate IS NOT NULL THEN s.price
ELSE 0
END) AS OutboundSales
FROM contact c, sales s
WHERE c.id = s.id
GROUP BY c.agent
I think the code would look
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id AND SUM(WHERE sales.OutboundCallDate)
GROUP BY contact.agent
notI'm assuming your Sales table contains something like Units and Price. If it's just a sales amount, then replace the calculation with the sales amount field name.
The key thing here is that the value summed should only be the sales amount if the OutboundCallDate exists. If the OutboundCallDate is not NULL, then we're using a value of 0 for that row.
select Agent.Agent, TotalSales = sum (sales.Price*Units)
, OutboundSales = sum (
case when Outboundcalldate is not null then price*Units
else 0
end)
From Sales inner join Agent on Sales.Agent = Agent.Agent
Group by Agent.Agent