My requirement is to display country name, total number of invoices and their average amount. Moreover, I need to return only those countries where the average invoice amount is greater than the average invoice amount of all invoices.
Query for Oracle Database
SELECT cntry.NAME,
COUNT(inv.NUMBER),
AVG(inv.TOTAL_PRICE)
FROM COUNTRY cntry JOIN
CITY ct ON ct.COUNTRY_ID = cntry.ID JOIN
CUSTOMER cst ON cst.CITY_ID = ct.ID JOIN
INVOICE inv ON inv.CUSTOMER_ID = cst.ID
GROUP BY cntry.NAME,
inv.NUMBER,
inv.TOTAL_PRICE
HAVING AVG(inv.TOTAL_PRICE) > (SELECT AVG(TOTAL_PRICE)
FROM INVOICE);
Result: Austria 1 9500
Expected: Austria 2 4825
Schema
Country
ID(INT)(PK) | NAME(VARCHAR)
City
ID(INT)(PK) | NAME(VARCHAR) | POSTAL_CODE(VARCHAR) | COUNTRY_ID(INT)(FK)
Customer
ID(INT)(PK) | NAME(VARCHAR) | CITY_ID(INT)(FK) | ADDRS(VARCHAR) | POC(VARCHAR) | EMAIL(VARCHAR) | IS_ACTV(INT)(0/1)
Invoice
ID(INT)(PK) | NUMBER(VARCHAR) | CUSTOMER_ID(INT)(FK) | USER_ACC_ID(INT) | TOTAL_PRICE(INT)
With no sample data, we can't really tell whether this:
Expected: Austria 2 4825
is true or not.
Anyway: would changing the GROUP BY clause to
GROUP BY cntry.NAME
(i.e. removing additional two columns from it) do any good?
`SELECT C.COUNTRY_NAME,COUNT(I.INVOICE_NUMBER),AVG(I.TOTAL_PRICE) AS AVERAGE
FROM COUNTRY AS C JOIN CITY AS CS ON C.ID=CS.COUNTRY_ID
JOIN CUSTOMER AS CUS ON CUS.CITY_ID=CS.ID
JOIN INVOICE AS I ON I.CUSTOMER_ID=CUS.ID
GROUP BY C.COUNTRY_NAME,C.ID
HAVING AVERAGE>(SELECT AVG(TOTAL_PRICE) FROM INVOICE`
would changing the GROUP BY clause to
GROUP BY cntry.NAME , cntry.ID
Fix your group by columns.
Keep only cntry.name.
It will work.
This is a hackerrank question.
Related
I have a (heavily simplified) orders table, total being the dollar amount, containing:
| id | client_id | type | total |
|----|-----------|--------|-------|
| 1 | 1 | sale | 100 |
| 2 | 1 | refund | 100 |
| 3 | 1 | refund | 100 |
And clients table containing:
| id | name |
|----|------|
| 1 | test |
I am attempting to create a breakdown, by client, metrics about the total number of sales, refunds, sum of sales, sum of refunds etc.
To do this, I am querying the clients table and joining the orders table. The orders table contains both sales and refunds, specified by the type column.
My idea was to join the orders twice using subqueries and create aliases for those filtered tables. The aliases would then be used in aggregate functions to find the sum, average etc. I have tried many variations of joining the orders table twice to achieve this but it produces the same incorrect results. This query demonstrates this idea:
SELECT
clients.*,
SUM(sales.total) as total_sales,
SUM(refunds.total) as total_refunds,
AVG(sales.total) as avg_ticket,
COUNT(sales.*) as num_of_sales
FROM clients
LEFT JOIN (SELECT * FROM orders WHERE type = 'sale') as sales
ON sales.client_id = clients.id
LEFT JOIN (SELECT * FROM orders WHERE type = 'refund') as refunds
ON refunds.client_id = clients.id
GROUP BY clients.id
Result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 200 | 200 | 100 | 2 |
Expected result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 100 | 200 | 100 | 1 |
When the second join is included in the query, the rows returned from the first join are returned again with the second join. They are multiplied by the number of rows in the second join. It's clear my understanding of joining and/or subqueries is incomplete.
I understand that I can filter the orders table with each aggregate function. This produces correct results but seems inefficient:
SELECT
clients.*,
SUM(orders.total) FILTER (WHERE type = 'sale') as total_sales,
SUM(orders.total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(orders.total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(orders.*) FILTER (WHERE type = 'sale') as num_of_sales
FROM clients
LEFT JOIN orders
on orders.client_id = clients.id
GROUP BY clients.id
What is the appropriate way to created filtered and aliased versions of this joined table?
Also, what exactly is happening with my initial query where the two subqueries are joined. I would expect them to be treated as separate subsets even though they are operating on the same (orders) table.
You should do the (filtered) aggregation once for all aggregates you want, and then join to the result of that. As your aggregation doesn't need any columns from the clients table, this can be done in a derived table. This is also typically faster than grouping the result of the join.
SELECT clients.*,
o.total_sales,
o.total_refunds,
o.avg_ticket,
o.num_of_sales
FROM clients
LEFT JOIN (
select client_id,
SUM(total) FILTER (WHERE type = 'sale') as total_sales,
SUM(total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(*) FILTER (WHERE type = 'sale') as num_of_sales
from orders
group by client_id
) o on o.client_id = clients.id
So here it is,
I've from customers table:
customersid (unique for each customers)
customers' city
from invoices table:
billing city
customers id
Now i've to find customers id whose billing city is different from city they live (customers_city). My code is this:
Select Customers.customerid, Customers.city, Invoices.Billingcity
From Customers Inner join
Invoices
ON customers.city <> invoices.billingcity
Now the problem is that count of unique customer_id (1,2,3,4) and number of mismatch cases in another column. But what I am getting is something like this:
(read it like this, after billing city, when the 1 (customer_id) repeats it indicates its a new entry).. I don't know how to format this column, sorry
CustomerId | City | BillingCity |
| 1 | São José dos Campos | Stuttgart |
| 1 | São José dos Campos | Oslo |
| 1 | São José dos Campos | Brussels |
(Output limit exceeded, 10 of 23812 total rows shown)
You need to join on the customer id and then compare the cities:
Select c.customerid, c.city, i.Billingcity
From Customers c join
Invoices i
on c.customerid = i.customerid
where c.city <> i.billingcity;
I'm working with a Redshift database and I can't understand why my join or SUM is bringing too many values. My query is below:
SELECT
date(u.created_at) AS date,
count(distinct c.user_id) AS active_users,
sum(distinct insights.spend) AS fbcosts,
count(c.transaction_amount) AS share_shake_costs,
round(((sum(distinct insights.spend) + count(c.transaction_amount)) /
count(distinct c.user_id)),2) AS cac
FROM
dbname.users AS u
LEFT JOIN
dbname.card_transaction AS c ON c.user_id = u.id
LEFT JOIN
facebookads.insights ON date(insights.date_start) = date(u.created_at)
LEFT JOIN
dbname.card_transaction AS c2 ON date(c2.timestamp) = date(u.created_at)
WHERE
c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY
date
ORDER BY
1 DESC;
This query returns the following data:
If we look at 2017-02-08, we can see a total of 1298 for "share_shake_costs". However, if I run the same query just on the card_transaction table I get the following results which are correct.
The query for this second table looks like this:
SELECT
date(timestamp),
sum(transaction_amount)
FROM
dbname.card_transaction AS c2
WHERE
c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY
1
ORDER BY
1 DESC;
I have a feeling that I have a similar issue for my "fbcosts" column. I think it has to do with my join since the SUM should be working fine.
I'm new to Redshift and SQL so perhaps there's a better way of doing this entire query. Is there anything obvious that I'm missing?
It seems you have a table that contains 1:n mapping and when you join over a common clause, that number is being counted n times.
Let us say one of your tables, orders contains user_id and the total bill_amount and the other table, order_details contains the detail of the sub-items placed by that user_id.
If you do a left join, by definition, orders.user_id will join n times to order_details.user_id, where
n = total number of rows in order_details table
and would perform the aggregation (sum, count etc) n times.
+------------------+ +----------------------+
| orders | | order_details |
+------------------+ +----------------------+
|amount user_id | | user_id items |
+------------------+ +----------------------+
| 1000 123 ---------> | 123 apple |
+ +----------------------+
+-------------> | 123 guava |
| +----------------------+
v-------------> | 123 mango |
+----------------------+
select sum(amount) from orders o left join order_details od
on o.user_id = od.user_id; // result: 3000
select count(amount) from orders o left join order_details od
on o.user_id = od.user_id; // result: 3
I hope the reason for large count is clear to you now.
PS: Also, always prefer to enclose OR conditions in ().
WHERE
(c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%')
Looking for some guidance on this. I am attempting to run a report in my complaint management system.. Complaints by Year, Location, Subcategory, Showing Totals for TotalCredits (child table) and TotalsCwts (childtable) as well as total ExternalRootCause (on master table).
This is my SQL, but the TotalCwts and TotalCredits are not being calculated correctly. It calculates 1 time for each child record rather than the total for each master record.
SELECT
dbo.Complaints.Location,
YEAR(dbo.Complaints.ComDate) AS Year,
dbo.Complaints.ComplaintSubcategory,
COUNT(Distinct(dbo.Complaints.ComId)) AS CustomerComplaints,
SUM(DISTINCT CASE WHEN (dbo.Complaints.RootCauseSource = 'External' ) THEN 1 ELSE 0 END) as ExternalRootCause,
SUM(dbo.ComplaintProducts.Cwts) AS TotalCwts,
Coalesce(SUM(dbo.CreditDeductions.CreditAmount),0) AS TotalCredits
FROM dbo.Complaints
JOIN dbo.CustomerComplaints
ON dbo.Complaints.ComId = dbo.CustomerComplaints.ComId
LEFT OUTER JOIN dbo.CreditDeductions
ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts
ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
WHERE
dbo.Complaints.Location = Coalesce(#Location,Location)
GROUP BY
YEAR(dbo.Complaints.ComDate),
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
ORDER BY
[YEAR] desc,
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
Data Results
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 8 | 8.00
Data Should Read
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 4 | 2.00
Above data reflects 1 complaint having 4 Product Records with 1cwt each and 2 credit records with 1.00 each.
What do I need to change in my query or should I approach this query a different way?
The problem is that the 1 complaint has 2 Deductions and 4 products. When you join in this manner then it will return every combination of Deduction/Product for the complaint which gives 8 rows as you're seeing.
One solution, which should work here, is to not query the Dedustion and Product tables directly; query a query which returns one row per table per complaint. In other words, replace:
LEFT OUTER JOIN dbo.CreditDeductions ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
...with this - showing the Deductions table only, you can work out the Products:
LEFT OUTER JOIN (
select ComId, count(*) CountDeductions, sum(CreditAmount) CreditAmount
from dbo.CreditDeductions
group by ComId
) d on d.ComId = Complaints.ComId
You'll have to change the references to dbo.CreditDedustions to just d (or whatever you want to call it).
Once you've done them both then you'll one each per complaint, which will result with 1 row per complaint contaoining the counts and totals from the two sub-tables.
I have two tables: a product table and a territory table. The product tables holds IDs of products and the territory code denoting which countries they can be sold in:
PRODUCT:
PRODUCT_ID | TERRITORY_CODE
----------------------------
PROD1 | 2
PROD2 | 0
PROD3 | 1
PROD4 | 0
PROD5 | 2
PROD6 | 0
PROD7 | 2
The second table table holds a territory code and the corresponding ISO code of countries it's allowed to be sold in. For example:
TERRITORY:
TERRITORY_CODE | COUNTRY_CODE
---------------------------
0 | US
1 | CA
2 | US
2 | CA
I would like to write a query that counts the number of PRODUCT_IDs using COUNTRY_CODE as a key.
For example, I want to know how many distinct products there are for sale in the US. I don't want to have to know that 0 and 2 are territory codes that contain the US, I just want to look up by COUNTRY_CODE. How can I do this?
In some preliminary research, I've found that a WITH clause may be useful, and came up with the following query:
WITH country AS (
SELECT (DISTINCT COUNTRY_CODE)
FROM TERRITORY
)
SELECT COUNT(DISTINCT PRODUCT_ID)
FROM country c,
PRODUCT p
WHERE p.TERRITORY_CODE=c.TERRITORY_ID;
However, this doesn't produce the expected result. I also can't get it to group by COUNTRY_CODE. What am I doing wrong?
Looks like you need to use GROUP BY. Try something like this:
SELECT T.Country_Code, COUNT(DISTINCT PRODUCT_ID)
FROM Product P
JOIN Territory T ON P.Territory_Code = T.Territory_Code
GROUP BY T.Country_Code
And the SQL Fiddle.
Good luck.