Using COALESCE in Postgres and grouping by the resulting value - sql

I have two tables in a Postgres database:
table a
transaction_id | city | store_name | amount
-------------------------------
123 | London | McDonalds | 6.20
999 | NULL | KFC | 8.40
etc...
table b
transaction_id | location | store_name | amount
-----------------------------------
123 | NULL | McDonalds | 6.20
999 | Sydney | KFC | 7.60
etc...
As you can see, the location might be missing in one table but present in another table. For example with transaction 123, the location is present in table a but missing in table b. Apart from that, the rest of the data (amount, store_name etc.) is the same, row by row, assumed that we join on the transaction_id.
For a given merchant, I need to retrieve a list of locations and the total amount for that location.
An example of the desired result:
KFC sales Report:
suburb | suburb_total
---------------
London | 2500
Sydney | 3500
What I tried:
select
coalesce(a.city, b.location) as suburb,
sum(a.amount) as suburbTotal
from tablea a
join tableb b on a.transaction_id = b.transaction_id
where a.store_name ilike 'KFC'
group by(suburb);
But I get the error column "a.city" must appear in the GROUP BY clause or be used in an aggregate function
So I tried:
select
coalesce(a.city, b.location) as suburb,
sum(a.amount) as suburbTotal,
max(a.city) as city_max,
max(b.location) as location_max
from tablea a
join tableb b on a.transaction_id = b.transaction_id
where a.store_name ilike 'McDonalds'
group by(suburb);
But, surprisingly, I'm getting the same error, even thought I'm now using that column in an aggregate function.
How could I achieve the desired result?
NB there are reasons why we have de-normalised data across two tables, that are currently outside of my control. I have to deal with it.
EDIT: added FROM and JOIN, sorry I forgot to type those...

I can only imagine getting that error with your query if suburb were a column in one of the tables. One way around this is to define the value in the from clause:
select v.suburb,
sum(a.amount) as suburbTotal,
max(a.city) as city_max,
max(b.location) as location_max
from tablea a join
tableb b
on a.transaction_id = b.transaction_id cross join lateral
(values (coalesce(a.city, b.location))) as v(suburb)
where a.store_name ilike 'McDonalds'
group by v.suburb;
This is one of the downsides of allowing column aliases in the group by. Sometimes, you might have conflicts with table columns.

Your querires are missing a from clause, which makes it unclear which logic you are trying to implement.
Based on your sample data and expected results, I think that's a full join on the transaction_id, and then aggregation. Using a positional parameter in the group by clause avoids repeating the expression:
select
store_name,
coalesce(a.city, b.location) as suburb,
sum(amount) suburb_total
from tablea a
full join tableb b using(transaction_id)
group by 1, 2

Related

Oracle SQL query partially including the desired results

My requirement is to display country name, total number of invoices and their average amount. Moreover, I need to return only those countries where the average invoice amount is greater than the average invoice amount of all invoices.
Query for Oracle Database
SELECT cntry.NAME,
COUNT(inv.NUMBER),
AVG(inv.TOTAL_PRICE)
FROM COUNTRY cntry JOIN
CITY ct ON ct.COUNTRY_ID = cntry.ID JOIN
CUSTOMER cst ON cst.CITY_ID = ct.ID JOIN
INVOICE inv ON inv.CUSTOMER_ID = cst.ID
GROUP BY cntry.NAME,
inv.NUMBER,
inv.TOTAL_PRICE
HAVING AVG(inv.TOTAL_PRICE) > (SELECT AVG(TOTAL_PRICE)
FROM INVOICE);
Result: Austria 1 9500
Expected: Austria 2 4825
Schema
Country
ID(INT)(PK) | NAME(VARCHAR)
City
ID(INT)(PK) | NAME(VARCHAR) | POSTAL_CODE(VARCHAR) | COUNTRY_ID(INT)(FK)
Customer
ID(INT)(PK) | NAME(VARCHAR) | CITY_ID(INT)(FK) | ADDRS(VARCHAR) | POC(VARCHAR) | EMAIL(VARCHAR) | IS_ACTV(INT)(0/1)
Invoice
ID(INT)(PK) | NUMBER(VARCHAR) | CUSTOMER_ID(INT)(FK) | USER_ACC_ID(INT) | TOTAL_PRICE(INT)
With no sample data, we can't really tell whether this:
Expected: Austria 2 4825
is true or not.
Anyway: would changing the GROUP BY clause to
GROUP BY cntry.NAME
(i.e. removing additional two columns from it) do any good?
`SELECT C.COUNTRY_NAME,COUNT(I.INVOICE_NUMBER),AVG(I.TOTAL_PRICE) AS AVERAGE
FROM COUNTRY AS C JOIN CITY AS CS ON C.ID=CS.COUNTRY_ID
JOIN CUSTOMER AS CUS ON CUS.CITY_ID=CS.ID
JOIN INVOICE AS I ON I.CUSTOMER_ID=CUS.ID
GROUP BY C.COUNTRY_NAME,C.ID
HAVING AVERAGE>(SELECT AVG(TOTAL_PRICE) FROM INVOICE`
would changing the GROUP BY clause to
GROUP BY cntry.NAME , cntry.ID
Fix your group by columns.
Keep only cntry.name.
It will work.
This is a hackerrank question.

Novice seeking help, Max Aggregate not returning expected results

I'm still very new to MS-SQL. I have a simple table and query that that is getting the best of me. I know it will something fundamental I'm overlooking.
I've changed the field names but the idea is the same.
So the idea is that every time someone signs up they get a RegID, Name, and Team. The names are unique, so for below yes John changed teams. And that's my trouble.
Football Table
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 101 | Bill | Blue |
| 102 | Tom | Green |
| 103 | John | Green |
+------------+----------+---------+
With the query at the bottom using the Max_RegID, I was expecting to get back only one record.
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 103 | John | Green |
+------------+----------+---------+
Instead I get back below, Which seems to include Max_RegID but also for each team. What am I doing wrong?
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 103 | John | Green |
+------------+----------+---------+
My Query
SELECT
Max(Football.RegID) AS Max_RegID,
Football.Name,
Football.Team
FROM
Football
GROUP BY
Football.RegID,
Football.Name,
Football.Team
EDIT* Removed the WHERE statement
The reason you're getting the results that you are is because of the way you have your GROUP BY clause structured.
When you're using any aggregate function, MAX(X), SUM(X), COUNT(X), or what have you, you're telling the SQL engine that you want the aggregate value of column X for each unique combination of the columns listed in the GROUP BY clause.
In your query as written, you're grouping by all three of the columns in the table, telling the SQL engine that each tuple is unique. Therefore the query is returning ALL of the values, and you aren't actually getting the MAX of anything at all.
What you actually want in your results is the maximum RegID for each distinct value in the Name column and also the Team that goes along with that (RegID,Name) combination.
To accomplish that you need to find the MAX(ID) for each Name in an initial data set, and then use that list of RegIDs to add the values for Name and Team in a secondary data set.
Caveat (per comments from #HABO): This is premised on the assumption that RegID is a unique number (an IDENTITY column, value from a SEQUENCE, or something of that sort). If there are duplicate values, this will fail.
The most straight forward way to accomplish that is with a sub-query. The sub-query below gets your unique RegIDs, then joins to the original table to add the other values.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, gets the list of IDs
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
GROUP BY
f2.Name
) AS sq
ON
sq.Max_RegID = f.RegID;
EDIT: Sorry. I just re-read the question. To get just the single record for the MAX(RegID), just take the GROUP BY out of the sub-query, and you'll just get the current maximum value, which you can use to find the values in the rest of the columns.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, now gets the MAX ID
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
) AS sq
ON
sq.Max_RegID = f.RegID;
Use row_number()
select * from
(SELECT
Football.RegID AS Max_RegID,
Football.Name,
Football.Team, row_number() over(partition by name order by Football.RegID desc) as rn
FROM
Football
WHERE
Football.Name = 'John')a
where rn=1
simply you can edit your query below way
SELECT *
FROM
Football f
WHERE
f.Name = 'John' and
Max_RegID = (SELECT Max(Football.Max_RegID) where Football.Name = 'John'
)
or
if sql server simply use this
select top 1 * from Football f
where f.Name = 'John'
order by Max_RegID desc
or
if mysql then
select * from Football f
where f.Name = 'John'
order by Max_RegID desc
Limit 1
You need self join :
select f1.*
from Football f inner join
Football f1
on f1.name = f.name
where f.Max_RegID = 103;
After re-visit question, the sample data suggests me subquery :
select f.*
from Football f
where name = (select top (1) f1.name
from Football f1
order by f1.Max_RegID desc
);

SQL Query : Facing issues to get desired records from different tables

I have two tables
Calendar (Calname, CCode, PCode)
Lookup (LCode, Name)
Calendar table contains records like,
Calname | CCode | PCode
abc | O_R | P_R
xyz | C_R | P_C
Lookup table contains records like,
LCode | Name
O_R | Reporting
C_R | Cross
P_R | Process
P_C | ProcessCross
I have to fetch the records in a way where I can get the name of all codes from lookup table which contains the record rowwise.
Desired Output,
Calname | CCode | PCode | CCodeName | PCodeName
abc | O_R | P_R | Reporting | Process
xyz | C_R | P_C | Cross | ProcessCross
I can not apply simply inner join on the basis of code it will not give me desired output.
I tried to use subquery also but it not worked out somehow,
.
Can anyone help me out with this issue.
Thanks
You can try joining the Calendar table to the Lookup table twice, using each of the two codes.
SELECT
c.Calname,
c.CCode,
c.PCode,
COALESCE(t1.Name, 'NA') AS CCodeName,
COALESCE(t2.Name, 'NA') AS PCodeName
FROM Calendar c
LEFT JOIN Lookup t1
ON c.CCode = t1.LCode
LEFT JOIN Lookup t2
ON c.PCode = t2.LCode
An alternative to Tim's answer would be to use scalar subqueries, which may or may not give you some performance benefit due to scalar subquery caching:
SELECT
c.Calname,
c.CCode,
c.PCode,
COALESCE((SELECT l1.name FROM lookup l1 WHERE c.ccode = l1.lcode), 'NA') AS CCodeName,
COALESCE((SELECT l2.name FROM lookup l2 WHERE c.pcode = l2.lcode), 'NA') AS PCodeName
FROM Calendar c;
I would test both answers to see which one works best for your data.

SQL Join or SUM is returning too many values when working with Redshift database

I'm working with a Redshift database and I can't understand why my join or SUM is bringing too many values. My query is below:
SELECT
date(u.created_at) AS date,
count(distinct c.user_id) AS active_users,
sum(distinct insights.spend) AS fbcosts,
count(c.transaction_amount) AS share_shake_costs,
round(((sum(distinct insights.spend) + count(c.transaction_amount)) /
count(distinct c.user_id)),2) AS cac
FROM
dbname.users AS u
LEFT JOIN
dbname.card_transaction AS c ON c.user_id = u.id
LEFT JOIN
facebookads.insights ON date(insights.date_start) = date(u.created_at)
LEFT JOIN
dbname.card_transaction AS c2 ON date(c2.timestamp) = date(u.created_at)
WHERE
c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY
date
ORDER BY
1 DESC;
This query returns the following data:
If we look at 2017-02-08, we can see a total of 1298 for "share_shake_costs". However, if I run the same query just on the card_transaction table I get the following results which are correct.
The query for this second table looks like this:
SELECT
date(timestamp),
sum(transaction_amount)
FROM
dbname.card_transaction AS c2
WHERE
c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%'
GROUP BY
1
ORDER BY
1 DESC;
I have a feeling that I have a similar issue for my "fbcosts" column. I think it has to do with my join since the SUM should be working fine.
I'm new to Redshift and SQL so perhaps there's a better way of doing this entire query. Is there anything obvious that I'm missing?
It seems you have a table that contains 1:n mapping and when you join over a common clause, that number is being counted n times.
Let us say one of your tables, orders contains user_id and the total bill_amount and the other table, order_details contains the detail of the sub-items placed by that user_id.
If you do a left join, by definition, orders.user_id will join n times to order_details.user_id, where
n = total number of rows in order_details table
and would perform the aggregation (sum, count etc) n times.
+------------------+ +----------------------+
| orders | | order_details |
+------------------+ +----------------------+
|amount user_id | | user_id items |
+------------------+ +----------------------+
| 1000 123 ---------> | 123 apple |
+ +----------------------+
+-------------> | 123 guava |
| +----------------------+
v-------------> | 123 mango |
+----------------------+
select sum(amount) from orders o left join order_details od
on o.user_id = od.user_id; // result: 3000
select count(amount) from orders o left join order_details od
on o.user_id = od.user_id; // result: 3
I hope the reason for large count is clear to you now.
PS: Also, always prefer to enclose OR conditions in ().
WHERE
(c2.vendor_transaction_description ilike '%share%'
OR c2.vendor_transaction_description ilike '%shake to win%')

UNION or JOIN for SELECT from multiple tables

My Issue
I am trying to select one row from multiple tables based on parameters, but my limited knowledge of SQL joining is holding me back. Could somebody possibly point me in the right direction?
Consider these table structures:
+-----------------------+ +---------------------+
| Customers | | Sellers |
+-------------+---------+ +-----------+---------+
| Customer_ID | Warning | | Seller_ID | Warning |
+-------------+---------+ +-----------+---------+
| 00001 | Test 1 | | 00008 | Testing |
| 00002 | Test 2 | | 00010 | Testing |
+-------------+---------+ +-----------+---------+
What I would like to do is one SELECT to retrieve only one row, and in this row will be the 'Warning' field for each of the tables based on the X_ID field.
Desired Results
So, if I submitted the following information, I would receive the following results:
Example 1:
Customer_ID = 00001
Seller_ID = 00008
Results:
+-----------------------------------+
| Customer_Warning | Seller_Warning |
+------------------+----------------+
| Test 1 | Testing |
+------------------+----------------+
Example 2:
Customer_ID = 00001
Seller_ID = 00200
Results:
+-----------------------------------+
| Customer_Warning | Seller_Warning |
+------------------+----------------+
| Test 1 | NULL |
+------------------+----------------+
What I Have Tried
This is my current code (I am receiving loads of rows):
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c,Sellers s
WHERE c.Customer_ID = #Customer_ID
OR s.Seller_ID = #Seller_ID
But I have also played around with UNION, UNION ALL and JOIN. Which method should I go for?
Since you're not really joining tables together, just selecting a single row from each, you could do this:
SELECT
(SELECT Warning
FROM Customers
WHERE Customer_ID = #Customer_ID) AS Customer_Warning,
(SELECT Warning
FROM Sellers
WHERE Seller_ID = #Seller_ID) AS Seller_Warning
The problem is you're getting a cartesian product of rows in each table where either column has the value you're looking for.
I think you just want AND instead of OR:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c
JOIN Sellers s
ON c.Customer_ID = #Customer_ID
AND s.Seller_ID = #Seller_ID
If performance isn't good enough you could join two filtered subqueries:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM (SELECT Warnning FROM Customers WHERE c.Customer_ID = #Customer_ID) c,
(SELECT Warning FROM Sellers s WHERE s.Seller_ID = #Seller_ID) s
But I suspect SQL will be able to optimize the filtered join just fine.
it wont return a row if one of the ID's doesnt exist.
Then you want a FULL OUTER JOIN:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c
FULL OUTER JOIN Sellers s
ON c.Customer_ID = #Customer_ID
AND s.Seller_ID = #Seller_ID
The problem that you are facing is that when one of the tables has no rows, you are going to get no rows out.
I would suggest solving this with a full outer join:
SELECT c.Warning as Customer_Warning, s.Warning AS Seller_Warning
FROM Customers c FULL OUTER JOIN
Sellers s
ON c.Customer_ID = #Customer_ID AND s.Seller_ID = #Seller_ID;
Also, I strongly discourage you from using single quotes for column aliases. Use single quotes only for string and date constants. Using them for column names can lead to confusion. In this case, you don't need delimiters on the names at all.
What I have seen so far here are working examples for your scenario. However, there is no real sense behind putting unrelated data together in one row. I would propose using a UNION and separate the values in your code:
SELECT 'C' AS Type, c.Warning
FROM Customers c
WHERE c.Customer_ID = #Customer_ID
UNION
SELECT 'S' AS Type, s.Warning
FROM Sellers s
WHERE s.Seller_ID = #Seller_ID
You can use the flag to distinguish the warnings in your code. This will be more efficient then joining or sub queries and will be easy to understand later on (when refactoring). I know this is not 100% what you ask for in your question but that's why I challenge the question :)