Ambiguous column name when doing a JOIN - sql

I've got some troubles when doing a join in my statement
This one is OK
select [db_id], COUNT(db_id) AS total
FROM [dbauditor_repo].[dbo].[dbauditor_repo_events]
GROUP BY [db_id]
ORDER BY total DESC
But I'm getting an error when doing the join (Ambiguous column name 'db_id'.)
SELECT [db_name], [db_id], COUNT(db_id) AS total
FROM [dbauditor_repo].[dbo].[dbauditor_repo_events] JOIN [dbauditor_repo].[dbo].[dbauditor_repo_warehouse]
ON ([dbauditor_repo].[dbo].[dbauditor_repo_events].[db_id] = [dbauditor_repo].[dbo].[dbauditor_repo_warehouse].[db_id])
WHERE [db_type] = 'mysql'
GROUP BY [db_id]
ORDER BY total DESC
Any idea ?

Just get in the habit of using table aliases for all column references. That will prevent such errors and other unexpected problems. The best practice is to use table abbreviations. Here is an example:
SELECT rw.[db_name], re.[db_id], COUNT(re.db_id) AS total
FROM [dbauditor_repo].[dbo].[dbauditor_repo_events] re JOIN
[dbauditor_repo].[dbo].[dbauditor_repo_warehouse] rw
ON re.[db_id] = rw.[db_id])
WHERE rw.[db_type] = 'mysql'
GROUP BY rw.[db_name], re.[db_id]
ORDER BY total DESC;
Of course, I have to guess which tables the columns come from. You should fix the query so they come from the right tables.
As a bonus, when using table aliases, queries are easier to write and to read.

You need to use fully qualified name when joining tables and the column exist in more than one table.
Try:
SELECT [dbauditor_repo_events].[db_id]
You also need to include the column db_name in the GROUP BY clause as this column is not being aggregated.
GROUP BY [dbauditor_repo_events].[db_id], [db_name]

It's in your select and group by:
SELECT [db_name], [dbauditor_repo_events].[db_id], COUNT([dbauditor_repo_events].db_id) AS total
FROM [dbauditor_repo].[dbo].[dbauditor_repo_events] JOIN [dbauditor_repo].[dbo].[dbauditor_repo_warehouse]
ON ([dbauditor_repo].[dbo].[dbauditor_repo_events].[db_id] = [dbauditor_repo].[dbo].[dbauditor_repo_warehouse].[db_id])
WHERE [db_type] = 'mysql'
GROUP BY [dbauditor_repo_events].[db_id]
ORDER BY total DESC
You should maybe consider aliasing your tables in the query for readability,e.g.:
SELECT [db_name], e.[db_id], COUNT(e.db_id) AS total
FROM [dbauditor_repo].[dbo].[dbauditor_repo_events] as e
JOIN [dbauditor_repo].[dbo].[dbauditor_repo_warehouse] as w
ON (e.[db_id] = w.[db_id])
WHERE [db_type] = 'mysql'
GROUP BY e.[db_id]
ORDER BY total DESC

Problem:
It is because:
[dbauditor_repo].[dbo].[dbauditor_repo_events]
and
[dbauditor_repo].[dbo].[dbauditor_repo_warehouse]
have the same column name db_id. The Query does not know which table's db_id to be used. Hence this error.
Solution:
Use TableName.FieldName or TableAliasName.FieldName to avoid this error. Change your Query as:
SELECT [db_name],
TAB1.[db_id] ,
COUNT(TAB1.db_id) AS total
FROM
[dbauditor_repo].[dbo].[dbauditor_repo_events] AS TAB1
JOIN
[dbauditor_repo].[dbo].[dbauditor_repo_warehouse] AS TAB2
ON
(TAB1.[db_id] = TAB2.[db_id])
WHERE [db_type] = 'mysql'
GROUP BY [db_name],[TAB1.db_id]
ORDER BY total DESC

Related

Get Max from a joined table

I write this script in SQL server And I want get the food name with the Max of order count From this Joined Table . I can get Max value correct but when I add FoodName is select It give me an error.
SELECT S.FoodName, MAX(S.OrderCount) FROM
(SELECT FoodName,
SUM(Number) AS OrderCount
FROM tblFactor
INNER JOIN tblDetail
ON tblFactor.Factor_ID = tblDetail.Factor_ID
WHERE FactorDate = '2020-10-30'
GROUP BY FoodName)S
Here is The Error Message
Column 'S.FoodName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
also I know I can use order by and top to achieve the food Name and Max order number but I want use the way I use in this script . Thank you for your answers
If I follow you correctly, you can use ORDER BY and TOP (1) directly on the result of the join query:
SELECT TOP (1) f.FoodName, SUM(d.Number) AS OrderCount
FROM tblFactor f
INNER JOIN tblDetail d ON f.Factor_ID = d.Factor_ID
WHERE f.FactorDate = '2020-10-30'
GROUP BY f.FoodName
ORDER BY OrderCount DESC
Notes:
I added table aliases to the query, and prefixed each column with the table it (presumably !) comes from; you might need to review that, as I had to make assumptions
If you want to allow top ties, use TOP (1) WITH TIES instead
You have an aggregation function in the outer query MAX() and an unaggregated column. Hence, the database expects a GROUP BY.
Instead, use ORDER BY and LIMIT:
SELECT FoodName, SUM(Number) AS OrderCount
FROM tblFactor f INNER JOIN
tblDetail d
ON fd.Factor_ID = d.Factor_ID
WHERE FactorDate = '2020-10-30'
GROUP BY FoodName
ORDER BY OrderCount DESC
LIMIT 1;
Note: In a query that references multiple tables, you should qualify all column references. It is not clear where the columns come from, so I cannot do that for this query.

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

NTILE Function and Using Inner Join in Oracle

I am supposed to use the given Database(Its pretty huge so I used codeshare) to list last names and customer numbers of top 5% of customers for each branch. To find the top 5% of customers, I decided to use the NTILE Function, (100/5 = 20, hence NTILE 20). The columns are pulled from two separate tables so I used Inner joins. For the life of me, I honesly cannot figure out where I am going wrong. I keep getting "missing expression" errors but Do not know what exactly I am missing. Here is the Database
Database: https://codeshare.io/5XKKBj
ERD: https://drive.google.com/file/d/0Bzum6VJXi9lUX1d2ZkhudTE3QXc/view?usp=sharing
Here is my SQL Query so far.
SELECT
Ntile(20) over
(partition by Employee.Branch_no
order by sum(ORDERS.SUBTOTAL) desc
) As Top_5,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
FROM
CUSTOMER
INNER JOIN ORDERS
ON
CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
GROUP BY
ORDERS.SUBTOTAL,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME;
You need to join Employee and the GROUP BY must include all non-aggregated expressions. You can use a subquery to generate the subtotals and get the NTILE in the outer query, e.g.:
SELECT
Ntile(20) over
(partition by BRANCH_NO
order by sum_subtotal desc
) As Top_5,
CUSTOMER_NO,
LNAME
FROM (
SELECT
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME,
sum(ORDERS.SUBTOTAL) as sum_subtotal
FROM CUSTOMER
JOIN ORDERS
ON CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
JOIN EMPLOYEE
ON ORDERS.EMPLOYEE_NO = EMPLOYEE.EMPLOYEE_NO
GROUP BY
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
);
Note: you might want to include BRANCH_NO in the select list as well, otherwise the output will look confusing with duplicate customers (if a customer has ordered from employees in multiple branches).
Now, if you want to filter the above query to just get the top 5%, you can put the whole thing in another subquery and add a predicate on the Top_5 column, e.g.:
SELECT CUSTOMER_NO, LNAME
FROM (... the query above...)
WHERE Top_5 = 1;

How to get the value of max() group when in subquery?

So i woud like to find the department name or department id(dpmid) for the group that has the max average of age among the other group and this is my query:
select
MAX(avg_age) as 'Max average age' FROM (
SELECT
AVG(userage) AS avg_age FROM user_data GROUP BY
(select dpmid from department_branch where
(select dpmbid from user_department_branch where
user_data.userid = user_department_branch.userid)=department_branch.dpmbid)
) AS query1
this code show only the max value of average age and when i try to show the name of the group it will show the wrong group name.
So, How to show the name of max group that has subquery from another table???
You may try this..
select MAX(avg_age) as max_avg, SUBSTRING_INDEX(MAX(avg_age_dep),'##',-1) as max_age_dep from
(
SELECT
AVG(userage) as avg_age, CONCAT( AVG(userage), CONCAT('##' ,department_name)) as avg_age_dep
FROM user_data
inner join user_department_branch
on user_data.userid = user_department_branch.userid
inner join department_branch
on department_branch.dpmbid = user_department_branch.dpmbid
inner join department
on department.dpmid = department_branch.dpmid
group by department_branch.dpmid
) tab_avg_age_by_dep
;
I've done some change on ipothesys that the department name is placed in a "department" anagraphical table.. so, as it needed put in join a table in plus, then I changed your query, eventually if the department name is placed (but I don't thing so) in the branch_department table you can add the field and its treatment to your query
update
In adjunct to as said, if you wanto to avoid identical average cases you can furtherly make univocal the averages by appending a rownum id in this way:
select MAX(avg_age) as max_avg, SUBSTRING_INDEX(MAX(avg_age_dep),'##',-1) as max_age_dep from
(
SELECT
AVG(userage) as avg_age, CONCAT( AVG(userage), CONCAT('##', CONCAT( #rownum:=#rownum+1, CONCAT('##' ,department_name)))) as avg_age_dep
FROM user_data
inner join user_department_branch
on user_data.userid = user_department_branch.userid
inner join department_branch
on department_branch.dpmbid = user_department_branch.dpmbid
inner join department
on department.dpmid = department_branch.dpmid
,(SELECT #rownum:=0) r
group by department_branch.dpmid
) tab_avg_age_by_dep
;
I took a shot at what I think you are looking for. The following will give you the department branch with the highest average age. I assumed the department_branch table had a department_name field. You may need an additional join to get the department.
SELECT db.department_name, udb.dpmid, AVG(userage) as `Average age`
FROM user_data as ud
JOIN user_department_branch as udb
ON udb.userid = ud.userid
JOIN department_branch as db
ON db.dpmbid = udb.dpmbid
GROUP BY udb.dpmid
ORDER BY `Average age` DESC
LIMIT 1

How to find the most frequent value in a select statement as a subquery?

I am trying to get the most frequent Zip_Code for the Location ID from table B. Table A(transaction) has one A.zip_code per Transaction but table B(Location) has multiple Zip_code for one area or City. I am trying to get the most frequent B.Zip_Code for the Account using Location_D that is present in both table.I have simplified my code and changed the names of the columns for easy understanding but this is the logic for my query I have so far.Any help would be appreciated. Thanks in advance.
Select
A.Account_Number,
A.Utility_Type,
A.Sum(usage),
A.Sum(Cost),
A.Zip_Code,
( select B.zip_Code from B where A.Location_ID= B.Location_ID having count(*)= max(count(B.Zip_Code)) as Location_Zip_Code,
A.Transaction_Date
From
Transaction_Table as A Left Join
Location Table as B On A.Location_ID= B.Location_ID
Group By
A.Account_Number,
A.Utility_Type,
A.Zip_Code,
A.Transaction_Date
This is what I come up with:
Select tt.Account_Number, tt.Utility_Type, Sum(tt.usage), Sum(tt.Cost),
tt.Zip_Code,
(select TOP 1 l.zip_Code
Location_Table l
where tt.Location_ID = l.Location_ID
group by l.zip_code
order by count(*) desc
) as Location_Zip_Code,
tt.Transaction_Date
From Transaction_Table tt
Group By tt.Account_Number, tt.Utility_Type, tt.Zip_Code, tt.Transaction_Date;
Notes:
Table aliases are a good thing. However, they should be abbreviations for the tables referenced, rather than arbitrary letters.
The table alias qualifies the column name, not the function. Hence sum(tt.usage) rather than tt.sum(usage).
There is no need for a join in the outer query. You are doing all the work in the subquery.
An order by with top seems the way to go to get the most common zip code (which, incidentally, is called the mode in statistics).