Ambiguous column name - is it though? - sql

If I want to write a query with a simple join, I can do this:
select * from customer c
join order o
on c.customerid = o.customerid
where c.customerid = 100
and it all works fine. In this query, is there a reason why I have to specify a table alias - ie. c.customerid? Why can't I just write this:
select * from customer c
join order o
on c.customerid = o.customerid
where customerid = 100
I get the error Ambiguous column name 'customerid'. In this case, where there's only one column in the WHERE clause and it's the column on which I'm JOINing, is this actually "ambiguous"? Or is it just to comply with the ansi-standard (I'm guessing here - I don't know if it does comply) and to encourage good coding conventions?

For your specific example I can't think of any circumstances in which it would make a difference. However for an INNER JOIN on a string column it could do as below.
DECLARE #customer TABLE
(customerid CHAR(3) COLLATE Latin1_General_CI_AS)
INSERT INTO #customer VALUES('FOO');
DECLARE #order TABLE
(customerid CHAR(3) COLLATE Latin1_General_CS_AS)
INSERT INTO #order VALUES('FOO');
SELECT *
FROM #customer c
JOIN #order o
ON c.customerid = o.customerid COLLATE Latin1_General_CS_AS
WHERE c.customerid = 'Foo' /*Returns 1 row*/
SELECT *
FROM #customer c
JOIN #order o
ON c.customerid = o.customerid COLLATE Latin1_General_CS_AS
WHERE o.customerid = 'Foo' /*Returns 0 rows*/

Omitting the table alias really does make for an ambiguous column reference. Just make your join a left join, and you'll immediately see why:
select * from customer c
left join order o
on c.customerid = o.customerid
where customerid = 100 -- here, the semantics are quite different
Another reason: One column could be of type INTEGER, the other of type SMALLINT. Which one to use for the filter? (This might have implications on the execution plan). An even better example is given by Martin Smith
So in general, you wouldn't gain much by making SQL more "forgiving", while at the same time introducing new sources of error. What you could do with some databases (not SQL Server), however is this:
select * from customer c
join order o
using (customerid)
where customerid = 100
Or this (if customerid is the only common column name)
select * from customer c
natural join order o
where customerid = 100

You are getting the error because the customerid column exists in both the order and customer tables and SQL doesn't know which column the condition should be applied to.

After JOINing two tables, the resulting table contains 2 columns having the same name of customerid. So you need to tell the WHERE clause which column to use by adding the table name as prefix.

Well...you know that the result set will only contain entries with exactly the same customerid, the database server however does not, because he doesn't "understand" what you are specifying. And if you had a join that does not have both customerids exactly the same in the result set, you will be happy, that the server distinguishes them. ;)

Ambiguous column error only come when we need to do some operation on a field which has in more then one table so in this case SQL can not recognize that from which table filed it need to operate.

Related

SQL Server: Is there any way to prevent accidental multiple updates of the same row in a single UPDATE Statement?

When you write an UPDATE statement from a join of two or more tables, there is always a possibility that you accidentally omitted one condition and it may end up updating the same row multiple times and lead to unexpected results, especially when there are complex keys/relationships.
Is there any way to ENSURE that if such situation happens, SQL Server raises an error or gives some kind of warning?
I'm usually careful on those things, but it happened to me few times recently, when I was trying to retrieve data from a not well known to me Database with complex relationships inside.
While my question is about SQL Server, how to prevent this situation, I'd be glad to hear how do you make sure its not happening?
Here is a small made up example of what I mean:
DECLARE #Customers TABLE (Id INT, Name VARCHAR(100), LatestInvoice VARCHAR(100))
DECLARE #Orders TABLE (Id INT, CustomerId INT, Invoice VARCHAR(100), Date DATETIME)
INSERT INTO #Customers (Id, Name)
VALUES (1, 'Customer1')
INSERT INTO #Orders (Id, CustomerId, Invoice, Date)
VALUES (1, 1, 'Invoice 1', '1/1/2019'),
(2, 1, 'Invoice 2', '2/1/2019'),
(3, 1, 'Invoice 3', '3/1/2019')
-- Correct UPDATE
-- one record updates once
UPDATE C
SET LatestInvoice = O.Invoice
FROM #Customers C
JOIN #Orders O ON O.CustomerId = C.Id
WHERE O.Date = '3/1/2019'
-- Incorrect UPDATE
-- one record gets updated 3 times and result of Invoice could be anything
UPDATE C
SET LatestInvoice = O.Invoice
FROM #Customers C
JOIN #Orders O ON O.CustomerId = C.Id
And BTW, how is such UPDATE mistake called?
Thanks a lot!
Not 100% defence, but .. Start designing an UPDATE with a SELECT
SELECT target.PrimaryKey, Count(*)
-- update table expression here
GROUP BY target.PrimaryKey
HAVING Count(*) > 1
For example
SELECT C.id, Count(*)
-- update table expression here
FROM #Customers C
JOIN #Orders O ON O.CustomerId = C.Id
--
GROUP BY C.id
HAVING Count(*) > 1
You can use CROSS APPLY instead of JOIN:
UPDATE C
SET LatestInvoice = O.Invoice
FROM #Customers C CROSS APPLY
(SELECT TOP (1) O.*
FROM #Orders O
WHERE O.CustomerId = C.Id
) O;
This will update once with an arbitrary matching row. You can add an ORDER BY to the subquery to provide more specification on the row that should be used.
EDIT:
I don't think there is a clean way to do this. I don't think there is a built-in function that will return an error from a query (such as throw() or raise_error() in T-SQL code). You can use the handy divide-by-zero error instead:
UPDATE C
SET LatestInvoice = O.Invoice
FROM #Customers C JOIN
(SELECT O.*, COUNT(*) OVER (PARTITION BY O.CustomerId) as cnt
FROM #Orders O
) O
ON O.CustomerId = C.Id
WHERE (CASE WHEN cnt > 1 THEN 1 / 0 ELSE cnt END) = 1;
Looks like if you have an UPDATE which might have multiple references to a target table rows, is the best to use 'MERGE' instead of 'UPDATE'
Unlike non-deterministic UPDATE the standard MERGE statement will
generates an error if multiple source rows match one target row,
requiring to revise the code to make it deterministic.
Here is the conversion of above UPDATE to the MERGE statement and the second one actually errors out!
-- works just fine
MERGE #Customers T
USING #Orders S
ON S.CustomerId = T.Id AND S.Date = '3/1/2019'
WHEN MATCHED
THEN
UPDATE SET LatestInvoice = S.Invoice;
-- omitting Date condition will follow with this error:
-- The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
MERGE #Customers T
USING #Orders S
ON S.CustomerId = T.Id --AND S.Date = '3/1/2019'
WHEN MATCHED
THEN
UPDATE SET LatestInvoice = S.Invoice;
Still, I think MS SQL should have have at least an option for the UPDATE statement to fail in non-deterministic updates as it certainly a mistake and leads to problems.

Where am I going wrong with this SQL query?

I am attempting to do the following:
Check to see if the table does not exist and if so, create the TABLE 'tmpTriangleTransfer'.
Check to see if the table exists and if so, DROP the TABLE 'tmpTriangleTransfer'.
Insert the data being pulled from the other tables into the 2nd -
5th columns of the TABLE 'tmpTriangleTransfer'.
Loop and for each row that exists in the TABLE 'tmpTriangleTransfer' update the 1st column with the declared information.
Return all of the information from that table (to be formatted into a report).
Can someone please help me figure out what I am doing wrong? I'm getting no results even though I know for a fact there are records (when I run just the SELECT statement on the last line, it shows records and when I run the SELECT DISTINCT statement in the middle, it shows the same records).
IF OBJECT_ID('tmpTriangleTransfer') IS NOT NULL
DROP TABLE tmpTriangleTransfer;
IF OBJECT_ID('tmpTriangleTransfer') IS NULL
CREATE TABLE tmpTriangleTransfer
(
CompanyName varchar(max),
OrderID decimal(19,2) NULL,
DriverID int NULL,
VehicleID int NULL,
Phone varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
BOL varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
);
INSERT INTO tmpTriangleTransfer (OrderID, BOL, DriverID, VehicleID, Phone)
SELECT DISTINCT tblOrder.OrderID AS OrderID, tblOrder.BOL AS BOL, tblOrderDrivers.DriverID AS DriverID, tblDrivers.VehicleID AS VehicleID, tblWorker.Phone AS Phone
FROM tblOrder WITH (NOLOCK)
INNER JOIN tblActiveOrders
ON tblOrder.OrderID = tblActiveOrders.OrderID
INNER JOIN tblOrderDrivers
ON tblOrder.OrderID = tblOrderDrivers.OrderID
INNER JOIN tblDrivers
ON tblOrderDrivers.DriverID = tblDrivers.DriverID
INNER JOIN tblWorker
ON tblDrivers.WorkerID = tblWorker.WorkerID
WHERE tblOrder.CustID = 7317
ORDER BY tblOrder.OrderID`
DECLARE #MaxRownum INT
SET #MaxRownum = (SELECT MAX(OrderID) FROM tmpTriangleTransfer)
DECLARE #Iter INT
SET #Iter = (SELECT MIN(OrderID) FROM tmpTriangleTransfer)
WHILE #Iter <= #MaxRownum
BEGIN
UPDATE tmpTriangleTransfer
SET tmpTriangleTransfer.CompanyName = 'Triangle'
WHERE tmpTriangleTransfer.CompanyName IS NULL;
SET #Iter = #Iter + 1
END
SELECT * from tmpTriangleTransfer WITH (NOLOCK)
Your existing query is far too complicated. In fact, you don't need a temporary table, the WHILE loop, or anything - just a single SELECT is all you need:
SELECT
'Triangle' AS CompanyName,
tblOrder.OrderId,
tblOrder.BOL,
tblOrderOrders.DriverID,
tblDrivers.VehicleID,
tblWorker.Phone
FROM
tblOrder
OUTER JOIN tblActiveOrders ON tblOrder.OrderID = tblActiveOrders.OrderID
OUTER JOIN tblOrderDrivers ON tblOrder.OrderID = tblOrderDrivers.OrderID
OUTER JOIN tblDrivers ON tblOrderDrivers.DriverID = tblDrivers.DriverID
OUTER JOIN tblWorker ON tblDrivers.WorkerID = tblWorker.WorkerID
WHERE
tblOrder.CustID = 7317
ORDER BY
tblOrder.OrderID
I've changed your query to use OUTER JOIN instead of INNER JOIN because I suspect this is the main reason for no data being returned. INNER JOIN requires rows to exist in both tables (relations) and I suspect that you have Orders without Drivers or that not every Order is in ActiveOrders. Change the joins to INNER JOIN if you know that related rows will always be present.
You can return literals in queries directly, like I'm doing in the SELECT 'Triangle' AS CompanyName part, whereas you were seemingly manually adding it to the output temporary-table.
Your code didn't seem to be doing anything that would require the WITH (NOLOCK) modifier - the fact it was repeated everywhere makes it look like a case of Cargo-Cult Programming.
Tip: In SQL, a SELECT statement, as written, is not representative of its logical execution order. It should instead be read in this order: FROM > WHERE > [GROUP BY >] SELECT > ORDER BY.
This is why in .NET Linq the .Select() call is often at the end, not the beginning, because previous Linq expressions define the data sources.
This query can be parameterised by converting it to a Table-defined Function that accepts CustID as a parameter, I also assume you have the company name "Triangle" stored in a table somewhere - embedding it as a literal value for a single query is a code-smell - what's so special about 7317 / "Triangle"?
Related note: Generally speaking, queries that only SELECT data (and don't perform any INSERT/UPDATE/DELETE/ALTER/CREATE statements) should be Table-valued UDFs or Views and not Stored Procedures - so that they can benefit from function-composition, query-composition and runtime execution plan optimizations that you cannot get with Stored Procedures.
If you're able to, see if you can remove the tbl prefix from the table names (Using "tbl" as a prefix has its defenders, but my own personal opinion is that it's an obsolete developer aid as today's database tooling shows type information, and it makes database refactoring harder (e.g. converting a table to a view).
Taken from a combination of the suggestion from Dai and the requirements of my employer:
`SELECT 'Triangle' AS CompanyName, tblOrder.OrderId AS OrderID, tblOrder.BOL AS BOL, tblOrderDrivers.DriverID AS DriverID, tblDrivers.VehicleID AS VehicleID, tblWorker.Phone AS Phone
FROM tblOrder WITH (NOLOCK)
INNER JOIN tblActiveOrders WITH (NOLOCK)
ON tblOrder.OrderID = tblActiveOrders.OrderID
INNER JOIN tblOrderDrivers WITH (NOLOCK)
ON tblOrder.OrderID = tblOrderDrivers.OrderID
INNER JOIN tblDrivers WITH (NOLOCK)
ON tblOrderDrivers.DriverID = tblDrivers.DriverID
INNER JOIN tblWorker WITH (NOLOCK)
ON tblDrivers.WorkerID = tblWorker.WorkerID
WHERE
tblOrder.CustID = 7317
ORDER BY
tblOrder.OrderID desc`

With SqlServer, where * not exists

I got two tables, Orders with two columns as orderid and customerid, and Customers with two columns as customerid and location.
What I'd like to do is find all the customerid in the table Customers, which are not in Orders. For example, Customers.customerid = {A, B, C, D}, Orders.customerid = {A, B, C}, guess what I need to do is just get the ones from Customers but not exists in Orders. For achieving that, I put,
select customerid from Customers where customerid not exists (select customerid from Orders)
But it returns nothing.. My logic is quite simple like, first got all customerid in table Orders, then get the ones which doesn't exisit in the customerIds from table Orders. I can't see why this is wrong..
I tried this later, and it works. May anyone can help me pls?
select customerid from Customers as c where customerid not exists(select customerid from orders as o where c.customerid = o.customerid)
Why do I have to add c.customerid = o.customerid?
Why do I have to add c.customerid = o.customerid?
Because just because you're using the same name for two columns in your database, that doesn't mean that any specific relationship is enforced or assumed between them.
You need to add the c.customerid = o.customerid to specify that you're interested in the specific condition that these two columns are equal.
But any other correlation condition is also allowed by the language. E.g. you could write a query:
select customerid from Customers as c where not exists(
select customerid from Customers as c2 where c2.customerid < c.customerid)
Which would find you the "first" customer, if considering the customers sorted by their customerid values (not that this is the best way of writing this query, it's just a demonstration of the flexibility)
Your first query was, in effect "give me all rows from the Customer table, provided that no rows exist in the Order table" - which is also a perfectly valid thing to ask for, but wasn't what you intended - you intended to perform some form of correlation, which is what you did in your second query.
May be you need:
select customerid from Customers where customerid not in (select customerid from Orders)
Try below query :
SELECT customerid from Customers C WHERE NOT EXISTS
(
SELECT 1 FROM orders O WHERE C.customerid = O.customerid
)
what you need is
select c.customerid from customer c inner join order o on c.customerid = o.customerid where c.customerid not in (select od.customerid from order od)
you can't access data of 2 tables without joining them.
The syntax is a bit off. You mean to write it like this:
SELECT C.CustomerID
FROM Customers C
WHERE NOT EXISTS
(
SELECT O.CustomerID
FROM Orders O
WHERE O.CustomerID = C.CustomerID
)
;
You could also do this with NOT IN, as such:
SELECT customerid
FROM Customers
WHERE customerid NOT IN
(
SELECT customerid
FROM Orders
)
;
The two are semantically equivalent, for the most part.
Some people will probably tell you that you could do the same thing with a LEFT JOIN / IS NULL construct, but you can look at this article to see why that is a poorer choice in many circumstances.
#Damien_The_Unbeliever gave correct explanation and u need to try like this
with some data i created for 2 tables
CREATE TABLE #Orders
(orderid varchar(10), customerid varchar(10))
insert into #Orders values
('venkat','a'),
('raj','b'),
('mahes','c')
CREATE TABLE #Customers
(customerid varchar(10), [location] varchar(10))
insert into #Customers values
('a','and'),
('b','bar'),
('c','board'),
('D','board1')
SELECT cu.customerid from #Customers CU WHERE NOT EXISTS
(
SELECT 1 FROM #orders b WHERE Cu.customerid = b.customerid
)
output
customerid
D

Query with columns from 4 tables in SQL

Can anyone who knows SQL, specifically the flavor used in Microsoft Access 2013, tell me what I'm doing wrong here?
SELECT custid, custname, ordno, itemno, itemname
FROM cust
INNER JOIN order
ON cust.custid = order.custid
INNER JOIN orderitems
ON order.ordno = orderitems.ordno
INNER JOIN inv
ON orderitems.itemno = inv.itemno;
I've already read other, similar questions, and tried the methods they used in their solutions, but I'm getting a "Syntax error in FROM clause.", almost no matter what I try.
* * *
SOLUTION: Thanks for the replies! In addition to adding square brackets around "order" and using TableName.ColumnName syntax in SELECT, I had to use parentheses for my multiple INNER JOINs. Here is the fixed code:
SELECT cust.custid, cust.custname, [order].ordno, orderitems.itemno, inv.itemname
FROM ((cust
INNER JOIN [order]
ON cust.custid = [order].custid)
INNER JOIN orderitems
ON [order].ordno = orderitems.ordno)
INNER JOIN inv
ON orderitems.itemno = inv.itemno;
SELECT cust.custid --<-- Use two part name here
,cust.custname
,[order].ordno
,orderitems.itemno --<-- Only guessing here use the correct table name
,inv.itemname --<-- Only guessing here use the correct table name
FROM cust
INNER JOIN [order]
ON cust.custid = [order].custid --<-- used square brackets [] around ORDER as it is
INNER JOIN orderitems -- a key word.
ON [order].ordno = orderitems.ordno
INNER JOIN inv
ON orderitems.itemno = inv.itemno;
In your Select Statament you need to use Two Part name i.e TableName.ColumnName since these column can exist in more than one Tables in your FROM clause you need to tell sql server that columns in your select coming from which table in your from clause.

What is wrong with my join in this query?

Im practicing basic SQL with this site http://www.sqlishard.com/Exercise
Here is the question:
S5.0 - INNER JOIN
Now that we can pull data out of a single table and qualify column
names, let's take it a step further. JOIN statements allow us to
'join' the rows of several tables together using a condition to define
how they match one another. SELECT [columns] FROM FirstTable INNER
JOIN SecondTable ON FirstTable.Id = SecondTable.FirstTableId
Try using the INNER JOIN syntax to SELECT all columns from the
Customers and Orders tables where the CustomerId column in Orders
matches the Id column in Customers. Since both tables have an Id
column, you will need to qualify the Customers id in the WHERE clause
with either the table name or a table alias.
Here is my answer:
SELECT *
FROM Customers AS c
INNER JOIN Orders AS o ON c.ID = o.ID
WHERE o.CustomerID = c.ID
The site says im wrong? Could anyone explain where i'm going wrong?
EDIT: I see now I dont need the WHERE clause, but the question states..
you will need to qualify the Customers id in the WHERE clause with
either the table name or a table alias.
Hence my confusion. Thanks none the less.
Try this way:
SELECT c.ID,o.ID
FROM Customers AS c
INNER JOIN Orders AS o ON o.CustomerID = c.ID
or using where clause
SELECT *
FROM Customers AS c, Orders AS o
where o.CustomerID = c.ID
If you use JOIN.. ON, you do not need where clause