How do I perform update query with subquery in Access? - sql

I want to port this SQL query, which works just fine on SQL Server, MySQL, and Oracle, to an Access database. How do I do that? Right now it prompts me for a Company_ID for some reason.
Edit: I was getting the prompt because I forgot to first create the Company_ID column in VendorRegKeys. Now I am getting error "Operation must use an updateable query".
UPDATE VendorRegKeys
SET Company_ID = (SELECT Users.Company_ID
FROM Users
WHERE Users.User_ID = VendorRegKeys.CreatedBy_ID)
Update: I found this to work based on JuniorFlip's answer:
UPDATE VendorRegKeys, Users
SET VendorRegKeys.Company_ID = Users.Company_ID
WHERE VendorRegKeys.CreatedBy_ID = Users.User_ID

Straight answer: you can't. The Access Database Engine simple does not support the vanilla SQL-92 scalar subquery syntax even when in its own so-called ANSI-92 Query Mode.
You are forced to use its own proprietary syntax which does not enforce the scalar requirement i.e. is unsafe and will pick a value arbitrarily and silently**. Further, beyond simple constructs it does not work at all, most notably where your subquery (if you were allowed to use one in the first place) uses an set function (MAX, SUM, etc) -- see this article for some really unsatisfactory workarounds.
Sorry to be negative but this is really basic syntax and I can't understand why the Access team haven't gotten around to fixing it yet. It is the undisputed number one reason why I can't take the Access database engine seriously anymore.
To demonstrate the unsafe behavior of the Access proprietary UPDATE..JOIN..Set syntax
CREATE TABLE Users
(
User_ID CHAR( 3 ) NOT NULL,
Company_ID CHAR( 4 ) NOT NULL,
UNIQUE ( Company_ID, User_ID ) );
CREATE TABLE VendorRegKeys
CreatedBy_ID CHAR( 3 ) NOT NULL UNIQUE,
Company_ID CHAR( 4 ) );
INSERT INTO Users VALUES ( 'Kip', 'MSFT' );
INSERT INTO Users VALUES ( 'Kip', 'AAPL' );
INSERT INTO VendorRegKeys VALUES ( 'Kip', NULL );
UPDATE VendorRegKeys
INNER JOIN Users ON Users.User_ID = VendorRegKeys.CreatedBy_ID
SET VendorRegKeys.Company_ID = Users.Company_ID;
When executing the update statement within Access, the UI warns we
You are about to update 2 row(s).
despite the fact there is only one row in the VendorRegKeys table!
What happens in practise is just one of the values we will used to update the column in that single row, without a reliable way of predicting which it will be.
With Standard SQL's scalar subquery syntax, you would get an error and the statement would fail to execute, which is arguably the desired functionality (Standard SQL's MERGE syntax behaves this way too).

That could be because Company_ID is not an existing field in VendorRegKeys OR Users.
EDIT:
UPDATE VendorRegKeys
INNER JOIN Users ON Users.User_ID = VendorRegKeys.CreatedBy_ID
SET VendorRegKeys.Company_ID = Users.Company_ID

you could try this one
update a
set a.company_id = b.company_id
from vendorRegkeys a, users b
where a.createdby_id = b.user_id

Related

update average/count from another table

I've been provided the below schema for this problem and I'm trying to do two things:
Update the ACCOUNT table's average_eval row with the average of the evaluation row from the POST_EVAL table per account_id.
Update the ACCOUNT table with a count of the number of posts per account_id, with default value 0 if the account_id has no post_id associated to it.
Here's the kicker : I MUST use the UPDATE statement and I'm not allowed to use triggers for these specific problems.
I've tried WITH clauses and GROUP BY but haven't gotten anywhere. Using postresql's pgadmin for reference.
Any help setting up these queries?
The first question can be done using something like this:
update account a
set average_eval = t.avg_eval
from (
select account_id, avg(evaluation) as avg_eval
from post_eval
group by account_id
) t
where t.account_id = a.account_id
The second question needs a co-related sub-query as there is no way to express an outer join in an UPDATE statement like the above:
update account a
set num_posts = (select count(*)
from post p
where p.account_id = a.account_id);
The count() will return zero (0) if there are no posts for that account. If a join was used (as in the first statement), the rows would not be updated at all, as the "join" condition wouldn't match.
I have not tested either of those statements, so they can contain typos (or even logical errors).
Unrelated, but: I understand that this is some kind of assignment, so you have no choice. But as RiggsFolly has mentioned: in general you should avoid storing information in a relational database that can be derived from existing data. Both values can easily be calculated in a view and then will always be up-to-date.

Should I always prefer EXISTS over COUNT() > 0 in SQL?

I often encounter the advice that, when checking for the existence of any rows from a (sub)query, one should use EXISTS instead of COUNT(*) > 0, for reasons of performance. Specifically, the former can short-circuit and return TRUE (or FALSE in the case of NOT EXISTS) after finding a single row, while COUNT needs to actually evaluate each row in order to return a number, only to be compared to zero.
This all makes perfect sense to me in simple cases. However, I recently ran into a problem where I needed to filter groups in the HAVING clause of a GROUP BY, based on whether all values in a certain column of the group were NULL.
For the sake of clarity, let's see an example. Let's say I have the following schema:
CREATE TABLE profile(
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL,
google_account_id INTEGER NULL,
facebook_account_id INTEGER NULL,
FOREIGN KEY (user_id) REFERENCES user(id),
CHECK(
(google_account_id IS NOT NULL) + (facebook_account_id IS NOT NULL) = 1
)
)
I.e. each user (table not shown for brevity) has 0 or more profiles. Each profile is either a Google or a Facebook account. (This is the translation of subclasses or a sum type with some associated data — in my real schema, the account IDs are also foreign keys to different tables holding that associated data, but this is not relevant to my question.)
Now, say I wanted to count the Facebook profiles for all users who do NOT have any Google profiles.
At first, I wrote the following query using COUNT() = 0:
SELECT user_id, COUNT(facebook_account_id)
FROM profile
GROUP BY user_id
HAVING COUNT(google_account_id) = 0;
But then it occurred to me that the condition in the HAVING clause is actually just an existence check. So I then re-wrote the query using a subquery and NOT EXISTS:
SELECT user_id, COUNT(facebook_account_id)
FROM profile AS p
GROUP BY user_id
HAVING NOT EXISTS (
SELECT 1
FROM profile AS q
WHERE p.user_id = q.user_id
AND q.google_id IS NOT NULL
)
My question is two-fold:
Should I keep the second, re-formulated query, and use NOT EXISTS with a subquery instead of COUNT() = 0? Is this really more efficient? I reckon that the index lookup due to the WHERE p.user_id = q.user_id condition has some additional cost. Whether this additional cost is absorbed by the short-circuiting behavior of EXISTS could as well depend on the average cardinality of the groups, could it not?
Or could the DBMS perhaps be smart enough to recognize the fact that the grouping key is being compared against, and optimize this subquery away completely, by replacing it with the current group (instead of actually performing an index lookup for each group)? I seriously doubt that a DBMS could optimize away this subquery, while failing to optimize COUNT() = 0 into NOT EXISTS.
Efficiency aside, the second query seems significantly more convoluted and less obviously correct to me, so I'd be reluctant to use it even if it happened to be faster. What do you think, is there a better way? Could I have my cake and eat it too, by using NOT EXISTS in a simpler manner, for instance by directly referencing the current group from within the HAVING clause?
You should prefer EXISTS/NOT EXISTS over COUNT() in a subquery. So instead of:
select t.*
from t
where (select count(*) from z where z.x = t.x) > 0
You should instead use:
select t.*
from t
where exists (select 1 from z where z.x = t.x)
The reasoning for this is that the subquery can stop processing at the first match.
This reasoning doesn't apply in a HAVING clause after an aggregation -- all the rows have to be generated anyway so there is little value in stopping at the first match.
However, aggregation might not be necessary if you have a users table and don't really need the facebook count. You could use:
select u.*
from users u
where not exists (select 1
from profiles p
where p.user_id = u.user_id and p.google_id is not null
);
Also, the aggregation might be faster if you filter before the aggregation:
SELECT user_id, COUNT(facebook_account_id)
FROM profile AS p
WHERE NOT EXISTS (
SELECT 1
FROM profile p2
WHERE p2.user_id = p.user_id AND p2.google_id IS NOT NULL
)
GROUP BY user_id;
Whether it actually is faster depends on a number of factors, including the number of rows that are actually filtered out.
The first query seems like the right way to do what you want.
That's an aggregate query already, since you want to count the facebook accounts. The overhead to process the having clause, that counts the google accounts, should be tiny.
On the other hand, the second approach requires reopening the table and scanning it, which is most probably more expensive.

Difference between DELETE and DELETE FROM in SQL?

Is there one? I am researching some stored procedures, and in one place I found the following line:
DELETE BI_Appointments
WHERE VisitType != (
SELECT TOP 1 CheckupType
FROM BI_Settings
WHERE DoctorName = #DoctorName)
Would that do the same thing as:
DELETE FROM BI_Appointments
WHERE VisitType != (
SELECT TOP 1 CheckupType
FROM BI_Settings
WHERE DoctorName = #DoctorName)
Or is it a syntax error, or something entirely different?
Assuming this is T-SQL or MS SQL Server, there is no difference and the statements are identical. The first FROM keyword is syntactically optional in a DELETE statement.
http://technet.microsoft.com/en-us/library/ms189835.aspx
The keyword is optional for two reasons.
First, the standard requires the FROM keyword in the clause, so it would have to be there for standards compliance.
Second, although the keyword is redundant, that's probably not why it's optional. I believe that it's because SQL Server allows you to specify a JOIN in the DELETE statement, and making the first FROM mandatory makes it awkward.
For example, here's a normal delete:
DELETE FROM Employee WHERE ID = #value
And that can be shortened to:
DELETE Employee WHERE ID = #value
And SQL Server allows you to delete based on another table with a JOIN:
DELETE Employee
FROM Employee
JOIN Site
ON Employee.SiteID = Site.ID
WHERE Site.Status = 'Closed'
If the first FROM keyword were not optional, the query above would need to look like this:
DELETE FROM Employee
FROM Employee
JOIN Site
ON Employee.SiteID = Site.ID
WHERE Site.Status = 'Closed'
This above query is perfectly valid and does execute, but it's a very awkward query to read. It's hard to tell that it's a single query. It looks like two got mashed together because of the "duplicate" FROM clauses.
Side note: Your example subqueries are potentially non-deterministic since there is no ORDER BY clause.
Hi friends there is no difference between delete and delete from in oracle database it is optional, but this is standard to write code like this
DELETE FROM table [ WHERE condition ]
this is sql-92 standard. always develop your code in the standard way.

How do I exclude or negate two queries?

I am new to SQL, so this is probably very simple, however, I wasn't able to find the solution.
Basically my query is as follows:
SELECT UserID
FROM Users
NOT UNION
SELECT UserID
FROM User_Groups
WHERE GroupID = '$_[0]'
However, I am not sure what the syntax is to exclude one query from another.
What I am trying to say is give me all the user ID's except for those that are in group X.
SELECT UserID FROM Users
WHERE UserID NOT IN (SELECT UserID FROM User_Groups WHERE GroupID = ?)
P.S. Don't interpolate variables into your queries as this can lead to SQL injection vulnerabilities in your code. Use placeholders instead.
SELECT Users.UserID
FROM Users
LEFT JOIN User_Groups ON Users.UserID = User_Groups.UserID
WHERE Users.GroupID = '$_[0]'
AND User_Groups.UserID IS NULL
You can left join to the other table and then put an IS NULL check on the other table in you WHERE clause as I've shown.
You could use EXCEPT as well:
SELECT UserID
FROM Users
EXCEPT
SELECT UserID
FROM User_Groups
WHERE GroupID = '$_[0]'
EXCEPT is SQL's version of set subtraction. Which of the various approaches (EXCEPT, NOT IN, ...) you should use depends, as usual, on your specific circumstances, what your database supports, and which one works best for you.
And eugene y has already mentioned the SQL injection issue with your code so I'll just consider that covered.
I linked to the PostgreSQL documentation even though this isn't a PostgreSQL question because the PostgreSQL documentation is quite good. SQLite does support EXCEPT:
The EXCEPT operator returns the subset of rows returned by the left SELECT that are not also returned by the right-hand SELECT. Duplicate rows are removed from the results of INTERSECT and EXCEPT operators before the result set is returned.
NOT IN() - Negating IN()
SELECT UserID FROM User_Groups WHERE GroupID NOT IN('1','2')
The IN() parameter can also be a sub-query.
Are you looking for a solution to be used with a postgres or a mySQL database?
Or are you looking for a plain SQL solution?
With postgres a subquery with "WHERE NOT EXISTS" might work like:
SELECT * FROM
(SELECT * FROM SCHEMA_NAME.TABLE_NAME)
WHERE
(NOT EXISTS (SELECT * FROM SCHEMA_NAME.TABLE_NAME)

Why is a UDF so much slower than a subquery?

I have a case where I need to translate (lookup) several values from the same table. The first way I wrote it, was using subqueries:
SELECT
(SELECT id FROM user WHERE user_pk = created_by) AS creator,
(SELECT id FROM user WHERE user_pk = updated_by) AS updater,
(SELECT id FROM user WHERE user_pk = owned_by) AS owner,
[name]
FROM asset
As I'm using this subquery a lot (that is, I have about 50 tables with these fields), and I might need to add some more code to the subquery (for example, "AND active = 1" ) I thought I'd put these into a user-defined function UDF and use that. But the performance using that UDF was abysmal.
CREATE FUNCTION dbo.get_user ( #user_pk INT )
RETURNS INT
AS BEGIN
RETURN ( SELECT id
FROM ice.dbo.[user]
WHERE user_pk = #user_pk )
END
SELECT dbo.get_user(created_by) as creator, [name]
FROM asset
The performance of #1 is less than 1 second. Performance of #2 is about 30 seconds...
Why, or more importantly, is there any way I can code in SQL server 2008, so that I don't have to use so many subqueries?
Edit:
Just a litte more explanation of when this is useful. This simple query (that is, get userid) gets a lot more complex when I want to have a text for a user, since I have to join with profile to get the language, with a company to see if the language should be fetch'ed from there instead, and with the translation table to get the translated text. And for most of these queries, performance is a secondary issue to readability and maintainability.
The UDF is a black box to the query optimiser so it's executed for every row.
You are doing a row-by-row cursor. For each row in an asset, look up an id three times in another table. This happens when you use scalar or multi-statement UDFs (In-line UDFs are simply macros that expand into the outer query)
One of many articles on the problem is "Scalar functions, inlining, and performance: An entertaining title for a boring post".
The sub-queries can be optimised to correlate and avoid the row-by-row operations.
What you really want is this:
SELECT
uc.id AS creator,
uu.id AS updater,
uo.id AS owner,
a.[name]
FROM
asset a
JOIN
user uc ON uc.user_pk = a.created_by
JOIN
user uu ON uu.user_pk = a.updated_by
JOIN
user uo ON uo.user_pk = a.owned_by
Update Feb 2019
SQL Server 2019 starts to fix this problem.
As other posters have suggested, using joins will definitely give you the best overall performance.
However, since you've stated that that you don't want the headache of maintaining 50-ish similar joins or subqueries, try using an inline table-valued function as follows:
CREATE FUNCTION dbo.get_user_inline (#user_pk INT)
RETURNS TABLE AS
RETURN
(
SELECT TOP 1 id
FROM ice.dbo.[user]
WHERE user_pk = #user_pk
-- AND active = 1
)
Your original query would then become something like:
SELECT
(SELECT TOP 1 id FROM dbo.get_user_inline(created_by)) AS creator,
(SELECT TOP 1 id FROM dbo.get_user_inline(updated_by)) AS updater,
(SELECT TOP 1 id FROM dbo.get_user_inline(owned_by)) AS owner,
[name]
FROM asset
An inline table-valued function should have better performance than either a scalar function or a multistatement table-valued function.
The performance should be roughly equivalent to your original query, but any future changes can be made in the UDF, making it much more maintainable.
To get the same result (NULL if user is deleted or not active).
select
u1.id as creator,
u2.id as updater,
u3.id as owner,
[a.name]
FROM asset a
LEFT JOIN user u1 ON (u1.user_pk = a.created_by AND u1.active=1)
LEFT JOIN user u2 ON (u2.user_pk = a.created_by AND u2.active=1)
LEFT JOIN user u3 ON (u3.user_pk = a.created_by AND u3.active=1)
Am I missing something? Why can't this work? You are only selecting the id which you already have in the table:
select created_by as creator, updated_by as updater,
owned_by as owner, [name]
from asset
By the way, in designing you really should avoid keywords, like name, as field names.