SQL Exist Predicate Explanation

SQL Exist Predicate Explanation - sql

Can someone please explain to me how EXIST works in SQL?
I am not entirely sure my result is working the way I need it to.
I just need an example using 3 different queries.
Please be very specific and explain it to me like you are talking to a 5 year old :)
Edit: Here is what I do not understand and need someone to make clear:
The subquery will generally only be executed long enough to determine
whether at least one row is returned, not all the way to completion.
It is unwise to write a subquery that has side effects (such as
calling sequence functions); whether the side effects occur might be
unpredictable.
Since the result depends only on whether any rows are returned, and
not on the contents of those rows, the output list of the subquery is
normally unimportant. A common coding convention is to write all
EXISTS tests in the form EXISTS(SELECT 1 WHERE ...). There are
exceptions to this rule however, such as subqueries that use
INTERSECT.
I am trying to create a query that selects values that do not exist in another query. Based on what I just quoted above, is this not a good idea to use EXISTS?
Edit Part 2:
If I am trying to make sure a value doesn't exist in column 1 OR Column 2 OR column 3 OR column 4, should I use ALL or Exist?
My result looks weird with Exist, so I just wanted to make sure I understood it correctly, how it works.

Related

Will this 'Where' condition speed up SQL query

I am going to be making a complex (for me) SQL query that involves finding the totals of both invoiced value and goods received value both linked to a purchase order line (so that's 3 joined tables, and possibly more) along with various date filters and so on.
In many cases, PO lines will reach a state where I know I won't ever have to worry about them again. I could therefore add a logic field to my PO line table to show this, tick the relevant lines as I go along, and add a where condition in the SQL to make it ignore them.
What I want to know is, will that Where condition be executed before or after the Select? Because if it's doing all the calcs and then just filtering the output, I don't want it as it's just more time/processing. If however the Where clause filters the input (ie before it does any calcs) then it could be a very significant time-saver.
I know you could just say 'try it and see' but whether or not I add that logic field has implications for other processes and reports, so I want to get it planned without spending too much time building a test.
So, the TL;DR: Is Where executed before Select in SQL? Or to put it another way, does Where filter the input or the output of a query?
Hope that makes sense, I'm still a bit of a beginner here!

Check out Order of execution which shows that the WHERE clause is evaluated in step 2, while SELECT is step 5.
So the WHERE filters (and thus reduces) the number of rows that will be processed by the SELECT.
I'm not entirely sure if this applies to all regular SQL-based database engines - the site doesn't seem to limit itself to a single RDBMS - so there's a chance this will be handled the same by any relational database system

Determine if a SQL Insert/Update statement affects the result from a stored Select Statement

Thought this would be a good place to ask for some "brainstorming." Apologies if it's a little broad/off subject.
I was wondering if anyone here had any ideas on how to approach the following problem:
First assume that I have a select statement stored somewhere as an object (this can be the tree form of the query). For example (for simplicity):
SELECT A, B FROM table_A WHERE A > 10;
It's easy to determine the below would change the result of the above query:
INSERT INTO table_A (A,B) VALUES (12,15);
But, given any possible Insert/Update/Whatever statement, as well as any possible starting Select (but we know the Selects and can analyze them all day) I'd like to determine if it would affect the result of the Select Statement.
It's fine to assume that there won't be any "outside" queries, and that we know about all the queries being sent to the DB. It is also assumed we know the DB schema.
No, this isn't for homework. Just a brain teaser I've been thinking about and started to get stuck on (obviously, SQL can get very complicated.)

Based on the reply to the comment, I'd say that without additional criteria, this ranges between very hard and impossible.
Very hard (leastways, it would be for me) because you'd have to write something to parse and interpret your SQL statements into a workable frame of reference for your goals. Doable, but can it be worth the effort?
Impossible because some queries transcend phrases like "Byzantinely complex". (Think nested queries, correlated subqueries, views, common table expressions, triggers, outer joins, and who knows what all.) Without setting criteria such as "no subqueries, no views or triggers, no more than X joins" and so forth, the problem becomes open-ended enough to warrant an NP Complete answer.

My first thought would be to put a trigger on table_A, where if any of the columns you're affecting (col A in this case) changes to meet (or no longer meet) the condition (> 10 here), then the trigger records that an "affecting" change has taken place.
E.g. have another little table to record a "last update timestamp", which the trigger could pop a getdate() into when it detects such a change.
Then, you could check that table to see if the timestamp has changed since the last time you ran the select query - if it has, then you know you need to re-run it, if it hasn't, then you know the results would be the same.
The table could hold many such timestamps (one per row, perhaps with the table/trigger name as a key value in another column) to service many such triggers.
Advantage? Being done in a trigger on the table means no risk of a change that could affect the select statement being missed.
Disadvantage? I guess depending on how your select statements come into existence, you might have an undesirable/unmanageable overhead in creating the trigger(s).

How to simulate ifs in a sql query that is not database server dependent?

Given the below table:
|idAsPrimaryKey|Id - it has a diff name, but it is easier like this|column A|
How can I select in a single sql query, not database server specific, something similar to:
List of results = null
for each different id:
if there is a row for this id that has for column A the value V1
ListOfResults add this found row
else
if there is a row for this id that has for column A the value V2
ListOfResults add this found row
else
add to ListOfResults the first row found for this id

Quite easy, since you don't seem to know anything about SQL, here's a "teach a man how to fish..." answer.
You have an amount of data and "only" a language how to get data, nothing to really "program". (Of course there are functions and procedures and so on, but those are used in other circumstances or the programmer makes things more complicated than necessary)
Because of this, you have to find a way, how to combine the data, sometimes even with itself, to get what you want. This blog post explains the basics of joins (that's how you combine tables or data from subqueries): A Visual Explanation of SQL Joins (for critics of this post, please read on...)
With this basic knowledge you should now try to create a query, where you join your table to itself two times. To choose the right value for your ListOfResults you then have to use the COALESCE() function. It returns the first of its parameters which isn't NULL.
Here comes the critic for the link I posted above. The Venn diagramms used in the first link don't represent how much data you get back from joining. For this to learn, read this answer here on SO: sql joins as venn diagram
Okay, now you learned, that you might get more data back than you might expect. And here comes another problem in your wording of your question. There's no "first" row in relational databases, you have to exactly describe which row you want, else the data you get back is actually worth nothing. You get random data. A solution for both problems is using GROUP BY and (important!) an appropriate aggregate function.
This should be enough info for you to solve the problem. Feel free to ask more questions if anything is unclear.

Can this SQL Query be optimized

I have got below SQL query in Procedure, can this more optimized for best results.
SELECT DISTINCT
[PUBLICATION_ID] as n
,[URL] as u
FROM [LINK_INFO]
WHERE Component_Template_Priority > 0
AND PUBLICATION_ID NOT IN (232,481)
ORDER BY URL
Please suggest, is using NOT Exists is better way in this.
Thanks

It is possible to use NOT EXISTS. Just going from the code above you probably shouldn't, but it's technically possible. As a general rule; a very small, quickly resolved set (two literals would definitely apply) will perform better as a NOT IN than as a NOT EXISTS. NOT EXISTS wins when NOT IN has to do enough comparisons against each row that the correlated subquery for NOT EXISTS (which stops at the first match) resolves more quickly.
This assumes that the comparison set cannot include NULL. Otherwise NOT IN and NOT EXISTS do not return the same results because NOT IN ( NULL, ...) always returns NULL and therefore no rows whereas NOT EXISTS excludes rows for which it finds a match and NULL won't generate a match and so won't exclude the row.
A third way to compare two sets for mismatches is with an OUTER JOIN. I don't see a reason to go into that from what we've got so far, so I'll let that one go for now.
A definitive answer would depend on a lot of variables (hence the comments on your question)...
What is the cardinality (number of different values) of the publication_id column?
Is there an index on the column?
How many rows are in the table?
Where did you get the values in your NOT IN clause?
Will they always be literals or are they going to come from parameters or a subquery?
... just to name a few. Of course, the best way to find out is by writing the query different ways and looking at execution times and query plans.
EDIT Another is with set operators like EXCEPT. Again, probably overkill to go into that.

Why do I need to explicitly specify all columns in a SQL "GROUP BY" clause - why not "GROUP BY *"?

This has always bothered me - why does the GROUP BY clause in a SQL statement require that I include all non-aggregate columns? These columns should be included by default - a kind of "GROUP BY *" - since I can't even run the query unless they're all included. Every column has to either be an aggregate or be specified in the "GROUP BY", but it seems like anything not aggregated should be automatically grouped.
Maybe it's part of the ANSI-SQL standard, but even so, I don't understand why. Can somebody help me understand the need for this convention?

It's hard to know exactly what the designers of the SQL language were thinking when they wrote the standard, but here's my opinion.
SQL, as a general rule, requires you to explicitly state your expectations and your intent. The language does not try to "guess what you meant", and automatically fill in the blanks. This is a good thing.
When you write a query the most important consideration is that it yields correct results. If you made a mistake, it's probably better that the SQL parser informs you, rather than making a guess about your intent and returning results that may not be correct. The declarative nature of SQL (where you state what you want to retrieve rather than the steps how to retrieve it) already makes it easy to inadvertently make mistakes. Introducing fuzziniess into the language syntax would not make this better.
In fact, every case I can think of where the language allows for shortcuts has caused problems. Take, for instance, natural joins - where you can omit the names of the columns you want to join on and allow the database to infer them based on column names. Once the column names change (as they naturally do over time) - the semantics of existing queries changes with them. This is bad ... very bad - you really don't want this kind of magic happening behind the scenes in your database code.
One consequence of this design choice, however, is that SQL is a verbose language in which you must explicitly express your intent. This can result in having to write more code than you may like, and gripe about why certain constructs are so verbose ... but at the end of the day - it is what it is.

The only logical reason I can think of to keep the GROUP BY clause as it is that you can include fields that are NOT included in your selection column in your grouping.
For example.
Select column1, SUM(column2) AS sum
FROM table1
GROUP BY column1, column3
Even though column3 is not represented elsewhere in the query, you can still group the results by it's value. (Of course once you have done that, you cannot tell from the result why the records were grouped as they were.)
It does seem like a simple shortcut for the overwhelmingly most common scenario (grouping by each of the non-aggregate columns) would be a simple yet effective tool for speeding up coding.
Perhaps "GROUP BY *"
Since it is already pretty common in SQL tools to allow references to the columns by result column number (ie. GROUP BY 1,2,3, etc.) It would seem simpler still to be able to allow the user to automatically include all the non-aggregate fields in one keystroke.

It's simple just like this: you asked to sql group the results by every single column in the from clause, meaning for every column in the from clause SQL, the sql engine will internally group the result sets before to present it to you. So that explains why it ask you to mention all the columns present in the from too because its not possible group it partially. If you mentioned the group by clause that is only possible to sql achieve your intent by grouping all the columns as well. It's a math restriction.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas