What does the output_expression for "DELETE FROM table" do? - sql

I recently ran across an oddity. The following is valid SQL:
DELETE FROM customer *;
The documentation for PostgreSQL DELETE says the star is a possible value for the output_expression:
An expression to be computed and returned by the DELETE command after
each row is deleted. The expression can use any column names of the
table or table(s) listed in USING. Write * to return all columns.
I tried it with and without the star and can't see a difference. In fact, I can put just about anything single word after the table name and it is accepted. It doesn't even have to be an actual column name. Nothing extra is returned.
db=> DELETE FROM customer wheeeeeee;
DELETE 19
So what does it do and what could I use it for?
Question also posted on the PostgreSQL mailing list.

The asterisk is not output_expression, for this you would have to use the RETURNING keyword. It is instead an old, obsolete syntax for including child tables in queries. (The last version for which it is documented seems to be PostgreSQL 8.1. Since the syntax is still valid it is a documentation bug, as Tom Lane points out in the post linked below.)
Since PostgreSQL 7.1 this is the default (unless sql_inheritance is set to off) and the ONLY keyword is used for the opposite, so the * is not very useful.
See this explanatory post from Tom Lane on the PostgreSQL mailing list.

Related

How to update a table in PostgreSQL recursively based on condition in column

I have a PostgreSQL database with many different tables and relations. There is one table named User. In our project, when people join the product, they can optionally provide FirstName and LastName. In the User table, it is kept under one column "name", so it looks like this: "Alex Smith"(exactly one space between them). There was a little bug, that if a user would not provide a FirstName or LastName, JavaScript would've inserted undefined. Our base looks like this:
If there is no FirstName, it is "undefined LastName", or the other way, or "undefined undefined". Now the bug is solved, but the base needs to be cleaned up a little bit. There are thousands of users with undefined FirstNames or LastNames or both. What I was trying to do, is to write a PostgreSQL query, which goes recursively over the table User and checks if there is "undefined" in a column and replaces undefined with an empty string "".
My problem is that I need to check and replace only undefined, if there is "undefined Smith", only undefined should be replaced with an empty string. I've checked the official documentation, and in StackOverflow, I couldn't find any similar case. If anybody will have a clue, would appreciate it a lot.
Thanks in advance.
Look up the Postgres String Functions Trim and Replace.
Using them together is what you are looking for. Something like (see example)
update user
set name = trim(replace(name, 'undefined',''));
Warning: USER is a very bad table name; it is a reserved word by by both Postgres and SQL Standard. While you can get away with using it in other than a predefined meaning, Postgres developers would be well within their writes to make in an invalid name at any time. Never use reserved words nor data types as DB object names.

Hive list of reserved words

Can anybody point me to a page listing all the reserved words in Hive?
There are many questions here aimed to using the reserved words as a column or table names after such columns or tables were created. My question is about how to avoid creating such columns or tables.
If you google "some_DBMS reserved words", the first hit is the official page.
I.e., here it's for Oracle, here for Postgres, here for MySQL, etc. But not for Hive.
Here is the only page I was able to find, but it's inaccurate - it does not include the DIV keyword, which is empirically found to be reserved.
I was looking into the code and found it to be in:
/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g
https://github.com/apache/hive/blob/rel/release-3.1.0/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
You can get reserve keyword of hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

GetRowsWithConditions in the condition I can use AND but not OR

I'm trying to get some rows from a table using the GetRowsWithConditions method in App Inventor 2. I've used AND and it works correctly but when I use OR I get 400 Bad Request Invalid query: Parse error near 'OR'.
The condition is
WHERE ROWID=1 OR ROWID=1001 OR ROWID=2001
As Taifun mentioned, "OR" is not supported in Fusion Tables, but an alternative suggested by Google is to use "IN".
Wikipedia Entry:
IN will find any values existing in a set of candidates.
SELECT ename WHERE ename IN ('value1', 'value2', ...)
All rows match the predicate if their value is one of the candidate set of values. This is the same behavior as
SELECT ename WHERE ename='value1' OR ename='value2'
except that the latter could allow comparison of several columns, which each IN clause does not. For a larger number of candidates, IN is less verbose.
So in theory*, your query would be reformatted to:
... WHERE ROWID IN ('1','1001','2001')
Hope that helps!
*I say in theory, because I've never used ROWID as the filter as I've always created a custom ID column.
OR does not exist in the Fusiontable SQL language, see also the SQL Reference Documentation of the Fusion Tables API.

Why are dot-separated prefixes ignored in the column list for INSERT statements?

I've just come across some SQL syntax that I thought was invalid, but actually works fine (in SQL Server at least).
Given this table:
create table SomeTable (FirstColumn int, SecondColumn int)
The following insert statement executes with no error:
insert SomeTable(Any.Text.Here.FirstColumn, It.Does.Not.Matter.What.SecondColumn)
values (1,2);
The insert statement completes without error, and checking select * from SomeTable shows that it did indeed execute properly. See fiddle: http://sqlfiddle.com/#!6/18de0/2
SQL Server seems to just ignore anything except the last part of the column name given in the insert list.
Actual question:
Can this be relied upon as documented behaviour?
Any explanation about why this is so would also be appreciated.
It's unlikely to be part of the SQL standard, given its dubious utility (though I haven't checked specifically (a)).
What's most likely happening is that it's throwing away the non-final part of the column specification because it's superfluous. You have explicitly stated what table you're inserting into, with the insert into SomeTable part of the command, and that's the table that will be used.
What you appear to have done here is to find a way to execute SQL commands that are less readable but have no real advantage. In that vein, it appears similar to the C code:
int nine = 9;
int eight = 8;
xyzzy = xyzzy + nine - eight;
which could perhaps be better written as xyzzy++; :-)
I wouldn't rely on it at all, possibly because it's not standard but mostly because it makes maintenance harder rather than easier, and because I know DBAs all over the world would track me down and beat me to death with IBM DB2 manuals, their choice of weapon due to the voluminous size and skull-crushing abilities :-)
(a) I have checked non-specifically, at least for ISO 9075-2:2003 which dictates the SQL03 language.
Section 14.8 of that standard covers the insert statement and it appears that the following clause may be relevant:
Each column-name in the insert-column-list shall identify an updatable column of T.
Without spending a huge amount of time (that document is 1,332 pages long and would take several days to digest properly), I suspect you could argue that the column could be identified just using the final part of the column name (by removing all the owner/user/schema specifications from it).
Especially since it appears only one target table is possible (updatable views crossing table boundaries notwithstanding):
<insertion target> ::= <table name>
Fair warning: I haven't checked later iterations of the standard so things may have changed. But I'd consider that unlikely since there seems to be no real use case for having this feature.
This was reported as a bug on Connect and despite initially encouraging comments the current status of the item is closed as "won't fix".
The Order by clause used to behave in a similar fashion but that one was fixed in SQL Server 2005.
I get errors when I try to run the script on SQL Server 2012 as well as SQL Server 2014 and SQL Server 2008 R2. So you can certainly not rely on the behavior you see with sqlfiddle.
Even if this were to work, I would never rely on undocumented behavior in production code. Microsoft will include notice of breaking changes with documented features but not undocumented ones. So if this were an actual T-SQL parsing bug that was later fixed, it would break the mal-formed code.

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.