Same query produces different results in SQL - sql-server-2005

In one machine, we have a set of data and say we have a column isValid which contains true or false and we also have another column which defines a group.For every group there can be only one value as true for Isvalid column and the rest as false.
Now, when we run our query , based on Group, the row which contains the Isvalid column as True comes as the first row in query result and the rest of the rows contains the Isvalid column which are false.
Here we dont use any 'order by' or 'group by', we just use 'inner join' and 'where' conditions.
The issue is on our development server and test server we get the expected query results, but when it goes to live server(For all three servers, i.e development, testing and live server the data is entirely different and all these servers run on same version SQL 2005), the results are interchanged(Row with isvalid column false comes as first row in query result) don't know why.Any suggestions please?
Please help,
Many thanks,
Byfour

The obvious answer is: different data on test, dev and live servers.
But even with the same data, without an ORDER BY clause, results are usually returned in clustered index order, BUT this is not guaranteed.
If you require results in a specific ordering, then you must use an ORDER BY clause.

Related

Oracle SQL Query - where clause starting and end in certain values

I just want to check whether my query is correct. I have written a simple query on one table and against a field in that table where we are looking for all rows where the value in that field is starting with a certain characters and ending in certain characters, but must not contain certain characters in that field's value. Thus -
WHERE field_1 LIKE 'ABC%' AND field_1 LIKE '%XWY' AND field_1 NOT LIKE '%GHI12XY%'
The result is no rows, which is what we wanted, but I am not 100% sure whether the query is spot on and would like other whether this is correct
It looks correct to me. A good idea in all database software development is to create and load one or more test tables with rows that can be used to verify your logic. Then run your query against the test table(s) and see if you get the results you expect. Ideally, the rows in your test table will cover all the possibilities, for example, where you're supposed to get output, and where you're not supposed to get any.

Invalid Position in SQL WHERE Clause

I have a query I am writing that examines an ID field and derives an ID number from that column based on several criteria. Now that I have its logic written, I want to run the query on each criteria to see if the logic is working. So, the last part of my query for doing so is as follows:
FROM TABLE1
WHERE SOURCE_SYSTEM_NM = 'XYZ' AND ((STRLEFT(SOURCE_ARRANGEMENT_ID,4)) NOT IN ('23CC','21CC'))
LIMIT 10000
Essentially what I am trying to do here is tell it to return to me only items with SOURCE_SYSTEM_NM equal to 'XYZ', while eliminating any with a SOURCE_ARRANGEMENT_ID not having the first 4 characters equal to '21CC' or '23CC'. I have a third criteria I want to filter on as well, which is that the first three characters must be '0CC'.
My problem when I run this is I get back an "Invalid Position" error. I removed one of the strings from the criteria, and it works. So, I decided to add the second in its own 'NOT IN...' clause with an AND between them, but that resulted in the same error.
If I had to guess, the NOT IN ('21CC','23CC') puts an AND between them and I think that must be the root of my issue. The criteria in my CASE statement derives the ID number with the following:
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND STRLEFT(SOURCE_ARRANGEMENT_ID, 4) IN ('23CC','21CC') THEN STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-4)
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND STRLEFT(SOURCE_ARRANGEMENT_ID, 3) IN ('0CC') THEN STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-3)
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND (STRLEFT(SOURCE_ARRANGEMENT_ID, 4) NOT IN ('23CC','21CC') OR STRLEFT(SOURCE_ARRANGEMENT_ID, 3) NOT IN ('0CC')) THEN (SOURCE_ARRANGEMENT_ID)
So with that, I am just trying to check each criteria to make sure the ID derived/created is correct. I need to filter down to get results for that last WHEN statement above, but I keep getting that "Invalid Position" in my WHERE statement at the end. I am using Aginity to run this query and it's running against an IBM Netezza database. Thanks in advance!
I figured out what the issue was on this - when performing
STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-4)
There are some of those Arrangement IDs that do not have 4 characters, thus I was getting an "Invalid Position". I fixed this by updating this query to use substring() instead:
SUBSTRING(SOURCE_ARRANGEMENT_ID,5,LENGTH(SOURCE_ARRANGEMENT_ID))
This fixed my issue. Just wanted to post an answer in case others have this issue. It s not Netezza specific, this will react this way with any SQL variant.

Efficiency check in sql query

I'm struggle between 2 ways of execute my mission.
I have set of conditions, and when they are all true - I need to set "true" in an "x" column attribute of that record.
What is more efficient & recommended, and why?
set this column into "false" for all records in the table, and then
run another query to set "true" under some conditions.
set "true" if all conditions are "true", then run another query to set "false" on all records where one or more of the conditions fails.
I cannot assume that there is some default value of the "x" column need to be changed, because the query should run also one in a while, when its needed, after initial values were inserted to that column, and some conditions may be changed from the last time I ran this query.
Perhaps there is also another idea, more efficient that the 2 above?
Also I'd like to understand how to calculate efficiently of a query, something similar to the way of efficient calculation in programming.
Probably the most efficient way is to use a case Statement, since you have to evaluate the condition only once per row and also have to modify every row only once, e.g.
UPDATE tablename SET fieldname = (CASE WHEN conditions THEN 'true' ELSE 'false' END)
Case Statements are at least available in Oracle and MS-SQL. Dont know about other DB vendors.
if all conditions are similar in datastructure, I don't think it really matters.
If the conditions have different data query structures (for example, "my key isn't a foreign key in tableB" and "State = 'California'"), setting to true and then checking them one by one (as a sequence of SQL statements) is better, cause you can check the simple ones first and not bother about the complex ones for records that got their flag removed already.

SQL Injection clarification

There is a query like:
select * from tablename where username='value1' and password='value2';
If I set to the fields the following:
username ='admin' and password ='admin';
Then I sign in into the website as administrator.
Now, If I wanted to SQL inject my query, I would enter to the username field the value 'or 1=1, after which the query would be executed like:
select * from tablename where username ='' or 1=1
Assuming everything after this the query is executed successfully.
My question is based on above example, what user we will be logged in as?
As:
1. Admin
2. Or first row in table?
3. Or any other user and how?
That is just a SQL query, it does not login or do any other application functionality. What is done with the data retrieved is entirely dependent on the specific application.
The code may happily consume the first row in the resulting recordset and assume that is the user to be logged in. It may also throw an exception, e.g., if the query is being done with LINQ and .SingleOrDefault() is used. Without seeing the application code, there is no way to know.
All rows in the tablename table will be returned to whatever is running this query. The order in which these rows are returned isn't well defined (and tables don't have an order, so your guess of "first row" is wrong for a number of reasons)
We'd then need to see the consuming code to know what happens - it might take the first row it's given, it might take the last row it's given (different runs of this query could have different first and last rows, because as I said, the order isn't well defined). It may attempt to (in some manner) merge all of the resulting rows together.
You shouldn't try to reason about what happens when your code is subject to SQL injection, you should just apply adequate defenses (e.g. parameterized queries) so you don't have to think about it again.
For example, lets say, for the sake of argument, that this query always returned the rows in some particular order (so long as the moon is full), such that the lowest UserID (or similar) is the first row, and that the consuming code uses the first returned row and ignores other rows. So you decide to "cunningly" create a dummy user with UserID 0 which can't do anything and warns you of an attack.
Well, guess what - all the attacker has to do is inject an ORDER BY CASE WHEN UserName='Admin' THEN 0 ELSE 1 END into the query - and bingo, the first row returned is now guaranteed to be the Admin user.

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.