Behavior of Selecting Non-Existent Columns [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Given a table with this structure:
create table person (
id int identity primary key,
name varchar(256),
birthday datetime
)
For this query:
select id,name,birthday,haircolor from person
What is the rational behind SQL throwing an error in this situation? I was considering how flexible it would be if queries would simply return null for non existent columns. What would be concrete reasons for or against such a SQL language design?

No, because then you'd assume there is a haircolour column which could then have the following implications, just as an example:
You'd think you could insert into the field since you would assume that the column exists,
The returned result would not be an accurate representation of the database schema.
Errors are given so that you can have a clear understanding and indication of what you can and
cannot do. It's also there so you can catch exceptions, bugs, spiders, and of course creepy crawlies as soon as possible. :)
We dont wan't to accomdate lazy developers.
Do the right thing, be a man - Russel Peters.

This would be inconsistent for many reasons. One simple example is use of * to select the column: when you do this
select id, name, birthday, hair_color from person
and get back a table with all four columns present (hair_color is set to null on all rows) you would reasonably expect that the query below
select * from person
returned at least four columns, and that hair_color is among the columns returned. However, this wouldn't be the case if SQL allowed non-existent columns to return nulls.
This would also create hard-to-find errors when a column gets renamed in the schema, but some of the queries or stored procedures do not get re-worked to match the new names.
Generally, though, SQL engine developers make tradeoffs between usability and "hard" error checking. Sometimes, they would relax a constraint or two, to accommodate some of the most common "simple" mistakes (such as forgetting to use an aggregate function in a grouped query; it is an error, but MySql lets you get away with it). When it comes to schema checks, though, SQL engines do not allow any complacency, treating all missing schema elements as errors.

MySQL has one horrible bug in it...
select field1,field2,filed3
from table
group by field1
Any database engine would return 'error, wtf do I do with the field2 and field3 in the select line when they are not an aggregate nor in the group by statement'.
MySQL on the other hand will return two random values for field 2 and 3, and not return an error (which I refer to as 'doing the wrong thing and not returning an error'). THis is a horrid bug, the number of scripts of troubleshooted to discover theat MySQL is not handling group by's correctly is absurd...give an error before giving unintentional results and this huge bug won't be such an issue
Doesn't it seem to you that you are requesting more of this stupid behaviour to be propagated...just in the select clause instead of the group by clause?
edit:
typo propagation as well.
select age,gender, haricolour from...
I'd prefer to get an error back saying 'great typo silly' instead of a misnamed field full of nulls.

This would result in a silent errors problem. The whole reason you have compile time errors and runtime exceptions is to catch bugs as soon as possible.
If you had a typo in a column name and it returned null instead of an error, your application will continue to work, but various things would misbehave as a result. You might have a column that saves a user setting indicating they don't want to receive emails. However your query has a typo and so it always returns null. Gradually a few people begin to report that they are setting the "Do not send emails" setting but are still getting emails. You have to hunt through all your code to figure out the cause. First you look at the form that edits this setting, then at the code that calls the database to save the setting, then you verify the data exists and the setting is getting persisted, then you look at the system that sends emails, and work your way up to the DB layer there that retrieves settings and painstakingly look through the SQL for that typo.
How much easier would that process be if it had just thrown an error in the first place? No users frustrated. No wasting time with support requests. No wasting time troubleshooting.

Returning null would be too generic. I like it this way much better. In your flow you can probably catch errors and return null if you want. But giving more info about why the query failed is better than receiving a null (and probably have you scratch your head on your JOIN and WHERE clauses

Related

'-999' used for all condition

I have a sample of a stored procedure like this (from my previous working experience):
Select * from table where (id=#id or id='-999')
Based on my understanding on this query, the '-999' is used to avoid exception when no value is transferred from users. So far in my research, I have not found its usage on the internet and other company implementations.
#id is transferred from user.
Any help will be appreciated in providing some links related to it.
I'd like to add my two guesses on this, although please note that to my disadvantage, I'm one of the very youngest in the field, so this is not coming from that much of history or experience.
Also, please note that for any reason anybody provides you, you might not be able to confirm it 100%. Your oven might just not have any leftover evidence in and of itself.
Now, per another question I read before, extreme integers were used in some systems to denote missing values, since text and NULL weren't options at those systems. Say I'm looking for ID#84, and I cannot find it in the table:
Not Found Is Unlikely:
Perhaps in some systems it's far more likely that a record exists with a missing/incorrect ID, than to not be existing at all? Hence, when no match is found, designers preferred all records without valid IDs to be returned?
This however has a few problems. First, depending on the design, user might not recognize the results are a set of records with missing IDs, especially if only one was returned. Second, current query poses a problem as it will always return the missing ID records in addition to the normal matches. Perhaps they relied on ORDERing to ease readability?
Exception Above SQL:
AFAIK, SQL is fine with a zero-row result, but maybe whatever thing that calls/used to call it wasn't as robust, and something goes wrong (hard exception, soft UI bug, etc.) when zero rows are returned? Perhaps then, this ID represented a dummy row (e.g. blanks and zeroes) to keep things running.
Then again, this also suffers from the same arguments above regarding "record is always outputted" and ORDER, with the added possibility that the SQL-caller might have dedicated logic to when the -999 record is the only record returned, which I doubt was the most practical approach even in whatever era this was done at.
... the more I type, the more I think this is the oven, and only the great grandmother can explain this to us.
If you want to avoid exception when no value transferred from user, in your stored procedure declare parameter as null. Like #id int = null
for instance :
CREATE PROCEDURE [dbo].[TableCheck]
#id int = null
AS
BEGIN
Select * from table where (id=#id)
END
Now you can execute it in either ways :
exec [dbo].[TableCheck] 2 or exec [dbo].[TableCheck]
Remember, it's a separate thing if you want to return whole table when your input parameter is null.
To answer your id = -999 condition, I tried it your way. It doesn't prevent any exception

What happens if I SELECT something AS a name then something else AS the same name?

I've already answered my question but didn't see it on here, so here we go. Please feel free to link to the question if it has been asked exactly.
I simplified my question to the following code:
SELECT 'a' AS col1, 'b' AS col1
Will this give a same column name error?
Will the last value always be returned or is there a chance col1 could be 'a'?
I'm not sure why you would ever want this, but I tried it in Oracle (10g) and it worked fine, returning both columns. I realize you've asked about SQL Server specifically, but I found it interesting that this worked at all.
Edit: It also works on MySQL.
It works in the final query:
However when you do it in a subselect and refer to the ambiguous column aliases in an outer query you get an error:
In SQL 2008 r2 it is valid as a stand alone query. Under certain circumstances it will produce errors (incomplete list):
Inline views
Common Table Entries
Stored Procedures when the output is used by reporting services and presumably similarly integrated tools
It's hard to imagine a case where you would want duplicate row names, and it's easy to think of ways in which writing queries with repeats now could turn sour in the future.

using MS SQL I need to select into a table while casting a whole load of strings to ints! can it be done?

Basically, I am the new IT type guy, old guy left a right mess for me! We have a MS-Access DB which stores the answers to an online questionnaire, this particular DB has about 45,000 records and each questionnaire has 220 questions. The old guy, in his wisdom decided to store the answers to the questionnaire questions as text even though the answers are 0-5 integers!
Anyway, we now need to add a load of new questions to the questionnaire taking it upto 240 questions. The 255 field limit for access and the 30ish columns of biographical data also stored in this database means that i need to split the DB.
So, I have managed to get all the bioinfo quite happily into a new table with:
SELECT id,[all bio column names] INTO resultsBioData FROM results;
this didn't cause to much of a problem as i am not casting anything, but for the question data i want to convert it all to integers, at the moment I have:
SELECT id,CInt(q1) AS nq1.......CInt(q220) AS nq220 INTO resultsItemData FROM results;
This seems to work fine for about 400 records but then just stops, I thought it may be because it hit something it cant convert to a integer to start with, so i wrote a little java program that deleted any record where any of ther 220 answers wasnt 0,1,2,3,4 or 5 and it still gives up around 400 (never the same record though!)
Anyone got any ideas? I am doing this on my test system at the moment and would really like something robust before i do it to our live system!
Sorry for the long winded question, but its doing my head in!
I'm unsure whether you're talking about doing the data transformation in Access or SQL Server. Either way, since you're redesigning the schema, now is the time to consider whether you really want resultsItemData table to include 200+ fields, from nq1 through nq220 (or ultimately nq240). And any future question additions would require changing the table structure again.
The rule of thumb is "columns are expensive; rows are cheap". That applies whether the table is in Access or SQL Server.
Consider one row per id/question combination.
id q_number answer
1 nq1 3
1 nq2 1
I don't understand why your current approach crashes at 400 rows. I wouldn't even worry about that, though, until you are sure you have the optimal table design.
Edit: Since you're stuck with the approach you described, I wonder if it might work with an "append" query instead of a "make table" query. Create resultsItemData table structure and append to it with a query which transforms the qx values to numeric.
INSERT INTO resultsItemData (id, nq1, nq2, ... nq220)
SELECT id, CInt(q1), CInt(q2), ... CInt(q220) FROM results;
Try this solution:
select * into #tmp from bad_table
truncate table bad_table
alter bad_table alter column silly_column int
insert bad_table
select cast(silly_column as int), other_columns
from #tmp
drop table #tmp
Reference: Change type of a column with numbers from varchar to int
Just wrote a small java program in the end that created the new table and went through each record individually casting the fields to integers, takes about an hour and a half to do the whole thing though so i am still after a better solution when i come to do this with the live system.

MSAccess SQL Injection

Situation:
I'm doing some penetration testing for a friend of mine and have total clearance to go postal on a demo environment. Reason for this is because I saw a XSS-hole in his online ASP-application (error page with error as param allowing html).
He has a Access DB and because of his lack of input-validation I came upon another hole: he allows sql injection in a where-clause.
I tried some stuff from:
http://www.krazl.com/blog/?p=3
But this gave limited result:
MSysRelationships is open, but his Objects table is shielded.
' UNION SELECT 1,1,1,1,1,1,1,1,1,1 FROM MSysRelationships WHERE '1' = '1 <-- worked so I know the parent table has at least 9 columns. I don't know how I can exploit the relation table to get tablenames ( I can't find any structures explanation so I don't know on what to select.
Tried brute-forceing some tablenames, but to no avail.
I do not want to trash his DB, but I do want to point out the serious flaw with some backing.
Anyone has Ideas?
Usually there are two ways to proceed from here. You could try to guess table names by the type of data which is stored in them which often works ("users" usually stores the user data ...). The other method would be to generate speaking error messages in the application to see if you can fetch table or column names from there.

Oracle9i: Filter Expression Fails to Exclude Data at Runtime

I have a relatively simple select statement in a VB6 program that I have to maintain. (Suppress your natural tendency to shudder; I inherited the thing, I didn't write it.)
The statement is straightforward (reformatted for clarity):
select distinct
b.ip_address
from
code_table a,
location b
where
a.code_item = b.which_id and
a.location_type_code = '15' and
a.code_status = 'R'
The table in question returns a list of IP addresses from the database. The key column in question is code_status. Some time ago, we realized that one of the IP addresses was no longer valid, so we changed its status to I (invalid) to exclude it from appearing in the query's results.
When you execute the query above in SQL Plus, or in SQL Developer, everything is fine. But when you execute it from VB6, the check against code_status is ignored, and the invalid IP address appears in the result set.
My first guess was that the results were cached somewhere. But, not being an Oracle expert, I have no idea where to look.
This is ancient VB6 code. The SQL is embedded in the application. At the moment, I don't have time to rewrite it as a stored procedure. (I will some day, given the chance.) But, I need to know what would cause this disparity in behavior and how to eliminate it. If it's happening here, it's likely happening somewhere else.
If anyone can suggest a good place to look, I'd be very appreciative.
Some random ideas:
Are you sure you committed the changes that invalidate the ip-address? Can someone else (using another db connection / user) see the changed code_status?
Are you sure that the results are not modified after they are returned from the database?
Are you sure that you are using the "same" database connection in SQLPlus as in the code (database, user etc.)?
Are you sure that that is indeed the SQL sent to the database? (You may check by tracing on the Oracle server or by debugging the VB code). Reformatting may have changed "something".
Off the top of my head I can't think of any "caching" that might "re-insert" the unwanted ip. Hope something from the above gives you some ideas on where to look at.
In addition to the suggestions that IronGoofy has made, have you tried swapping round the last two clauses?
where
a.code_item = b.wich_id and
a.code_status = 'R' and
a.location_type_code = '15'
If you get a different set of results then this might point to some sort of wrangling going on that results in dodgy SQL actually be sent to the database.
There are Oracle bugs that result in incorrect answers. This surely isn't one of those times. Usually they involve some bizarre combination of views and functions and dblinks and lunar phases...
It's not cached anywhere. Oracle doesn't cache results until 11 and even then it knows to change the cache when the answer may change.
I would guess this is a data issue. You have a DISTINCT on the IP address in the query, why? If there's no unique constraint, there may be more than one copy of your IP address and you only fixed one of them.
And your Code_status is in a completely different table from your IP addresses. You set the status to "I" in the code table and you get the list of IPs from the Location table.
Stop thinking zebras and start thinking horses. This is almost certainly just data you do not fully understand.
Run this
select
a.location_type_code,
a.code_status
from
code_table a,
location b
where
a.code_item = b.which_id and
b.ip_address = <the one you think you fixed>
I bet you get one row with an 'I' and another row with an 'R'
I'd suggest you have a look at the V$SQL system view to confirm that the query you believe the VB6 code is running is actually the query it is running.
Something along the lines of
select sql_text, fetches
where sql_text like '%ip_address%'
Verify that the SQL_TEXT is the one you expect and that the FETCHES count goes up as you execute the code.