Access- Where and OR not working - sql

I have the following query written in MS Access
SELECT DISTINCT Table1.ColumnA, Table1.ColumnB,Table1.ColumnC,Table1.ColumnD,Table1.ColumnE
FROM Table2
RIGHT JOIN Table1 ON (Table2.ColumnB = Table1.ColumnF)
WHERE (Table1.ColumnF <>28) OR (Table1.ColumnF<>29)
Tried to with and without Parentheses
When I just have one where statement the 262 records go down to 160 records, as expected
When I have the two conditions connected by the OR, the records go back up to 262, clearly not doing whats expected. Even if just the first condition held, I should not have gone back up to 262 records.
My question is whats wrong with my query, especially as it pertains to the WHERE XXX OR XXX?
Secondly, does the RIGHT JOIN statement have any bearing on the outcome of a subsequent WHERE statement.
Thirdly, if I cannot combined a RIGHT JOIN and a WHERE, what is the optimal way to apply conditions to a query that relies on a RIGHT JOIN?
Appreciate any help!

replace your OR with AND
WHERE (Table1.ColumnF <>28) AND (Table1.ColumnF<>29)

You can use this instead:
WHERE Table1.ColumnF Not In (28,29)
That approach expresses your intention clearly and concisely. Now that you have resolved the issue of OR vs. AND for your WHERE conditions, this suggestion probably doesn't seem very useful. However, keep it in mind for when you have several more such conditions. Not In (28,29,32,40,119) will be easier to write and understand than 4 AND s.

Related

Having Trouble Running An SQL Update Statement

Forgive me but I'm relatively new to SQL.
I am trying to update a column of a table I created with a function I created but when I run the Update Statement, nothing happens, I just see the underscore flashing (I'm assuming its trying to run it). The Update Statement is updating around 60,000 fields so I assume it should take a little while but it's been 10 minutes and no good.
I would just like to know if anyone knows just some general reasons that the underscore may be flashing. I know this is super general but I've just never seen this before.
Here's an image of what I'm talking about:
http://i.imgur.com/Xk3kM2U.png?1
EDIT: There are exactly 67,662 records in the table.
I've also just screenshotted the query and linked it.
Your old-style joins have no join condition between the ap1/r1 pair and the ap2/r2 pair, so you're calling your calc_distance() function for 67,662 * 67,622 combinations of coordinates. The use of distinct is potentially a warning that you know you're getting duplicates. And then there is no correlation between the subquery and the update itself, so you're repeating that for each row in temproute. That will take a while.
It looks like you maybe don't want to be looking at the source airport from two copies of the route table; but the source and destination airports from a single copy.
Something like (untested):
UPDATE temproute tr
SET distance = (
SELECT calc_distance(ap2.latitude, ap2.longitude, ap1.latitude, ap1.longitude)
FROM routes r
JOIN airports ap1 ON ap1.icaoairport = r.sourceid
JOIN airports ap2 ON ap2.icaoairport = r.destid
WHERE r.routeid = tr.routeid
);
If temproute is a copy of route too, which the line count implies, then you don't need to refer to route directly at all in the subquery, perhaps.
But I'm speculating about what you're doing.

Rails doesn't respect my select fields using includes

I want write a query with active record and seems it never respect what I want to do. So, here are my example:
Phonogram.preload(:bpms).includes(:bpms).select("phonograms.id", "bpms.bpm")
This query returns all my fields from phonograms and bpms. The problem is that I need put more 15 relationships in this query.
I also tried use joins but didn't work properly. I've 10 phonograms and returns just 3.
Someone experienced that? How did you solve it properly?
Cheers.
select with includes does not produce consistent behavior. It appears that if the included association returns no results, select will work properly, if it returns results, the select statement will have no effect. In fact, it will be completely ignored, such that your select statement could reference invalid table names and no error would be produced. select with joins will produce consistent behavior.
That's why you better go with joins like:
Phonogram.joins(:bpms).select("phonograms.id", "bpms.bpm")

What can I do to speed this slow query up?

We have a massive, multi-table Sybase query we call the get_safari_exploration_data query, that fetches all sorts of info related to explorers going on a safari, and all the animals they encounter.
This query is slow, and I've been asked to speed it up. The first thing that jumps out at me is that there doesn't seem to be a pressing need for the nested SELECT statement inside the outer FROM clause. In that nested SELECT, there also seems to be several fields that aren't necessary (vegetable, broomhilda, devoured, etc.). I'm also skeptical about the use of the joins ("*=" instead of "INNER JOIN...ON").
SELECT
dog_id,
cat_metadata,
rhino_id,
buffalo_id,
animal_metadata,
has_4_Legs,
is_mammal,
is_carnivore,
num_teeth,
does_hibernate,
last_spotted_by,
last_spotted_date,
purchased_by,
purchased_date,
allegator_id,
cow_id,
cow_name,
cow_alias,
can_be_ridden
FROM
(
SELECT
mp.dog_id as dog_id,
ts.cat_metadata + '-yoyo' as cat_metadata,
mp.rhino_id as rhino_id,
mp.buffalo_id as buffalo_id,
mp.animal_metadata as animal_metadata,
isnull(mp.has_4_Legs, 0) as has_4_Legs,
isnull(mp.is_mammal, 0) as is_mammal,
isnull(mp.is_carnivore, 0) as is_carnivore,
isnull(mp.num_teeth, 0) as num_teeth,
isnull(mp.does_hibernate, 0) as does_hibernate,
jungle_info.explorer as last_spotted_by,
exploring_journal.spotted_datetime as last_spotted_date,
jungle_info.explorer as purchased_by,
early_exploreration_journal.spotted_datetime as purchased_date,
alleg_id as allegator_id,
ho.cow_id,
ho.cow_name,
ho.cow_alias,
isnull(mp.is_ridable,0) as can_be_ridden,
ts.cat_metadata as broomhilda,
ts.squirrel as vegetable,
convert (varchar(15), mp.rhino_id) as tms_id,
0 as devoured
FROM
mammal_pickles mp,
very_tricky_animals vt,
possibly_venomous pv,
possibly_carniv_and_tall pct,
tall_and_skinny ts,
tall_and_skinny_type ptt,
exploration_history last_exploration_history,
master_exploration_journal exploring_journal,
adventurer jungle_info,
exploration_history first_exploration_history,
master_exploration_journal early_exploreration_journal,
adventurer jungle_info,
hunting_orders ho
WHERE
mp.exploring_strategy_id = 47
and mp.cow_id = ho.cow_id
and ho.cow_id IN (20, 30, 50)
and mp.rhino_id = vt.rhino_id
and vt.version_id = pv.version_id
and pv.possibly_carniv_and_tall_id = pct.possibly_carniv_and_tall_id
and vt.tall_and_skinny_id = ts.tall_and_skinny_id
and ts.tall_and_skinny_type_id = ptt.tall_and_skinny_type_id
and mp.alleg_id *= last_exploration_history.exploration_history_id
and last_exploration_history.master_exploration_journal_id *= exploring_journal.master_exploration_journal_id
and exploring_journal.person_id *= jungle_info.person_id
and mp.first_exploration_history_id *= first_exploration_history.exploration_history_id
and first_exploration_history.master_exploration_journal_id *= early_exploreration_journal.master_exploration_journal_id
and early_exploreration_journal.person_id *= jungle_info.person_id
) TEMP_TBL
So I ask:
Am I correct about the nested SELECT?
Am I correct about the unnecessary fields inside the nested SELECT?
Am I correct about the structure/syntax/usage of the joins?
Is there anything else about the structure/nature of this query that jumps out at you as being terribly inefficient/slow?
Unfortunately, unless there is irrefutable, matter-of-fact proof that decomposing this large query into smaller queries is beneficial in the long run, management will simply not approve refactoring it out into multiple, smaller queries, as this will take considerable time to refactor and test. Thanks in advance for any help/insight here!
Am I correct about the nested SELECT?
You would be in some cases, but a competent planner would collapse it and ignore it here.
Am I correct about the unnecessary fields inside the nested SELECT?
Yes, especially considering that some of them don't show up at all in the final list of fields.
Am I correct about the structure/syntax/usage of the joins?
Insofar as I'm aware, *= and =* are merely syntactic sugar for a left and right join, but I might be wrong in stating that. If not, then they merely force the way joins occur, but they may be necessary for your query to work at all.
Is there anything else about the structure/nature of this query that jumps out at you as being terribly inefficient/slow?
Yes.
Firstly, you've some calculations that aren't needed, e.g. convert (varchar(15), mp.rhino_id) as tms_id. Perhaps a join or two as well, but I admittedly haven't looked at the gory details of the query.
Next, you might have a problem with the db design itself, e.g. a cow_id field. (Seriously? :-) )
Last, there occasionally is something to be said about doing multiple queries instead of a single one, to avoid doing tons of joins.
In a blog, for instance, it's usually a good idea to grab the top 10 posts, and then to use a separate query to fetch their tags (where id in (id1, id2, etc.)). In your case, the selective part seems to be around here:
mp.exploring_strategy_id = 47
and mp.cow_id = ho.cow_id
and ho.cow_id IN (20, 30, 50)
so maybe isolate that part in one query, and then build an in () clause using the resulting IDs, and fetch the cosmetic bits and pieces in one or more separate queries.
Oh, and as point out by Gordon, check your indexes as well. But then, note that the indexes may end up of little use without splitting the query into more manageable parts.
I would suggest the following approach.
First, rewrite the query using ANSI standard joins with the on clause. This will make the conditions and filtering much easier to understand. Also, this is "safe" -- you should get exactly the same results as the current version. Be careful, because the *= is an outer join, so not everything is an inner join.
I doubt this step will improve performance.
Then, check each of the reference tables and be sure that the join keys have indexes on them in the reference table. If keys are missing, then add them in.
Then, check whether the left outer joins are necessary. There are filters on tables that are left outer join'ed in . . . these filters convert the outer joins to inner joins. Probably not a performance hit, but you never know.
Then, consider indexing the fields used for filtering (in the where clause).
And, learn how to use explain capabilities. Any nested loop joins (without an index) as likely culprits for performance problems.
As for the nested select, I think Sybase is smart enough to "do the right thing". Even if it wrote out and re-read the result set, that probably would have a marginal effect on the query compared to getting the joins right.
If this is your real data structure, by the way, it sounds like a very interesting domain. It is not often that I see a field called allegator_id in a table.
I will answer some of your questions.
You think that the fields (vegetable, broomhilda, devoured) in nested SELECT could be causing performance issue. Not necessarily. The two unused fields (vegetable, broomhilda) in nested SELECT are from the ts table but the cat_metadata field which is being used is also from ts table. So unless cat_metadata is being covered by index used on ts table, there wont be any performance impact. Because, to extract cat_metadata field the data page from table will need to be fetched anyway. The extraction of other two fields will take little CPU, that's it. So don't worry about that. The 'devoured' field is also a constant. That will not affect the performance either.
Dennis pointed out about usage of convert function convert(varchar(15), mp.rhino_id). I disagree that that will effect performance as it will consume CPU only.
Lastly I would say, try using the set table count to 13, as there are 13 tables in there. Sybase uses four tables at a time for optimisation.

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.

SQL: Alternative to "First" function?

I'm trying to write a query I don't want to have Cartesian products on. I was going to use the First function, because some Type_Codes have multiple descriptions, and I don't want to multiply my dollars.
Select
Sum(A.Dollar) as Dollars,
A.Type_Code,
First(B.Type_Description) as FstTypeDescr
From
Totals A,
TypDesc B
Where
A.Type_Code = B.Type_Code
Group by A.Type_Code
I just want to grab ANY of the descriptions for a given code (I don't really care which one). I get the following error when trying to use FIRST:
[IBM][CLI Driver][DB2/AIX64] SQL0440N No authorized routine named "FIRST" of type "FUNCTION"
Is there another way to do this?
Instead of First(), use MIN().
first() is not SQL standard. I forget what database product it works in, but it's not in most SQL engines. As Recursive points out, min() accomplishes the same thing for your purposes here, the difference being that depending on indexes and other components of the query, it may require a search of many records to find the minimum value, when in your case -- and many of my own -- all you really want is ANY match. I don't know any standard SQL way to ask that question. SQL appears to have been designed by mathematicians seeking a rigorous application of set theory, rather than practical computer geeks seeking to solve real-world problems as quickly and efficiently as possible.
I forget the actual name of this feature, but you can make it so you actually join to a subquery. in that subquery, you can do like oxbow_lakes suggests and use "top 1"
something like:
select * from table1
inner join (select top 1 from table2) t2 on t2.id = table1.id