Minus or not equal to? Which is better?

Minus or not equal to? Which is better? - sql

I have a table with 3 columns cost, from_date and to_date. I have to select all the rows which do not have the dates from beginning of the month to the end of the month. That is, select rows which do not have the from_date as '1-NOV-2011' and to_date as '30-NOV-2011'. I've written 2 queries.
SELECT * FROM TABLE1 WHERE FROM_DATE <> '1-NOV-2011' OR TO_DATE <> '30-NOV-2011';
and
SELECT * FROM TABLE1 MINUS SELECT * FROM TABLE1 WHERE FROM_DATE = '1-NOV-2011' AND TO_DATE = '30-NOV-2011';
Which one will give a better performance?

Clarification
First off, the two queries are not equivalent. The following sets would produce the same results:
Set 1
Query 1
SELECT * FROM TABLE1
WHERE NOT (FROM_DATE = '1-NOV-2011' AND TO_DATE = '30-NOV-2011');
Query 2
SELECT * FROM TABLE1
MINUS SELECT * FROM TABLE1
WHERE FROM_DATE = '1-NOV-2011' AND TO_DATE = '30-NOV-2011';
Set 2
Query 1
SELECT * FROM TABLE1
WHERE FROM_DATE <> '1-NOV-2011' OR TO_DATE <> '30-NOV-2011';
Query 2
SELECT * FROM TABLE1
MINUS SELECT * FROM TABLE1
WHERE FROM_DATE = '1-NOV-2011' OR TO_DATE = '30-NOV-2011';
Answer
Now to the actual answer. The prima facie answer is that the first query (for either set) will be faster, because it involves only one table access, rather than two. However, that may not be true.
It's possible that the second query will be faster. In the first, the database will need to do a full-table scan, then check each row for the disqualifying values. In the second case, it can do a full table scan without a filter to fulfill the first half off the query. For the second half, if there is an index on FROM_DATE and TO_DATE, it can use an index scan to get the disqualifying rows then perform a set operation to remove those results from the first set.
Whether this is actually faster or not will likely depend a lot on your data. As always, the best way to determine which will be faster for your application is to perform your own benchmarks.

1st one is better, since that involves only a single scan also that does not contains any 'in's or 'not in's. go for 1st first one...

I guess, 1st version will have better performance than 2nd version.
SELECT is happening twice in 2nd query.

The second one will definitely be slower. You're basically pulling two sets in the second one and doing a set difference. Only the smaller set can be pulled with an index (assuming you have indexes, and assuming doesn't do some magical optimization). The first query builds just one set and it is based on indexes.
Disclaimer: That's a simplified explanation, and I know nothing of the inner workings of Oracle, just how I would expect it to work.

Related

How to ignore 00 (two leading zeros) in Select query?

I am not sure whether it is possible or not, I have one DB table which is having fields refNumber, Some of this fields values contains two leading zeros, following is example.
id.
refNumber
10001
123
10002
00456
Now I am trying to write a query which can select from this table with our without leading zeros (Only two not less or greater than two).Here is an example, for select refNumber=123 OR refNumber=00123 should return result 10001 and for refNumber=00456 OR refNumber=456 should return result of 10002. I can not use like operator because in that case other records might also be return. Is it possible through the query? if not what would be the right way to select such records? I am avoiding looping the all rows in my application.

You need to apply TRIM function on both - column and the value you want to filter by:
SELECT * FROM MyTable
WHERE TRIM(LEADING '0' FROM refNumber) = TRIM(LEADING '0' FROM '00123') -- here you put your desired ref number

Use trim()
Select * from table where trim(refnumber) IN ('123','456')
Or replace() whichever supported
Select * from table where
replace(refnumber, '0','') IN
('123','456')

While the currently accepted answer would work, be aware that at best it would cause Db2 to do a full index scan and at worst could result in a full table scan.
Not a particularly efficient way to return 1 or 2 records out of perhaps millions. This happens anytime you use an expression over a table column in the WHERE clause.
If you know there's only ever going to be 5 digits or less , a better solution would be something that does the following:
SELECT * FROM MyTable
WHERE refNumber in ('00123','123')
That assumes you can build the two possibilities outside the query.
If you really want to have the query deal with the two possibilities..
SELECT * FROM MyTable
WHERE refNumber in (LPAD(:value,5,'0'),LTRIM(:value, '0'))
If '00123' or '123' is pass in as value, the above query would find records with '00123' or '123' in refNumber.
And assuming you have an index on refNumber, do so quickly and efficiently.
If there could be an unknown number of lead zeros, then you are stuck with
SELECT * FROM MyTable
WHERE LTRIM(refNumber,'0') = LTRIM(:value, '0')
However, if you platform/version of Db2 supports indexes over an expression you'd want to create one for efficiency's sake
create index myidx
on MyTable (LTRIM('0' from refNumber))

Will the partition be hit in an inner Union?

I have the followning SQL statement:
SELECT *
FROM (
SELECT eu_dupcheck AS dupcheck
, eu_date AS threshold
FROM WF_EU_EVENT_UNPROCESSED
WHERE eu_dupcheck IS NOT NULL
UNION
SELECT he_dupcheck AS dupcheck
, he_date AS threshold
FROM WF_HE_HISTORY_EVENT
WHERE he_dupcheck IS NOT NULL
)
WHERE threshold > sysdate - 30
The second table is partitioned by date but the first isn't. I need to know if the partition of the second table will be hit in this query, or will it do a full table scan?

I would be surprised if Oracle were smart enough to avoid a full table scan. Remember that UNION processes the data by removing duplicates. So, Oracle would have to recognize that:
The where clause is appropriate for the partitioning (this is actually easy).
That partitioning does not affect the duplicate removal (this is a bit harder, but true because the date is in the select).
Oracle has a smart optimizer, so perhaps it can recognize this situation (and it would probably avoid the full table scan for a UNION ALL). However, you are safer by moving the condition to the subqueries:
SELECT *
FROM ((SELECT eu_dupcheck AS dupcheck, eu_date AS threshold
FROM WF_EU_EVENT_UNPROCESSED
WHERE eu_dupcheck IS NOT NULL AND eu_date > sysdate - 30
) UNION
(SELECT he_dupcheck AS dupcheck, he_date AS threshold
FROM WF_HE_HISTORY_EVENT
WHERE he_dupcheck IS NOT NULL AND he_date > sysdate - 30
)
) eh;

Query using Rownum and order by clause does not use the index

I am using Oracle (Enterprise Edition 10g) and I have a query like this:
SELECT * FROM (
SELECT * FROM MyTable
ORDER BY MyColumn
) WHERE rownum <= 10;
MyColumn is indexed, however, Oracle is for some reason doing a full table scan before it cuts the first 10 rows. So for a table with 4 million records the above takes around 15 seconds.
Now consider this equivalent query:
SELECT MyTable.*
FROM
(SELECT rid
FROM
(SELECT rowid as rid
FROM MyTable
ORDER BY MyColumn
)
WHERE rownum <= 10
)
INNER JOIN MyTable
ON MyTable.rowid = rid
ORDER BY MyColumn;
Here Oracle scans the index and finds the top 10 rowids, and then uses nested loops to find the 10 records by rowid. This takes less than a second for a 4 million table.
My first question is why is the optimizer taking such an apparently bad decision for the first query above?
An my second and most important question is: is it possible to make the first query perform better. I have a specific need to use the first query as unmodified as possible. I am looking for something simpler than my second query above. Thank you!
Please note that for particular reasons I am unable to use the /*+ FIRST_ROWS(n) */ hint, or the ROW_NUMBER() OVER (ORDER BY column) construct.

If this is acceptable in your case, adding a WHERE ... IS NOT NULL clause will help the optimizer to use the index instead of doing a full table scan when using an ORDER BY clause:
SELECT * FROM (
SELECT * FROM MyTable
WHERE MyColumn IS NOT NULL
-- ^^^^^^^^^^^^^^^^^^^^
ORDER BY MyColumn
) WHERE rownum <= 10;
The rational is Oracle does not store NULL values in the index. As your query was originally written, the optimizer took the decision of doing a full table scan, as if there was less than 10 non-NULL values, it should retrieve some "NULL rows" to "fill in" the remaining rows. Apparently it is not smart enough to check first if the index contains enough rows...
With the added WHERE MyColumn IS NOT NULL, you inform the optimizer that you don't want in any circumstances any row having NULL in MyColumn. So it can blindly use the index without worrying about hypothetical rows having NULL in MyColumn.
For the same reason, declaring the ORDER BY column as NOT NULL should prevent the optimizer to do a full table scan. So, if you can change the schema, a cleaner option would be:
ALTER TABLE MyTable MODIFY (MyColumn NOT NULL);
See http://sqlfiddle.com/#!4/e3616/1 for various comparisons (click on view execution plan)

Assistance with SQL statement

I'm using sql-server 2005 and ASP.NET with C#.
I have Users table with
userId(int),
userGender(tinyint),
userAge(tinyint),
userCity(tinyint)
(simplified version of course)
I need to select always two fit to userID I pass to query users of opposite gender, in age range of -5 to +10 years and from the same city.
Important fact is it always must be two, so I created condition if ##rowcount<2 re-select without age and city filters.
Now the problem is that I sometimes have two returned result sets because I use first ##rowcount on a table. If I run the query.
Will it be a problem to use the DataReader object to read from always second result set? Is there any other way to check how many results were selected without performing select with results?

Can you simplify it by using SELECT TOP 2 ?
Update: I would perform both selects all the time, union the results, and then select from them based on an order (using SELECT TOP 2) as the union may have added more than two. Its important that this next select selects the rows in order of importance, ie it prefers rows from your first select.
Alternatively, have the reader logic read the next result-set if there is one and leave the SQL alone.

To avoid getting two separate result sets you can do your first SELECT into a table variable and then do your ##ROWCOUNT check. If >= 2 then just select from the table variable on its own otherwise select the results of the table variable UNION ALLed with the results of the second query.
Edit: There is a slight overhead to using table variables so you'd need to balance whether this was cheaper than Adam's suggestion just to perform the 'UNION' as a matter of routine by looking at the execution stats for both approaches
SET STATISTICS IO ON

Would something along the following lines be of use...
SELECT *
FROM (SELECT 1 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender AND
M1.userAge - 5 >= M2.userAge AND
M1.userAge + 15 <= M2.userAge AND
M1.userCity = M2.userCity
LIMIT TO 2 ROWS
UNION
SELECT 2 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender
LIMIT TO 2 ROWS)
ORDER BY prio
LIMIT TO 2 ROWS;
I haven't tried it as I have no SQL Server and there may be dialect issues.

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.

SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)

There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1

I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).

you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Minus or not equal to? Which is better? - sql

1st one is better, since that involves only a single scan also that does not contains any 'in's or 'not in's. go for 1st first one...

I guess, 1st version will have better performance than 2nd version. SELECT is happening twice in 2nd query.

Related

How to ignore 00 (two leading zeros) in Select query?

Will the partition be hit in an inner Union?

Query using Rownum and order by clause does not use the index

Assistance with SQL statement

Aggregate functions in WHERE clause in SQLite

Categories

Resources