Query to compare 3 different data columns with a reference value - sql

I need to check whether a table (in an Oracle DB) contains entries that were updated after a certain date. "Updated" in this case means any of 3 columns (DateCreated, DateModified, DateDeleted) have a value greater than the reference.
The query I have come up so far is this
select * from myTable
where DateCreated > :reference_date
or DateModified > :reference_date
or DateDeleted > :reference_date
;
This works and gives desired results, but is not what I want, because I would like to enter the value for :reference_date only once.
Any ideas on how I could write a more elegant query ?

While what you have looks fine and only uses one bind variable, if for some reason you have positional rather than named binds then you could avoid the need to supply the bind value multiple time by using an inline view or a CTE:
with cte as (select :reference_date as reference_date from dual)
select myTable.*
from cte
join myTable
on myTable.DateCreated > cte.reference_date
or myTable.DateModified > cte.reference_date
or myTable.DateDeleted > cte.reference_date
;
But again I wouldn't consider that better than your original unless you have a really compelling reason and a problem supplying the bind value. Having to set it three times from a calling program probably wouldn't count as compelling, for example, for me anyway. And I'd check it didn't affect performance before deploying - I'd expect Oracle to optimise something like this but the execution plan might be interesting.

I suppose you could rewrite that as:
select * from myTable
where greatest(DateCreated, DateModified, DateDeleted) > :reference_date;
if you absolutely had to, but I wouldn't. Your original query is, IMHO, much easier to understand than this one, plus by using a function, you've lost any chance of using an index, should one exist (unless you have a function based index based on the new clause).

Related

Postgres all subqueries in coalesce executed

COALESCE in Postgres is a function that returns the first parameter not null.
So I used coalesce in subqueries like:
SELECT COALESCE (
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...)
);
I change the where in any query and they contain lots of params and CASE, also different ORDER BY clauses.
This is because I always want to return something but giving priorities.
What I noticed while issuing EXPLAIN ANALYZE is that any query is executed despite the first one actually returns NOT a null value.
I would expect the engine to run only the first one query and not the following ones if it returns not null.
This way I could have a bad performance.
So am I doing any bad practice and is it better to run the queries separately for performance reason?
EDIT:
Sorry you where right I don’t select * but I select only one column. I didn’t post my code because I am not interested in my query but it’s a generic question to understand how the engine is working. So I reproduce a very simple fiddle here http://sqlfiddle.com/#!17/a8aa7/4
I may be wrong but I think it behaves as I was telling: it runs all the subqueries despite the first one already returns a not null value
EDIT 2: ok I read only now it says never executed. So the other two queries aren’t getting executed. What confused me was the fact they were included in the query plan.
Anyways it’s still important for my question. Is it better to run all the queries separately for performance reasons? Because it seems like that even if the first one returns a not null value the other two subqueries can slow down the performance
For separate SELECT queries, I suggest to use UNION ALL / LIMIT 1 instead. Based on your fiddle:
(select user_id from users order by age limit 1) -- misleading example, see below
UNION ALL
(select user_id from users where user_id=1)
UNION ALL
(select user_id from users order by user_id DESC limit 1)
LIMIT 1;
db<>fiddle here
For three reasons:
Works for any SELECT list: single expressions (your fiddle), multiple or whole row (your example in the question).
You can distinguish actual NULL values from "no row". Since user_id is the PK in the example (and hence, NOT NULL), the problem cannot surface in the example. But with an expression that can be NULL, COALESCE cannot distinguish between both, "no row" is coerced to NULL for the purpose of the query. See:
Return a value if no record is found
Faster.
Aside, your first SELECT in the example makes this a wild-goose chase. It returns a row if there is at least one. The rest is noise in this case.
Related:
PostgreSQL combine multiple select statements
SQL - does order of OR conditions matter?
Way to try multiple SELECTs till a result is available?

SQL - delete rows where only one column changes

I have a large table in SQL, in which an effective_from date column should update every time one of the other columns changes. However, for some reason, there are numerous rows in which the effective_from date changes, but no other values have changed. For example:
CODE NAME EFFECTIVE_FROM
CCWA Oak 1999
CCWA Willow 2001
CCWA Willow 2004
How can I delete the rows where the change in effective_from date doesn't provide any info. e.g. the third row in the above table.
The tables are very large, so I would prefer to use SELECT statements rather than DELETE or ALTER which seem to be slow.
Any help much appreciated!
I believe you are looking for:
SELECT Code, Name, MAX(EFFECTIVE_FROM)
FROM myTable
GROUP BY Code, Name
Since it is the later date that adds no information, you want to select the minimum date value.
SELECT Code, Name, MIN(EFFECTIVE_FROM)
FROM CodeTable
GROUP BY Code, Name
try this:
SELECT code, name, max(EFFECTIVE_FROM)
FROM tablename
GROUP BY code, name
You want to use lag(). The result set without duplicates:
select t.*
from (select t.*,
lag(code) over (order by effective_from) as prev_code,
lag(name) over (order by effective_from) as prev_name
from t
) t
where (prev_code <> code or prev_code is null) and
(prev_name <> name or prev_name is null);
This assumes that code and name are never NULL. That is easy to incorporate in the logic (but it makes the where clause a bit complicated).
Your question doesn not clearify the real result you want to achieve: if you want to permanently delete elements from the table, you need to use a DELETE, if your target is simply to filter out the duplicates you described, you can use a SELECT (and the elements will remain in the table).
The fact that you consider to use a DELETE make me suppose that this "duplicates" (except for the date) are not desirable.
In this case you can also consider to add a trigger that prevent the insertion when the informative fields (all fields except EFFECTIVE_FROM) aren't changed, in this way only interesting data changes will generate a new row.
Then you can execute a one-shot operation which delete all the duplicated elements that does not reflect any data change (operation to do by night, or however when the system has a low load or no one is using it, if the table is really very large as you typed).
This kind of solution changes the nature of this table, in fact you lose the historical information of updates without real data changes. Consider this solution only if these informations aren't necessary for your target.

Using Order By in IN condition

Let's say we have this silly query:
Select * From Emp Where Id In (Select Id From Emp)
Let's do a small modification inside IN condition by adding an Order By clause.
Select * From Mail Where Id In (Select Id From Mail Order By Id)
Then we are getting the following error:
ORA-00907: missing right parenthesis
Oracle assumes that IN condition will end after declaring the From table. As a result waits for right parenthesis, but we give an Order By instead.
Why can't we add an Order By inside IN condition?
FYI: I don't ask for a equivalent query. This is an example after all. I just can't understand why this error occurs.
Consider the condition x in (select something from somewhere). It returns true if x is equal to any of the somethings returned from the query - regardless of whether it's the first, second, last, or anything in the middle. The order that the somethings are returned is inconsequential. Adding an order by clause to a query often comes with a hefty performance hit, so I'm guessing this this limitation was introduced to prevent adding a clause that has no impact on the query's correctness on the one hand and may be quite expensive on the other.
It would not make sense to sort the values inside the IN clause. Think in this way:
IN (a, b, c)
is same as
IN (c, b, a)
IS SAME AS
IN (b, c, a)
Internally Oracle applies OR condition, so it is equivalent to:
WHERE id = a OR id = b OR id = c
Would it make any sense to order the conditions?
Ordering comes at it's own expense. So, when there is no need to sort, just don't do it. And in this case, Oracle applied the same rule.
When it comes to performance of the query, optimizer needs choose the best possible execution plan i.e. with the least cost to achieve the desired output. ORDER BY is useless in this case, and Oracle did a good job to prevent using it at all.
For the documentation,
Type of Condition Operation
----------------- ----------
IN Equal-to-any-member-of test. Equivalent to =ANY.
So, when you need to test ANY value for membership in a list of values, there is no need of ordered list, just a random matching does the job.
If you look at Oracle SQL reference (syntax diagrams) you will find a reason. ORDER BY is part of "select" statement, while IN clause uses lover level "subquery" statement. Your problem relates to nature of the Oracle's SQL grammar.
PS: there might be more gotchas like multiple UNION, MINUS on "subqueries" and then also you can use ONLY one ORDER BY clause, as this applies only onto result of UNION operation.
This will fail too:
select * from dual order by 1
union all
select * from dual order by 1;

SQL Query: Which one should i use? count("columnname") or count(1)

In my SQL query I just need to check whether data exists for a particular userid.
I always only want one row that will be returned when data exist.
I have two options
1. select count(columnname) from table where userid=:userid
2. select count(1) from tablename where userid=:userid
I am thinking second one is the one I should use because it may have a better response time as compared with first one.
There can be differences between count(*) and count(column). count(*) is often fastest for reasons discussed here. Basically, with count(column) the database has to check if column is null or not in each row. With count(column) it just returns the total number of rows in the table which is probably has on hand. The exact details may depend on the database and the version of the database.
Short answer: use count(*) or count(1). Hell, forget the count and select userid.
You should also make sure the where clause is performing well and that its using an index. Look into EXPLAIN.
I'd like to point out that this:
select count(*) from tablename where userid=:userid
has the same effect as your second solution, with th advantage that count(*) it unambigously means "count all rows".
The * in COUNT(*) will not expand into all columns - that is to say, the * in SELECT COUNT(*) is not the same as in SELECT *. So you need not worry about performance when writing COUNT(*)
The disadvantage of writing COUNT(1) is that it is less clear: what did you mean? A literal one (1) may look like a lower case L (this: l) in some fonts.
Will give different results if columnname can be NULL, otherwise identical performance.
The optimiser (SQL Server at least) realises COUNT(1) is trivial. You can also use COUNT(1/0)
It depends what you want to do.
The first one counts rows with non-null values of columnname. The second one counts ALL rows.
Which behaviour do you want? From the way your question is worded, I guess that you want the second one.
To count the number of records you should use the second option, or rather:
select count(*) from tablename where userid=:userid
You could also use the exists() function:
select case when exists(select * from tablename where userid=:userid) then 1 else 0 end
It might be possible for the database to do the latter more efficiently in some cases, as it can stop looking as soon as a match is found instead of comparing all records.
Hey how about Select count(userid) from tablename where userid=:userid ? That way the query looks more friendly.

SQL Server UNION - What is the default ORDER BY Behaviour

If I have a few UNION Statements as a contrived example:
SELECT * FROM xxx WHERE z = 1
UNION
SELECT * FROM xxx WHERE z = 2
UNION
SELECT * FROM xxx WHERE z = 3
What is the default order by behaviour?
The test data I'm seeing essentially does not return the data in the order that is specified above. I.e. the data is ordered, but I wanted to know what are the rules of precedence on this.
Another thing is that in this case xxx is a View. The view joins 3 different tables together to return the results I want.
There is no default order.
Without an Order By clause the order returned is undefined. That means SQL Server can bring them back in any order it likes.
EDIT:
Based on what I have seen, without an Order By, the order that the results come back in depends on the query plan. So if there is an index that it is using, the result may come back in that order but again there is no guarantee.
In regards to adding an ORDER BY clause:
This is probably elementary to most here but I thought I add this.
Sometimes you don't want the results mixed, so you want the first query's results then the second and so on. To do that I just add a dummy first column and order by that. Because of possible issues with forgetting to alias a column in unions, I usually use ordinals in the order by clause, not column names.
For example:
SELECT 1, * FROM xxx WHERE z = 'abc'
UNION ALL
SELECT 2, * FROM xxx WHERE z = 'def'
UNION ALL
SELECT 3, * FROM xxx WHERE z = 'ghi'
ORDER BY 1
The dummy ordinal column is also useful for times when I'm going to run two queries and I know only one is going to return any results. Then I can just check the ordinal of the returned results. This saves me from having to do multiple database calls and most empty resultset checking.
Just found the actual answer.
Because UNION removes duplicates it does a DISTINCT SORT. This is done before all the UNION statements are concatenated (check out the execution plan).
To stop a sort, do a UNION ALL and this will also not remove duplicates.
If you care what order the records are returned, you MUST use an order by.
If you leave it out, it may appear organized (based on the indexes chosen by the query plan), but the results you see today may NOT be the results you expect, and it could even change when the same query is run tomorrow.
Edit: Some good, specific examples: (all examples are MS SQL server)
Dave Pinal's blog describes how two very similar queries can show a different apparent order, because different indexes are used:
SELECT ContactID FROM Person.Contact
SELECT * FROM Person.Contact
Conor Cunningham shows how the apparent order can change when the table gets larger (if the query optimizer decides to use a parallel execution plan).
Hugo Kornelis proves that the apparent order is not always based on primary key. Here is his follow-up post with explanation.
A UNION can be deceptive with respect to result set ordering because a database will sometimes use a sort method to provide the DISTINCT that is implicit in UNION , which makes it look like the rows are deliberately ordered -- this doesn't apply to UNION ALL for which there is no implicit distinct, of course.
However there are algorithms for the implicit distinct, such as Oracle's hash method in 10g+, for which no ordering will be applied.
As DJ says, always use an ORDER BY
It's very common to come across poorly written code that assumes table data is returned in insert order, and 95% of the time the coder gets away with it and is never aware that this is a problem as on many common databases (MSSQL, Oracle, MySQL). It is of course a complete fallacy and should always be corrected when it's come across, and always, without exception, use an Order By clause yourself.