Select nearest but not greater-than row - sql

Given a table action(start:DATE, length:NUMBER, type:NUMBER), with all records unique, I need to select (e.g.) length of last action with type Y before time X:
select action.length
where action.type = Y
and action.start is the biggest, but not greater than X
Proposed solution (improved):
with actionView as (select * from action where type = Y and time <= X)
select length
from actionView
where time = (select max(time) from actionView)
But this still envolves two selects.
What I would like to ask is it possible to perform some analytical or hierarchical or any other oracle magic to this query, to improve it?
(Probably, something like this algo is what I need, but I don't know how to express it in SQL:
savedAction.time = MinimalTime
foreach action in actions
if action.type = y and savedAction.time < action.time <= X
savedAction = action
return savedAction;
)

Oracle has no LIMIT (PostgreSQL, MySQL) or TOP (SQL Server) clause like other RDBMS. But you can use ROWNUM for that:
SELECT *
FROM (
SELECT length
FROM action
WHERE type = Y
AND start < X
ORDER BY start DESC
)
WHERE rownum = 1;
This way, the table will be queried once only.
The details in the manual.
In reply to Dems comment I quote from the link above:
If you embed the ORDER BY clause in a subquery and place the ROWNUM
condition in the top-level query, then you can force the ROWNUM
condition to be applied after the ordering of the rows.

You can use ROW_NUMBER() to evaluate this in a single scan...
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY x ORDER BY start DESC) AS sequence_id,
*
FROM
action
WHERE
type = Y
AND start < Z
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
You don't need the PARTITION BY, but it is used where you're getting the 'max' row per goup (such as per person, or item, in your database).

I don't know about any magic, but:
with (select length, time from action where type = Y and time <= X) as typed_action
select length from typed_action
where time = (select max(time) from typed_action)
Will give you less "where" clauses to execute and a (much?) smaller temporary typed_action table.

Related

How to speed up sql query execution?

The task is to execute the sql query:
select * from x where user in (select user from x where id = '1')
The subquery contains about 1000 id so it takes a long time.
Maybe this question was already there, but how can I speed it up? (if it is possible to speed up please write for PL SQL and T-SQL or at least one of them).
I would start by rewriting the in condition to exists:
select *
from x
where exists (select 1 from x x1 where x.user = x.user and x1.id = 1)
Then, consider an index on x(user, id) - or x(id, user) (you can try both and see if one offers better improvement that the other).
Another possibility is to use window functions:
select *
from (
select x.*, max(case when id = 1 then 1 else 0 end) over(partition by user) flag
from x
) x
where flag = 1
This might, or might not, perform better than the not exists solution, depending on various factors.
ids are usually unique. Is it sufficient to do this?
select x.*
from x
where id in ( . . . );
You would want an index on id, if it is not already the primary key of the table.

Is there a better way to retrieve a random row from an Oracle table?

Not so long ago I needed to fetch a random row from a table in an Oracle database. The most widespread solution that I've found was this:
SELECT * FROM
( SELECT * FROM tabela WHERE warunek
ORDER BY dbms_random.value )
WHERE rownum = 1​
However, this is very performance heavy for large tables, as it sorts the table in random order first, then grabs the first row.
Today, one of my collegues suggested a different way:
SELECT * FROM (
SELECT * FROM MAIN_PRODUCT
WHERE ROWNUM <= CAST((SELECT COUNT(*) FROM MAIN_PRODUCT)*dbms_random.value AS INTEGER)
ORDER BY ROWNUM DESC
) WHERE ROWNUM = 1;
It works way faster and seems to return random values, but does it really? Could someone give an insight into whether it is really random and behaves the way as expected? I'm really curious why I haven't found this approach anywhere else while looking, and if it is indeed random and way better performance wise, why isn't it more widespread?
This is the (possibly) the most simple query possible to get the results.
But the SELECT COUNT(*) FROM MAIN_PRODUCT will table scan i doubt you can get a query which does not do that.
P.s This query assumes not deleted records.
Query
SELECT *
FROM
MAIN_PRODUCT
WHERE
ROWNUM = FLOOR(
(dbms_random.value * (SELECT COUNT(*) FROM MAIN_PRODUCT)) + 1
)
FLOOR(
(dbms_random.value * (SELECT COUNT(*) FROM MAIN_PRODUCT)) + 1
)
Will generate a number between between 1 and the max count of the table see demo how that works when you refresh it.
Oracle12c+ Query
SELECT *
FROM
MAIN_PRODUCT
WHERE
ROWNUM <= FLOOR(
(dbms_random.value * (SELECT COUNT(*) FROM MAIN_PRODUCT)) + 1
)
ORDER BY
ROWNUM DESC
FETCH FIRST ROW ONLY
The second code you have
SELECT * FROM (
SELECT * FROM MAIN_PRODUCT
WHERE ROWNUM <= CAST((SELECT COUNT(*) FROM MAIN_PRODUCT)*dbms_random.value AS INTEGER)
ORDER BY ROWNUM DESC
) WHERE ROWNUM = 1;
is excellent, except that it will get subsequent elements. dbms_random.value is returning a real number between 0 and 1. Multiplying this with the number of rows will provide you a really random number and the bottleneck here is counting the number of rows rather then generating a random value for each row.
Proof
Consider the
0 <= x < 1
number. If we multiply it with n, we get
0 <= n * x < n
which is exactly what you need if you want to load a single element. The reason this is not widespread is that in many cases the performance issues are not felt due to only a few thousands of records.
EDIT
If you would need k number of records, not just the first one, then it would be slightly difficult, however, still solvable. The algorithm would be something like this (I do not have Oracle installed to test it, so I only describe the algorithm):
randomize(n, k)
randomized <- empty_set
while (k > 0) do
newValue <- random(n)
n <- n - 1
k <- k - 1
//find out how many elements are lower than newValue
//increase newValue with that amount
//find out if newValue became larger than some values which were larger than new value
//increase newValue with that amount
//repeat until there is no need to increase newValue
while end
randomize end
If you randomize k elements from n, then you will be able to use those values in your filter.
The key to improving performance is to lessen the load of the ORDER BY.
If you know about how many rows match the conditions, then you can filter before the sort. For instance, the following takes about 1% of the rows:
SELECT *
FROM (SELECT *
FROM tabela
WHERE warunek AND dbms_random.value < 0.01
ORDER BY dbms_random.value
)
WHERE rownum = 1​ ;
A variation is to calculate the number of matching values. Then randomly select a smaller sample. The following gets about 100 matching rows and then sorts them for the random selection:
SELECT a.*
FROM (SELECT *
FROM (SELECT a.*, COUNT(*) OVER () as cnt
FROM tabela a
WHERE warunek
) a
WHERE dbms_random.value < 100 / cnt
ORDER BY dbms_random.value
) a
WHERE rownum = 1​ ;

MariaDB - GROUP BY with an order

So I have a dataset, where I would like to order it based on strings ORDER BY FIELD(field_name, ...) after the order I wan't it to group the dataset based on another column.
I have tried with a subquery, but it seems like it ignores by ORDER BY when it gets subqueried.
This is the query I would like to group with GROUP BY setting_id
SELECT *
FROM `setting_values`
WHERE ((`owned_by_type` = 'App\\Models\\Utecca\\User' AND `owned_by_id` = 1 OR ((`owned_by_type` = 'App\\Models\\Utecca\\Agreement' AND `owned_by_id` = 1006))) OR (`owned_by_type` = 'App\\Models\\Utecca\\Employee' AND `owned_by_id` = 1)) AND `setting_values`.`deleted_at` IS NULL
ORDER BY FIELD(owned_by_type, 'App\\Models\\Utecca\\Employee', 'App\\Models\\Utecca\\Agreement', 'App\\Models\\Utecca\\User')
The order by works just fine, but I cannot get it to group it based on my order, it always selects the one with lowest primary key (id).
Here is my attempt which did not work.
SELECT * FROM (
SELECT *
FROM `setting_values`
WHERE ((`owned_by_type` = 'App\\Models\\Utecca\\User' AND `owned_by_id` = 1 OR ((`owned_by_type` = 'App\\Models\\Utecca\\Agreement' AND `owned_by_id` = 1006))) OR (`owned_by_type` = 'App\\Models\\Utecca\\Employee' AND `owned_by_id` = 1)) AND `setting_values`.`deleted_at` IS NULL
ORDER BY FIELD(owned_by_type, 'App\\Models\\Utecca\\Employee', 'App\\Models\\Utecca\\Agreement', 'App\\Models\\Utecca\\User')
) AS t
GROUP BY setting_id;
Here is some sample data
What I am trying to accomplish with this sample data is 1 row with the id 3 as the row.
The desired result set from the query should obey these rules
1 row for each setting_id
owned_by_type together with owned_by_id is filtered the following way agreement = 1006, user = 1, employee = 1.
When limiting the 1 row for each setting_idit should be done with the following priority in owned_by_type column Employee, Agreement, User
Here is a SQLFiddle with it.
Running MariaDB version 10.2.6-MariaDB
First of all, the Optimizer is free to ignore the inner ORDER BY. So, please describe further what your intent is.
Getting past that, you can use a subquery:
SELECT ...
FROM ( SELECT
...
GROUP BY ...
ORDER BY ... -- This is lost unless followed by :
LIMIT 9999999999 -- something valid; or very high (for all)
) AS x
GROUP BY ...
Perhaps you are doing groupwise max ??

Returning the lowest integer not in a list in SQL

Supposed you have a table T(A) with only positive integers allowed, like:
1,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18
In the above example, the result is 10. We always can use ORDER BY and DISTINCT to sort and remove duplicates. However, to find the lowest integer not in the list, I came up with the following SQL query:
select list.x + 1
from (select x from (select distinct a as x from T order by a)) as list, T
where list.x + 1 not in T limit 1;
My idea is start a counter and 1, check if that counter is in list: if it is, return it, otherwise increment and look again. However, I have to start that counter as 1, and then increment. That query works most of the cases, by there are some corner cases like in 1. How can I accomplish that in SQL or should I go about a completely different direction to solve this problem?
Because SQL works on sets, the intermediate SELECT DISTINCT a AS x FROM t ORDER BY a is redundant.
The basic technique of looking for a gap in a column of integers is to find where the current entry plus 1 does not exist. This requires a self-join of some sort.
Your query is not far off, but I think it can be simplified to:
SELECT MIN(a) + 1
FROM t
WHERE a + 1 NOT IN (SELECT a FROM t)
The NOT IN acts as a sort of self-join. This won't produce anything from an empty table, but should be OK otherwise.
SQL Fiddle
select min(y.a) as a
from
t x
right join
(
select a + 1 as a from t
union
select 1
) y on y.a = x.a
where x.a is null
It will work even in an empty table
SELECT min(t.a) - 1
FROM t
LEFT JOIN t t1 ON t1.a = t.a - 1
WHERE t1.a IS NULL
AND t.a > 1; -- exclude 0
This finds the smallest number greater than 1, where the next-smaller number is not in the same table. That missing number is returned.
This works even for a missing 1. There are multiple answers checking in the opposite direction. All of them would fail with a missing 1.
SQL Fiddle.
You can do the following, although you may also want to define a range - in which case you might need a couple of UNIONs
SELECT x.id+1
FROM my_table x
LEFT
JOIN my_table y
ON x.id+1 = y.id
WHERE y.id IS NULL
ORDER
BY x.id LIMIT 1;
You can always create a table with all of the numbers from 1 to X and then join that table with the table you are comparing. Then just find the TOP value in your SELECT statement that isn't present in the table you are comparing
SELECT TOP 1 table_with_all_numbers.number, table_with_missing_numbers.number
FROM table_with_all_numbers
LEFT JOIN table_with_missing_numbers
ON table_with_missing_numbers.number = table_with_all_numbers.number
WHERE table_with_missing_numbers.number IS NULL
ORDER BY table_with_all_numbers.number ASC;
In SQLite 3.8.3 or later, you can use a recursive common table expression to create a counter.
Here, we stop counting when we find a value not in the table:
WITH RECURSIVE counter(c) AS (
SELECT 1
UNION ALL
SELECT c + 1 FROM counter WHERE c IN t)
SELECT max(c) FROM counter;
(This works for an empty table or a missing 1.)
This query ranks (starting from rank 1) each distinct number in ascending order and selects the lowest rank that's less than its number. If no rank is lower than its number (i.e. there are no gaps in the table) the query returns the max number + 1.
select coalesce(min(number),1) from (
select min(cnt) number
from (
select
number,
(select count(*) from (select distinct number from numbers) b where b.number <= a.number) as cnt
from (select distinct number from numbers) a
) t1 where number > cnt
union
select max(number) + 1 number from numbers
) t1
http://sqlfiddle.com/#!7/720cc/3
Just another method, using EXCEPT this time:
SELECT a + 1 AS missing FROM T
EXCEPT
SELECT a FROM T
ORDER BY missing
LIMIT 1;

T-SQL to find length of time a particular value is in-range

I have a table in SQL Server where, for each row r at time t, I would like to find the first t + i for some function of r where abs(f(r, t + i) - f(r, t)) > epsilon.
I can imagine doing this with two cursors, but this seems highly inefficient.
Any of the T-SQL gurus out there have any advice?
select a.t, b.t --- and what other columns you need
from tbl a -- implicitly each row of table
cross apply (
select top(1) * -- the first, going upwards along b.t
from tbl b
where a.t < b.t -- look for records with later t than the source row
and <here's your function between a row and b row>
order by b.t asc
) x
I'm not a big fan of correlated subqueries. But, it seems useful in this case. The following code returns the minimum "sequence number" of the first row after the given row subject to your condition:
with t as (
select t.*, f(r, t) as fval, row_number() over (order by <ordering>) as seqnum
from table t
)
select t.*,
(select min(t2.seqnum)
from t t2
where t2.seqnum > t.seqnum and
abs(t2.fval - t.fval) > <epsilon>
) as next_seqnum
from t
To make this work, you need to specify <ordering> and <epsilon>. is how you know the order of the rows (t would be a good guess, if I had to guess).