Optimizing a simple SQLite query, if possible ! - sql

I would like to optimize this query using SQLite 3.
SELECT id FROM Table WHERE value = (SELECT max(value) FROM Table WHERE value < myvalue )
UNION
SELECT id FROM Table WHERE value = (SELECT min(value) FROM Table WHERE value > myvalue );
I want the 2 closest id from a given value. Example: id 20, value 50. The closest id could be 3 with the value 48 (max value inferior) and above id 4 with value 55 (min value superior).
SQLite 3 has not all the features of a real database, if you have something better I can use, well thanks !

SELECT
(SELECT id FROM test WHERE value < myvalue ORDER BY value DESC LIMIT 1) as below,
(SELECT id FROM test WHERE value > myvalue ORDER BY value ASC LIMIT 1) as above;
Theorically speaking this should be faster becase it use two table scans intead of four.
Anyway i would create a table with a few millon records and test different queries with
the timer on. (.timer ON in sqlite console).
Also make sure to test with and without an index on value. Sometimes, specially
when the index size if bigger than your memory, indexes are useless.
If speed is the real issue consider an alternative light storage, like Kyoto
Cabinet.

Here's another way to do it. I don't know if it's faster in sqlite though. You can always try.
select id
from table
where value - myvalue > 0
order by abs(value - myvalue) asc
limit 1
union all
select id
from table
where value - myvalue < 0
order by abs(value - myvalue) desc
limit 1

SELECT id FROM Table WHERE value > myvalue ORDER BY value LIMIT 1
SELECT id FROM Table WHERE value < myvalue ORDER BY value DESC LIMIT 1
this solution has no sub-selects, table scans and no extraneous group or math functions.
but needs two queries
you should index Table.value

Related

SQL: Give up/return different result if too many rows

Short version, I have a SQL statement where I only want the results if the number of rows returned is less than some value (say 1000) and otherwise I want a different result set. What's the best way to do this without incurring the overhead of returning the 1000 rows (as would happen if I used limit) when I'm just going to throw them away?
For instance, I want to return the results of
SELECT *
FROM T
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name ASC
provided there are at most 1000 entries but if there are more than that I want to return
SELECT *
FROM T
ORDER BY name ASC
LIMIT 25
Two queries isn't bad, but I definitely don't want to get 1000 records back from the first query only to toss them.
(Happy to use Postgres extensions too but prefer SQL)
--
To explain I'm refreshing data requested by client in batches and sometimes the client needs to know if there have been any changes in the part they've already received. If there are too many changes, however, I'm just giving up and starting to send the records from the start again.
WITH max1000 AS (
SELECT the_row, count(*) OVER () AS total
FROM (
SELECT the_row -- named row type
FROM T AS the_row
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name
LIMIT 1001
) sub
)
SELECT (the_row).* -- parentheses required
FROM max1000 m
WHERE total < 1001
UNION ALL
( -- parentheses required
SELECT *
FROM T
WHERE (SELECT total > 1000 FROM max1000 LIMIT 1)
ORDER BY name
LIMIT 25
)
The subquery sub in CTE max1000 gets the complete, sorted result for the first query - wrapped as row type, and with LIMIT 1001 to avoid excess work.
The outer SELECT adds the total row count. See:
Run a query with a LIMIT/OFFSET and also get the total number of rows
The first SELECT of the outer UNION query returns decomposed rows as result - if there are less than 1001 of them.
The second SELECT of the outer UNION query returns the alternate result - if there were more than 1000. Parentheses are required - see:
Combining 3 SELECT statements to output 1 table
Or:
WITH max1000 AS (
SELECT *
FROM T
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name
LIMIT 1001
)
, ct(ok) AS (SELECT count(*) < 1001 FROM max1000)
SELECT *
FROM max1000 m
WHERE (SELECT ok FROM ct)
UNION ALL
( -- parentheses required
SELECT *
FROM T
WHERE (SELECT NOT ok FROM ct)
ORDER BY name
LIMIT 25
);
I think I like the 2nd better. Not sure which is faster.
Either optimizes performance for less than 1001 rows in most calls. If that's the exception, I would first run a somewhat cheaper count. Also depends a lot on available indexes ...
You get no row if the first query finds no row. (Seems like an odd result.)

Fetch No oF Rows that can be returned by select query

I'm trying to fetch data and showing in a table with pagination. so I use limit and offset for that but I also need to show no of rows that can be fetched from that query. Is there any way to get that.
I tried
resultset.last() and getRow()
select count(*) from(query) myNewTable;
These two cases i'm getting correct answer but is it correct way to do this. Performance is a concern
We can get the limited records using below code,
First, we need to set how many records we want like below,
var limit = 10;
After that sent this limit to the below statement
WITH
Temp AS(
SELECT
ROW_NUMBER() OVER( primayKey DESC ) AS RowNumber,
*
FROM
myNewTable
),
Temp2 AS(
SELECT COUNT(*) AS TotalCount FROM Temp
)
SELECT TOP limit * FROM Temp, Temp2 WHERE RowNumber > :offset order by RowNumber
This is run in both MSSQL and MySQL
There is no easy way of doing this.
1. As you found out, it usually boils down to executing 2 queries:
Executing SELECT with limit and offset in order to fetch the data that you need.
Executing a COUNT(*) in order to count the total number of pages.
This approach might work for tables that don't have a lot of rows, or when you filter the data (int the COUNT and SELECT queries) on a column that is indexed.
2. If your table is large, but the data that you need to show represents smaller percentage of the data from the table and the data shares a common trait (for example, the data in all of your pages is created on a single day) you can use partitioning. Executing COUNT and SELECT on a single partition will be way more faster than executing them on the whole table.
3. You can create another table which will store the value of the COUNT query.
For example, lets say that your big_table table looks like this:
id | user_id | timestamp_column | text_column | another_text_column
Now, your SELECT query looks like this:
SELECT * FROM big_table WHERE user_id = 4 ORDER BY timestamp_column LIMIT 20 OFFSET 20;
And your count query:
SELECT COUNT(*) FROM table WHERE user_id = 4;
You could create a count_table that will have the following format:
user_id | count
Once you fill this table with the current data in the system, you will create a trigger which will update this table on every insert or update of the big_table.
This way, the count query will be really fast, because it will be executed on the count_table, for example:
SELECT count FROM count_table WHERE user_id = 4
The drawback of this approach is that the insert in the big_table will be slower, since the trigger will fire and update the count_table on every insert.
This are the approaches that you can try but in the end it all depends on the size and type of your data.

How to skip the first n rows in sql query

I want to fire a Query "SELECT * FROM TABLE" but select only from row N+1. Any idea on how to do this?
For SQL Server 2012 and above, use this:
SELECT *
FROM Sales.SalesOrderHeader
ORDER BY OrderDate
OFFSET (#Skip) ROWS FETCH NEXT (#Take) ROWS ONLY
https://stackoverflow.com/a/19669165/1883345
SQL Server:
select * from table
except
select top N * from table
Oracle up to 11.2:
select * from table
minus
select * from table where rownum <= N
with TableWithNum as (
select t.*, rownum as Num
from Table t
)
select * from TableWithNum where Num > N
Oracle 12.1 and later (following standard ANSI SQL)
select *
from table
order by some_column
offset x rows
fetch first y rows only
They may meet your needs more or less.
There is no direct way to do what you want by SQL.
However, it is not a design flaw, in my opinion.
SQL is not supposed to be used like this.
In relational databases, a table represents a relation, which is a set by definition. A set contains unordered elements.
Also, don't rely on the physical order of the records. The row order is not guaranteed by the RDBMS.
If the ordering of the records is important, you'd better add a column such as `Num' to the table, and use the following query. This is more natural.
select *
from Table
where Num > N
order by Num
Query: in sql-server
DECLARE #N INT = 5 --Any random number
SELECT * FROM (
SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RoNum
, ID --Add any fields needed here (or replace ID by *)
FROM TABLE_NAME
) AS tbl
WHERE #N < RoNum
ORDER BY tbl.ID
This will give rows of Table, where rownumber is starting from #N + 1.
In order to do this in SQL Server, you must order the query by a column, so you can specify the rows you want.
Example:
select * from table order by [some_column]
offset 10 rows
FETCH NEXT 10 rows only
Do you want something like in LINQ skip 5 and take 10?
SELECT TOP(10) * FROM MY_TABLE
WHERE ID not in (SELECT TOP(5) ID From My_TABLE ORDER BY ID)
ORDER BY ID;
This approach will work in any SQL version. You need to stablish some order (by Id for example) so all rows are provided in a predictable manner.
I know it's quite late now to answer the query. But I have a little different solution than the others which I believe has better performance because no comparisons are performed in the SQL query only sorting is done. You can see its considerable performance improvement basically when value of SKIP is LARGE enough.
Best performance but only for SQL Server 2012 and above. Originally from #Majid Basirati's answer which is worth mentioning again.
DECLARE #Skip INT = 2, #Take INT = 2
SELECT * FROM TABLE_NAME
ORDER BY ID ASC
OFFSET (#Skip) ROWS FETCH NEXT (#Take) ROWS ONLY
Not as Good as the first one but compatible with SQL Server 2005 and above.
DECLARE #Skip INT = 2, #Take INT = 2
SELECT * FROM
(
SELECT TOP (#Take) * FROM
(
SELECT TOP (#Take + #Skip) * FROM TABLE_NAME
ORDER BY ID ASC
) T1
ORDER BY ID DESC
) T2
ORDER BY ID ASC
What about this:
SELECT * FROM table LIMIT 50 OFFSET 1
This works with all DBRM/SQL, it is standard ANSI:
SELECT *
FROM owner.tablename A
WHERE condition
AND n+1 <= (
SELECT COUNT(DISTINCT b.column_order)
FROM owner.tablename B
WHERE condition
AND b.column_order>a.column_order
)
ORDER BY a.column_order DESC
PostgreSQL: OFFSET without LIMIT
This syntax is supported, and it is in my opinion the cleanest API compared to other SQL implementations as it does not introduce any new keywords:
SELECT * FROM mytable ORDER BY mycol ASC OFFSET 1
that should definitely be standardized.
The fact that this is allowed can be seen from: https://www.postgresql.org/docs/13/sql-select.html since LIMIT and OFFSET can be given independently, since OFFSET is not a sub-clause of LIMIT in the syntax specification:
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
SQLite: negative limit
OFFSET requires LIMIT in that DBMS, but dummy negative values mean no limit. Not as nice as PostgreSQL, but it works:
SELECT * FROM mytable ORDER BY mycol ASC LIMIT -1 OFFSET 1
Asked at: SQLite with skip (offset) only (not limit)
Documented at: https://sqlite.org/lang_select.html
If the LIMIT expression evaluates to a negative value, then there is no upper bound on the number of rows returned.
MySQL: use a huge limit number
Terrible API design, the documentation actually recommends it:
SELECT * FROM tbl LIMIT 1,18446744073709551615;
Asked at: MySQL skip first 10 results
Node.js Sequelize ORM implements it
That ORM allows e.g. findAll({offset: without limit:, and implements workarounds such as the ones mentioned above for each different DBMS.
In Faircom SQL (which is a pseudo MySQL), i can do this in a super simple SQL Statement, just as follows:
SELECT SKIP 10 * FROM TABLE ORDER BY Id
Obviously you can just replace 10 with any declared variable of your desire.
I don't have access to MS SQL or other platforms, but I'll be really surprised MS SQL doesn't support something like this.
DECLARE #Skip int= 2, #Take int= 2
SELECT * FROM TABLE_NAME
ORDER BY Column_Name
OFFSET (#Skip) ROWS FETCH NEXT (#Take) ROWS ONLY
try below query it's work
SELECT * FROM `my_table` WHERE id != (SELECT id From my_table LIMIT 1)
Hope this will help
You can also use OFFSET to remove the 1st record from your query result like this-
Example - find the second max salary from the employee table
select distinct salary from employee order by salary desc limit 1 OFFSET 1
For SQL Server 2012 and later versions, the best method is #MajidBasirati's answer.
I also loved #CarlosToledo's answer, it's not limited to any SQL Server version but it's missing Order By Clauses. Without them, it may return wrong results.
For SQL Server 2008 and later I would use Common Table Expressions for better performance.
-- This example omits first 10 records and select next 5 records
;WITH MyCTE(Id) as
(
SELECT TOP (10) Id
FROM MY_TABLE
ORDER BY Id
)
SELECT TOP (5) *
FROM MY_TABLE
INNER JOIN MyCTE ON (MyCTE.Id <> MY_TABLE.Id)
ORDER BY Id

Find out if query exceeds arbitrary limit using ROWNUM?

I have a stored proc in Oracle, and we're limiting the number of records with ROWNUM based on a parameter. However, we also have a requirement to know whether the search result count exceeded the arbitrary limit (even though we're only passing data up to the limit; searches can return a lot of data, and if the limit is exceeded a user may want to refine their query.)
The limit's working well, but I'm attempting to pass an OUT value as a flag to signal when the maximum results were exceeded. My idea for this was to get the count of the inner table and compare it to the count of the outer select query (with ROWNUM) but I'm not sure how I can get that into a variable. Has anyone done this before? Is there any way that I can do this without selecting everything twice?
Thank you.
EDIT: For the moment, I am actually doing two identical selects - one for the count only, selected into my variable, and one for the actual records. I then pass back the comparison of the base result count to my max limit parameter. This means two selects, which isn't ideal. Still looking for an answer here.
You can add a column to the query:
select * from (
select . . . , count(*) over () as numrows
from . . .
where . . .
) where rownum <= 1000;
And then report numrows as the size of the final result set.
You could use a nested subquery:
select id, case when max_count > 3 then 'Exceeded' else 'OK' end as flag
from (
select id, rn, max(rn) over () as max_count
from (
select id, rownum as rn
from t
)
where rownum <= 4
)
where rownum <= 3;
The inner level is your actual query (which you probably have filters and an order-by clause in really). The middle later restricts to your actual limit + 1, which still allows Oracle to optimise using a stop key, and uses an analytic count over that inner result set to see if you got a fourth record (without requiring a scan of all matching records). And the outer layer restricts to your original limit.
With a sample table with 10 rows, this gets:
ID FLAG
---------- --------
1 Exceeded
2 Exceeded
3 Exceeded
If the inner query had a filter that returned fewer rows, say:
select id, rownum as rn
from t
where id < 4
it would get:
ID FLAG
---------- --------
1 OK
2 OK
3 OK
Of course for this demo I haven't done any ordering so you would get indeterminate results. And from your description you would use your variable instead of 3, and (your variable + 1) instead of 4.
In my application I do a very simple approach. I do the normal SELECT and when the number of returned rows is equal to the limit then the client application shows LIMIT reached message, because is it very likely that my query would return more rows in case you would not limit the result.
Of course, when the number of rows is exactly the limit then this is a wrong indication. However, in my application the limit is set mainly for performance reasons by end-user, a typical limit is "1000 rows" or "10000 rows", for example.
In my case this solution is fully sufficient - and it is simple.
Update:
Are you aware of the row_limiting_clause? It was introduced in Oracle 12.1
For example this query
SELECT employee_id, last_name
FROM employees
ORDER BY employee_id
OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
will return row 6 to row 16 of the entire result set. It may support you in finding a solution.
Another idea is this one:
SELECT employee_id, last_name
FROM employees
UNION ALL
SELECT NULL, NULL FROM dual
ORDER BY employee_id NULLS LAST
When you get the row where employee_id IS NULL then you know you reached the end of your result-set and no further records will arrive.
Select the whole thing, then select the count and the data, restricting the number of rows.
with
base as
(
select c1, c2, c3
from table
where condition
)
select (select count(*) from base), c1, c2, c3
from base
where rownum < 100

Can I get the position of a record in a SQL result table?

If I do something like
SELECT * FROM mytable ORDER BY mycolumn ASC;
I get a result table in a specific order.
Is there a way in SQL to efficiently find out, given a PK, what position in that result table would contain the record with my PK?
You can count the number of records where the value that you are sorting on has a lower value than the record that you know the key value of:
select count(*)
from mytable
where mycolumn < (select mycolumn from mytable where key = 42)
On databases that support it, you could use ROW_NUMBER() for this purpose:
SELECT RowNr
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY mycolumn) AS RowNr,
mycolumn
FROM mytable
) sub
WHERE sub.mycolumn = 42
The example assumes you're looking for primary key 42 :)
The subquery is necessary because something like:
SELECT
ROW_NUMBER() OVER (ORDER BY mycolumn) AS RowNr
FROM mytable
WHERE sub.mycolumn = 42
Will always return 1; ROW_NUMBER() works after the WHERE, so to speak.
SQL doesn't work that way. It's set-based, which means that "position in that result table" is meaningless to the database.
You can keep track of position when you map the ResultSet into a collection of objects or when you iterate over it.
Unfortunately you cannot get "the position of a row in a table".
The best you can get, using ORDER BY and a variant of the ROW_NUMBER construct (depends on the database engine in use), is the position of a row in the resultset of the query executed.
This position does not map back to any position in the table, though, unless the ORDER BY is on a set of clustered index columns, but even then that position might be invalidated the next second.
What I would like to know is what you intended to use this "position" for.
This answer applies to MySQL
==> lower than 8.0
SET #row_number = 0;
SELECT
(#row_number:=#row_number + 1) AS num,
myColumn.first,
myColumn.second
FROM
myTable
ORDER BY myColumn.first, myColumn.second
source: http://www.mysqltutorial.org/mysql-row_number/
==> greater than 8.0
Please see MySQL ROW_NUMBER() function manual as I did not test. But it seems this function is prefered.
There's no way you can tell that without selecting an entire subset of records. If your PK is of integer type, you can
select count(*) from mytable
where id <= 10 -- Record with ID 10
order by mycolumn asc