SQL Select, different than the last 10 records - sql

I have a table called "dutyroster". I want to make a random selection from this table's "names" column, but, I want the selection be different than the last 10 records so that the same guy is not given a second duty in 10 days. Is that possible ?

Create a temporary table with only one column called oldnames which will have no records initially. For each select, execute a query like
select names from dutyroster where dutyroster.names not in (select oldnamesfrom temporarytable) limit 10
and when execution is done add the resultset to the temporary table

The other answer already here is addressing the portion of the question on how to avoid duplicating selections.
To accomplish the random part of the selection, leverage newid() directly within your select statement. I've made this sqlfiddle as an example.
SELECT TOP 10
newid() AS [RandomSortColumn],
*
FROM
dutyroster
ORDER BY
[RandomSortColumn] ASC
Keep executing the query, and you'll keep getting different results. Use the technique in the other answer for avoiding doubling a guy up.

The basic idea is to use a subquery to get all but users from the last ten days, then sort the rest randomly:
select dr.*
from dutyroster dr
where dr.name not in (select dr2.name
from dutyroster dr2
where dr2.datetimecol >= date_sub(curdate(), interval 10 day)
)
order by rand()
limit 1;
Different databases may have different syntax for limit, rand(), and for the date/time functions. The above gives the structure of the query, but the functions may differ.
If you have a large amount of data and performance is a concern, there are other (more complicated) ways to take a random sample.

you could use TOP function for SQL Server
and for MYSQL you could use LIMIT function

Maybe this would help...
SELECT TOP number|percent column_name(s)
FROM table_name;
Source: http://www.w3schools.com/sql/sql_top.asp

Related

Does a query goes through all data when you only select the last N?

I have a query that select the last 5($new) items from my database.
SELECT OvenRunData.dataId AS id, OvenRunData.data AS data
FROM ovenRuns INNER JOIN OvenRunData ON OvenRuns.id = OvenRunData.ovenRunId
WHERE OvenRunData.ovenRunId = (SELECT MAX(id) FROM OvenRuns)
ORDER BY id DESC LIMIT '$new'
I want to execute this query every 5 seconds with an AJAX request so I can update my table.
I know this query select the last 5 records but I want to know if the query runs through all records and then selects the last 5 or does it select only the last 5 without checking all the data?
I'm really worried that I'll have lag.
You need two indexes to make it fast enough:
create index ix_OvenRuns_id on OvenRuns(id)
create index ix_OvenRunData_ovenRunId on OvenRunData(ovenRunId)
you can even put OvenRunData.dataId OvenRunData.data into the second one, or create clustered index, however, these indexes definitely avoid full data scan.
That depends on the indexes.
In your case, you should have one on OverRuns(id).
More here: http://use-the-index-luke.com/sql/partial-results/top-n-queries
The LIMIT is applied after the ORDER BY, and the ORDER BY is applied to the entire result-set. So the answer to your question is, yes it must go through all of the records in your result-set determined by your WHERE clause before applying the LIMIT.

SQLite3 (or general SQL) retrieve nth row of a query result

Quicky question on SQLite3 (may as well be general SQLite)
How can one retrieve the n-th row of a query result?
row_id (or whichever index) won't work on my case, given that the tables contain a column with a number. Based on some data, the query needs the data unsorted or sorted by asc/desc criteria.
But I may need to quickly retrieve, say, rows 2 & 5 of the results.
So other than implementing a sqlite3_step()==SQLITE_ROW with a counter, right now I have no idea on how to proceed with this.
And I don't like this solution very much because of performance issues.
So, if anyone can drop a hint that'd be highly appreciated.
Regards
david
add LIMIT 1 and OFFSET <n> to the query
example SELECT * FROM users LIMIT 1 OFFSET 5132;
The general approach is that, if you want only the nth row of m rows, use an appropriate where condition to only get that row.
If you need to get to a row and can't because no where criteria can get you there, your database has a serious design issue. It fails the first normal form, which states that "There's no top-to-bottom ordering to the rows."
But I may need to quickly retrieve, say, rows 2 & 5 of the results.
In scenario when you need non-continuous rows you could use ROW_NUMBER():
WITH cte AS (
SELECT *, ROW_NUMBER() OVER() AS rn --OVER(ORDER BY ...) --if specific order is required
FROM t
)
SELECT c
FROM cte
WHERE rn IN (2,5); -- row nums
db<>fiddle demo

Getting additional info on the result of a SQL max query

Say I want to do this with SQL (Sybase): Find all fields of the record with the latest timestamp.
One way to write that is like this:
select * from data where timestamp = (select max(timestamp) from data)
This is a bit silly because it causes two queries - first to find the max timestamp, and then to find all the data for that timestamp (assume it's unique, and yes - i do have an index on timestamp). More so it just seems unnecessary because max() has already found the row that I am interested in so looking for it again is wasteful.
Is there a way to directly access fields of the row that max() returns?
Edit: All answers I see are basically clever hacks - I was looking for a syntactic way of doing something like max(field1).field2 to access field2 of the row with max field1
SELECT TOP 1 * from data ORDER BY timestamp DESC
No, using an aggregate means that you are automatically grouping, so there isn't a single row to get data from even if the group happens to contain a single row.
You can order by the field and get the first row:
set rowcount 1
select * from data order by timestamp desc
(Note that you shouldn't use select *, but rather specify the fields that you want from the query. That makes the query less sensetive to changes in the database layout.)
Can you try this
SELECT TOP 1 *
FROm data
ORDER BY timestamp DESC
You're making assumptions about how Sybase optimizes queries. For all you know, it may do precisely what you want it to do - it may notice both queries are from "data" and that the condition is "where =", and may optimize as you suggest.
I know in the case of SQL Server, it's possible to configure indexes to include fields from the indexed row. Doing a select through such an index leaves those fields available.
This is SQL server, but you'll get the idea.
SELECT TOP(1) * FROM data
ORDER BY timestamp DESC;

How can I speed up row_number in Oracle?

I have a SQL query that looks something like this:
SELECT * FROM(
SELECT
...,
row_number() OVER(ORDER BY ID) rn
FROM
...
) WHERE rn between :start and :end
Essentially, it's the ORDER BY part that's slowing things down. If I were to remove it, the EXPLAIN cost goes down by an order of magnitude (over 1000x). I've tried this:
SELECT
...
FROM
...
WHERE
rownum between :start and :end
But this doesn't give correct results. Is there any easy way to speed this up? Or will I have to spend some more time with the EXPLAIN tool?
ROW_NUMBER is quite inefficient in Oracle.
See the article in my blog for performance details:
Oracle: ROW_NUMBER vs ROWNUM
For your specific query, I'd recommend you to replace it with ROWNUM and make sure that the index is used:
SELECT *
FROM (
SELECT /*+ INDEX_ASC(t index_on_column) NOPARALLEL_INDEX(t index_on_column) */
t.*, ROWNUM AS rn
FROM table t
ORDER BY
column
)
WHERE rn >= :start
AND rownum <= :end - :start + 1
This query will use COUNT STOPKEY
Also either make sure you column is not nullable, or add WHERE column IS NOT NULL condition.
Otherwise the index cannot be used to retrieve all values.
Note that you cannot use ROWNUM BETWEEN :start and :end without a subquery.
ROWNUM is always assigned last and checked last, that's way ROWNUM's always come in order without gaps.
If you use ROWNUM BETWEEN 10 and 20, the first row that satisifies all other conditions will become a candidate for returning, temporarily assigned with ROWNUM = 1 and fail the test of ROWNUM BETWEEN 10 AND 20.
Then the next row will be a candidate, assigned with ROWNUM = 1 and fail, etc., so, finally, no rows will be returned at all.
This should be worked around by putting ROWNUM's into the subquery.
Looks like a pagination query to me.
From this ASKTOM article (about 90% down the page):
You need to order by something unique for these pagination queries, so that ROW_NUMBER is assigned deterministically to the rows each and every time.
Also your queries are no where near the same so I'm not sure what the benefit of comparing the costs of one to the other is.
Is your ORDER BY column indexed? If not that's a good place to start.
Part of the problem is how big is the 'start' to 'end' span and where they 'live'.
Say you have a million rows in the table, and you want rows 567,890 to 567,900 then you are going to have to live with the fact that it is going to need to go through the entire table, sort pretty much all of that by id, and work out what rows fall into that range.
In short, that's a lot of work, which is why the optimizer gives it a high cost.
It is also not something an index can help with much. An index would give the order, but at best, that gives you somewhere to start and then you keep reading on until you get to the 567,900th entry.
If you are showing your end user 10 items at a time, it may be worth actually grabbing the top 100 from the DB, then having the app break that 100 into ten chunks.
Spend more time with the EXPLAIN PLAN tool. If you see a TABLE SCAN you need to change your query.
Your query makes little sense to me. Querying over a ROWID seems like asking for trouble. There's no relational info in that query. Is it the real query that you're having trouble with or an example that you made up to illustrate your problem?

Is there efficient SQL to query a portion of a large table

The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.