I need to locate the index position of a record in a large database table in order to preset a pager to that item's page. I need an efficient SQL query that can give me this number. Sadly SQL doesn't offer something like:
SELECT INDEX(*) FROM users WHERE userid='123'
Any bright ideas?
EDIT: Lets assume there is an ORDER BY clause appended to this. the point is I do not want to have to load all records to locate the position of a specific one. I am trying to open a pager to the page holding an existing item that had previously been chosen - because i want to provide information about that already chosen item within a context that allows a user to choose a different one.
You might use something like (pseudo-code):
counting query: $n = select count(uid) from {users} where ... (your paging condition including userid 123 as the limit)
$page = floor($n / $pager_size);
display query: select what,you,want from {users} where (your paging condition without the limit), passed to db_query_range(thequery, $page, $pager_size)
You should really look at pager_query, though, because that's what it's all about, and it basically works like this: a counting query and a display query, except it tries to build the counting query automatically.
Assuming you are really asking how to page records in SQL Server 2005 onwards, have a look at this code from David Hayden:
(you will need to change Date, Description to be your columns)
CREATE PROCEDURE dbo.ShowUsers
#PageIndex INT,
#PageSize INT
AS
BEGIN
WITH UserEntries AS (
SELECT ROW_NUMBER() OVER (ORDER BY Date DESC) AS Row, Date, Description
FROM users)
SELECT Date, Description
FROM UserEntries
WHERE Row BETWEEN (#PageIndex - 1) * #PageSize + 1 AND #PageIndex * #PageSize
END
SQL doesn't guarantee the order of objects in the table unless you use the OrderBy clause. In other words, the index of any particular row may change in subsequent queries. Can you describe what you are trying to accomplish with the pager?
You might be interested in something that simulates the rownum() of Oracle in MySQL... if you are using MySQL of course as it's not specified in the question.
Notes:
You'll have to look through all the records of your pages for that to work of course. You don't need to fetch them back to the PHP page from the database server but you'll have to include them in the query. There's no magic trick to determine the position of your row inside a result set other than querying the result set as it might change because of the where conditions, the orders and the groups. It needs to be in context.
Of course, if all your rows are sequential, with incremental ids, none are deleted, and you know the first and last ids; then you could use a count and with simple math get the position without querying everything.... but I doubt that's your case, it never is.
Related
I came across an old script that in essence does the following:
CREATE TABLE #T (ColA VARCHAR (20), ID INT)
INSERT INTO #T VALUES ('BBBBBBBB', 1), ('AAAAAAA', 4), ('RRRRRR', 3)
CREATE TABLE #S (ColA VARCHAR (100), ID INT)
INSERT INTO #S
SELECT * FROM #T
ORDER BY ID -- odd to do an order by in an insert statement, but that's the code as it is...
SELECT * FROM #S
DROP TABLE #T, #S
First, I want to mention that I am aware of the fact that tables such as the ones I created here do not have an actual order, we just order the resultset if we want.
However, if you run the script above on a SQL version 2008, you will get the results ordered in the order that was specified in the insert statement. On a 2016 machine, this is not the case. There it returns the rows in the order they were created in the first place. Does anyone know what changes cause this different behaviour?
Thanks a lot!
As to your example - nothing is changed. The relation in the relation theory is represented in the SQL with a table. And the relation is not ordered. So, you are not allowed to defined how rows are ordered when they are materialized - and you should not care about this.
If you want to SELECT the data in a ordered way each time, you must specified unique order by criteria.
Also, in your example - you can SELECT the data one billion times and the data can be returned as "you inserted" it each time, but on the very next time you can get different results. The engine returns the data in the "best" way according to it when there is no order specified, but this can change anytime.
As you know - unless order by is specified, the database engine returns the rows in an arbitrary order - How this order is generated has to do with the internal parts of the database engine - the algorithm may change between versions, even between service packs, without any need for documentation since it's known to be arbitrary.
Please note that arbitrary is not the same as random - meaning you should not expect to get different row order each time you run the query - in fact, you will probably get the same row order every time until something changes - that might be a restart to the server, a rebuild of an index, another row added to the table, an index created or removed - I can't say because it's not documented anywhere.
Moreover, unless you have an Identity column in your table, the optimizer will simply ignore the order by clause in the insert...select statement, exactly because what you already wrote in your question - Database tables have no intrinsic order.
Order the result set of a query by the specified column list and,
optionally, limit the rows returned to a specified range. The order
in which rows are returned in a result set are not guaranteed unless
an ORDER BY clause is specified.
MSSQL Docs
We get an Access DB (.accdb) from an external source and have no control over the structure or data. We need to ingest the data into our DB using code. This means I have control over the SQL.
Our issue is that one table contains almost 13k records (currently 12,997) and takes a long time to process. I'd like to query the data from the source DB but only a predefined number of records at a time - let's say 1000 at a time.
I tried generating my query inside a loop where I update the number the records to return with each loop. So far, the only thing I've found that comes close to working is something like this:
SELECT *
FROM (
SELECT Top + pageSize + sub.*
FROM (
SELECT TOP + startPos + [Product Description Codes].*
FROM [Product Description Codes]
ORDER BY [Product Description Codes].PRODDESCRIPCODE
) sub
ORDER BY sub.PRODDESCRIPCODE DESC
) subOrdered
ORDER BY subOrdered.PRODDESCRIPCODE
Where I increment pageSize and startPos with each loop. The problem is that it always returns 1000 rows, even on what I think should be the last loop when it should return only 997 and then return zero after that.
Can anyone help me with this? I don't have another column to filter on. Is there a way to select a certain number of records in a loop and then increment that number until I've gotten all the records, and then stop?
If PRODDESCRIPCODE is primary key then you can simplify your select. ie:
SELECT TOP 1000 *
FROM [Product Description Codes]
where PRODDESCRIPCODE > #pcode;
and start with passing a #pcode parameter of 0 (if int, or '' if text etc). In next loop you would set the parameter to the max PRODDESCRIPCODE you have received.
(I am not sure if you meant MS SQL server saying SQL and how you are doing this).
Do you absolutely have to update records, or can you afford to insert the entire access table into your local table, slap on a timestamp field, and structure your local queries to grab the most recent entry? Based on some of your comments above, it doesn't sound like you have any cases where you are keeping a local record over an imported one.
SELECT PRODDESCRIPCODE, MAX(timestamp) FROM table GROUP BY PRODDESCRIPCODE
I ended up using a variation of the method from here:
http://www.jertix.org/en/blog/programming/implementation-of-sql-pagination-with-ms-access.html
Thank you all very much for your suggestions.
Few days ago I came across a strange problem with the Order By , While creating a new table I used
Select - Into - From and Order By (column name)
and when I open that table see tables are not arranged accordingly.
I re-verified it multiple times to make sure I am doing the right thing.
One more thing I would like to add is till the time I don't use INTO, I can see the desired result but as soon as I create new table, I see there is no Order for tht column. Please help me !
Thanks in advance.. Before posting the question I did research for 3 days but no solution yet
SELECT
[WorkOrderID], [ProductID], [OrderQty], [StockedQty]
INTO
[AdventureWorks2012].[Production].[WorkOrder_test]
FROM
[AdventureWorks2012].[Production].[WorkOrder]
ORDER BY
[StockedQty]
SQL 101 for beginners: SELECT statements have no defined order unless you define one.
When i open that table
That likely issues a SELECT (TOP 1000 IIFC) without order.
While creating a new table i used Select - Into - From and Order By (column name)
Which sort of is totally irrelevant - you basically waste performance ordering the input data.
You want an order in a select, MAKE ONE by adding an order by clause to the select. The table's internal order is by clustered index, but an query can return results in any order it wants. Fundamental SQL issue, as I said in the first sentence. Any good book on sql covers that in one of the first chapters. SQL uses a set approach, sets have no intrinsic order.
Firstly T-SQL is a set based language and sets don't have orders. More over it also doesn't mean serial execution of commands i.e, the above query is not executed in sequence written but the processing order for a SELECT statement is as:
1.FROM
2.ON
3.JOIN
4.WHERE
5.GROUP BY
6.WITH CUBE or WITH ROLLUP
7.HAVING
8.SELECT
9.DISTINCT
10.ORDER BY
Now when you execute your query without into selected column data gets ordered based on the condition specified in 'Order By' clause but when Into is used format of new_table is determined by evaluating the expressions in the select list.(Remember order by clause has not been evaluated yet).
The columns in new_table are created in the order specified by the select list but rows cannot be ordered. It's a limitation of Into clause you can refer here:
Specifying an ORDER BY clause does not guarantee the rows are inserted
in the specified order.
In my database I have a table with a rather large data set that users can perform searches on. So for the following table structure for the Person table that contains about 250,000 records:
firstName|lastName|age
---------|--------|---
John | Doe |25
---------|--------|---
John | Sams |15
---------|--------|---
the users would be able to perform a query that can return about 500 or so results. What I would like to do is allow the user see his search results 50 at a time using pagination. I've figured out the client side pagination stuff, but I need somewhere to store the query results so that the pagination uses the results from his unique query and not from a SELECT * statement.
Can anyone provide some guidance on the best way to achieve this? Thanks.
Side note: I've been trying to use temp tables to do this by using the SELECT INTO statements, but I think that might cause some problems if, say, User A performs a search and his results are stored in the temp table then User B performs a search shortly after and User A's search results are overwritten.
In SQL Server the ROW_NUMBER() function is great for pagination, and may be helpful depending on what parameters change between searches, for example if searches were just for different firstName values you could use:
;WITH search AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY firstName ORDER BY lastName) AS RN_firstName
FROM YourTable)
SELECT *
FROM search
WHERE RN BETWEEN 51 AND 100
AND firstName = 'John'
You could add additional ROW_NUMBER() lines, altering the PARTITION BY clause based on which fields are being searched.
Historically, for us, the best way to manage this is to create a complete new table, with a unique name. Then, when you're done, you can schedule the table for deletion.
The table, if practical, simply contains an index id (a simple sequenece: 1,2,3,4,5) and the primary key to the table(s) that are part of the query. Not the entire result set.
Your pagination logic then does something like:
SELECT p.* FROM temp_1234 t, primary_table p
WHERE t.pkey = p.primary_key
AND t.serial_id between 51 and 100
The serial id is your paging index.
So, you end up with something like (note, I'm not a SQL Server guy, so pardon):
CREATE TABLE temp_1234 (
serial_id serial,
pkey number
);
INSERT INTO temp_1234
SELECT 0, primary_key FROM primary_table WHERE <criteria> ORDER BY <sort>;
CREATE INDEX i_temp_1234 ON temp_1234(serial_id); // I think sql already does this for you
If you can delay the index, it's faster than creating it first, but it's a marginal improvement most likely.
Also, create a tracking table where you insert the table name, and the date. You can use this with a reaper process later (late at night) to DROP the days tables (those more than, say, X hours old).
Full table operations are much cheaper than inserting and deleting rows in to an individual table:
INSERT INTO page_table SELECT 'temp_1234', <sequence>, primary_key...
DELETE FROM page_table WHERE page_id = 'temp_1234';
That's just awful.
First of all, make sure you really need to do this. You're adding significant complexity, so go & measure whether the queries and pagination really hurts or you just "feel like you should". The pagination can be handled with ROW_NUMBER() quite easily.
Assuming you go ahead, once you've got your query, clearly you need to build a cache so first you need to identify what the key is. It will be the SQL statement or operation identifier (name of stored procedure perhaps) and the criteria used. If you don't want to share between users then the user name or some kind of session ID too.
Now when you do a query, you first look up in this table with all the key data then either
a) Can't find it so you run the query and add to the cache, storing the criteria/keys and the data or PK of the data depending on if you want a snapshot or real time. Bear in mind that "real time" isn't really because other users could be changing data under you.
b) Find it, so remove the results (or join the PK to the underlying tables) and return the results.
Of course now you need a background process to go and clean up the cache when it's been hanging around too long.
Like I said - you should really make sure you need to do this before you embark on it. In the example you give I don't think it's worth it.
MySQL
Suppose you want to retrieve just a single record by some id, but you want to know what its position would have been if you'd encountered it in a large ordered set.
Case in point is a photo gallery. You land on a single photo, but the system must know what its offset is in the entire gallery.
I suppose I could use custom indexing fields to keep track of positions, but there must be a more graceful way in SQL alone.
So, first you create a virtual table with the position # ordered by whatever your ORDER BY is, then you select the highest one from that set. That's the position in the greater result set. You can run into problems if you don't order by a unique value/set of values...
If you create an index on (photo_gallery_id, date_created_on) it may do an index scan (depending on the distribution of photos), which ought to be faster than a table scan (provided your gallery_id isn't 90% of the photos or whatnot).
SELECT #row := 0;
SELECT MAX( position )
FROM ( SELECT #row := #row + 1 AS position
FROM photos
WHERE photo_gallery_id = 43
AND date_created_on <= 'the-date-time-your-photo-was'
ORDER BY date_created_on ) positions;
Not really. I think Oracle gives you a "ROWID" or something like that, but most don't give you one. A custom ordering, like a column in your database that tells you want position the entry in the gallery is good because you can never be sure that SQL will put things in the table in the order you think they should be in.
As you are not specific about what database you're using, in SQL Server 2005 you could use
SELECT
ROW_NUMBER() OVER (ORDER BY PhotoID)
, PhotoID
FROM dbo.Photos
You don't say what DBMS you are using, and the "solution" will vary accordingly. In Oracle you could do this (but I would urge you not to!):
select photo, offset
from
( select photo
, row_number() over (partition by gallery_id, order by photo_seq) as offset
from photos
)
where id = 123
That query will select all photos (full table scan) and then pick out the one you asked for - not a performant query!
I would suggest if you really need this information it should be stored.
Assuming the position is determined solely by the id, would it not be as simple as counting all records with a smaller id value?:
select
po.[id]
...
((select count(pi.[id]) from photos pi where pi.[id] < po.[id]) + 1) as index
...
from photos po
...
I'm not sure what the performance implications of such a query would be, but I would think returning a lot of records could be a problem.
You must understand the difference between a "application key" and a "technical key".
The technical key exists for the sole purpose to make an item unique. It's usually in INTEGER or BIGINT, generated (identity, whatever). This key is used to locate objects in the database, quickly figure out of an object has already been persisted (IDs must be > 0, so an object with the default ID == 0 is not in the DB, yet), etc.
The application key is something which you need to make sense of an object within the context of your application. In this case, it's the ordering of the photos in the gallery. This has no meaning whatsoever for the database.
Think ordered list: This is the default in most languages. You have a set of items, accessed by an index. For a database, this index is an application key since sets in the database are unordered (or rather the database doesn't guarantee any ordering unless you specify ORDER BY). For the very same reason, paging through results from a query is such a pain: Databases really don't like the idea of "position".
So what you must do is add an index row (i.e. an INTEGER which says at which position in the gallery your image is; not a database index for quicker access, even though you should create an index on this column ...) and maintain that. For every insertion, you must UPDATE index = index + 1 where index >= insertion_point, etc.
Yes, it sucks. The only solution I know of: Use an ORM framework which solves this for you.
There's no need for an extra table, why not just count the records instead?
You know the order in which they are displayed (which can vary), but you know it.
You also know the ID of the current record; let's say it's ordered on date:
The offset of the record, is the total number of records counted with a date < that date.
SELECT COUNT(1) FROM ... WHERE date < "the-date"
This gives you the number you can use as the offset for the other queries...