I've been trying to solve this problem for a few days now without much luck. I have found loads of resources that talk about paging on SQL Server 2000 both here and on codeproject.
The problem I am facing is trying to implement some sort of paging mechanism on a table which has three keys which make up the primary key. Operator, CustomerIdentifier, DateDisconnected.
Any help/pointers would be greately appreciated
SQL Server 2000 doesn't have the handy row_number function, so you'll have to auto-generate a row number column with a subquery, like so:
select
*
from
(select
*,
(select count(*) from tblA where
operator < a.operator
or (operator = a.operator
and customeridentifier < a.customeridentifier)
or (operator = a.operator
and customeridentifier = a.customeridentifier
and datedisconnected <= a.datedisconnected)) as rownum
from
tblA a) s
where
s.rownum between 5 and 10
order by s.rownum
However, you can sort those rows by any column in the table -- it doesn't have to use the composite key. It would probably run faster, too!
Additionally, composite keys are usually a flag. Is there any particular reason you aren't just using a surrogate key with a unique constraint on these three columns?
Related
I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo and hello.
The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question
SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
WHERE out.text != in.text
AND LOWER(in.text) = LOWER(out.text)
AND in.created_at > out.created_at) > 1
Thanks!
Can you try
SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text) ORDER by created_at ) as rn
FROM some_table
You can then use the rn column as a filter
To help this query, create an expression index on LOWER(text). Include created_at in the index to help the date comparisons.
CREATE INDEX text_lower ON some_table(LOWER(text), created_at);
It's hard to test this without your data, though.
I want to apply pagination on a table with huge data. All I want to know a better option than using OFFSET in SQL Server.
Here is my simple query:
SELECT *
FROM TableName
ORDER BY Id DESC
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).
In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.
For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.
In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.
One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.
Here is an example:
Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.
You begin with:
SELECT TOP (#numRows)
*
FROM TableName
ORDER BY Id DESC;
Note the use of ORDER BY to ensure the order is correct
In some RDBMSs you need LIMIT instead of TOP
The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:
SELECT TOP (#numRows)
*
FROM TableName
WHERE Id < #lastId
ORDER BY Id DESC;
Note the use of < not <=
In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.
The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.
SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (#lastOther, #lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).
Instead you need the following:
SELECT TOP (#numRows)
*
FROM TableName
WHERE (
(OtherColumn = #lastOther AND Id < #lastId)
OR OtherColumn < #lastOther
)
ORDER BY
OtherColumn DESC,
Id DESC;
This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.
The presence of NULLs complicates things further. You may want to query those rows separately.
On very big merchant website we use a technic compound of ids stored in a pseudo temporary table and join with this table to the rows of the product table.
Let me talk with a clear example.
We have a table design this way :
CREATE TABLE S_TEMP.T_PAGINATION_PGN
(PGN_ID BIGINT IDENTITY(-9 223 372 036 854 775 808, 1) PRIMARY KEY,
PGN_SESSION_GUID UNIQUEIDENTIFIER NOT NULL,
PGN_SESSION_DATE DATETIME2(0) NOT NULL,
PGN_PRODUCT_ID INT NOT NULL,
PGN_SESSION_ORDER INT NOT NULL);
CREATE INDEX X_PGN_SESSION_GUID_ORDER
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_GUID, PGN_SESSION_ORDER)
INCLUDE (PGN_SESSION_ORDER);
CREATE INDEX X_PGN_SESSION_DATE
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_DATE);
We have a very big product table call T_PRODUIT_PRD and a customer filtered it with many predicates. We INSERT rows from the filtered SELECT into this table this way :
DECLARE #SESSION_ID UNIQUEIDENTIFIER = NEWID();
INSERT INTO S_TEMP.T_PAGINATION_PGN
SELECT #SESSION_ID , SYSUTCDATETIME(), PRD_ID,
ROW_NUMBER() OVER(ORDER BY --> custom order by
FROM dbo.T_PRODUIT_PRD
WHERE ... --> custom filter
Then everytime we need a desired page, compound of #N products we add a join to this table as :
...
JOIN S_TEMP.T_PAGINATION_PGN
ON PGN_SESSION_GUID = #SESSION_ID
AND 1 + (PGN_SESSION_ORDER / #N) = #DESIRED_PAGE_NUMBER
AND PGN_PRODUCT_ID = dbo.T_PRODUIT_PRD.PRD_ID
All the indexes will do the job !
Of course, regularly we have to purge this table and this is why we have a scheduled job which deletes the rows whose sessions were generated more than 4 hours ago :
DELETE FROM S_TEMP.T_PAGINATION_PGN
WHERE PGN_SESSION_DATE < DATEADD(hour, -4, SYSUTCDATETIME());
In the same spirit as SQLPro solution, I propose:
WITH CTE AS
(SELECT 30000000 AS N
UNION ALL SELECT N-1 FROM CTE
WHERE N > 30000000 +1 - 20)
SELECT T.* FROM CTE JOIN TableName T ON CTE.N=T.ID
ORDER BY CTE.N DESC
Tried with 2 billion lines and it's instant !
Easy to make it a stored procedure...
Of course, valid if ids follow each other.
I have 2 tables, one contains a list of customers(t_client) with their unique ID, the other one contains a list of promotional codes(t_promo_code).
I have created the index for both data table: idx_client; idx_code and I want to join these 2 tables so that each client can have a promotional code.
I suppose there should be something like this in SQL server?
SELECT *
FROM [EMAIL].[dbo].[T_client]
JOIN [EMAIL].[dbo].[T_promo_code] ON
(INDEX([EMAIL].[dbo].[T_client].idx_client)) = (INDEX ([EMAIL].[dbo].[T_promo_code].idx_code))
However, I cannot find anything... And I am really not familiar with Index. If I can turn index into a column, that would be much easier, yet I don't know how to do that either.
I only found a select sentence like this:
Select #row_index := #row_index +1 as index
But it seems that it only works for MYSQL, while I am using SQL SERVER 2008.
Any ideas?
Sorry if I didn't make it clear. I have the difficulty to join these 2 tables because the table t_promo_code doesn't have any column to match t_client.
Hence, I was considering generate a shared key for they by using INDEX. However, I just found another solution, that is using Row_number instead of Index.
Eventually, I have used the following SQL, and it works!
Select Email, 'test_campaign' AS Campaign, GEtDATE() AS DATE, Code
from(
SELECT Code, row_number() over (order by code) as row_num
FROM [t_promo_code])A
join
(SELECT Email, row_number() over (order by Email) as row_num
FROM [t_client])B
on A.row_num=B.row_num
ORDER BY A.Code,B.Email
I have a table with n number of records
How can i retrieve the nth record and (n-1)th record from my table in SQL without using derived table ?
I have tried using ROWID as
select * from table where rowid in (select max(rowid) from table);
It is giving the nth record but i want the (n-1)th record also .
And is there any other method other than using max,derived table and pseudo columns
Thanks
You cannot depend on rowid to get you to the last row in the table. You need an auto-incrementing id or creation time to have the proper ordering.
You can use, for instance:
select *
from (select t.*, row_number() over (order by <id> desc) as seqnum
from t
) t
where seqnum <= 2
Although allowed in the syntax, the order by clause in a subquery is ignored (for instance http://docs.oracle.com/javadb/10.8.2.2/ref/rrefsqlj13658.html).
Just to be clear, rowids have nothing to do with the ordering of rows in a table. The Oracle documentation is quite clear that they specify a physical access path for the data (http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i6732). It is true that in an empty database, inserting records into a newtable will probably create a monotonically increasing sequence of row ids. But you cannot depend on this. The only guarantees with rowids are that they are unique within a table and are the fastest way to access a particular row.
I have to admit that I cannot find good documentation on Oracle handling or not handling order by's in subqueries in its most recent versions. ANSI SQL does not require compliant databases to support order by in subqueries. Oracle syntax allows it, and it seems to work in some cases, at least. My best guess is that it would probably work on a single processor, single threaded instance of Oracle, or if the data access is through an index. Once parallelism is introduced, the results would probably not be ordered. Since I started using Oracle (in the mid-1990s), I have been under the impression that order bys in subqueries are generally ignored. My advice would be to not depend on the functionality, until Oracle clearly states that it is supported.
select * from (select * from my_table order by rowid) where rownum <= 2
and for rows between N and M:
select * from (
select * from (
select * from my_table order by rowid
) where rownum <= M
) where rownum >= N
Try this
select top 2 * from table order by rowid desc
Assuming rowid as column in your table:
SELECT * FROM table ORDER BY rowid DESC LIMIT 2
I'm just getting into optimizing queries by logging slow queries and EXPLAINing them. I guess the thing is... I'm not sure exactly what kind of things I should be looking for.... I have the query
SELECT DISTINCT
screenshot.id,
screenshot.view_count
FROM screenshot_udb_affect_assoc
INNER JOIN screenshot ON id = screenshot_id
WHERE unit_id = 56
ORDER BY RAND()
LIMIT 0, 6;
Looking at these two elements.... where should I focus on optimization?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE screenshot ALL PRIMARY NULL NULL NULL 504 Using temporary; Using filesort
1 SIMPLE screenshot_udb_affect_assoc ref screenshot_id screenshot_id 8 source_core.screenshot.id,const 3 Using index; Distinct
To begin with please refrain using ORDER BY RAND(). This in particular degrades performance when the table size is large.
For example, even with limit 1 , it generates number of random numbers equal to the row count, and would pick the smallest one. This might be inefficient if table size is large or bound to grow. Detailed discussion on this can be found at: http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
Lastly, also ensure that your join columns are indexed.
Try:
SELECT s.id,
s.view_count
FROM SCREENSHOT s
WHERE EXISTS(SELECT NULL
FROM SCREENSHOT_UDB_AFFECT_ASSOC x
WHERE x.screenshot_id = s.id)
ORDER BY RAND()
LIMIT 6
Under 100K records, it's fine to use ORDER BY RAND() -- over that, and you want to start looking at alternatives that scale better. For more info, see this article.
I agree with kuriouscoder, refrain from using ORDER BY RAND(), and make sure each of the following fields are indexed in a single index:
screenshot_udb_affect_assoc.id
screenshot.id
screenshot.unit_id
do this using code like:
create index Index1 on screenshot(id):