CosmosDB: How does SELECT TOP work? - sql

I have a 450GB database... with millions of records.
Here is an example query:
SELECT TOP 1 * FROM c WHERE c.type='water';
To speed up our queries, I thought about just taking the first one but we have noticed that the query still takes quite a while, despite the very first record in the database matching our constraints.
So, my question is, how does the SELECT TOP 1 really work? Does it:
A) Select ALL records and then return just the first (top) one where
type='water'
B) Return the first record which is encountered where type='water'

Try this line, noting the offset limit:
SELECT * FROM c WHERE c.type='water' OFFSET 0 LIMIT 1
For more information about the offset limit:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-offset-limit

Assuming you aren't sorting your results (which you query isn't) then TOP 1 will return the first result as soon as it finds one. This should then end the query.

Cosmos db Explorer doesn't work with the TOP Command, It's an existing issue. It works fine in SDK Call.
Check some Top command usage below
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery

Related

How can I get a specific chunk of results?

Is it possible to retrieve a specific range of results? I know how to do TOP x but the result I will retrieve is WAY too big and will time out. I was hoping to be able to pick say the first 10,000 results then the next 10,000 and so on. Is this possible?
WITH Q AS (
SELECT ROW_NUMBER() OVER (ORDER BY ...some column) AS N, ...other columns
FROM ...some table
) SELECT * FROM Q WHERE N BETWEEN 1 AND 10000;
Read more about ROW_NUMBER() here: http://msdn.microsoft.com/en-us/library/ms186734.aspx
Practically all SQL DB implementations have a way of specifying the starting row to return, as well as the number of rows.
For example, in both mysql and postgres it looks like:
SELECT ...
ORDER BY something -- not required, but highly recommended
LIMIT 100 -- only get 100 rows
OFFSET 500; -- start at row 500
Note that normally you would include an ORDER BY to make sure your chunks are consistent
MS SQL Server (being a "pretend" DB) don't support OFFSET directly, but it can be coded using ROW_NUMBER() - see this SO post for more detail.

Top N in View or Crystal Reports?

I am wondering if it's possible to use a view to get the top 5 lines from a table.
I am finding that Crystal reports doesn't seem to have anything built in to do this, or I'd do it there.
When I query the view Select * from qryTranHistory, it returns the first 5 items, but if I try to select a specific type Select * from qryTranHistory Where tID = 45 it returns nothing, since there are no tID=45 in the top 5 normally.
Is it possible to do this?
Can it be accomplished in a sub report in Crystal Reports?
It is easy to limit a report to the top 5 records. In the menu, just choose
Report --> Selection Formulas... --> Group
In the formula, enter "RecordNumber <= 5" and you are done.
You don't need to have a group field nor summary field to do the group filter. You don't need a sort order, but using top N records without a sort order doesn't usually make much sense. It might not be efficient as OMG Ponies suggested, but for small number of records it is OK.
You can reference a sproc from Crystal Reports. In the sproc, use a conditional on the parameter.
ALTER PROCEDURE dbo.Get_TOP5
(
#tID INT = NULL
)
AS
IF #tID IS NULL
BEGIN
SELECT TOP 5
FIELD1,
FIELD2
FROM qryTranHistory
END
ELSE
BEGIN
SELECT
FIELD1,
FIELD2
FROM qryTranHistory
WHERE tID =#tID
END
A simple setting can limit the records to top 5!! Here it is, if you're using .Net 1.1 (similar arrangement of options in higher frameworks too!).
Right click on the report layout > Reports > Top N/Sort Group Expert > Choose Top N in the Dropdown that asks for the type of filtering/ sorting you wish to do > Set the Value of top N (5 in your case) > Uncheck the option that includes other records.
Your report will be filtered for only the top 5 records from the Dataset.
There's another way how it could be done and that is through the Record selection formula where you limit the No. of records, as suggested by John Price in this thread.
Cheers!
Can you put the TOP in your SELECT statement instead of in the view?
SELECT TOP 5
col1,
col2,
...
FROM
qryTranHistory
WHERE
tid = 45
If your table has more then 5 rows I hope this query:
SELECT * FROM qryTranHistory
Returns more then 5 rows because you never mentioned TOP 5.
Your question doesn't make a lot of sense as I am not sure waht you are after.
You mentioned if you ran your query with WHERE tID=45, it returns nothing, what exactly do you want it to return ?
Read up on TOP in BOL:
SELECT TOP 10 Recs FROM Records WHERE...
By the way you do not want to do this in the report / a form interface, you want to do this in your db layer.
You can do Top N processing in Crystal Reports, but it's a little obscure - you have to use the group sort expert (and in order to use that, you need to have groups and summary fields inserted into the groups.)
Doing the Top N processing in the query should be more efficient, where possible.
If you have a small recordset, you can create a running total that counts the change of rows (field1), then in Section Expert in Details, tell it to supress RTotal0 (your running total variable) to > 5

Sql query to get a non-contiguous subset of results

I'm writing a web application that should show very large results on a search query.
Say some queries will return 10.000 items.
I'd like to show those to users paginated; no problem so far: each page will be the result of a query with an appropriate LIMIT statement.
But I'd like to show clues about results in each page of the paginated query: some data from the first item and some from the last.
This mean that, for example, with a result of 10.000 items and a page size of 50 items, if the user asked for the first page I will need:
the first 50 items (the page requested by the user)
item 51 and 100 (the first and last of the second page)
item 101 and 151
etc
For efficiency reasons I want to avoid one query per row.
[edit] I also would prefer not downloading 10.000 results if I only need 50 + 10000/50*2 = 400
The question is: is there a single query I can issue to the RDBMS (mysql, by the way, but I'd prefer a cross-db solution) that will return only the data I need?
I can't use server side cursor, because not all dbs support it and I want my app to be database-agnostic.
Just for fun, here is the MSSQL version of it.
declare #pageSize as int; set #pageSize = 10;
declare #pageIndex as int; set #pageIndex = 0; /* first page */
WITH x AS
(
select
ROW_NUMBER() OVER (ORDER BY (created) ASC) AS RowNumber,
*
from table
)
SELECT * FROM x
WHERE
((RowNumber <= (#pageIndex+1)*#pageSize) AND (RowNumber >= #pageIndex*#PageSize+1))
OR
RowNumber % #pageSize = 1
OR
RowNumber % #pageSize = #pageSize-1
Note, that an ORDER BY is provided in the over clause.
Also note, that if you have gazillion rows, your result set will have millions. You need to maximize the result rows for practical reasons.
I have no idea how this could be a solved in generic SQL. (My bet: no way. Even simple pageing cannot be solved without DB-specific operators.)
UPDATE: I completely misread the initial question. You can do this using UNION and the LIMIT clause in MySQL, although it might be what you meant by "one query per row". The syntax would be like:
select FOO from BAZ limit 50
union
select FOO from BAZ limit 50, 1
union
select FOO from BAZ limit 99, 1
union
select FOO from BAZ limit 100, 1
union
select FOO from BAZ limit 149, 1
and so on and so forth. Since you're using UNION, you'll only need one roundtrip to the database. I'm not sure how MySQL will treat the various SELECT statements, though. It should be able to recognize that they are essentially the same query and use a cached query plan, but I don't work with MySQL enough to know if that's a reasonable expectation for its optimizer.
Obviously, to build this query in a general fashion, you'll first need to run a count query so you can calculate what your offsets will be.
This is definitely not a tractable problem for standard SQL, since the paging logic requires nonstandard features.

LIMIT in FoxPro

I am attempting to pull ALOT of data from a fox pro database, work with it and insert it into a mysql db. It is too much to do all at once so want to do it in batches of say 10 000 records. What is the equivalent to LIMIT 5, 10 in Fox Pro SQL, would like a select statement like
select name, address from people limit 5, 10;
ie only get 10 results back, starting at the 5th. Have looked around online and they only make mention of top which is obviously not of much use.
Take a look at the RecNo() function.
FoxPro does not have direct support for a LIMIT clause. It does have "TOP nn" but that only provides the "top-most records" within a given percentage, and even that has a limitation of 32k records returned (maximum).
You might be better off dumping the data as a CSV, or if that isn't practical (due to size issues), writing a small FoxPro script that auto-generates a series of BEGIN-INSERT(x10000)-COMMIT statements that dump to a series of text files. Of course, you would need a FoxPro development environment for this, so this may not apply to your situation...
Visual FoxPro does not support LIMIT directly.
I used the following query to get over the limitation:
SELECT TOP 100 * from PEOPLE WHERE RECNO() > 1000 ORDER BY ID;
where 100 is the limit and 1000 is the offset.
It is very easy to get around LIMIT clause using TOP clause ; if you want to extract from record _start to record _finish from a file named _test, you can do :
[VFP]
** assuming _start <= _finish, if not you get a top clause error
*
_finish = MIN(RECCOUNT('_test'),_finish)
*
SELECT * FROM (SELECT TOP (_finish - _start + 1) * FROM (SELECT TOP _finish *, RECNO() AS _tempo FROM _test ORDER BY _tempo) xx ORDER BY _tempo DESC) yy ORDER BY _tempo
**
[/VFP]
I had to convert a Foxpro database to Mysql a few years ago. What I did to solve this was add an auto-incrementing id column to the Foxpro table and use that as the row reference.
So then you could do something like.
select name, address from people where id >= 5 and id <= 10;
The Foxpro sql documentation does not show anything similar to limit.
Here, adapt this to your tables. Took me like 2 mins, i do this waaaay too often.
N1 - group by whatever, and make sure you got a max(id), you can use recno() to make one, sorted correctly
N2 - Joins N1 where the ID = Max Id of N1, display the field you want from N2
Then if you want to join to other tables, put that all in brackets and give it an alias and include it in a join.
Select N1.reference, N1.OrderNoteCount, N2.notes_desc LastNote
FROM
(select reference, count(reference) OrderNoteCount, Max(notes_key) MaxNoteId
from custnote
where reference != ''
Group by reference
) N1
JOIN
(
select reference, count(reference) OrderNoteCount, notes_key, notes_desc
from custnote
where reference != ''
Group by reference, notes_key, notes_desc
) N2 ON N1.MaxNoteId = N2.notes_key
To expand on Eyvind's answer I would create a program to uses the RecNo() function to pull records within a given range, say 10,000 records.
You could then programmatically cycle through the large table in chucks of 10,000 records at a time and preform your data load into you MySQL database.
By using the RecNO() function you can be certain not to insert rows more than once, and be able to restart at a know point in the data load process. That by it's self can be very handy in the event you need to stop and restart the load process.
Depending on the number of the returned rows and if you are using .NET Framework you can offset/limit the gotten DataTable on the following way:
dataTable = dataTable.AsEnumerable().Skip(offset).Take(limit).CopyToDataTable();
Remember to add the Assembly System.Data.DataSetExtensions.

How to limit result set size for arbitrary query in Ingres?

In Oracle, the number of rows returned in an arbitrary query can be limited by filtering on the "virtual" rownum column. Consider the following example, which will return, at most, 10 rows.
SELECT * FROM all_tables WHERE rownum <= 10
Is there a simple, generic way to do something similar in Ingres?
Blatantly changing my answer. "Limit 10" works for MySql and others, Ingres uses
Select First 10 * from myTable
Ref
select * from myTable limit 10 does not work.
Have discovered one possible solution:
TIDs are "tuple identifiers" or row addresses. The TID contains the
page number and the index of the offset to the row relative to the
page boundary. TIDs are presently implemented as 4-byte integers.
The TID uniquely identifies each row in a table. Every row has a
TID. The high-order 23 bits of the TID are the page number of the page
in which the row occurs. The TID can be addressed in SQL by the name
`tid.'
So you can limit the number of rows coming back using something like:
select * from SomeTable where tid < 2048
The method is somewhat inexact in the number of rows it returns. It's fine for my requirement though because I just want to limit rows coming back from a very large result set to speed up testing.
Hey Craig. I'm sorry, I made a Ninja Edit.
No, Limit 10 does not work, I was mistaken in thinking it was standard SQL supported by everyone. Ingres uses (according to doc) "First" to solve the issue.
Hey Ninja editor from Stockholm! No worries, have confirmed that "first X" works well and a much nicer solution than I came up with. Thankyou!