How to select a specific number of bytes from SQLite table? - sql

I have a simple SQLite table with 1 column where I'm selecting a random number of records:
SELECT * FROM vocabulary ORDER BY RANDOM() LIMIT 100;
Is there a way to select a specific number of bytes, instead of rows? Something along the lines of:
SELECT * FROM vocabulary ORDER BY RANDOM() LIMIT BYTES 1024;

You can't limit your select via the SQLite-engine to a specific number of bytes across rows. Note though that LIMIT simply stops reading when the limit is reached. You can do the same thing by keeping a count in your calling code and then stop reading the data once you've reached the number of bytes you want.
Precisely how will depend on what environment you're programming in.

Related

int64 overflow in sampling n number of rows (not %)

The below script is to randomly sample an approximate number of rows (50k).
SELECT *
FROM table
qualify rand() <= 50000 / count(*) over()
This has worked a handful of times before, hence, I was shocked to find this error this morning:
int64 overflow: 8475548256593033885 + 6301395400903259047
I have read this post. But as I am not summing, I don't think it is applicable.
The table in question has 267,606,559 rows.
Looking forward to any ideas. Thank you.
I believe counting is actually a sum the way BQ (and other databases) compute counts. You can see this by viewing the Execution Details/Graph (in the BQ UI). This is true even on a simple select count(*) from table query.
For your problem, consider something simpler like:
select *, rand() as my_rand
from table
order by my_rand
limit 50000
Also, if you know the rough size of your data or don't need exactly 50K, consider using the tablesample method:
select * from table
tablesample system (10 percent)

SQL COUNT - greater than some number without having to get the exact count?

There's a thread at https://github.com/amatsuda/kaminari/issues/545 talking about a problem with a Ruby pagination gem when it encounters large tables.
When the number of records is large, the pagination will display something like:
[1][2][3][4][5][6][7][8][9][10][...][end]
This can incur performance penalties when the number of records is huge, because getting an exact count of, say, 50M+ records will take time. However, all that's needed to know in this case is that the count is greater than the number of pages to show * number of records per page.
Is there a faster SQL operation than getting the exact COUNT, which would merely assert that the COUNT is greater than some value x?
You could try with
SQL Server:
SELECT COUNT(*) FROM (SELECT TOP 1000 * FROM MyTable) X
MySQL:
SELECT COUNT(*) FROM (SELECT * FROM MyTable LIMIT 1000) X
With a little luck, the SQL Server/MySQL will optimize this query. Clearly instead of 1000 you should put the maximum number of pages you want * the number of rows per page.

Objective c - SQLite selecting random row with another value

Basically I have a database of words,
This database contains a rowID(primary key), the word and word length as table columns.
I want to select a random row where length = x and get the word at that row.
This is for an iPhone game project and it is high priority that the queries are as fast as possible (the searches are made in a game).
For instance:
SELECT * FROM WordsDB WHERE >= (abs(random()) %% (SELECT max(rowid) FROM WordsDB)) LIMIT 1;
This query is really fast at selecting a random row a lot faster than ORDER BY RANDOM() LIMIT 1, however, if I add the word length to the query I get issues:
SELECT * FROM WordsDB WHERE length = 9 AND rowid >= (abs(random()) %% (SELECT max(rowid) FROM WordsDB)) LIMIT 1
Presumably because the random row will not always have a length of 9.
I was just wondering what would be the fastest / most efficient way of doing this.
Thanks for your time
Note: the 2 % symbols are because it is in objective c and the query is set as a string.
This one seems to work ok for me:
select * from WordsDB
where length = 9
limit (abs(random()) % (select count(rowid) from WordsDB
where length = 9)), 1;
note that length = 9 appears in both where clauses.
Add index on length if it appears to be slow.
Add an index to the WordsDB.length
create index if not exists WordsDBLengthIndex on WordsDB (length);
should make selection on this field much faster

How can I get a specific chunk of results?

Is it possible to retrieve a specific range of results? I know how to do TOP x but the result I will retrieve is WAY too big and will time out. I was hoping to be able to pick say the first 10,000 results then the next 10,000 and so on. Is this possible?
WITH Q AS (
SELECT ROW_NUMBER() OVER (ORDER BY ...some column) AS N, ...other columns
FROM ...some table
) SELECT * FROM Q WHERE N BETWEEN 1 AND 10000;
Read more about ROW_NUMBER() here: http://msdn.microsoft.com/en-us/library/ms186734.aspx
Practically all SQL DB implementations have a way of specifying the starting row to return, as well as the number of rows.
For example, in both mysql and postgres it looks like:
SELECT ...
ORDER BY something -- not required, but highly recommended
LIMIT 100 -- only get 100 rows
OFFSET 500; -- start at row 500
Note that normally you would include an ORDER BY to make sure your chunks are consistent
MS SQL Server (being a "pretend" DB) don't support OFFSET directly, but it can be coded using ROW_NUMBER() - see this SO post for more detail.

How to limit result set size for arbitrary query in Ingres?

In Oracle, the number of rows returned in an arbitrary query can be limited by filtering on the "virtual" rownum column. Consider the following example, which will return, at most, 10 rows.
SELECT * FROM all_tables WHERE rownum <= 10
Is there a simple, generic way to do something similar in Ingres?
Blatantly changing my answer. "Limit 10" works for MySql and others, Ingres uses
Select First 10 * from myTable
Ref
select * from myTable limit 10 does not work.
Have discovered one possible solution:
TIDs are "tuple identifiers" or row addresses. The TID contains the
page number and the index of the offset to the row relative to the
page boundary. TIDs are presently implemented as 4-byte integers.
The TID uniquely identifies each row in a table. Every row has a
TID. The high-order 23 bits of the TID are the page number of the page
in which the row occurs. The TID can be addressed in SQL by the name
`tid.'
So you can limit the number of rows coming back using something like:
select * from SomeTable where tid < 2048
The method is somewhat inexact in the number of rows it returns. It's fine for my requirement though because I just want to limit rows coming back from a very large result set to speed up testing.
Hey Craig. I'm sorry, I made a Ninja Edit.
No, Limit 10 does not work, I was mistaken in thinking it was standard SQL supported by everyone. Ingres uses (according to doc) "First" to solve the issue.
Hey Ninja editor from Stockholm! No worries, have confirmed that "first X" works well and a much nicer solution than I came up with. Thankyou!