Getting random rows in Big Query with different limit? - google-bigquery

I have the following queries :
SELECT * FROM `datafusiontest-2897325.mergedquery.test_table LIMIT 10
SELECT * FROM `datafusiontest-2897325.mergedquery.test_table LIMIT 100
SELECT * FROM `datafusiontest-2897325.mergedquery.test_table LIMIT 10000
I am getting a different top result for each query.

As your query is not specifying an order, it is normal for results to be different each time - they are returning random rows from your table which meet the qualifying criteria.
To get the same top n returned you should add an ORDER BY clause, for example:
SELECT *
FROM `datafusiontest-2897325.mergedquery.test_table`
ORDER BY date
LIMIT 10

Related

How to limit the return results using MDX query

I am using MDX query to limit the huge results from my query using the following but it doesn't work. My intention is to limit the result to 10 only to reduce the loads
SELECT {[Measures].[activityduration]} ON COLUMNS,
{([rig], 10)} ON ROWS
FROM activityhours
The error says:
The following is not a valid MDX query: No function matches signature '(<Dimension>, <Numeric Expression>)'
Can anybody helps?
If you mean limit to 10 rows of rig(since you talked about huge number of records), then below should help:
SELECT {[Measures].[activityduration]} ON COLUMNS,
TOPCOUNT([rig], 10) ON ROWS
FROM activityhours
It would be first 10 values from [rig] in natural order.
What does "limit the result to 10" mean exactly? Limit to 10 rows, or do you want to filter [rig] somehow based on 10?
I've guessed you just want members of [rig] where [Measures].[activityduration] is equal to 10:
SELECT {[Measures].[activityduration]} ON COLUMNS,
[rig]
HAVING Measures].[activityduration] = 10
ON ROWS
FROM activityhours;

SQL COUNT - greater than some number without having to get the exact count?

There's a thread at https://github.com/amatsuda/kaminari/issues/545 talking about a problem with a Ruby pagination gem when it encounters large tables.
When the number of records is large, the pagination will display something like:
[1][2][3][4][5][6][7][8][9][10][...][end]
This can incur performance penalties when the number of records is huge, because getting an exact count of, say, 50M+ records will take time. However, all that's needed to know in this case is that the count is greater than the number of pages to show * number of records per page.
Is there a faster SQL operation than getting the exact COUNT, which would merely assert that the COUNT is greater than some value x?
You could try with
SQL Server:
SELECT COUNT(*) FROM (SELECT TOP 1000 * FROM MyTable) X
MySQL:
SELECT COUNT(*) FROM (SELECT * FROM MyTable LIMIT 1000) X
With a little luck, the SQL Server/MySQL will optimize this query. Clearly instead of 1000 you should put the maximum number of pages you want * the number of rows per page.

subquery using DISTINCT and LIMIT

In SQLite, when I do
SELECT DISTINCT idvar
FROM myTable
LIMIT 100
OFFSET 0;
the data returned are 100 rows with (the first) 100 distinct values of idvar in myTable. That's exaclty what I expected.
Now, when I do
SELECT *
FROM myTable
WHERE idvar IN (SELECT DISTINCT idvar
FROM myTable
LIMIT 100
OFFSET 0);
I would expect to have all the data from myTable corresponding to those 100 distinct values of idvar (so potentially the data returned would have more than 100 rows if there is more than one row of each idvar). What I get however is all the data for whatever many distinct values of idvar that return more or less 100 rows. I don't understand why.
Thoughts? How should I build a query that returns what I expected?
Context
I have a 50GB table, and I need to do some calculations using R. Since I can't possibly load that much data into R for memory reasons, I want to work in chuncks. It is important however that each chunck contains all the rows for a given level of idvar. That's why I use OFFSET and LIMIT in the query, as well as trying to make sure that it returns all rows for levels of idvar.
I'm not sure about SQLite, but in other SQL variants the result of un-ordered LIMIT query is not guaranteed to return the same result every time. So you should also include ORDER BY in there.
But a better idea may be to do a separate query at the beginning to read all of the distinct IDs into R. And then split those into batches of 100 and then to a separate query for each batch. Should be clearer and faster and easier to debug.
Edit: example R code. Lets say you have 100k distinct IDs in variable ids.
for (i in 1:1000) {
tmp.ids <- ids[((i - 1) * 100 + 1) : (i * 100)]
query <- paste0("SELECT * FROM myTable WHERE idvar IN (",
paste0(tmp.ids, collapse = ", "),
")")
dbSendquery(con, query)
fetch results, etc..
}

query a query result , return only TOP 100 rows

I start a query on an MSSQL Server with the following statement
aquery.sql.text := ' select * from Mytable where <XXXXXXXXXXXXXXXXXXX>';
aquery.open;
repeat
//........
until aquery.EOF
MyTable has 4 additional col#s : x1,x2,y1,y2;
I want to get from above SQL Statement only the top 100 row's, but they should be rows with Maximum Area, which means (area = abs(x1-x0) * abs(y1-y0);
What is the best solution to get only 100 records with a Maximum area, but limited with the given Basic SQL Statement ?
SELECT TOP 100 *
FROM Mytable
ORDER BY Abs(x1-x0)*Abs(y1-y0) DESC

How to use LIMIT in query in access, but not TOP

I would like to display 15 rows from queries but not first 15?
SELECT Abgänge.Vorgang, Abgänge.Date_SW_Drucken FROM Abgänge
WHERE Abgänge.Bezahlung = "Bar" LIMIT 34,15;
How to transform this to access 2010?
Limit:
LIMIT from_record-1, count_record
You can't, because there is no support for an offset in the Microsoft Access SELECT syntax. An option is to use TOP with offset + limit and skip offset rows manually. BTW: Using TOP or LIMIT without an ORDER BY is not advisable as it can lead to inconsistent results.
You could also combine two queries with TOP, first selecting limit + offset, and then selecting only offset, for example
SELECT TOP 15 ...
FROM (
SELECT TOP 49 ....
FROM sometable
ORDER BY somecolumn ASC
) a
ORDER BY somecolumn DESC
The only problem with this solution is that if there are less than 49 results from the subquery, then the offset will be less than 34.
If you need the result in a different order then you may need to add an additional 'layer' that applies that order.