When I run my big query with limit 5 it returns more rows. This happens whenever i put limit greater than 3. For every limit it returns more rows.
this can happen, if you are using BigQuery Legacy SQL and querying table with repeated field for example as below
#legacySQL
SELECT request.parameters.*
FROM [yourProject:yourDataset.yourTable]
LIMIT 3
What it does? It limits number of output rows to 3, but than it outputs all repeated values in those three rows thus you see more than 3 :o)
The way to check this is to run below and confirm that real number of rows output is actually 3
#legacySQL
SELECT COUNT(1) FROM (
SELECT request.parameters.*
FROM [yourProject:yourDataset.yourTable]
LIMIT 3
)
The best way to deal with this is to migrate to using BigQuery Standard SQL
Related
I have a table called COMPUTED_DATA. It contains 3 columns: data, cluster, and last_execution.
There is job which runs every 2 weeks which inserts multiple data for a cluster and its last_execution time.
My requirement is to get the data for a cluster for its most recent last_execution time. Currently I have written query like this in my code
last_execution = SELECT distinct(last_execution) FROM COMPUTED_DATA
WHERE cluster=1204 AND ORDER BY last_execution DESC limit 1
The above query gets me the most recent last_execution
data = SELECT data FROM COMPUTED_DATA WHERE cluster=1204 AND
last_execution={last_execution}
This query uses that last_execution to get the data.
My question is can this be combined into just 1 query. I am running this in my spark cluster so each SQL query is very time expensive.
Hence I want to combine this into one query. Please help.
EDIT: the second query where I am getting data from returns multiple rows. Hence I cannot use limit on the second query because that count is unknown.
Yes it can
SELECT
data
FROM
COMPUTED_DATA
WHERE
cluster = 1204 and last_execution=(SELECT distinct(last_execution) FROM COMPUTED_DATA
WHERE cluster=1204 AND ORDER BY last_execution DESC limit 1)
This isn't the most beautiful way to write this, but you get idea how to use subquery in where clause.
I am using sql server 2012. I have one table, contains 1.7 million records. I am just selecting all record using (select * from table_name). But it takes 1 Hour 2 minutes to fetch.
What should i do to fetch records quickly?
All you can do is limit your Result by Using
Top(100) Result or Where Clause Your Desired Data instead of
SELECT * FROM table
This will help you get only concerning and limited data in a less amount of time.
Another thing you can do is to Get Concerning columns only which gives you desired results.
This will significantly enhance the fetching time.
I've been trying to download the m-lab dataset from big query recently. There seems to be a limit that we can only query and get as much as around 1 million rows with one query. The m-lab dataset contains multiple billion records in many tables. I'd love to use queries like bq query --destination_table=mydataset.table1 "select * from (select ROW_NUMBER() OVER() row_number, * from (select * from [measurement-lab:m_lab.2013_03] limit 10000000)) where row_number between 2000001 and 3000000;" but it didn't work. Is there a workaround to make it work? Thanks a lot!
If you're trying to download a large table (like the m-lab table), your best option is to use an extract job. For example, run
bq extract 'mlab-project:datasset.table' 'gs://bucket/foo*'
Which will extract the table to the google cloud storage objects gs://bucket/foo000000000.csv, gs://bucket/foo0000000001.csv, etc. The default extracts as CSV, but you can pass `--destination_format=NEWLINE_DELIMITED_JSON to extract the table as json.
The other thing to mention is that you can read the 1 millionth row in bigquery using the tabledata list api to read from that particular offset (no query required!).
bq head -n 1000 -s 1000000 'm-lab-project:dataset.table'
will read 1000 rows starting at the 1000000th row.
I am querying my table to achieve pagination but I do not know the total number of rows in the table.
select name from table where id = 1 limit 0, 10
Is there a way to find out the total number of rows that would have returned if I had not used limit clause without querying for total count.
SQLite computes results on the fly when they are actually needed.
The only way to get the total count is to run the actual query (or better, SELECT COUNT(*)) without the LIMIT.
Depends on which back end technology you are using. In PHP, mysql_num_rows() returns number of rows without actually fetching the data.
Is there any practical limit of number of rows a select statement can fetch from a database- any database?
Assume, I am running a query SELECT * FROM TableName and that table has more than 12,00,000 rows.
How many rows it can fetch? Is there a limit for this?
12000000 is not at all a big number I have worked with way bigger result sets. As long as your memory can fit the result you should have no problems.