google big query limit clause returning too many rows - google-bigquery

In big query I am running a query on exported tables from GA.
I can not seem to get big query to limit the results. Here is a sample query, quite basic.
SELECT * FROM [1111111.ga_sessions_20140318] LIMIT 20000
The result set returns but with 7 million+ rows! I have tried this several different ways, ie. out to a table, just return result set, use cache results, don't use cached results, etc.
No matter which table I try to query it always returns the entire table.
This is basically the same as the sample query big query gives when clicking on the query table button except I changed the limit value from 1000 to 20000.
Anyone have any insight?

As noted by the comment on the original question:
"Is it possible that the number of rows shown at the bottom of the
result set returned in big query is my 20000 main object records plus
all the nested records?"
The answer is yes: BigQuery will apply the limit to the number of rows in the response, but if there are nested records involved, those will be flattened in the output.

Related

using SQL COUNT function or executing search query directly which is more efficient

Let's say i have a very big database , if i execute a search query directly then count the returned rows would it be more faster ? Or using COUNT(searchquery) then start executing query like ->
SELECT *
FROM TABLE
WHERE bla='blabla'
OFFSET 0 FETCH NEXT 20 ROWS ONLY
I searched for it but i couldn't find any solution.
Do the count in the database! It will be much faster.
First, a count(*) only returns one row and one value. That is much, much less data -- and much faster -- than returning all the rows.
Second, a count(*) does not reference any columns in the select, so the query can be better optimized. It might be possible to get the count without ever looking at the data pages.
It looks like you are doing paging. You need the total count to do display the total count and calculate the total number of pages to the user, yes?
Than Gordon's answer is the one to use.

Different result size between SELECT * and SELECT COUNT(*) on Oracle

I have an strange behavior on an oracle database. We make a huge insert of around 3.1 million records. Everything fine so far.
Shortly after the insert finished (around 1 too 10 minutes) I execute two statements.
SELECT COUNT(*) FROM TABLE
SELECT * FROM TABLE
The result from the first statement is fine it gives me the exact number of rows that was inserted.
The result from the second statement is now the problem. Depending on the time, the number of rows that are returned is for example around 500K lower than the result from the first statement. The difference of the two results is decreasing with time.
So I have to wait 15 to 30 minutes before both statements return the same number of rows.
I already talked with the oracle dba about this issue but he has no idea how this could happen.
Any ideas, questions or suggestions?
Update
When I select only an index column I get the correct row count.
When I instead select an non index column I get again the wrong row count.
That doesn't sounds like a bug to me, if I understood you correctly, it just takes time for Oracle to fetch the entire table . After all, 3 Mil is not a small amount.
As opposed to count, which brings 1 record with the total number of rows.
If after some waiting, the number of records being output equals to the number that the count query returns, then everything is fine.
Have you already verified with these things:
1- Count single column instead of * ALL to verify both result
2- You can verify both queries result by adding where clause and gradually select more rows by removing conditions so that you can get the issue where it is returning different value from both.
I think you should check Execution plan to identify missing indexes to improve performance.
Add missing Indexes and check the result.
Why missing Indexes are impotent:
To count row, Oracle engine no need to go throw paging operation. But while fetching all the details from a table, it requires to go through paging.
And paging process depends on indexes created on a table to fetch the data effectively and fast.
So to decrease time for your second statement, you should find missing indexes and create those indexes.
How to Find Missing Indexes:
You can start with DBA_HIST_ACTIVE_SESS_HISTORY, and look at all statements that contain that type of hint.
From there, you can pull the index name coming from that hint, and then do a lookup on dba_indexes to see if the index exists, is valid, etc.

How do I avoid multiple query for Pagination count and data retreival?

I have a scenario where I get a count and then pass the count as a variable to a similar query to get the paginated records. So basically I am doing a full query to get all the count by internally creating the full table and then using that count to display the same table with 10 per page. What solutions do I have to avoid this sort of multiple query?
Something like this is a Pseudo language .
select count {big table}
select big table where records are between count and count+10
Is there a sensible way to get the COUNT variable in the same query?
I am wondering how would Google handle a search, would it first find all the records or just fetch the records without tracking the no: of pages? Page numbers can't be computed prior as it is dependent on the variable sent by the user.
Edit: I have a similar question here https://dba.stackexchange.com/questions/161586/including-count-of-a-result-in-the-main-query
Regarding Google, they are likely to generate only the requested amount of results (like 10) and to estimate the count. The estimated count is very imprecise.
You can't have SQL Server count all results and get only a subset of them. There 3 strategies to deal with this:
execute a counting and a data query
execute an unlimited data query and discard all but ten results on the client
execute an unlimited data query into a temp-table whose primary key is the row number. You can then count instantly (get the last row) and select any subset by rownumber with a single seek
Counting the data can be significantly cheaper because SQL Server can use different indexes or discard joins.

How to predict result set row count?

I have an application where I create a big SQL query dynamically for SQL server 2008. This query is based on various search criteria which the user might give such as search by lastname, firstname, ssn etc.
The requirement is that if the user gives a condition due to which the formed query might return a lot of rows(configurable for max N rows), then the application must send back a message instead to the user saying that he needs to refine his search query as the existing query will return too many rows.
I would not want to bring back say, 5000 rows to the client and then discard that data just to show the user an error. What is an efficient way to tackle this issue?
why not just show the first N rows, AND the message? limit the rows returned to N+1 and if the count of returned rows is > N then show the message :)
if you just want to check how many rows WOULD be returned by a query then select count(id) (or some column name) instead of select *

Does mysql_num_rows recount all the rows, or does it just grab a total after a select statement?

I wasn't sure if it recounted the rows, or if after it retrieved the whole result set, it only grabbed the total post query?
when you run a statement, mysql sends the row count in the header. so no, the query is not re-run to get the count.
this has an interesting implication for queries with LIMIT. mysql_num_rows() returns the number of rows returned after LIMIT is applied. if you use the SQL_CALC_FOUND_ROWS keyword in your SELECT statement, then mysql_num_rows() returns the number of rows that would have been returned if LIMIT were not used. this is helpful for paging.
Quoting the manual literally, it “retrieves the number of rows from a result set”. It obviously doesn't run a second query if that's your question (not sure what you mean by “recount all the rows”).