Difference with queries regarding BFS - cypher

Could someone give me some hints about this? This query works:
match path=(n)-[*bfs..10]->(m)
return path limit 1;
however I get a memory limit exceed exception with this one:
match path=(n)-[*bfs..1000]->(m)
return path limit 1;
Shouldn't the second query result in the same path as in the first query? Considering the same dataset?

It should result in the same path, but it's probably not strange that it results in a memory limit exception. Depending on the size and type of your graph, having a limit of 1000 for the path length could eat up a massive amount of RAM as the number of paths generated rises considerably by raising the path length.

Related

Bigquery maximum processing data size allowance?

My question is how much data are we allowed to process on bigquery. I am using stackoverflow's kaggle dataset to analyze the data, and the text I am analyzing is around 27gb. I just want to get the average length per entry, so I do
query_length_text = """
SELECT
AVG(CHAR_LENGTH(title)) AS avg_title_length,
AVG(CHAR_LENGTH(body)) AS avg_body_length
FROM
`bigquery-public-data.stackoverflow.stackoverflow_posts`
"""
however this says:
Query cancelled; estimated size of 26.847077486105263 exceeds limit of 1 GB
I am only returning one float, so i know that isn't the problem. Is the 1gb on the processing too? How do I process it in batches, so I do 1gb at a time?
So Kaggle by default sets a 1GB limit on requests (to prevent your monthly quota of 5TB to run out). This is what causes this to happen. To prevent this, you can override it by using the max_gb_scanned parameter like this:
df = bq_assistant.query_to_pandas_safe(QUERY, max_gb_scanned = N)
where N is the amount of data processed by your query, or any number higher than it.

ARRAY_AGG leading to OOM

I'm trying to run a pretty simple query but it's failing with an Resources exceeded error.
I read in another post that the heuristic used to allocate the number of mixers could fail from time to time.
SELECT
response.auctionId,
response.scenarioId,
ARRAY_AGG(response) AS responses
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
Is there a way to fix my query knowing that:
a response is composed of 38 fields (most of them being short strings)
the max(count()) of a single response is kind of low (165)
Query Failed
Error: Resources exceeded during query execution.
Job ID: teads-1307:bquijob_257ce97b_1566a6a3f27
It's a current limitation that arrays (produced by ARRAY_AGG or other means) must fit in the memory of a single machine. We've made a couple of recent improvements that should help to reduce the resources required for queries such as this, however. To confirm whether this is the issue, you could try a query such as:
SELECT
SUM(LENGTH(FORMAT("%t", response))) AS total_response_size
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
ORDER BY total_response_size DESC LIMIT 1;
This formats the structs as strings as a rough heuristic of how much memory they would take to represent. If the result is very large, then perhaps we can restructure the query to use less memory. If the result is not very large, then some other issue is at play, and we'll look into getting it fixed :) Thanks!

Google BigQuery unable to process larger result set getting "Response too large to return" or "Resources exceeded during query execution"

I am currently working with large table (~105M Records) in C# application.
When query the table with 'Order by' or 'Order Each by' clause, then i am getting "Resources exceeded during query execution" error.
If i remove 'Order by' or 'Order Each by' clause, then i am getting Response too large to return error.
Here is the sample query for two scenarios (I am using Wikipedia public table)
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title
Here are the questions i have
What is the maximum size of Big Query Response?
How do we select all the records in Query Request not in 'Export Method'?
1. What is the maximum size of Big Query Response?
As it's mentioned on Quota-policy queries maximum response size: 10 GB compressed (unlimited when returning large query results)
2. How do we select all the records in Query Request not in 'Export Method'?
If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.
Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:
You must specify a destination table.
You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
Window functions can return large query results only if used in conjunction with a PARTITION BY clause.
Read more about how to paginate to get the results here and also read from the BigQuery Analytics book, the pages that start with page 200, where it is explained how Jobs::getQueryResults is working together with the maxResults parameter and int's blocking mode.
Update:
Query Result Size Limitations - Sometimes, it is hard to know what 10 GB of compressed
data means.
When you run a normal query in BigQuery, the response size is limited to 10 GB
of compressed data. Sometimes, it is hard to know what 10 GB of compressed
data means. Does it get compressed 2x? 10x? The results are compressed within
their respective columns, which means the compression ratio tends to be very
good. For example, if you have one column that is the name of a country, there
will likely be only a few different values. When you have only a few distinct
values, this means that there isn’t a lot of unique information, and the column
will generally compress well. If you return encrypted blobs of data, they will
likely not compress well because they will be mostly random. (This is explained on the book linked above on page 220)

What is the maximum permitted response data size?

In the API Docs section Browsing Table Data, there is a reference to the "permitted response data size"; however, that link is dead. Experimentation revealed that requests with maxResults=50000 are usually successful, but as I near maxResults=100000 I begin to get errors from the BigQuery server.
This is happening while I page through a large table (or set of query results), so after each page is received, I request the next one; it thus doesn't matter to me what the page size is, but it does affect the communication with BigQuery.
What is the optimal value for this parameter?
Here is some explanations: https://developers.google.com/bigquery/docs/reference/v2/jobs/query?hl=en
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
To sum up: max size is 10MB, no row count limit.
You can choose value of maxResult parameter based on your usage of app.
If you want show data on the report, then you need to set low value for fast showing first page.
If you need to load data to other app, then you can use max possible value (record size * row count < 10MB).
As you say, you manually set maxResults = 100000 to page through result set, it will get errors from BigQuery server. What errors you will get? Could you paste the error message?

SQL Finding the size of query result

So basically I'm doing a SQL select query, but I want to know how much data I am pulling back (how many kilobytes), any way?
Actually, "Show Client Statistics" within SSMS query Editor Window will return the resultset size, Bytes Received from Server, etc
SELECT <your query here>
INTO dbo.MyTempTable
FROM <query source>
exec sp_spaceused 'MyTempTable'
DROP TABLE MyTempTable
This wlil Return Rows, Reserved Space, Data space (in KB), Index space, and unused space for that table.
You can include the actual execution plan of the query in the Results window of SSMS, which will display an estimated row size for the results. Multiply that by the number of rows to get your result. Not sure how accurate the estimated row size is, though.
You can use sp_spaceused to get the size of a table. But I am unaware of any way to get the size of a query (of course that means little).
One way to get a quick estimate of the size would be to save the data as a text file. Obviously, there will be extra whitespace. But it would give you a general idea on how large the query is.