BigQuery cacheHit property - google-bigquery

BigQuery cacheHit property - google-bigquery

I'm using the BigQuery API to run a query with the following code:
query = (
'SELEC ...'
)
# API request - starts the query
query_job = client.query(
query,
location='US'
)
results = query_job.result()
The query works and outputs expected results.
However, I am not able to verify use of the cache.
Docs:
If you are using the BigQuery API, the cacheHit property in the query
result is set to true.
I am trying to access results.cacheHit, but it does work out.
AttributeError: 'RowIterator' object has no attribute 'cacheHit'
What am I doing wrong? How can I see the use of cache with my query?

The quote you are using from docs refers to the REST API (cacheHit is in the response of the getQueryResults method).
What you need instead is query_job.cache_hit

Related

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA

It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

API parameters - filter with ARRAY_CONTAINS (cosmos db back end)

I have an API I am pinging which queries a cosmos db to return records.
I can filter on a simple string in my api call like so:
// return objects where '_Subject' field equals "filterTest"
string getUrl = $"...baseApiPath/?$filter=_Subject+eq+'filterTest'";
This is working perfectly.
But I cannot figure out the filter syntax to make my API query be based on ARRAY_CONTAINS.
// return objects where '_Attachments' field CONTAINS "945afd138aasdf545a2d1";
How would I do that? Is there a general reference for API filter syntax somewhere?

If you're asking about how to query, a query against a property with an array of values looks like this:
SELECT * FROM c WHERE ARRAY_CONTAINS(c._Attachments, "945afd138aasdf545a2d1")
Another example in this answer.

Can Karate generate multiple query parameters with the same name?

I need to pass multiple query parameters with the same name in a URL, but I am having problems getting it to work with Karate. In my case, the URL should look like this:
http://mytestapi.com/v1/orders?sort=order.orderNumber&sort=order.customer.name,DESC
Notice 2 query parameters named "sort". I attempted to create these query string parameters with Karate, but only the last "sort" parameter gets created in the query string. Here are the ways I tried to do this:
Given path 'v1/orders'
And param sort = 'order.orderNumber'
And param sort = 'order.customer.name,DESC'
And header Authorization = authInfo.token
And method get
Then status 200
And:
Given path 'v1/orders'
And params sort = { sort: 'order.orderNumber', sort: 'order.customer.name,DESC' }
And header Authorization = authInfo.token
And method get
Then status 200
And:
Given path 'v1/order?sort=order.orderNumber&sort=order.customer.name,DESC'
And header Authorization = authInfo.token
And method get
Then status 200
The first two ways provide the same query string result: ?sort=order.customer.name%2CDESC
The last example does not work because the ? get encoded, which was expected and explained in this post - Karate API Tests - Escaping '?' in the url in a feature file
It's clear that the second "sort" param is overriding the first and only one parameter is being added to the URL. I have gone through the Karate documentation, which is very good, but I have not found a way to add multiple parameters with the same name.
So, is there a way in Karate to set multiple URL query parameters with the same name?

Yes you can generate multiple query parameters with the same name in karate
All values of similar key should be provided in an array.
Given path 'v1/orders'
And params {"sort":["order.orderNumber","order.customer.name,DESC"]}
And header Authorization = authInfo.token
And method get
Then status 200
And for setting single parameter using param it will be like
And param sort = ["order.orderNumber","order.customer.name,DESC"]

BigQuery synchronous query is not returning any results

According to the BigQuery documentation listed at https://cloud.google.com/bigquery/querying-data#asynchronous-queries:
There are two types of querying via the BigQuery API. Synchronous and Asynchronous. Async works perfectly for me using the sample code provided, however synchronous does not.
The sample code I am referring to is shown if you click on the link above. What I noticed is that it does not actually wait until the results are available. If I insert a time.sleep(15) before the while True, my results return as expected. If not, then the it returns an empty result set.
The official documentation example uses the query:
"""SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = #corpus
AND word_count >= #min_word_count
ORDER BY word_count DESC;
"""
This query returns very quickly, however my query takes several seconds to return a result.
My question is, why does the documentation state that the run_sync_query command waits until the query completes, if the results are not actually accessible and no results are returned?
I cannot provide the query I used because it is a private data source. To produce, you just need a query that takes several seconds to run.

Looks like the request/call is timing out, not the query itself. The default time is 10s. Try setting timeout_ms in your code:
For example (I'm going to assume you are using Python):
..[auth/client setup stuff]..
query = client.run_sync_query('<your_query>')
query.timeout_ms = 60000 #set the request timeout
query.use_legacy_sql = False
query.use_query_cache = True
query.run()
..[do something with the results]..

ReleaseCumulativeFlowData and CardState

I'm trying to run a query against the ReleaseCumulativeFlowData object as follows:
((ReleaseObjectID = 12345) AND CardState="Accepted")
However, running the query results in the following error message:
OperationResultError
Could not read: could not read all instances of class
com.f4tech.slm.domain.reporting.ReleaseCumulativeFlowDataSet
Is this a bug in Rally?

WSAPI is very picky about the structure of the query. You have to include parentheses around chained query filters, so you would need something like the following:
((ReleaseObjectID = 12345) AND (CardState = "Accepted"))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery cacheHit property - google-bigquery

The quote you are using from docs refers to the REST API (cacheHit is in the response of the getQueryResults method). What you need instead is query_job.cache_hit

Related

Is there a way to execute text gremlin query with PartitionStrategy

API parameters - filter with ARRAY_CONTAINS (cosmos db back end)

Can Karate generate multiple query parameters with the same name?

BigQuery synchronous query is not returning any results

ReleaseCumulativeFlowData and CardState

Categories

Resources