How to get referenced tables in query in Big Query - google-bigquery

In the class QueryJob on Big Query there is an attribute called referenced_tables.
The description of the attribute says that it returns a List[Dict] "describing the query plan, or an empty list if the query has not yet completed.".
The thing is when I try to use the attribute I always get an empty list as result, even in queries that reference only one table and process relative small amout of data, 11.27 KB for example.
What I found most curious is that when I check if the query has been completed, using the method done, it returns True, even if I use the to_dataframe() method it works.
Does any know how can I solve this problem?
I want to list all the tables referenced in the query.

Related

Google Bigquery, WHERE clause based on JSON item

I've got a bigquery import from a firestore database where I want to query on a particular field from a document. This was populated via the firestore-bigquery extension and the document data is stored as a JSON string.
I'm trying to use a WHERE clause in my query that uses one of the fields from the JSON data. However this doesn't seem to work.
My query is as follows:
SELECT json_extract(data,'$.title') as title,p
FROM `table`
left join unnest(json_extract_array(data, '$.tags')) as p
where json_extract(data,'$.title') = 'technology'
data is the JSON object and title is an attribute of all of the items. The above query will run but yield 'no results' (There are definitely results there for the title in question as they appear in the table preview).
I've tried using WHERE title = 'technology' as well but this returns an error that title is an unrecognized field (hence the json_extract).
From my research this should work as a standard SQL JSON query but doesn't seem to work on Bigquery. Does anyone know of a way around this?
All I can think of is if I put the results in another table, but I don't know if that's a workable solution as the data is updated via the extension on an update, so I would need to constantly refresh my second table as well.
Edit
I'm wondering if configuring a view would help with this? Though ultimately I would like to query this based on different parameters and the docs here https://cloud.google.com/bigquery/docs/views suggest you can't reference query parameters in a view
I've since managed to work this out, and will share the solution for anyone else with the same problem.
The solution was to use JSON_VALUE in the WHERE clause instead e.g:
where JSON_VALUE(data,'$.title') = 'technology';
I'm still not sure if this is the best way to do this in terms of performance and cost so I will wait to see if anyone else leaves a better answer.

Updating SQL from object with groovy

When you read in a result set in Groovy it comes in a collection of maps.
Seems like you should be able to update values inside those maps and write them back out, but I can't find anything built into groovy to allow me to do so.
I'm considering writing a routine that allows me to write a modified map by iterating over the fields of one of the result objects, taking each key/value pair and using them to create the appropriate update statement, but it could be annoying so I was wondering if anyone else had done this or if it'sa vailable already in groovy.
It seems like just a few lines of code so I'd rather not bring in hibernate for this. I'm just thinking a little "update" method that would allow:
def rows=sql.rows(query)
rows[0].name="newName"
update(sql, rows[0])
to update the first guy's name in the database. Anyone seen/created such a monster, or is something like this already built into Groovy Sql and I'm just missing it?
(I suppose you may have to point out to the update method which field is the key field, but that's doable...)
Using the rows method will actually read out all of the values into a List of GroovyRowResult so it's not really possible to update the data without creating an update method like the one you mention.
It's not really possible to do that in the generic case because your query can contain joins or a column reference that is an aggregate, etc.
If you're selecting from a single table use the Sql.eachRow method however and set the ResultSet to be an updatable one, you can use the underlying ResultSet interface to update as you iterate through:
sql.resultSetConcurrency = ResultSet.CONCUR_UPDATABLE
sql.resultSetType = ResultSet.TYPE_FORWARD_ONLY
sql.eachRow(query) { row ->
row.updateString('name', 'newName')
row.updateRow()
}
Depending on the database/driver you use, you may not be able to create an updatable ResultSet.

API: Issue with exact match in deep queries

I'm querying for test results which are associated with any test set that has a particular tag.
However, this query does not work:
(TestSet.Tags.Name = "foo")
What does work is:
(TestSet.Tags.Name contains "foo")
I would think the first query should work if the second one returns matches with the tag "foo". I presume this is a bug?
I can get around this problem by using the second query, but of course the problem is that this can match a tag named "foo2" as well, so my query can have extra results (potentially many more) and I have to filter them out. Additionally, now I need to have my query fetch the "Tags" as well, so every result I get back is larger because of it.
Yes, as user1195996 suggested this feels like a bug. Your same queries work as expected against defect or user stories. Please work with Rally Support on this issue so we can work to correct it.

MDX CurrentMember evaluation on Calculated Members - Weird Result

I'm using SQL Server Analysis Services.
I have a calculated member that, for now, just does this:
[MyDimension].[MyOnlyHierarchy].CurrentMember.Properties("MEMBER_UNIQUE_NAME")
Previously, I had just written [MyDimension].[MyOnlyHierarchy].CurrentMember.UniqueName. They should be the same anyway.
Now, I used SQL Profiler to get ahold of the query my application issues. For a simple calculated member in [MyDimension].[MyOnlyHierarchy] that just sums to different members, say with IDs 401 and 402, I get this result:
[MyDimension].[MyOnlyHierarchy].&[401][MyDimension].[MyOnlyHierarchy].&[402]
In other words, it is as if AS evaluates the underlying members and concatenates the results, rather than giving me the unique name of the calculated member...
The REALLY strange thing to me is that when I take the original query, and prepend the following:
WITH MEMBER [Measures].[GiveMeCalculatedMemberUniqueName]
AS
(
[MyDimension].[MyOnlyHierarchy].CurrentMember.Properties("MEMBER_UNIQUE_NAME")
)
...rest of query
I get the CORRECT results using this second measure! The context is the same (to me at least). Everything is the same... Yet the measure declared in the project file gives a different result than this inline calculated member.
What's going on here? Note, I've redeployed 10000 times, and I've checked the actual definition in the cube on the server and everything. It just doesn't make sense to me.
Calculations are evaluated based on solve order. may be because you moved it down and it was how the solve order was supposed to work it gives you correct result . I have a little blog on solve order here but there are many more articles on internet.
HTH
Funny... I've been sitting for about 2-3 hours with this, exploring every thought; then, after I post this question I decide to try one more thing:
move the calulated member definition to the bottom of the calculated members script file in the project.
This now gives the correct result.

Read number of columns and their type from query result table (in C)

I use PostgreSQL database and C to connect to it. With a help from dyntest.pgc I can access to number of columns and their (SQL3) types from a result table of a query.
Problem is that when result table is empty, I can't fetch a row to get this data. Does anyone have a solution for this?
Query can be SELECT 1,2,3 - so, I think I can't use INFORMATION SCHEMA for this because there is no base table.
I'm not familiar with ecpg, but with libpq you should be able to call PQnfields to get the number of fields and then call various PQf* routines (like PQftype, PQfname) to get detailed info. Those functions take a PGResult, which you have even if there are no rows.
Problem is that when result table is empty, I can't fetch a row to get this data. Does anyone have a solution for this?
I am not sure to really get what you want, but it seems the answer is in the question. If the table is empty, there are no rows...
The only solution here seems you must wait a non empty result table, and then get the needed informations.