How to query string inside a record in Google BigQuery? Docs not working - google-bigquery

I want to query a subset of a record in the bitcoin blockchain using the google bigquery database. I go here and click view dataset https://console.cloud.google.com/marketplace/details/bigquery-public-data/bitcoin-blockchain. Then, on the left sidebar, it seems you have to click the dropdown at 'bigquery-public-data', then click 'bitcoin_blockchain' then 'transactions'. Then on the right you have to click the button 'Query Table'. This is the only way I have found to select the table -- just copying and pasting the command below won't recreate the error.
Based on the table that appears following the above instructions, I noticed that outputs are arecord type. I would like to view only one string from inside the record. The string is called output_pubkey_base58.
So I read the docs, and the docs imply the command would be:
SELECT outputs.output_pubkey_base58 FROM `bigquery-public-data.bitcoin_blockchain.transactions` LIMIT 1000;
I get an error: Cannot access value on Array<Struct<output_satoshis ... .. I tried outputs[0].output_pubkey_base58, didn't work
The annoying thing is that this problem is in the same format as the first example, where they query the citiesLived.place parameter from inside the citiesLived record in the same kind of command. : https://cloud.google.com/bigquery/docs/legacy-nested-repeated

You need to unnest the array into a new variable.
SELECT o.output_pubkey_base58
FROM
`bigquery-public-data.bitcoin_blockchain.transactions`,
UNNEST (outputs) as o
LIMIT
1000

Feel the confusion here is about legacy SQL and standard SQL. UNNEST must be used in standard SQL as described in document: https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#differences_in_repeated_field_handling
Selecting nested repeated leaf fields
Using legacy SQL, you can "dot" into a nested repeated field without needing to consider where the repetition occurs. In standard SQL, attempting to "dot" into a nested repeated field results in an error.

Related

Google Bigquery, WHERE clause based on JSON item

I've got a bigquery import from a firestore database where I want to query on a particular field from a document. This was populated via the firestore-bigquery extension and the document data is stored as a JSON string.
I'm trying to use a WHERE clause in my query that uses one of the fields from the JSON data. However this doesn't seem to work.
My query is as follows:
SELECT json_extract(data,'$.title') as title,p
FROM `table`
left join unnest(json_extract_array(data, '$.tags')) as p
where json_extract(data,'$.title') = 'technology'
data is the JSON object and title is an attribute of all of the items. The above query will run but yield 'no results' (There are definitely results there for the title in question as they appear in the table preview).
I've tried using WHERE title = 'technology' as well but this returns an error that title is an unrecognized field (hence the json_extract).
From my research this should work as a standard SQL JSON query but doesn't seem to work on Bigquery. Does anyone know of a way around this?
All I can think of is if I put the results in another table, but I don't know if that's a workable solution as the data is updated via the extension on an update, so I would need to constantly refresh my second table as well.
Edit
I'm wondering if configuring a view would help with this? Though ultimately I would like to query this based on different parameters and the docs here https://cloud.google.com/bigquery/docs/views suggest you can't reference query parameters in a view
I've since managed to work this out, and will share the solution for anyone else with the same problem.
The solution was to use JSON_VALUE in the WHERE clause instead e.g:
where JSON_VALUE(data,'$.title') = 'technology';
I'm still not sure if this is the best way to do this in terms of performance and cost so I will wait to see if anyone else leaves a better answer.

Difference between "Preview" and Query in BigQuery

I have the following table schema:
+-----+---------+----------+
+ chn | INTEGER | NULLABLE |
+-----+---------+----------|
+ size| STRING | NULLABLE |
+-----+---------+----------|
+ char| REPEATED| NULLABLE |
+-----+---------+----------|
+ ped | INTEGER | NULLABLE |
+-----+---------+----------
When I click on 'preview' in the Google BigQuery Web UI, I get the following result:
But when I query my table, I get this result:
It seems like "preview" is interpreting my repeated field as an array, I would want to get the same result in a query to limit the number of rows.
I did try to uncheck "Use Legacy SQL" which gave me the same result but the problem is that with my table, a same query takes ~1.0 sec to execute with "Use Legacy SQL" checked and ~12 seconds when it's unchecked.
I am looking for speed here so unfortunately, not using Legacy SQL is not an option...
Is there another way to render my repeated field like it does in the "preview" ?
Thanks for the help :)
In legacy SQL, BigQuery flattens the result of queries by default. This means two things:
All child fields of RECORD fields are propagated to the top-level, with their names changed from record.subrecord.leaf to record_subrecord_leaf. Parent records are removed from the schema.
All repeated fields are converted to fields of optional mode, with each repeated value expanded into its own row. (As a side note, this step is very similar to the FLATTEN function exposed in legacy SQL.)
What you see here is a product of #2. Each repeated value is becoming its own row (as you can see by the row count on the left-hand side in your two images) and the values from the other columns are, well, repeated for each new row.
You can prevent this behavior and receive "unflattened results" in a couple ways.
Using standard SQL, as you note in your original question. All standard SQL queries return unflattened results.
While using legacy SQL, setting the flattenResults parameter to false. This requires also specifying a destination table and setting allowLargeResults to false. These can be found in the Show Options panel beneath the query editor if you want to set them within the UI. Mikhail has some good suggestions for managing the temporary-ness of destination tables if you aren't interested in keeping them around.
I should note that there are a number of corner cases with legacy SQL with flattenResults set to false which might trip you up if you start writing more complex queries. A prominent example is that you can't output more than one independently repeated field in query results using legacy SQL, but you can output multiple with standard SQL. These issues are unlikely to be resolved in legacy SQL, and going forward we're suggesting people use standard SQL when they run into them.
If you could provide more details about your much slower query using standard SQL (e.g. job ID in legacy SQL, job ID in standard SQL, for comparison), I, and the rest of the BigQuery team, would be very interested in investigating further.
Is there another way to render my repeated field like it does in the
"preview" ?
To see original not-flattened output in Web UI for Legacy SQL, i used to set respective options (click Show Options) to actually write output to table with checked Allow Large Results and unchecked Flatten Results.
This actually not only saves result into table but also shows result in the same way as preview does (because it is actually preview of that table). To make sure that table gets removed afterwards - i have "dedicated" dataset (temp) with default expiration set to 1 day (or hour - depends on how aggressive you want to be with your junk), so you don't need to worry of that table(s) - it will get deleted automatically for you. Wanted to note: this was quite a common pattern for us to deal with and having to do extra settings was boring, so we ended up with our own custom UI that does all this for user automatically
What you see is called Flatten.
By default the UI flattens the query output, there is currently no option to show query results like you want. In order to produce unflatten results you must write to a table, but that's different thing.

SELECT query using LIKE property in Microsoft Access returns no results when it should

I'm sure I'm making some kind of rookie error here, but I have no idea what the problem is. I am trying to run a simple query on one table in a microsoft access database using the LIKE property to find records that have a certain text string in a particular field. More specifically, the table, called Catreqs, has a few fields, bib_num, MARC_336, MARC_337, and MARC_338. The MARC_336 field has a text string in it and I want a query that selects all the records for which that text string includes the characters "txt".
Here's my query:
SELECT [Catreqs].record_num, [Catreqs].MARC_336
FROM [Catreqs]
WHERE [Catreqs].MARC_336 Like '%txt%';
I should note that I created this query in MS Access design view and this is the query that was generated when I switched to SQL view. I am a little familiar with SQL and even less familiar with Access so this is actually my preferred way of dealing with it.
I've also tried using Like '*txt*' but that didn't return any results either. For reference, here is the entire text string these characters are in:
text txt rdacontent
Any suggestions thoughts on why this fails and how I can fix it?
Thanks!
In Access, for a string you must use the * character.
Check if [Catreqs] has rows where MARC_336 contains "txt".
This is the official documentation of Access:
https://support.office.com/en-us/article/Like-Operator-b2f7ef03-9085-4ffb-9829-eef18358e931?ui=it-IT&rs=en-001&ad=IT&omkt=en-001

Search for string in all tables

Is there some tool that can do that ?
Searching for given string in whole database ?
HeidiSQL can search an entire mysql host via a slick GUI.
Just press control+shift+f
It'll show a search window where you can select which databases, tables, views and field-types to search. Results are presented in a neat list.
You can get the results as SQL that selects the rows where your text was found.
...
Of course you can also do it the old-fashioned way:
Use mysqldump.exe to export to a file, and then use grep to find your searchtext.
You could query the metadata to find all of the tables and columns within the tables and then search each of them for the string.
Here's some links on how to pull the metadata out of mysql:
http://www.developerfusion.com/code/3945/get-metadata-on-mysql-databases/
http://forge.mysql.com/wiki/Metadata

Is there a way to parser a SQL query to pull out the column names and table names?

I have 150+ SQL queries in separate text files that I need to analyze (just the actual SQL code, not the data results) in order to identify all column names and table names used. Preferably with the number of times each column and table makes an appearance. Writing a brand new SQL parsing program is trickier than is seems, with nested SELECT statements and the like.
There has to be a program, or code out there that does this (or something close to this), but I have not found it.
I actually ended up using a tool called
SQL Pretty Printer. You can purchase a desktop version, but I just used the free online application. Just copy the query into the text box, set the Output to "List DB Object" and click the Format SQL button.
It work great using around 150 different (and complex) SQL queries.
How about using the Execution Plan report in MS SQLServer? You can save this to an xml file which can then be parsed.
You may want to looking to something like this:
JSqlParser
which uses JavaCC to parse and return the query string as an object graph. I've never used it, so I can't vouch for its quality.
If you're application needs to do it, and has access to a database that has the tables etc, you could run something like:
SELECT TOP 0 * FROM MY_TABLE
Using ADO.NET. This would give you a DataTable instance for which you could query the columns and their attributes.
Please go with antlr... Write a grammar n follow the steps..which is given in antlr site..eventually you will get AST(abstract syntax tree). For the given query... we can traverse through this and bring all table ,column which is present in the query..
In DB2 you can append your query with something such as the following, but 1 is the minimum you can specify; it will throw an error if you try to specify 0:
FETCH FIRST 1 ROW ONLY