We have several views in numerous projects & datasets in Google Big Query. Is there a way to list all invalid views? E.g. to "re-validate" all views and then to get a list?
While it might not cover all problems I think I could execute a view using the dryRun parameter to determine its state (https://cloud.google.com/bigquery/docs/dry-run-queries). But in this case I would like to determine all existing views (over all projects, or - as this may be a bad idea - at least within one project) and then to trigger the view with the dryRun parameter and to store the results somewhere/somehow.
Hints how to do that are appreciated.
Regards,
HerrB92
I am not aware of any built-in tools to do this, but it should be doable with some scripting.
bq ls command will return list of datasets, then for each dataset you can continue running bq ls <dataset> (or use SELECT * FROM dataset.INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'VIEW'), then run each view with --dry_run flag.
Related
In hive, when typed - show create table table_name does not show the full output especially when the table has many columns specified. In ordert to show the whole output what command should be added??
Are you using hive or beeline shell? It might be related with interface that you used. Especially shell tools might be limited for long results.
Could you try it from Hue or a graphical desktop tool like Dbeaver?
I need to list all tables in my BigQuery, but I don't know how to do it, I try search but I didn't find anything about it.
I need to know if the table exists, if it exists I search for record, if not I create table and insert record.
Depending where/how you want to do this, you can use CLI, API calls or client libraries. Here you have all the info about listing tables
As an example, if you want to list them using Command Line Interface, you can do it like:
bq ls <project>:<dataset>
If you want to use normal SQL queries, you can use the INFROMATION_SCHEMA Beta feature
SELECT table_name from `<project>.<dataset>.INFORMATION_SCHEMA.TABLES`
(project is optional)
We have a created a large number of views in BigQuery using standardSql. Now we need to see for the correctness of these created views.
Is there a bq command to get the sql query with which these views have been created in BigQuery?
This command will prevent manual effort of checking for the correctness of these views
Use the show command with the view flag.
e.g. bq show --view <project>:<dataset>.<view>
You can also use the --format=prettyjson flag (instead of --view) so you can easily get the query content when running a script, for example:
bq show --format=prettyjson <project>:<dataset>.<view>
I have several databases within a BigQuery project which are populated by various jobs engines and applications. I would like to maintain a dashboard of all of the Last Modified dates for every table within our project to monitor job failures.
Are there any command line or SQL commands which could provide this list of Last Modified dates?
For a SQL command you could try this one:
#standardSQL
SELECT *, TIMESTAMP_MILLIS(last_modified_time)
FROM `dataset.__TABLES__` where table_id = 'table_id'
I recommend you though to see if you can log these errors at the application level. By doing so you can also understand why something didn't work as expected.
If you are already using GCP you can make use of Stackdriver (it works on AWS as well), we started using it in our projects and I recommend giving it a try (we tested for python applications though, not sure how the tool performs on other clients but it might be quite similar).
I've just queried stacked GA4 data using the following code:
FROM analytics_#########.__TABLES__
where table_id LIKE 'events_2%'
I have kept the 2 on the events to ensure my intraday tables do not pull through also.
I've run Hive on elastic mapreduce in interactive mode:
./elastic-mapreduce --create --hive-interactive
and in script mode:
./elastic-mapreduce --create --hive-script --arg s3://mybucket/myfile.q
I'd like to have an application (preferably in PHP, R, or Python) on my own server be able to spin up an elastic mapreduce cluster and run several Hive commands while getting their output in a parsable form.
I know that spinning up a cluster can take some time, so maybe my application might have to do that in a separate step and wait for the cluster to become ready. But is there any way to do something like this somewhat concrete hypothetical example:
create Hive table customer_orders
run Hive query "SELECT dt, count(*) FROM customer_orders GROUP BY dt"
wait for result
parse result in PHP
run Hive query "SELECT MAX(id) FROM customer_orders"
wait for result
parse result in PHP
...
Does anyone have any recommendations on how I might do this?
You may use MRJOB. It lets you write MapReduce jobs in Python 2.5+ and run them on several platforms.
An alternative is HiPy, it is an awesome project which should perhaps be enough for all your needs. The purpose of HiPy is to support programmatic construction of Hive queries in Python and easier management of queries, including queries with transform scripts.
HiPy enables grouping together in a single script of query
construction, transform scripts and post-processing. This assists in
traceability, documentation and re-usability of scripts. Everything
appears in one place and Python comments can be used to document the
script.
Hive queries are constructed by composing a handful of Python objects,
representing things such as Columns, Tables and Select statements.
During this process, HiPy keeps track of the schema of the resulting
query output.
Transform scripts can be included in the main body of the Python
script. HiPy will take care of providing the code of the script to
Hive as well as of serialization and de-serialization of data to/from
Python data types. If any of the data columns contain JSON, HiPy takes
care of converting that to/from Python data types too.
Check out the Documentation for details!