I need to check if a table exists in the google cloud bucket through a SQL query in Google Big Query.
In T-SQL I do it this way:
if object_id('TABELA') is null
begin
create Tabela (
campo tipo,
campo2 tipo
)
end
How do I perform this query in google bigquery?
Most DBMSs (including BigQuery) have an INFORMATION_SCHEMA that holds information about the objects in a database. This would normally be the place to look for the answers to your type of question
Related
A coworker created a table in BigQuery using "create or replace table" function. Unfortunately the query wasn't documented. I was wondering if there's a way to see the underlying query behind the table or a way to access the edit history of the table?
Use below as an example
select ddl
from `bigquery-public-data.utility_us.INFORMATION_SCHEMA.TABLES`
where table_name = 'us_county_area'
with out put like below
There are quite a number of very usefull INFORMATION SCHEMA Views to use
I would try to find this query in INFORMATION SCHEMA JOBS:
SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
job_type = 'QUERY' and
statement_type = 'CREATE_TABLE_AS_SELECT' and
lower(query) LIKE '%create%table%foo%' -- replace foo with your table name
How do I drop a few tables (e.g. 1 - 3) using the output of a SELECT statement for the table names? This is probably standard SQL, but specifically I'm using Apache Impala SQL accessed via Apache Zeppelin.
So I have a table called tables_to_drop with a single column called "table_name". This will have one to a few entries in it, each with the name of another temporary table that was generated as the result of other processes. As part of my cleanup I need to drop these temporary tables whose names are listed in the "tables_to_drop" table.
Conceptually I was thinking of an SQL command like:
DROP TABLE (SELECT table_name FROM tables_to_drop);
or:
WITH subquery1 AS (SELECT table_name FROM tables_to_drop) DROP TABLE * FROM subquery1;
Neither of these work (syntax errors). Any ideas please?
even in standard sql this is not possible to do it the way you showed.
in standard sql usually you can use dynamic sql which impala doesn't support.
however you can write an impala script and run it in impala shell but it's going to be complicated for such task, I would prepare the drop statement using select and run it manually if this is one-time thing:
select concat('DROP TABLE IF EXISTS ',table_name) dropstatements
from tables_to_drop
I created an external table in Redshift and then added some data to the specified S3 folder. I can view all the data perfectly in Athena, but I can't seem to query it from Redshift. What's weird is that select count(*) works, so that means it can find the data, but it can't actually show anything. I'm guessing it's some mis-configuration somewhere, but I'm not sure what.
Some stuff that may be relevant (I anonymized some stuff):
create external schema spectrum_staging
from data catalog
database 'spectrum_db'
iam_role 'arn:aws:iam::############:role/RedshiftSpectrumRole'
create external database if not exists;
create external table spectrum_staging.errors(
id varchar(100),
error varchar(100))
stored as parquet
location 's3://mybucket/errors/';
My sample data is stored in s3://mybucket/errors/2018-08-27-errors.parquet
This query works:
db=# select count(*) from spectrum_staging.errors;
count
-------
11
(1 row)
This query does not:
db=# select * from spectrum_staging.errors;
id | error
----+-------
(0 rows)
Check your parquet file and make sure the column data types in the Spectrum table match up.
Then run SELECT pg_last_query_id(); after your query to get the query number and look in the system tables STL_S3CLIENT and STL_S3CLIENT_ERROR to find further details about the query execution.
You don't need to define external tables when you have defined external schema based on Glue Data Catalog. Redshift Spectrum pics up all the tables that are in the Catalog.
What's probably going on there is that you somehow have two things with the same name and in one case it picks it up from the data catalog and in the other case it tries to use the external table.
Check these tables from Redshift side to get a better view of what's there:
select * from SVV_EXTERNAL_SCHEMAS
select * from SVV_EXTERNAL_TABLES
select * from SVV_EXTERNAL_PARTITIONS
select * from SVV_EXTERNAL_COLUMNS
And these tables for queries that use the tables from external schema:
select * from SVL_S3QUERY_SUMMARY
select * from SVL_S3LOG order by eventtime desc
select * from SVL_S3QUERY where query = xyz
select * from SVL_S3PARTITION where query = xyz
was there ever a resolution for this? a year down, i have the same problem today.
nothing stands out in terms of schema differences- an error exists though
select recordtime, file, process, errcode, linenum as line,
trim(error) as err
from stl_error order by recordtime desc;
/home/ec2-user/padb/src/sys/cg_util.cpp padbmaster 1 601 Compilation of segment failed: /rds/bin/padb.1.0.10480/data/exec/227/48844003/de67afa670209cb9cffcd4f6a61e1c32a5b3dccc/0
Not sure what this means.
I encountered a similar issue when creating an external table in Athena using RegexSerDe row format. I was able to query this external table from Athena without any issues. However, when querying the external table from Redhift the results were null.
Resolved by converting to parquet format as Spectrum cannot handle regular expression serialization.
See link below:
Redshift spectrum shows NULL values for all rows
It seems you can do select sql in bigquery but can you update only certain rows in the table through api or from their web console?
Currently BigQuery only accepts SELECT statements. Updates to data need to be done via API, web UI, or CLI.
BigQuery is a WORM technology (append-only by design). It looks for me, that you are not aware of this thing, as there is no option like UPDATE or DELETE row.
To delete data, you could re-materialize the table without the desired rows:
SELECT *
FROM [mytable]
WHERE id NOT IN (SELECT id FROM [rows_to_delete]
To update data, you could follow a similar process:
SELECT * FROM (
SELECT *
FROM [mytable]
WHERE id NOT IN (SELECT id FROM [rows_to_update]
), (
SELECT *
FROM [rows_to_update]
)
Re-materializing a table in BigQuery is fast enough - compared to native update/deletes on other analytical databases AFAIK.
SQL Azure does not support SQL Server's Full Text Search feature.
Does this mean a text field cannot be indexed to handle substring searches?
For example, if I have a table Emails, with a Message column
And I want to find all messages with both the words 'hello' and 'thanks' in them, will the standard index on the message collumn allow me to do this?
CREATE TABLE Emails (
[Id] bigint NOT NULL,
[Message] nvarchar({some number}) NOT NULL
);
GO
CREATE NONCLUSTERED INDEX Messages_Emails ON Emails
my query (using entity) would look like
var niceMessageQuery = Context.Emails.Where(e => e.Message.Contains("hello") && e.Message.Contains("thanks"));
Is there a better way to setup this query?
As of April 2015 Full-Text Search is available in Azure Sql Database server version V12: http://azure.microsoft.com/blog/2015/04/30/full-text-search-is-now-available-for-preview-in-azure-sql-database/.
Not familiar at all with azure, but is it possible to use a subquery? The inner (sub) query find all records that contain "hello" and the outer query uses that inner query as it's dataset to search for "thanks"?