How to select from variable table name? - sql

I have a scheduled query set up, but I want to select FROM my_project.my_database.my_table_{todays_date} each day.
I found how to create variables in BigQuery like:
DECLARE todays_date STRING DEFAULT REPLACE(CAST(CURRENT_DATE AS STRING), '-', '').
(Date Format: YYYYMMDD (no underscore or hyphen))
But how could I query from this table each day?
`my_project.my_database.my_table_{todays_date}`

You can achieve this using queries with wildcards in the table name. The documentation here explains it very well:
https://cloud.google.com/bigquery/docs/querying-wildcard-tables
Additionally in your case if you wanted to filter against a subset of the tables you could do something like this where the _TABLE_SUFFIX psuedo column is used to filter to select tables based on the #run_date variable if being done through a scheduled query:
SELECT *
FROM my_project.my_dataset.my_table_*
WHERE _TABLE_SUFFIX = CAST(#run_date AS STRING)

Related

How to query all tables in dataset and add an identifier?

I use this query
SELECT * FROM `project.DATASET.*`
to select all the data in DATASET,
There is a way to add a new column to identify which table belong each record?
You're using a wildcard query, which support a special _TABLE_SUFFIX identifier. Most use it for filtering the set of matched tables, but you can project it as a result column as well.
More info here: https://cloud.google.com/bigquery/docs/querying-wildcard-tables
Something like this:
SELECT
_TABLE_SUFFIX as src_tbl,
*
FROM `project.dataset.*`

Using an UDF to query a table in Hive

I have the following UDF available on Hive to convert a time bigint to date,
to_date(from_utc_timestamp(from_unixtime(cast(listed_time/1000 AS bigint)),'PST'))
I want to use this UDF to query a table on a specific date. Something like,
SELECT * FROM <table_name>
WHERE date = '2020-03-01'
ORDER BY <something>
LIMIT 10
I would suggest to change the logic: avoid applying the function to the column being filtered, because it is an inefficient approach. The function needs to be invoked for every row, which prevents the query from benefiting an index.
On the other hand, you can simply convert the input date to a unix timestamp (possibly with an UDF). This should look like;
SELECT * FROM <table_name>
WHERE date = to_utc_timestamp('2020-03-01', 'PST') * 1000
ORDER BY <something>
LIMIT 10

How does one select all columns but rename some of them in one statement?

Normally I would write a statement like:
SELECT * FROM my_table;
But I have two columns, (both of date type), called 'created' and 'edited'. If I do select *, then the date in each of these columns will appear as:
2017-11-04T18:30:00.000Z
I would rather the date appear in DD/MON/YYYY.
To do that, I currently modify my SQL statement to:
SELECT column_name1,column_name2,column_name3,to_char(created, 'DD-MON-YYYY') as created,column_name4.... FROM my_table;
The problem is that although I can format the date, I have the problem of having to specify each column name in the statement. Is there some way I can select all the columns (but rename one or more columns using the method above), without having to specify each column name ?
You can try something like this:
select to_char(t.created,'DD-MON-YYYY') as created,to_char(t.edited,'DD-MON-YYYY') as edited, t.* from my_table t;

comparison of several attributes in one SQL query

I want tow write a Teradata SQL query.
I have two attributes: date and name.
I want to create do a comparison for those two only using ONE subquery. I want it to look something like:
date,name= SELECT date, name FROM ...
Is it possible? How does that syntax look like
Have you tried the following:
SELECT {columns}
FROM {TableA}
WHERE (date,name) IN (SELECT date, name FROM {TableB});

Bigquery how to query multiple tables of the same structure?

I have datasets of the same structure and i know I can query them like this, they are named by date:
SELECT column
FROM [xx.ga_sessions_20141019] ,[xx.ga_sessions_20141020],[xx.ga_sessions_20141021]
WHERE column = 'condition';
However I actually want to query various months of this data... so instead of listing them all in the same way as above, is there syntax that you can use that looks like:
SELECT column
FROM [xx.ga_sessions_201410*] ,[xx.ga_sessions_201411*]
WHERE column = 'condition';
Take a look at the table wildcard functions section of the BigQuery query reference. TABLE_DATE_RANGE or TABLE_QUERY will work for you here. Something like:
SELECT column
FROM TABLE_DATE_RANGE(xx.ga_sessions_,
TIMESTAMP('2014-10-19'),
TIMESTAMP('2014-10-21'))
WHERE column = 'condition';