AWS Redshift Leader Node-Only Function with table reference - sql

I have a requirement to pass the server address, server port, and the count from a table in one query in AWS Redshift. i.e.
select inet_server_addr(), inet_server_port(), count(*) from my_table;
ERROR: 0A000: Specified types or functions (one per INFO message) not
supported on Redshift tables.
I understand that this query does not work because I am trying to execute a Leader Node-Only Function in conjunction with a query which needs to access the compute nodes.
I am wondering, however, if there is a work around available to get the information that I need in one query execution.
Note: Editing the above query to use common table expressions (cte), sub-queries, views, scalar-functions etc results in the same error message.
Note 2: PostgreSQL System information functions like inet_server_addr() are currently unsupported in AWS Redshift, however, they work when called without a table reference.

Related

Adding today date in Table name when using Create Table function in standard sql GBQ

I am quite new to GBQ and any help is appreciated it.
I have a query below:
#Standard SQL
create or replace table `xxx.xxx.applications`
as select * from `yyy.yyy.applications`
What I need to do is to add today's date at the end of the table name so it is something like xxx.xxx.applications_<todays date>
basically create a filename with Application but add date at the end of the name applications.
I am writing a procedure to create a table every time it runs but need to add the date for audit purposes every time I create the table (as a backup).
I searched everywhere and can't get the exact answer, is this possible in Query Editor as I need to store this as a Proc.
Thanks in advance
BigQuery doesn't support dynamic SQL at the moment which means that this kind of construction is not possible.
Currently BigQuery supports Parameterized Queries but its not possible to use parameters to dynamically change the source table's name as you can see in the provided link.
BigQuery supports query parameters to help prevent SQL injection when
queries are constructed using user input. This feature is only
available with standard SQL syntax. Query parameters can be used as
substitutes for arbitrary expressions. Parameters cannot be used as
substitutes for identifiers, column names, table names, or other parts
of the query.
If you need to build a query based on some variable's value, I suggest that you use some script in SHELL, Python or any other programming language to create the SQL statement and then execute it using the bq command.
Another approach could be using the BigQuery client library in some of the supported languages instead of the bq command.

How do I generate a table name that contains today's date?

It may seem a little strange, but there are already tables with names for each date.
In my project, I have tables for each date to make statistics easier to handle.
Of course, I don't think this is always the best way, but this is the table structure for my project.
(It's a common technique in Google BigQuery and Amazon Athena. This question is about Google BigQuery)
So to get the data, I want to generate today's date. If I use TODAY, I can get the data of the latest day without rewriting the code even if it is the next day.
I tried, but the code didn't work.
Not work 1:
CONCAT in FROM
SELECT
*
FROM
CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo'))
Error:
Table-valued function not found: CONCAT at [4:3]
Not work 2:
create temporary function:
create temporary function getTableName() as (CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo')));
Error:
CREATE TEMPORARY FUNCTION statements must be followed by an actual query.
Question
How do I generate a table name that contains TODAY's date?
In this case, I would recommend you to use Wild tables in BigQuery, which allows you to use some features in Standard SQL.
With Wild Tables you can use _TABLE_SUFFIX, it grants you the ability to filter/scan tables containing this parameter. The syntax would be as follows:
SELECT *
FROM `test-proj-261014.sample.test_*`
where _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE)
I hope it helps.
Your first query should go like this:
select CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo'))
For creating temporary function, use the below code:
create temp function getTableName() as
((select CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo'))
));
select getTableName()
The error "CREATE TEMPORARY FUNCTION statements must be followed by an actual query." is because once the temporary functions are defined then you have to use the actual query to use that function and then the validity of function dies out. To define persistent UDFs and use them in multiple queries please go through the link to define permanent functions.You can reuse persistent UDFs across multiple queries, whereas you can only use temporary UDFs in a single query.

How do I query a specific range of Firebase's analytics table using Data Studio's date parameters?

I've been reading up on how to query a wildcard table in BigQuery, but Data Studio doesn't seem to recognize the _TABLE_SUFFIX keyword.
Google Data Studio on using parameters
Google BigQuery docs on querying wildcard tables
I'm trying to use the recently added date parameters for a custom query in Data Studio. The goal is to prevent the custom query from scanning all partitions to save time.
When using the following query:
SELECT
*
FROM
`project-name.analytics_196324132.events_*`
WHERE
_TABLE_SUFFIX BETWEEN DS_START_DATE AND DS_END_DATE
I receive the following error:
Unrecognized name: _TABLE_SUFFIX
I expected the suffix keyword to be recognized so that the custom query is more efficient. But I get this error message. Does Data Studio not yet support this? Or is there another way?
It could be possible that you are setting the query in the wrong place. I created a DataSource from a Custom Query and the wildcard worked. The query I tested was the following, similar to yours since _TABLE_SUFFIX is a wildcard that is available in standardSQL in BigQuery:
select
*
from
`training_project.training_dataset.table1_*`
where
_TABLE_SUFFIX BETWEEN '20190625' AND '20190626'
As per your comments you are trying to add a query in the formula field of a custom parameter, however the formula field only accepts basic math operations, functions, and branching logic.
The workaround I can see is to build a select query and use it as a Custom Query in the Data Source definition so that the query can calculate any extra fields in advance (steps 5,6 and 7 from this tutorial).

Listagg Redshift DDL

I am trying to retrieve the DDL for a table in Redshift. I found this view where I can easily select the definition for any table. However I need this information in one line, and I know that there is this Listagg function, but if I try to do this:
select listagg(ddl, ' ')
from admin.v_generate_tbl_ddl
where schemaname = 'schema'
and tablename = 'orders'
It's giving me this error:
Query execution failed
Reason: SQL Error [XX000]: ERROR: One or more of the used functions
must be applied on at least one user created tables. Examples of user
table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT, etc
Can You please help me on how can I achieve this?
listagg function is a compute-node only function.
But the query you run to get the table ddl runs only on leader because it only specifies pg_* tables.
According to AWS documentation
A query that references only catalog tables (tables with a PG prefix, such as PG_TABLE_DEF) or that does not reference any tables, runs exclusively on the leader node.
If a query that uses a compute-node function doesn't reference a user-defined table or Amazon Redshift system table returns the following error.
[Amazon] (500310) Invalid operation: One or more of the used functions must be applied on at least one user created table.
https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_functions_compute_node_only.html
To sum up, since your query does not reference any user created table, you can not use listagg

Choose different division of Exact Online when using distributed query with Invantive SQL

I have a set of SQL statements using distributed option of Invantive SQL that extract shipped goods information from Exact Online and create for each serial number shipped a ticket in Freshdesk, together with the consumer as a contact.
This works fine when connected to Exact Online and Freshdesk under one log on code. However, the end user uses a different log on code. In that case the set of SQL statements retrieves data from their test division in Exact Online instead of the correct production division.
When using no distributed option, I can change the division using:
use 123123
Where 123123 is the unique division number in the Exact Online country.
When connected both to Exact Online and Freshdesk, I get a:
itgenuse002: List of partitions could not be determined.
How can I enforce that the set of SQL statements is executed for a specific Exact Online division instead of the default one set at that moment for the log on code?
Sample SQL query that shows the problem:
create or replace table fulladdress#inmemorystorage --STAP 1.
as
select acad.id
, acad.name
, acad.phone
, acad.email
, acad.addressline1 || ' ' || acad.postcode || ' ' || acad.city fulladdress
from ExactonlineREST..Accounts#eolnl acad
where acad.status = 'C'
The use statement shown is for databases with exactly one data container. In that case, there is only one data container that can handle the question and everything runs smooth.
With a distributed query in Invantive SQL, you need to direct the use statement which data container to use. Otherwise, the first data container will try to handle it (in this case probably Freshdesk which has no concept of partitioning). That is similar to appending the data container alias to each tables as in:
select ...
from table#eolnl
join table2#freshdesk
on ...
Here eolnl and freshdesk specify where the tables should looked up.
So, in this case use:
use 123123#eolnl
The same also holds for the set statement.
From your code it seems you have multiple data containers running in your connection. From the user interface you can only set the partitions on the default data container.
However, there is an easy code solution to use. You have to know the alias of the data container you want to set the partition for. The use that alias in a use call (in this sample 123123 is the partition you want to choose):
use 123123#eolnl
Or to use all partitions available:
use all#eolnl