Iterative functions for Apache Impala - sql

am working on my graduation project and its using Impala ,
so i want to ask is there anyway so i can use options like ' for , if , while ' ... etc in Cloudera Impala ?

#Atef Ibrahim you can use if :
if(boolean condition, type ifTrue, type ifFalseOrNull)
for more info you can read the doc Impala Conditional Functions
Regarding the for/while loop statement you can read
Write a While loop in Impala SQL?

Related

Mulesoft not able to pass dynamic SQL queries based on environments

Hello for demonstration purposes I trimmed out my actual sql query.
I have a SQL query
SELECT *
FROM dbdev.training.courses
where dbdev is my DEV database table name. When I migrate to TEST env, I want my query to dynamically change to
SELECT *
FROM dbtest.training.courses
I tried using input parameters like {env: p('db_name')} and using in the query as
SELECT * FROM :env.training.courses
or
SELECT * FROM (:env).training.courses
but none of them worked. I don't want my SQL query in properties file.
Can you please suggest a way to write my SQL query dynamically based on environment?
The only alternative way is to deploy separate jars for different environments with different code.
You can set the value of the property to a variable and then use the variable with string interpolation.
Warning: creating dynamic SQL queries using any kind of string manipulation may expose your application to SQL injection security vulnerabilities.
Example:
#['SELECT * FROM $(vars.database default "dbtest").training.courses']
Actually, you can do a completely dynamic or partially dynamic query using the MuleSoft DB connector.
Please see this repo:
https://github.com/TheComputerClassroom/dynamicSQLGETandPATCH
Also, I'm about to post an update that allows joins.
At a high level, this is a "Query Builder" where the code that builds the query is written in DataWeave 2. I'm working on another version that allows joins between entities, too.
If you have questions, feel free to reply.
One way to do it is :
Create a variable before DB Connector:
getTableName - ${env}.training.courses
Write SQL Query :
Select * from $(getTableName);

Adding today date in Table name when using Create Table function in standard sql GBQ

I am quite new to GBQ and any help is appreciated it.
I have a query below:
#Standard SQL
create or replace table `xxx.xxx.applications`
as select * from `yyy.yyy.applications`
What I need to do is to add today's date at the end of the table name so it is something like xxx.xxx.applications_<todays date>
basically create a filename with Application but add date at the end of the name applications.
I am writing a procedure to create a table every time it runs but need to add the date for audit purposes every time I create the table (as a backup).
I searched everywhere and can't get the exact answer, is this possible in Query Editor as I need to store this as a Proc.
Thanks in advance
BigQuery doesn't support dynamic SQL at the moment which means that this kind of construction is not possible.
Currently BigQuery supports Parameterized Queries but its not possible to use parameters to dynamically change the source table's name as you can see in the provided link.
BigQuery supports query parameters to help prevent SQL injection when
queries are constructed using user input. This feature is only
available with standard SQL syntax. Query parameters can be used as
substitutes for arbitrary expressions. Parameters cannot be used as
substitutes for identifiers, column names, table names, or other parts
of the query.
If you need to build a query based on some variable's value, I suggest that you use some script in SHELL, Python or any other programming language to create the SQL statement and then execute it using the bq command.
Another approach could be using the BigQuery client library in some of the supported languages instead of the bq command.

Collect list function in hive and impala

I am using
concat_ws(' ', collect_list(field1)) as field1,
but the query is not running in impala.
Does impala not support this function?
If not, what is an alternative for a similar operation in impala?
You can use group_concat() function in impala,
Plase refer https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_group_concat.html

Table ranges with BigQuery's standard SQL

How can you query a range of timestamped tables with the new syntax? Using TABLE_DATE_RANGE returns the error Unhandled node type in from clause: TVF.
The latest version of BigQuery supports an equivalent of table wildcards with Standard SQL. The documentation is available here: https://cloud.google.com/bigquery/docs/wildcard-tables.
Also please take a look at this post:
Is there an equivalent of table wildcard functions in BigQuery with standard SQL?

How to check if a table exists in Hive?

I am connecting to Hive via an ODBC driver from a .NET application. Is there a query to determine if a table already exists?
For example, in MSSQL you can query the INFORMATION_SCHEMA table and in Netezza you can query the _v_table table.
Any assistance would be appreciated.
Execute the following command : show tables in DB like 'TABLENAME'
If the table exists, its name will be returned, otherwise nothing will be returned.
This is done directly from hive. for more options see this.
DB is the database in which you want to see if the table exists.
TABLENAME is the table name you seek,
What actually happens is that Hive queries its metastore (depends on your configuration but it can be in a standard RDBMS like MySQL) so you can optionally connect directly to the same metastore and write your own query to see if the table exists.
There are two approaches by which you can check that:
1.) As #dimamah suggested, just to add one point here, for this approach you need to
1.1) start the **hiveserver** before running the query
1.2) you have to run two queries
1.2.1) USE <database_name>
1.2.2) SHOW TABLES LIKE 'table_name'
1.2.3) Then you check your result using Result set.
2.) Second approach is to use HiveMetastoreClient APIs, where you can directly use the APIs to check whether the table_name exist in a particular database or not.
For further help please go through this Hive 11
When programming on Hive by Spark SQL, you can use following method to check whether Hive table exists.
if (hiveContext.hql("SHOW TABLES LIKE '" + tableName + "'").count() == 1) {
println(tableName + " exists")
}
If someone is using shell script like me then my answer could be useful. Assume that your table is in the default namespace.
table=your_hive_table
validateTable=$(hive --database default -e "SHOW TABLES LIKE '$table'")
if [[ -z $validateTable ]]; then
echo "Error:: $table cannot be found"
exit 1
fi
If you're using SparkSQL you can do the following.
if "table_name" in sqlContext.tableNames("db_name"):
...do something
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.tableNames
Code similar to below one can find in many of my Spark notebooks:
stg_table_exists = sqlCtx.sql("SHOW TABLES IN "+ stg_db)
.filter("tableName='%s'" % stg_tab_name) .collect()
(made two-liner for readability)
I wish Spark would have an API call to check the same.
If you're using a scala spark app and SparkSQL you can do the following
if spark.catalog.tableExists("tablename") {do something}