Looking to set a reusable variable in hive - hive

I'm looking to set a variable like below, called today_date, and then be able to reuse it as a variable throughout the query. The below throws an error.
set today_date = date_format(date_sub(current_date, 1), 'YYYYMMdd')
select account
from table
where data_date = today_date

First command should end with semicolon:
set today_date=date_format(date_sub(current_date, 1), 'YYYYMMdd');
And variable should be used like this:
select account
from table
where data_date=${hivevar:today_date};
set command will not calculate expression and it will be substituted as is. The resulted query will be
select account
from table
where data_date = date_format(date_sub(current_date, 1), 'YYYYMMdd');
If you want variable to be already calculated, then calculate it in a shell and pass to your Hive script like in this answer: https://stackoverflow.com/a/37821218/2700344

You still need to put a semicolon at the end of the set line, surround your variable with ${} and use the proper namespace.
Note that this will not execute the date_format() function when the variable is defined. When you use the variable the SQL code will just be copied as-is. Think of it as more as a macro than as a variable.
Furthermore, Hive has multiple variable namespaces.
The 2 easiest options are either to be less verbose when you define your variable but to be more verbose when you use it (hiveconf namespace):
set today_date = date_format(date_sub(current_date, 1), 'YYYYMMdd');
select account from table where data_date = ${hiveconf:today_date};
or the other way round (hivevar namespace)
set hivevar:today_date = date_format(date_sub(current_date, 1), 'YYYYMMdd');
select account from table where data_date = ${today_date};

Related

How to select characters between wildcards in Azure Data Factory expression for Switch Block

I have 2 pipelines that are currently selected based on an IF condition, which works well. I now need to add a third pipeline , so using a Switch block instead. And instead of tablename as the deciding factor, ideally I would use logic something like this:
does the sourcetablename value contain {FromDate} then goto pipeline A
does the sourcetablename value contain {passinid} then goto pipeline B
else goto pipeline C
The sourcetablename value comes with the initial gettablestoload lookup.
An example sourcetablename value is used for the API call in this instance, and would look like this for example: blahblah/content/{passinid}/user/blahblah
I am struggling with the expression for the switchblock. Previously I have matched on the last 10 characters of the tablename, this just seems a little bit tricky.
Here is an example of an expression to remove the last 46 characters, just to give you an idea where I am struggling up to:
#substring(activity('GetTablesToLoad').SourceTableName,0,sub(length(activity('GetTablesToLoad').SourceTableName),46))
Would anyone have an idea please?
If this was SQL it would be something like this:
DECLARE #text VARCHAR (500) = 'content/{passinid}/user'
SELECT SUBSTRING(#Text, CHARINDEX('{', #Text)
, CHARINDEX('}',#text) - CHARINDEX('{', #Text) + Len('}'))
thankyou
You can use if function to check for required value directly. I have taken a parameter called sourceTableName for demonstration instead of lookup value.
I have used the following dynamic content as expression value for switch activity. You can replace the pipeline().parameters.sourceTableName with lookup activity SourceTableName value. The name of switch case cannot start with { so, I have directly taken the name.
#if(contains(pipeline().parameters.sourceTableName,'{passinid}'),'passinid',if(contains(pipeline().parameters.sourceTableName,'{FromDate}'),'FromDate','execute default'))
When the parameter value is blahblah/content/{passinid}/user/blahblah, the corresponding set variable3 is executed.
When the parameter value is blahblah/content/{FromDate}/user/blahblah, the corresponding set variable4 is executed.
If these values are not present (blahblah/conten/user/blahblah), then the default case would be executed (set variable2 activity).

How can I use a variable on BigQuery SQL Editor?

While writing SQL queries on BigQuery UI, sometimes I do a lot of JOIN over multiple tables using many WHERE clauses with the same date condition for each table.
Whenever I need to see the results for a different date, I have to replace it at multiple locations in the query. I wonder if there is a simple way to use a variable in the BQ SQL Editor and pass it just once (top/bottom)?
This is true for all the complicated queries as we have to search throughout the query for variables to change.
While parameterized queries are not available in the Console. You can use Scripting, instead.
According to your need, you can use DECLARE and/or SET. It is stated in the documentation that:
DECLARE: Declares a variable of the specified type. If the DEFAULT clause is specified, the variable is initialised with the
value of the expression; if no DEFAULT clause is present, the variable
is initialised with the value NULL
The syntax is as follows:
#Declaring the variable's type and initialising the variable using DEFAULT
DECLARE variable STRING DEFAULT 'San Francisco';
SELECT * FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions`
#using the variable inside the where clause
WHERE name = variable
SET: Sets a variable to have the value of the provided expression, or sets multiple variables at the same time based on the
result of multiple expressions.
And the syntax, as below:
#Declaring the variable using Declare and its type
DECLARE variable STRING;
#Initialising the variable
SET variable = 'San Francisco';
SELECT * FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions`
#using the variable inside the where clause
WHERE name = variable
In both examples above, I have queried against a public dataset bigquery-public-data.san_francisco_bikeshare.bikeshare_regions. Also, both outputs are the same,
Row region_id name
1 3 San Francisco
In addition to the above example, more specifically to your case, when declaring a variable as date you can to it as follows:
DECLARE data1 DATE DEFAULT DATE(2019,02,15);
WITH sample AS(
SELECT DATE(2019,02,15) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,16) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,17) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,18) AS date_s, "Some other field!" AS string
)
SELECT * FROM sample
WHERE date_s = data1
And the output,
Row date_s string1
1 2019-02-15 Some other field!

SSIS Variable in Data Flow Task used incorrectly

In the SSIS Package I have several Data flow Tasks where I want to use one input global variable named KOERSEL to go back in time and call data set specifically in time.
When I try to run it I am getting the error:
Syntax error, permission violation, or other nonspecific error.
when I change ? to 1 in the SQL command text, the code is running fine. So what am I missing?
DECLARE #dt DATETIMEOFFSET = SWITCHOFFSET(CONVERT(DATETIMEOFFSET, GETDATE()), '-04:00')
DECLARE #interval INT = ?
SET #interval = -1 * #interval
DECLARE #DATE_OPG DATE
SELECT #DATE_OPG = A.DWH_PR_DATO
FROM TABLE AS A
WHERE YEAR(A.DWH_PR_DATO)=YEAR(DATEADD(MONTH,#interval,#dt)) AND
MONTH(A.DWH_PR_DATO)=MONTH(DATEADD(MONTH,#interval,#dt))
ORDER BY A.DWH_PR_DATO DESC
SELECT DISTINCT COLUMN 1,
COLUMN 1,
COLUMN 1,
FROM TABLE 1
WHERE DATE_OPG=#DATE_OPG
UNION ALL
SELECT DISTINCT COLUMN 2,
COLUMN 2,
COLUMN 2,
FROM TABLE 2
WHERE DATE_OPG=#DATE_OPG
...
Screenshot
I don't think that the following error is the real issue.
Incorrect syntax near ')'.
The query parser was not able to parse the query because you have added a minus sign before the question mark ?. In this answer i will try to clarify the main cause of the error you are seeing.
Parameter data type vs Variable data type
Based on the official OLEDB Source - Documentation:
The parameters are mapped to variables that provide the parameter values at run time. The variables are typically user-defined variables, although you can also use the system variables that Integration Services provides. If you use user-defined variables, make sure that you set the data type to a type that is compatible with the data type of the column that the mapped parameter references.
This implies that the parameter datatype is not related to the variable data type.
So when you are using -? inside the SQL Command the query parser are not able to identify the parameter metadata even if it is mapped to an integer variable.
You can check my answer on the link below, it contains much details with experiments:
Date calculation with parameter in SSIS is not giving the correct result
Solving the problem
(1) Force parameter data type
Try using CAST() function to force the parameter data type and assign it to a variable in the same way you have declared #dt:
DECLARE #interval INT = CAST(? as INT)
--If you want to get a negative number else ignore the row below
SET #interval = -1 * #interval
DECLARE #dt DATETIMEOFFSET = SWITCHOFFSET(CONVERT(DATETIMEOFFSET,GETDATE()),'-04:00');
DECLARE #DATE_OPG DATE;
SELECT #DATE_OPG = DWH_PR_DATEO
FROM TableName
WHERE YEAR(DWH_PR_DATO) = YEAR(DATEADD(MONTH,#interval ,#dt)) AND
MONTH(DWH_PR_DATO) = MONTH(DATEADD(MONTH,#interval ,#dt))
ORDER BY DWH_PR_DATO DESC
(2) Using Expressions
You can use Expressions while building the SQL Command:
Add a variable of type string (Example: #[User::strQuery])
Define an Expression within this variable:
"DECLARE #dt DATETIMEOFFSET = SWITCHOFFSET(CONVERT(DATETIMEOFFSET,GETDATE()),'-04:00');
DECLARE #DATE_OPG DATE;
SELECT #DATE_OPG = DWH_PR_DATEO
FROM TableName
WHERE YEAR(DWH_PR_DATO) = YEAR(DATEADD(MONTH,-" + #[User::KOERSEL] + ",#dt)) AND
MONTH(DWH_PR_DATO) = MONTH(DATEADD(MONTH,-" + #[User::KOERSEL] + ",#dt))
ORDER BY DWH_PR_DATO DESC"
In the OLEDB Source choose SQL Command from variable and Select #[User::strQuery]
Experiments
I tried a similar query using the AdventureWorks database:
DECLARE #dt DATETIMEOFFSET = SWITCHOFFSET(CONVERT(DATETIMEOFFSET, GETDATE()), '-04:00')
DECLARE #interval INT = CAST(? as INT)
SET #interval = -1 * #interval
DECLARE #DATE_OPG DATE
SELECT #DATE_OPG = A.[ModifiedDate]
FROM [AdventureWorks2016CTP3].[HumanResources].[Employee] AS A
WHERE YEAR(A.[ModifiedDate])=YEAR(DATEADD(MONTH,#interval,#dt)) AND
MONTH(A.[ModifiedDate])=MONTH(DATEADD(MONTH,#interval,#dt))
ORDER BY A.[ModifiedDate] DESC
SELECT * FROM [AdventureWorks2016CTP3].[HumanResources].[Employee]
WHERE [ModifiedDate] = #DATE_OPG
And the query is parsed successfully
Instead of -? use the following logic:
-1 * (CAST(? as int))
if you just want to pass the variable as parameter without a negative sign then just use:
(CAST(? as int))
You cannot assign a negative sign to the parameter because it will cause some conflict since the query parser will not be able to define the parameter data type.
If it still throwing and exception, check the following link it contains a workaround:
Problem With Parameter Multiplied By Negative Value In Where Clause in OLE DB Source

Using a ScarlaValue Function return value in another UDF

[CheckAtomicResultCriteria] is a Scalar value function that returns a BIT.
I want to use that function in another TableValue function's where clause. Like below, but doesn't working. What is the correct way of using this?
WHERE [CheckAtomicResultCriteria](parameters) = '1'
Neither this works
WHERE (SELECT [CheckAtomicResultCriteria](parameters)) = '1'
When executing a UDF, you must prefix the schema.
So you need to call it like this:
WHERE dbo.CheckAtomicResultCriteria(parms) = '1'

Writing the content of a local variable back to the resultset column?

Is it possible, by using a stored procedure, to fetch an integer column value from resultset into a local variable, manipulate it there and then write it back to the resultset's column?
If so what would the syntax look like?
Something along the following lines should do the trick.
DECLARE #iSomeDataItem INT
SELECT #iSomeDataItem = TableColumName
FROM TableName
WHERE ID = ?
--Do some work on the variable
SET #iSomeDataItem = #iSomeDataItem + 21 * 2
UPDATE TableName
SET TableColumName = #iSomeDataItem
WHERE ID = ?
The downside to an implementation of this sort is that it only operates on a specific record however this may be what you are looking to achieve.
What you are looking for is probably more along the lines of a user-defined function that can be used in SQL just like any other built in function.
Not sure how this works in DB2, but for Oracle it would be something like this:
Create or replace Function Decrement (pIn Integer)
return Integer
Is
Begin
return pIn - 1;
end;
You could use this in a SQL, e.g.
Select Decrement (43)
From Dual;
should return the "ultimate answer" (42).
Hope this helps.
Thanks for the replies, i went another way and solved the problem without using a procedure. The core problem was to calculate a Date using various column values, the column values ahd to to converted to right format. Solved it by using large "case - when" statements in the select.
Thanks again... :-)
Why not just do the manipulation within the update statement? You don't need to load it into a variable, manipulate it, and then save it.
update TableName
SET TableColumnName=TableColumnName + 42 /* or what ever manipulation you want */
WHERE ID = ?
also,
#iSomeDataItem + 21 * 2
is the same as:
#iSomeDataItem + 42
The function idea is an unnecessary extra step, unless most of the following are true:
1) you will need to use this calculation in many places
2) the calculation is complex
3) the calculation can change