adding a sub query to a case statement in hive - hive

I hope you can help. I have the below query, which has a case statement.
I want to say:
IF the domain is in the other table, then return the domain name, else, mark it as 'other'
I am using Hive & get the error:
Unsupported SubQuery Expression 'cleandomain': Currently SubQuery expressions are only allowed as Where Clause predicates
Is there some other way I can achieve the same?
SELECT *,
CASE
WHEN cleandomain IN (SELECT cleandomain
FROM keenek1.daily_top_doms) THEN cleandomain
ELSE 'other'
END AS status
FROM (SELECT hour,.....

One possible solution is using in_file(string str, string filename) function.
Put the list of domains in the text file, one domain per line, txt file and call in_file function in the CASE statement:
CASE
WHEN in_file(cleandomain,'file/path/daily_top_doms.txt') THEN cleandomain
ELSE 'other'
END AS status
Another solution is to aggregate the list of domains into array in the subquery, join using cross join and use array_contains(). This may work much faster if the list is not too big:
with dom as (
SELECT collect_set(cleandomain) dom
FROM keenek1.daily_top_doms
)
select
case when array_contains(d.dom, s.cleardomain) then s.cleandomain
else 'other'
end as status
from (your query) s cross join dom d --one row cross join

Related

Adding a condition to the showcase

The task is as follows. I have a code that has a huge number of attributes. And to one of the attributes, let's say this is the card type card_type='universal', you need to add the following condition:
case when card>='129897' and card<='293965'then 'unnamed' and card>='093750' and card <='903750' then 'personal' end as parameter
The attribute itself is as follows :select case when card_sybtype in ('VISA','MS') then 'universal'
At the same time, I do not need to output this to the final script, but I need this feature to be present in the script. That is, I need it to be linked only to the card type.
I think you can use a nested case when.
And if you dont want to select parameter column, you can use a subquery to hide the columns. So, in case you want it, just add it to the outside select list.
SELECT col1, col2
FROM
(SELECT
col1, col2,
case when card_sybtype in ('VISA','MS') THEN -- universal case
case when card>='129897' and card <='293965' then 'unnamed'
when card>='093750' and card <='903750' then 'personal' end
else null
end as param_adhoc
FROM tab
) rs

Hive - SELECT inside WHEN clause of CASE function gives an error

I am trying to write a query in Hive with a Case statement in which the condition depends on one of the values in the current row (whether or not it is equal to its predecessor). I want to evaluate it on the fly, this way, therefore requiring a nested query, not by making it another column first and comparing 2 columns. (I was able to do the latter, but that's really second-best). Does anyone know how to make this work?
Thanks.
My query:
SELECT * ,
CASE
WHEN
(SELECT lag(field_with_duplicates,1) over (order by field_with_duplicates) FROM my_table b
WHERE b.id=a.id) = a.field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
FROM my_table a
Error:
java.sql.SQLException: org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'lag' '(' in expression specification; line 4 pos 9
Notes:
The reason I needed the complicated 'lag' function is that the unique Id's in the table are not consecutive, but I don't think that's where it's at: I tested by substituting another simpler inner query and got the same error message.
Speaking of 'duplicates', I did search on this issue before posting, but the only SELECT's inside CASE's I found were in the THEN statement, and if that works the same, it suggests mine should work too.
You do not need the subquery inside CASE:
SELECT a.* ,
CASE
WHEN prev_field_with_duplicates = field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
FROM (select a.*,
lag(field_with_duplicates,1) over (order by field_with_duplicates) as prev_field_with_duplicates
from my_table a
)a
or even you can use lag() inside CASE instead without subquery at all (I'm not sure if it will work in all Hive versions ):
CASE
WHEN lag(field_with_duplicates,1) over (order by field_with_duplicates) = field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
Thanks to #MatBailie for the answer in his comment. Don't I feel silly...
Resolved

Mutliple values for IN clause with CASE in Redshift

I am trying to run this query where IN clause uses CASE to choose values between two cases.
The issue is with the hard coded value('aaa','bbb'). I cannot add multiple values inside THEN so it act as regular IN values. The hard code values will be dynamic as I will pass a variable for it.
select kdo.field0
from tb1 data1 inner join tb2 kdo
on kdo.field1 = 'xxx'
and kdo.field2::DATE >='2017-08-01'::DATE
and kdo.field0
in (case when 'asd'!='' then 'aaa','bbb'
else tb2.field0 end);
Also, I used a sub-query select inside THEN to get specific hard code values but it is also of no avail. Using single hard-coded value obviously works as usual.
Move your CASE outside IN:
select kdo.field0
from tb1 data1 inner join tb2 kdo
on kdo.field1 = 'xxx'
and kdo.field2::DATE >='2017-08-01'::DATE
and case when 'asd'!='' then kdo.field0 in ('aaa','bbb')
else kdo.field0=tb2.field0 end;
however I'm not sure what do you mean by 'asd'!='' since 'asd' is a string and this will always return true
also, else tb2.field0 end); part in your statement is not an array option, it's a column name so I assume this just translates to kdo.field0=tb2.field0 because if the previous case option is false you want to check if kdo.field0 is equal to any of values in tb2.field0 which is basically a join condition

Assign a case value to a column rather than an alias

This should be a simple one, but I have not found any solution:
The normal way is using an alias like this:
CASE WHEN ac_code='T' THEN 'time' ELSE 'purchase' END as alias
When using alias in conjunction with UNION ALL this causes problem because the alias is not treated the same way as the other columns.
Using an alias to assign the value is not working. It is still treated as alias, though it has the column name.
CASE WHEN ac_code='T' THEN 'time' ELSE 'purchase' END as ac_subject
I want to assign a value to a column based on a condition.
CASE WHEN ac_code='T' THEN ac_subject ='time' ELSE ac_subject='purchase' END
Now I get the error message
UNION types character varying and boolean cannot be matched
How can I assign a value to a column in a case statement without using an alias in the column (shared by other columns in UNION)?
Here is the whole (simplified) query:
SELECT hr_id,
CASE WHEN hr_subject='' THEN code_name ELSE hr_subject END
FROM hr
LEFT JOIN code ON code_id=hr_code
WHERE hr_job='123'
UNION ALL
SELECT po_id,
CASE WHEN po_subject='' THEN code_name ELSE po_subject END
FROM po
LEFT JOIN code ON code_id=po_code
WHERE po_job='123'
UNION ALL
SELECT ac_id,
CASE WHEN ac_code='T' THEN ac_subject='time' ELSE ac_subject='purchase' END
FROM ac
WHERE ac_job='123'
There is no alias in your presented query. You are confusing terms. This would be a column alias:
CASE WHEN hr_subject='' THEN code_name ELSE hr_subject END AS ac_subject
In a UNION query, the number of columns, column names and data types in the returned set are determined by the first row. All appended rows have to match the row type. Column names in appended rows (including aliases) are just noise and ignored. Maybe useful for documentation, nothing else.
The = operator does not assign anything in a SELECT query. It's the equality operator that returns a boolean value. TRUE if both operands are equal, etc. This returns a boolean value: ac_subject='time' Hence your error message:
UNION types character varying and boolean cannot be matched
The only way to "assign" a value to a particular output column in this query is to include it at the right position in the SELECT list.
The information in the question is incomplete, but I suspect you are also confusing the empty string ('') with the NULL value. A distinction that you need to understand before doing anything else with relational databases. Maybe start here. In this case you would rather use COALESCE to provide a default for NULL values:
SELECT hr_id, COALESCE(hr_subject, code_name) AS ac_subject
FROM hr
LEFT JOIN code ON code_id=hr_code
WHERE hr_job = '123'
UNION ALL
SELECT po_id, COALESCE(po_subject, code_name)
FROM po
LEFT JOIN code ON code_id=po_code
WHERE po_job = '123'
UNION ALL
SELECT ac_id, CASE WHEN ac_code = 'T' THEN 'time'::varchar ELSE 'purchase' END
FROM ac
WHERE ac_job = '123'
Just an educated guess, assuming type varchar. You should have added table qualification to column names to clarify their origin. Or table definitions to clarify everything.
The CASE expression is supposed to return a value, e.g. 'time'.
Your value is another expression subject ='time' which is a boolean (true or false).
Is this on purpose? Does the other query you glue with UNION have a boolean in that place, too? Probably not, and this is what the DBMS complains about.
I found the problem.
CASE WHEN hr_subject=’’ THEN code_name ELSE hr_subject END
The columns code_name and hr_subject was different length. This caused the unpredictable result. I think that aliases can work now.
Thank you for your support.

SQL CASE returning two values

I'm writing my first SQL CASE statement and I have done some research on them. Obviously the actual practice is going to be a little different than what I read because of context and things of that nature. I understand HOW they work. I am just having trouble forming mine correctly. Below is my draft of the SQL statement where I am trying to return two values (Either a code value from version A and it's title or a code value from version B and its title). I've been told that you can't return two values in one CASE statment, but I can't figure out how to rewrite this SQL statement to give me all the values that I need. Is there a way to use a CASE within a CASE (as in a CASE statement for each column)?
P.S. When pasting the code I removed the aliases just to make it more concise for the post
SELECT
CASE
WHEN codeVersion = A THEN ACode, Title
ELSE BCode, Title
END
FROM Code.CodeRef
WHERE ACode=#useCode OR BCode=#useCode
A case statement can only return one value. You can easily write what you want as:
SELECT (CASE WHEN codeVersion = 'A' THEN ACode
ELSE BCode
END) as Code, Title
FROM Code.CodeRef
WHERE #useCode in (ACode, BCode);
A case statement can only return a single column. In your scenario, that's all that is needed, as title is used in either outcome:
SELECT
CASE
WHEN codeVersion = "A" THEN ACode,
ELSE BCode
END as Code,
Title
FROM Code.CodeRef
WHERE ACode=#useCode OR BCode=#useCode
If you actually did need to apply the case logic to more than one column, then you'd need to repeat it.
Here is what I normally use:
SELECT
CASE
WHEN codeVersion = "A" THEN 'ACode'
WHEN codeVersion = "B" THEN 'BCode'
ELSE 'Invalid Version'
END as 'Version',
Title
FROM Code.CodeRef
WHERE
CASE
WHEN codeVersion = "A" THEN ACode
WHEN codeVersion = "B" THEN BCode
ELSE 'Invalid Version'
END = 'Acode'
my suggestion uses an alias. note on aliases: unfortunately you can't use the alias 'Version' in a where/group by clause. You have to use the whole case statement again. I believe you can only use an alias in an Order By.