Hive - SELECT inside WHEN clause of CASE function gives an error - sql

I am trying to write a query in Hive with a Case statement in which the condition depends on one of the values in the current row (whether or not it is equal to its predecessor). I want to evaluate it on the fly, this way, therefore requiring a nested query, not by making it another column first and comparing 2 columns. (I was able to do the latter, but that's really second-best). Does anyone know how to make this work?
Thanks.
My query:
SELECT * ,
CASE
WHEN
(SELECT lag(field_with_duplicates,1) over (order by field_with_duplicates) FROM my_table b
WHERE b.id=a.id) = a.field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
FROM my_table a
Error:
java.sql.SQLException: org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'lag' '(' in expression specification; line 4 pos 9
Notes:
The reason I needed the complicated 'lag' function is that the unique Id's in the table are not consecutive, but I don't think that's where it's at: I tested by substituting another simpler inner query and got the same error message.
Speaking of 'duplicates', I did search on this issue before posting, but the only SELECT's inside CASE's I found were in the THEN statement, and if that works the same, it suggests mine should work too.

You do not need the subquery inside CASE:
SELECT a.* ,
CASE
WHEN prev_field_with_duplicates = field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
FROM (select a.*,
lag(field_with_duplicates,1) over (order by field_with_duplicates) as prev_field_with_duplicates
from my_table a
)a
or even you can use lag() inside CASE instead without subquery at all (I'm not sure if it will work in all Hive versions ):
CASE
WHEN lag(field_with_duplicates,1) over (order by field_with_duplicates) = field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator

Thanks to #MatBailie for the answer in his comment. Don't I feel silly...
Resolved

Related

Oracle CASE missing right parenthesis for a "in" limit

I have a QRY im developing in Oracle for spotfire. In the where statement, I have a decision case statement and if its True, im trying to pass a list of items to match a column, below is what I have, but its throwing a missing right parenthesis error and I cannot determine why. In short, when a variable is determined True (in this case 9>8 for the example, I need it to result those items, else, result the entire column with no limits.
Note: This works fine when its only 1 item being passed, i.e. 'BOB' but as soon as its multiple, this error occurs.
and Column1 = (CASE When 9>8 Then ('BOB','TOM') Else Column1 END)
Case expressions are best avoided in the where clause. Instead, write the logic with AND and OR:
And (
(9>8 AND Column1 IN ('BOB','TOM'))
OR 9<=8 -- You say you check a variable here, don't forget to check for NULL
)
Oracle does not have a boolean type for use in SQL queries.
Instead, just use basic logic:
and ( (9 > 8 and Column1 in ('BOB','TOM')) or
9 <= 8
)

How to reference a column in SQL that has count?

How do I get the column "count(division)" instead of getting the actual number of counts?
select * from num_taught;
gets me this
select count(division) from num_taught;
gets me this, but I actually want the third column "count(division)" from the previous image
I want to know this because I'm doing this right now:
sql> select * from num_taught as a, num_taught as b
...> where a.count(division) = b.count(division);
Error: near "(": syntax error
but as you can see, there's a syntax error and I think it's because the code is not referencing the "count(division)" columns but actually finding the count instead.
My end goal is to output the "Titles" that have the same "Division" and have the same count(division).
So for example, the end table would have the rows "Chief Accountant", "Programmer Trainee", "Scrivener", "Technician", "Wizard". Since these are the rows that have a match in division and count(division)
Thanks!
What does DESC num_taught return? I am curious how the third column is populated - is it some kind of pseudo-column? You may want try wrapping the column name with [], see: How to deal with SQL column names that look like SQL keywords?
i.e. try:
select [count(division)] from num_taught;
You need to escape your column name using quotes (in case it's Sqlite like you mentioned in the comments).
select "count(division)" from num_taught;
or:
select * from num_taught as a, num_taught as b
where a."count(division)" = b."count(division)";
If you don't you are using the count-function provided by your Database-system.
It's very unusual to name a column like this, it might be either a trap by your tutor or an error while initializing the table in your case.
I think you just want a count(distinct):
select count(distinct division)
from num_taught;

adding a sub query to a case statement in hive

I hope you can help. I have the below query, which has a case statement.
I want to say:
IF the domain is in the other table, then return the domain name, else, mark it as 'other'
I am using Hive & get the error:
Unsupported SubQuery Expression 'cleandomain': Currently SubQuery expressions are only allowed as Where Clause predicates
Is there some other way I can achieve the same?
SELECT *,
CASE
WHEN cleandomain IN (SELECT cleandomain
FROM keenek1.daily_top_doms) THEN cleandomain
ELSE 'other'
END AS status
FROM (SELECT hour,.....
One possible solution is using in_file(string str, string filename) function.
Put the list of domains in the text file, one domain per line, txt file and call in_file function in the CASE statement:
CASE
WHEN in_file(cleandomain,'file/path/daily_top_doms.txt') THEN cleandomain
ELSE 'other'
END AS status
Another solution is to aggregate the list of domains into array in the subquery, join using cross join and use array_contains(). This may work much faster if the list is not too big:
with dom as (
SELECT collect_set(cleandomain) dom
FROM keenek1.daily_top_doms
)
select
case when array_contains(d.dom, s.cleardomain) then s.cleandomain
else 'other'
end as status
from (your query) s cross join dom d --one row cross join

Aggregate functions and OrderBy

I have a query like the following one:
SELECT case_filed_by,
office_code,
desg_code,
court_code,
court_case_no,
COUNT(office_code) as count
FROM registration_of_case
WHERE TRUE
AND SUBSTR(office_code, 1, 0) = SUBSTR('', 1, 0)
ORDER BY court_code, court_case_no
I am getting the following error:
ERROR: column "registration_of_case.case_filed_by" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT case_filed_by,office_code,desg_code, court_code,court […]
As you describe in your comments, you actually want the number of selected rows in a separate field of your result set.
You can achieve this by using a subselect for the count and the join these two queries.
Something like this:
SELECT case_filed_by,
office_code,
desg_code,
court_code,
court_case_no,
office_code_count
FROM registration_of_case,
(SELECT COUNT(office_code) AS office_code_count
FROM registration_of_case
WHERE TRUE
AND SUBSTR(office_code, 1, 0) = SUBSTR('', 1, 0)
) AS count_query
WHERE TRUE
AND SUBSTR(office_code, 1, 0) = SUBSTR('', 1, 0)
ORDER BY court_code, court_case_no
I couldn't test the query, but it should work or at least point you into the right direction.
You are using COUNT(), which is an aggregate function, along with a number of fields that are not part of the GROUP BY (since there is none) or in the aggregate function (except office_code).
Now, in MySQL something like this is allowed because the engine will select one record from the group and return that (although the query cannot affect it in any way, that's usually okay). Postgresql clearly cannot. I don't use Postgresql and I can work it out.
If Postgresql has a "non-strict" mode, I suggest you enable that; otherwise, either correct your query or change database types.
I would suggest an appropriate query, if I knew what Postgresql does, and doesn't, allow.
Add a group by clause like this,
"group by case_filed_by,office_code,desg_code,court_code,court_case_no"
Now try exceuting, it will work.
The simple logic is if you want to use aggreagate function together with other columns in table, group by that columns.
Check it out and comment if works

order by sql data using condtions (case) based on derived columns

I want to put a condtion to sort SQL data based on the value of derived columns as follows:
SELECT DISTINCT sp.ID
, sp.Status
, sp.Rank
, sp.Price
, sp.SalePrice
, sp.Width
, sp.Height
, sp.QOH
, (sp.SalePrice*sp.QOH) As 'sp.Value'
, (sp.Price*sp.QOH) As 'sp.StandardValue'
FROM table
WHERE -- Conditions
ORDER BY
CASE WHEN 'sp.SalePrice' > 0 THEN 'sp.Value' END DESC,
CASE WHEN 'sp.SalePrice' = 0 THEN 'sp.StandardValue' END DESC
Gves this error:
Msg 145, Level 15, State 1, Line 1
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
if i try
ORDER BY
CASE WHEN sp.SalePrice > 0 THEN (sp.SalePrice*sp.QOH) As "sp.Value" END DESC,
CASE WHEN sp.SalePrice = 0 THEN (sp.Price*sp.QOH) As sp.StandardValue" END DESC
Gives error:
Incorrect syntax near the keyword 'As'.
it starts giving the same select distinct error if i try to remove aliases * as from order by cluase & only leave the multiplication part
I'm going to add a bunch of suggestions about general conventions below, but as to the main problem consider what you're trying to do with this example:
My_Table:
id value
1 1
2 2
3 1
Now, try to show distinct value ordering by id. Which should it be?
value
1
2
or
value
2
1
So, you're asking SQL Server to do something that may be impossible or at least unclear.
Now, as to conventions... Maybe some of this is just how you've posted here, but I would make the following suggestions for your code:
Avoid using reserved and SQL language words for names. This would include table, rank, and status.
Avoid using special characters in names. This would include sp.value. Sure, you can do it with a quoted identifier, but some front-ends, etc. might not support them even if SQL does and you don't really buy anything by using them in most cases.
Use the quoted identifier when you have to quote names. If you absolutely must violate one of the above two suggestions, use the standard quoted identifiers for SQL Server, which are [ and ]. If you want to quote aliases, use these as well (and you shouldn't have to typically quote aliases BTW). This helps to avoid the problem that Mark B points out.
Your CASE statement can be better written as one ordering column. Also, you should include an ELSE in most cases to avoid unhandled conditions. This may not be needed here as long as you can't have NULLs or negative values in any of the involved columns.
CASE
WHEN sp.SalePrice > 0 THEN (sp.SalePrice*sp.QOH)
WHEN sp.SalePrice = 0 THEN (sp.Price*sp.QOH)
END
I would personally avoid using the table alias (which it looks like you accidentally left out of your query) as part of your column aliases. It makes it much more confusing IMO because it makes it look like that aliased column is actually a column in the table.
Your second example is closest as it specifies the columns correctly. However, you cannot alias columns in the order by. This would work:
ORDER BY
CASE WHEN sp.SalePrice > 0 THEN sp.SalePrice*sp.QOH END DESC,
CASE WHEN sp.SalePrice = 0 THEN sp.Price*sp.QOH END DESC
Or alternatively just use the alias you defined in the result set:
ORDER BY
CASE WHEN sp.SalePrice > 0 THEN [sp.Value] END DESC,
CASE WHEN sp.SalePrice = 0 THEN [sp.StandardValue] END DESC
Note the brackets [].... this is required to define the column name as you used a dotted name in the alias... otherwise sp.Value would be considered to be the Value column in the sp table.
using ' around field names turns them into strings. Either remove the quotes entirely, or use " instead.