impala replacing in-clause with inner join - sql

overflow members!
I have a query like this running in impala:
SELECT
COUNT(*) AS value
FROM
myTable
WHERE
mycolumn IN ('value1', 'value2',..... 'value_n')
it seems than where n is in the thousands it makes this query to be quite slow.. I am trying
now to replace the in clause with an inner join something like this:
SELECT
COUNT(*) AS value
FROM
myTable
INNER JOIN (SELECT UNNEST(['value1', 'value2', .. 'value_n']) AS myColumnName)
ON myColumn=myColumnName
Impala complains about the use there of UNNEST;
[13:09] rdm (Guest)
Query: SELECT UNNEST([1, 2, 3]) AS myColumnName
Query submitted at: 2022-12-29 12:08:08 (Coordinator: http://rpkgh21dev147:25000)
ERROR: ParseException: Syntax error in line 1:
SELECT UNNEST([1, 2, 3]) AS myColumnName
       ^
Encountered: A reserved word cannot be used as an identifier: UNNEST
Expected: ALL, CASE, CAST, DEFAULT, DISTINCT, EXISTS, FALSE, IF, INTERVAL, LEFT, NOT, NULL, REPLACE, RIGHT, STRAIGHT_JOIN, TRUNCATE, TRUE, IDENTIFIER
I am not able to figure out how to make this.. what is the correct syntax in impala to make this? (be aware the n-values doesnt came from another table so I can not replace the literal array with another select).
Thank you very much in advance.
Roberto

Related

SQL merge using a CAST column

I want to perform a SQL merge but on columns that have been CAST in the select statement.
I have tried this code:
CREATE TABLE test
AS
SELECT
a.ID, a.curr_bus_date, a.branch_no, a.account_no,
CAST(b.sortcodeinfo AS int) AS sortcode,
CAST(b.accountnumberinfo AS int) AS accountnumber,
b.curr_bus_date
FROM
TABLE1 AS a
LEFT JOIN
TABLE2 AS b ON a.branch_no = sortcode
AND a.account_no = accountnumber
AND a.curr_bus_date = b.curr_bus_date
WHERE
MONTH(curr_bus_date) = 11
AND YEAR(curr_bus_date) = 2022
And I get the error:
DatabaseError: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 2:37: mismatched input '.'. Expecting: ',', 'EXCEPT', 'FROM', 'GROUP', 'HAVING', 'INTERSECT', 'LIMIT', 'OFFSET', 'ORDER', 'UNION', 'WHERE',
I can't quite get the syntax correct for the cast variables I wish to merge on. I need to do this casting as the variables are different types and otherwise won't merge.. The volumes in the table are too big to do the casting as separate exercises.
The names created in the select clause are not yet available in the ON join conditions. You will have to repeat the cast expressions there.
Additionally, for the WHERE clause, this will be MUCH more efficient:
WHERE curr_bus_date >= '20221101' and curr_bus_date < '20221201'
Before, the database had to run the month and year functions for the curr_bus_date column on every row in the table, even rows you won't need. Using a function or otherwise altering a field like curr_bus_date also means the result of the expression no longer matches any index values. Any index on the column would be worthless for the query. Writing the conditions so the column values remain unaltered means indexes can still work, which can be HUGE for performance.

Hive "with" clause syntax

The with syntax is absolutely not cooperating, can not get it to work. Here is a stripped down version of it
set hive.strict.checks.cartesian.product=false;
with
newd as (select
avg(risk_score_highest) risk_score_hi,
avg(risk_score_current) risk_score_cur,
from table1),
oldd as ( select
avg(risk_score_highest) risk_score_hi,
avg(risk_score_current) risk_score_cur,
from table2
where ds='2022-09-08')
select
(newd.risk_score_hi-oldd.risk_score_hi)/newd.risk_score_hi diff_risk_score_hi,
(newd.risk_score_cur-oldd.risk_score_cur)/newd.risk_score_cur diff_risk_score_cur,
from newd cross join oldd
order by 1 desc
Apache Hive Error
[Statement 2 out of 2] hive error: Error while compiling statement:
FAILED: SemanticException [Error 10004]: Invalid table alias or
column reference 'newd': (possible column names are:
diff_risk_score_hi, diff_risk_score_cur)
I had been following the general form shown here: https://stackoverflow.com/a/47351815/1056563
WITH v_text
AS
(SELECT 1 AS key, 'One' AS value),
v_roman
AS
(SELECT 1 AS key, 'I' AS value)
INSERT OVERWRITE TABLE ramesh_test
SELECT v_text.key, v_text.value, v_roman.value
FROM v_text JOIN v_roman
ON (v_text.key = v_roman.key);
I can not understand what I am missing to get the inline with views to work.
Update My query (the first one on top) works in Presto (but obviously with the set hive.strict.checks.cartesian.product=false; line removed). So hive is really hard to get happy for with clauses apparently. I tried like a dozen different ways of using the aliases.

REPLACE with JOIN - SQL

I need help to understand what I did wrong ... I'm a beginner so excuse me the simple question!
I have two tables in which I want to do a JOIN where, in one of the columns I had to use REPLACE to remove the text 'RIxRE' that does not interest me.
In table 1, this is the original text of the column id_notification: RIxRE-1787216-BSB and this is the text that returns when using REPLACE: 1787216-BSB
In column 2, this is the text that exists: 1787216-BSB
However, I get the following error:
# 1054 - Unknown column 'a.id_not' in 'on clause'
SELECT *, REPLACE(a.id_notificacao,'RIxRE','') AS id_not
FROM robo_qualinet_cadastro_remedy a
JOIN (SELECT * FROM painel_monitoracao) b ON a.id_not = b.id_notificacao
You cannot use a column alias again in the FROM clause or the WHERE clause after the SELECT (and possibly not other clauses as well, depending on the database).
So, repeat the expression:
SELECT *, REPLACE(a.id_notificacao, 'RIxRE', '') AS id_not
FROM robo_qualinet_cadastro_remedy rqcr JOIN
painel_monitoracao pm
ON REPLACE(rqcr.id_notificacao, 'RIxRE', '') = pm.id_notificacao;
Notes:
Use table aliases the mean something, such as abbreviations for the able names.
The subquery is not necessary in the FROM clause.
I suspect that you have a problem with your data model if you need a REPLACE() for the JOIN condition, but that is a different issue from this question.

Issue using MINUS clause in SQL

While using minus clause in between two statements giving some error. Can someone help me with this?
Error is Msg 102, Level 15, State 1, Line 101
Incorrect syntax near 'MINUS'.
SELECT a from (SELECT DISTINCT(name) as a FROM hack WHERE name LIKE '%') a
MINUS
SELECT b from (SELECT DISTINCT(name) as b FROM hack WHERE name LIKE '[aeiou]%[aeiou]') b
MINUS is exist in Oracle. By seeing your error message, I hope you are looking in SQL Server.
In SQL Server, EXCEPT is the correct replacement for MINUS.
SELECT DISTINCT name
FROM hack
WHERE name LIKE '%'
EXCEPT
SELECT DISTINCT name
FROM hack
WHERE name LIKE '[aeiou]%[aeiou]'
You can simplify the logic to:
SELECT DISTINCT name
FROM hac
WHERE name NOT LIKE '[aeiou]%[aeiou]'
A simple comparison should be much more efficient that multiple comparisons along with set operators.

SQL Server minus query gives issue?

I'm trying to compare two table's values for difference (I suspect for two TankSystemIds containing same data)
My query is
SELECT *
FROM [dbo].[vwRawSaleTransaction]
WHERE hdTankSystemId = 2782
MINUS
SELECT *
FROM [dbo].[vwRawSaleTransaction]
WHERE hdTankSystemId = 2380
But I get an error about syntax issues:
Incorrect syntax near 'minus'
But this is right[1]?
[1] https://www.techonthenet.com/sql/minus.php
Quoted in your link.
For databases such as SQL Server, PostgreSQL, and SQLite, use the EXCEPT operator to perform this type of query.
For your case, it seems like you are looking for duplicated data, intersect should be used instead.
Also, INTERSECT statement like
SELECT
EXPRESSION_1, EXPRESSION_2, ..., EXPRESSION_N
FROM
TABLE_A
INTERSECT
SELECT
EXPRESSION_1, EXPRESSION_2, ..., EXPRESSION_N
FROM
TABLE_B
can be written as
SELECT
TABLE_A.EXPRESSION_1, TABLE_A.EXPRESSION_2, ..., TABLE_A.EXPRESSION_N
FROM
TABLE_A
INNER JOIN
TABLE_B
ON
TABLE_A.EXPRESSION_1 = TABLE_B.EXPRESSION_1
AND TABLE_A.EXPRESSION_2 = TABLE_B.EXPRESSION_2
.
.
.
AMD TABLE_A.EXPRESSION_N = TABLE_B.EXPRESSION_N
If you use select * from the same table with a different where condition then intersect them, you are not going to get any rows as they have different value on the specific column used in where condition.