Impala: error while adding variable in query - variables

I tried to set a variable in impala query, but I am getting the following error. I dont know how to solve it.
set var:id = "it"
select *
from prs_nafisa.rfm_data
where id=${VAR:id};
SQL Error [500051] [HY000]: [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error in line 1:
set var:id = "it"
^
Encountered: :
Expected: ADD, ALTER, AND, ARRAY, AS, ASC, BETWEEN, BIGINT, BINARY, BLOCK_SIZE, BOOLEAN, CACHED, CASCADE, CHANGE, CHAR, COMMENT, COMPRESSION, CROSS, DATE, DATETIME, DECIMAL, DEFAULT, DESC, DIV, REAL, DROP, ELSE, ENCODING, END, EXCEPT, FLOAT, FOLLOWING, FOR, FROM, FULL, GROUP, IGNORE, HAVING, ILIKE, IN, INNER, INTEGER, INTERSECT, IREGEXP, IS, JOIN, LEFT, LIKE, LIMIT, LOCATION, ||, MANAGEDLOCATION, MAP, MINUS, NOT, NULL, NULLS, OFFSET, ON, OR, ORDER, PARTITION, PARTITIONED, PRECEDING, PRIMARY, PURGE, RANGE, RECOVER, REGEXP, RENAME, REPLACE, RESTRICT, RIGHT, RLIKE, ROW, ROWS, SELECT, SET, SMALLINT, SORT, STORED, STRAIGHT_JOIN, STRING, STRUCT, TABLESAMPLE, TBLPROPERTIES, THEN, TIMESTAMP, TINYINT, TO, UNCACHED, UNION, UNSET, USING, VALUES, VARCHAR, WHEN, WHERE, WITH, COMMA, IDENTIFIER
CAUSED BY: Exception: Syntax error
), Query: set var:id = "it"
select * from prs_nafisa.rfm_data where id="it".

You need to use it like this
$ impala-shell --quiet --var=vid="it"
[impala] > select * from prs_nafisa.rfm_data where id=${var:vid};

Related

Msg 245, Level 16, State 1, Line 4 Conversion failed when converting the nvarchar value '239.6' to data type int

I have this query:
SELECT SerialNumber
FROM [ETEL-PRDSQL].[ERP10DBLIVE].[ERP].[SerialNo]
WHERE CustNum IN (2);
It's causing this error:
Msg 245, Level 16, State 1, Line 4
Conversion failed when converting the nvarchar value '239.6' to data type int.
The query works if I compare CustNum with a different value, but it fails when I try CustNum IN (2).
How can I fix this?
You have a varchar column named CustNum. The varchar values in this column may contain only digits, but that doesn't make them numbers! Then you compare this text column with the integer value 2. Again, the integer value 2 is not the same as the text value '2'. It's also not the same as the floating point value 2.0. These are all different, they have different types, and SQL Server must resolve any such differences before it can compare values.
Based on type precedence rules SQL Server determines it needs to convert the text in the column to the integer, instead of vice versa. Once this determination is made for the query, if you have any data in the text column that is not integer-compatible, the query is going to fail.
It's important to understand this conversion happens separately from the conditional check in the WHERE clause, and is a prerequisite for that check. It's not enough to expect the WHERE condition to evaluate to FALSE for rows that do not convert. This is true even if you don't need the row, because SQL Server can't know you don't need that row until after it attempts the conversion!
In this case, we have the value 293.6. This value may be numeric, but it is not an integer. Nor is it convertible to integer. Therefore the query fails.
In addition to (eventually!) failing the query, this is absolutely awful for performance. SQL Server has to do this conversion for every row in the table... even rows you don't need. This is because SQL Server doesn't know which rows will match the WHERE clause until after it checks the conditional expression, and it needs to do this conversion in order to make that check. Worse still, the new converted value no longer matches your indexes, so any indexes you might have become worthless for this query. That cuts to the core of database performance.
If you don't like it, define your data types better, or trying comparing the string with another string:
SELECT SerialNumber
FROM [ETEL-PRDSQL].[ERP10DBLIVE].[ERP].[SerialNo]
WHERE CustNum IN ('2');
The query might also run if you did this:
SELECT SerialNumber
FROM [ETEL-PRDSQL].[ERP10DBLIVE].[ERP].[SerialNo]
WHERE CustNum IN (2.0);
Now the type precedence rules will convert your text to a floating point type, and it's possible that will succeed if the rest of the values in the table are compatible. It's also possible this is closer to what you intend... but again, the performance here will be much worse.

How to avoid performance degradation when run query with cast in where clause?

I have a table with 2 varchar columns (name and value)
and I have such a query:
select * from attribute
where name = 'width' and cast( value as integer) > 12
This query works but I suppose there are could be an issue with execution plan because of index build over value column because it is technically varchar but we convert it to integer.
Are there ways to fix it ?
P.S. I can't change type to int because the database design implies that value could be any type.
Performance should not be your first worry here.
Your statement is prone to failures. You read this as:
read all rows with name = 'width'
of these rows cast all values to integer and only keep those with a value graeter than 12
But the DBMS is free to check conditions in the WHERE clause in any order. If the DBMS does that:
cast all values to integer and only keep the rows with a value graeter than 12
of these rows keep all with name = 'width'
the first step will already cause a runtime error, if there is a non-integer value in that table, which is likely.
So first get your query safe. The following should work:
select *
from
(
select *
from attribute
where name = 'width'
) widths
where cast(value as integer) > 12;
This will still fail, when your width contains non-integers. So, to get this even safe in case of invalid data in the table, you may want to add a check that the value only contains digits in the subquery.
And yes, this won't become super-fast. You sacrifice speed (and data consistency checks) for flexibility with this data model.
What you can do, however, is create an index on both columns, so the DBMS can quickly find all width rows and then have the value directly at hand, before it accesses the table:
create index idx on attribute (name, value);
As far as I know, there is no fail-safe cast function in PostgreSQL. Otherwise you could use this and have a function index instead. I may be wrong, so maybe someone can come up with a better solution here.

UNION ALL Statement Failing

I am trying to UNION ALL around 20 or so tables to consolidate into a single view. I keep getting an error that states:
'The numeric value XXXXX is not recognized'.
This error is contained in 1 column in each of the tables, but the data type for that column is VARCHAR(256) in each of the tables. No matter what I cast the column to I still get the same error.
The UNION ALL works perfectly if I comment that column out.
I've tried casting all columns to the same datatype, no luck.
I've tried commenting out the column in question, which works, but I need that column.
I've tried only UNION-ing a few of the tables, which sometimes works and sometimes doesn't, depending on the document type.
SELECT
CAST(QUICKBOOKS_MEXICO.BILL_LINE.DESCRIPTION AS VARCHAR(256)) AS DESCRIPTION
FROM QUICKBOOKS_MEXICO.BILL_LINE
UNION ALL
SELECT
CAST(QUICKBOOKS_EUROPE_BV.PURCHASE_LINE.DESCRIPTION AS VARCHAR(256)) AS DESCRIPTION
FROM QUICKBOOKS_EUROPE_BV.PURCHASE_LINE
The columns should seamlessly UNION.
Here is the error message:
Numeric value 'Exchange Gain Or Loss' is not recognized
It's worth mentioning that if I remove all the other fields BESIDES the column that is throwing the error from the query, it performs just fine. Truly baffling!
It seems to me that the error message that you get is not related to this column, but with another column where you might be casting the data as NUMERIC, but instead getting this value 'Exchange Gain Or Loss'.
One way to ignore this conversion error, is to use TRY_CAST instead of CAST, so when a value can not be converted to the intended data type, it will simply return NULL

Impala ODBC Driver Syntax Error (Encountered DECIMAL LITERAL)

I'm attempting to do a simple INSERT INTO statement on an Impala table that has the following schema:
field1 (date)
field2 (string)
field3 (string)
field4 (string)
field5 (string)
field6 (bigint)
I am using Impala pyODBC drivers to do this. Here's my query
INSERT INTO testdb.mydata VALUES('2018-06-20', 'field1', 'field2', 'field3', 'field4', 'field5', 1000000)
However I keep getting the following error and I don't understand why! It's an AnalysisException and Syntax error which is super general and I just cannot pinpoint where the issue is. I am following the format specified by Cloudera's documentation here: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_insert.html
AnalysisException: Syntax error in line 1:undefined: ...INTO
testdb.mydata VALUES('2018-06... ^ Encountered: DECIMAL LITERAL
Expected: ADD, ALTER, AND, ARRAY, AS, ASC, BETWEEN, BIGINT, BINARY,
BLOCK_SIZE, BOOLEAN, CACHED, CASCADE, CHANGE, CHAR, COMMENT,
COMPRESSION, CROSS, DATE, DATETIME, DECIMAL, DEFAULT, DESC, DIV, REAL,
DROP, ELSE, ENCODING, END, FLOAT, FOLLOWING, FROM, FULL, GROUP,
IGNORE, HAVING, ILIKE, IN, INNER, INTEGER, IREGEXP, IS, JOIN, LEFT,
LIKE, LIMIT, LOCATION, MAP, NOT, NULL, NULLS, OFFSET, ON, OR, ORDER,
PARTITION, PARTITIONED, PRECEDING, PRIMARY, PURGE, RANGE, RECOVER,
REGEXP, RENAME, REPLACE, RESTRICT, RIGHT, RLIKE, ROW, ROWS, SELECT,
SET, SMALLINT, SORT, STORED, STRAIGHT_JOIN, STRING, STRUCT,
TABLESAMPLE, TBLPROPERTIES, THEN, TIMESTAMP, TINYINT, TO, UNCACHED,
UNION, USING, VALUES, VARCHAR, WHEN, WHERE, WITH, COMMA, IDENTIFIER
CAUSED BY: Exception: Syntax error\n (110)
I also tried referencing the column names as follows:
INSERT INTO testdb.mydata(field1, field2, field3, field4, field5, field6) VALUES(....) but that yielded the same error message.
Other answers on here don't seem to address this specific error. Any guidance would be super appreciated, thanks!

Fetching records which have either of two values in an array column in postgres

I have an array_agg column in postgresql which has values like these:
"{Metabolic/Endocrinology}"
"{Cardiovascular}"
"{Oncology}"
"{Autoimmune/Inflammation}"
Basically a string variable being array_agg by an id.
Now I want to fetch all records from this table where either of Oncology or Autoimmune/Inflammation is present.
I am doing something like this but I am not sure why it is throwing an error.
select * from Table where id = ANY('{Oncology,Autoimmune/Inflammation}')
It throws the following error.
ERROR: operator does not exist: text[] = text
SQL state: 42883
Hint: No operator matches the given name and argument type(s). You may need to add explicit type casts.
Character: 67
Please note I have also used ::TEXT [] and it still gives an error.
You want to use the array-overlaps operator &&.
See array operators.
e.g.
select * from (
VALUES
(ARRAY['Oncology','Pediatrics']),
(ARRAY['Autoimmune/Inflammation','Oncology']),
(ARRAY['Autoimmune/Inflammation']),
(ARRAY['Pediatrics']),
(ARRAY[]::text[])
) "Table"(id)
where id && ARRAY['Oncology','Autoimmune/Inflammation'];
By the way, I suggest using the SQL-standard ARRAY[...] constructor where possible.
Also, it's almost certainly a terrible idea to have an id column (presumably a primary key, if not, the name is confusing) defined as an array type.