I have a database which has a view created off other views which are created off other views (a data engineer built the views not me)
In Hive I can do this but its slow, so I want to use Impala
select * from table limit 5;
In Impala I get an error, have tried invalidate metadata and refresh with no luck.
"ERROR: AnalysisException: No matching function with signature: lower(BIGINT)."
what reason could this happen? Never seen this type of error before. Is there a way to do this recursively?
show create table;
To begin with, be aware that Hive and Impala are distinct solutions, with distinct SQL parsers, supporting a distinct set of functions and features. A syntax that is valid in Hive may not be valid in Impala. Some table formats defined with Hive may not be supported by Impala (e.g. ORC, or Parquet with a BINARY column).
In this specific case, the Hive documentation appears to match the Impala documentation for function lower() (caveat: check what versions you are using).
But there's a big catch: lower() takes a String and produces a String. It is not a number function. That smells like a gross mistake such as confusing lower() -- convert some text to lowercase -- and floor() -- get the integer value that is equal or less to a decimal value.
Check with your so-called Data Engineer what he/she was trying to do, and make sure the views were properly tested (or are properly tested after the correction is made). Hive clearly applies some implicit type conversions that enable queries to run, even though it makes no sense and produces goofy results.
Related
Folks
I am in the process of moving a decade old back-end from DB2 9.5 to Oracle 19c.
I frequently see in SQL queries and veiw definitions bizarre timestamp(nullif('','')) constructs used instead of a plain null.
What is the point of doing so? Why would anyone in their same mind would want to do so?
Disclaimer: my SQL skills are fairly mediocre. I might well miss something obvious.
It appears to create a NULL value with a TIMESTAMP data type.
The TIMESTAMP DB2 documentation states:
TIMESTAMP scalar function
The TIMESTAMP function returns a timestamp from a value or a pair of values.
TIMESTAMP(expression1, [expression2])
expression1 and expression2
The rules for the arguments depend on whether expression2 is specified and the data type of expression2.
If only one argument is specified it must be an expression that returns a value of one of the following built-in data types: a DATE, a TIMESTAMP, or a character string that is not a CLOB.
If you try to pass an untyped NULL to the TIMESTAMP function:
TIMESTAMP(NULL)
Then you get the error:
The invocation of routine "TIMESTAMP" is ambiguous. The argument in position "1" does not have a best fit.
To invoke the function, you need to pass one of the required DATE, TIMESTAMP or a non-CLOB string to the function which means that you need to coerce the NULL to have one of those types.
This could be:
TIMESTAMP(CAST(NULL AS VARCHAR(14)))
TIMESTAMP(NULLIF('',''))
Using NULLIF is more confusing but, if I have to try to make an excuse for using it, is slightly less to type than casting a NULL to a string.
The equivalent in Oracle would be:
CAST(NULL AS TIMESTAMP)
This also works in DB2 (and is even less to type).
It is not clear why - in any SQL dialect, no matter how old - one would use an argument like nullif('',''). Regardless of the result, that is a constant that can be calculated once and for all, and given as argument to timestamp(). Very likely, it should be null in any dialect and any version. So that should be the same as timestamp(null). The code you found suggests that whoever wrote it didn't know what they were doing.
One might need to write something like that - rather than a plain null - to get null of a specific data type. Even though "theoretical" SQL says null does not have a data type, you may need something like that, for example in a view, to define the data type of the column defined by an expression like that.
In Oracle you can use the cast() function, as MT0 demonstrated already - that is by far the most common and most elegant equivalent.
If you want something much closer in spirit to what you saw in that old code, to_timestamp(null) will have the same effect. No reason to write something more complicated for null given as argument, though - along the lines of that nullif() call.
Background:
Our group is going through a Cloudera upgrade to 6.1.1 and I have been tasked with determining how to handle the loss of the implicit data type conversion across data types. See link below for the relevant Release Note details.
https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_611_incompatible_changes.html#hive_union_all_returns_incorrect_data
Not only does this issue affect UNION ALL queries, but there is a function that performs comparisons on columns of different data types (i.e, STRING to BIGINT).
The group has decided that we do not want to change the underlying table meta data. So the solution is to allow for potential data loss by using the CAST() function to cast the data. In the case of UNION ALL, we cast to the destination table's meta data. But, when performing comparisons, I am trying to determine the simplest and easiest way to perform comparisons without getting erroneous results.
Question:
Can I simply cast everything to either STRING or VARCHAR() when performing the comparison? Are there any potential problems that might create incorrect results?
Update:
If there are problems with this approach, is there a correct solution to handle this?
Note: this is my first engagement working with Hadoop/HIVE and I have learned that everything I know in RDBMS land does not always apply.
It is possible that you will have problems. For instance, if comparing a string to an int, then:
'1.00' = 1 --> true, because the values are compared as numbers
But as strings:
'1.00' = '1' --> false, because the values are compared as strings
You can get similar issues with dates, I think.
Let me explain why I want to do this... I have built a Tableau dashboard that allows a user to browse/search all of the tables & columns in our warehouse by schema, object type (table,view,materialized view), etc. I want to add a column that pulls a sample of the data from each column in each table - this is also done, but with this problem...:
The resulting column is comprised of data of different types (varchar2, LONG, etc.). I can basically get every type of data to conform to a single data type except for LONG - it will not allow me to convert it to anything else compatible with everything else (if that makes sense...). I simply need all data types to coexist in a single column. I've tried many different things and have been reading up on the subject for about a week now, but it sounds like it just can't be done, but in my experience there is always a way... I figured I'd check with the guru's here before admitting defeat.
One of the things I've tried:
--Here, from two different tables, I'm pulling a single piece of data from a single column and attempting to merge into a single column called SAMPLE_DATA
--OTHER is LONG data type
--ORGN_NME is VARCHAR2 data type
select 'PLAN','OTHER', cast(substr(OTHER,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.PLAN union all
select 'BUS_ORGN','ORGN_NME', cast(substr(ORGN_NME,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.BUS_ORGN;
Resulting error:
Lookup Error
ORA-00932: inconsistent datatypes: expected CHAR got LONG
How can I achieve this?
Thanks in advance
Long datatypes are basically unusable by most applications. I made something similar where I wanted to search the contents of packages. The solution is to convert the LONG into CLOB using a pipelined function. Adrian Billington's source code can be found here:
https://github.com/oracle-developer/dla
You end up with a view that you can query. I did not see any performance hit even when looking at large packages so it should work for you.
I'm trying to extract data (using SPUFI) from a DB2 table to a file, with one of the output fields converting a decimal field to the same format as a COBOL comp field.
So e.g. today's date (20141007) would be ..ëõ
The SQL HEX function converts 20141007 to 013353CF, and doing a SELECT of x'013353CF' gives me the desired result, but obviously that's a constant, I'm trying to find an equivalent function.
Basically an inverse of the HEX function.
I've come across a couple of suggestions using user defined functions. Problem is, we've only recently upgraded to DB2 10 and new function mode isn't enabled yet, which means I don't have access to any control functions in a UDF.
I suspect I'm out of luck, but wondering if anyone has any suggestions.
I appreciate this is completely the wrong tool for the job, and would be easier to just write a COBOL program to do it, but various constraints are preventing that. I'm limited to just SQL functions and possibly JCL).
I thought I had a solution using a recursive UDF to get around the lack of control functions, but that's not allowed either.
I have an Informix 11.70 database.I am unable to sucessfully execute this insert statement on a table.
INSERT INTO some_table(
col1,
col2,
text_col,
col3)
VALUES(
5,
50,
CAST('"id","title1","title2"
"row1","some data","some other data"
"row2","some data","some other"' AS TEXT),
3);
The error I receive is:
[Error Code: -9634, SQL State: IX000] No cast from char to text.
I found that I should add this statement in order to allow using new lines in text literals, so I added this above the same query I have already written:
EXECUTE PROCEDURE IFX_ALLOW_NEWLINE('t');
Still, I receive the same error.
I have also read the IBM documentation that says: to alternatively allow new lines, I could set the ALLOW_NEWLINE parameter in the ONCONFIG file. I suppose the last one requires administrative access to the server to alter that config file, which I do not have, and I prefer not to take advantage of this setting.
Informix's TEXT (and BYTE) columns pre-date any standard, and are in many ways very peculiar types. TEXT in Informix is very different from TEXT found in other DBMS. One of the long-standing (over 20 years) problems with them is that there isn't a string literal notation that can be used to insert data into them. The 'No cast from char to text' is saying there is no explicit conversion from string literal to TEXT, either.
You have a variety of options:
Use LVARCHAR in the table (good if your values won't be longer than a few KiB, because the total row length is approximately 32 KiB). Maximum size of an LVARCHAR column is just under 32 KiB.
Use a programming language which can handle Informix 'locator' structures — in ESQL/C, the type used to hold a TEXT is loc_t.
Consider using CLOB instead. However, this has the same limitation (no string to CLOB conversion), but you'd be able to use the FILETOCLOB() function to get the information from a file on the client to the database (and LOTOFILE transfers information from the DB to a file on the client).
If you can use LVARCHAR, that is by far the simplest alternative.
I forgot to mention an important detail in the question - I use Java and the Hibernate ORM to access my Informix database, thus some of the suggested approaches (the loc_t handling in particular) in Jonathan Leffler's answer are unfortunately not applicable. Also, I need to store large data of dynamic length and I fear the LVARCHAR column would not be sufficient to hold it.
The way I got it working was to follow Michał Niklas's suggestion from his comment, and use PreparedStatement. This could potentially be explained by Informix handing the TEXT data type in its own manner.