Hive conditionally select column name - hive

I have multiple tables with very similar schema except one column, which can have different names.
I want to make some complicated calculations using Hive and would like to have one code for all tables with possible parametrisation. For some reasons, I can't parametrise queries using language like Python, Scala etc, so decided to go with pure Hive SQL.
I want to conditionally select appropriate column, but it seems, that Hive evaluates all parts of conditional expression/statement regardless of condition.
What did I wrong?
DROP TABLE IF EXISTS `so_sample`;
CREATE TABLE `so_sample` (
`app_version` string
);
SELECT
if (true, app_version, software_version) AS firmware
FROM so_sample
;
Output:
Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 2:25 Invalid table alias or column reference 'software_version': (possible column names are: app_version) (state=42000,code=10004)
Regards
Pawel

Try to use regex to select the column with different names, for more information see manual and don't forget
set hive.support.quoted.identifiers=none;

Related

How can you filter Snowflake EXPLAIN AS TABULAR syntax when its embedded in the TABLE function? Can you filter it with anything?

I have a table named Posts I would like to count and profile in Snowflake using the current Snowsight UI.
When I return the results via EXPLAIN using TABLULAR I am able to return the set with the combination of TABLE, RESULT_SCAN, and LAST_QUERY_ID functions, but any predicate or filter or column reference seems to fail.
Is there a valid way to do this in Snowflake with the TABLE function or is there another way to query the output of the EXPLAIN using TABLULAR?
-- Works
EXPLAIN using TABULAR SELECT COUNT(*) from Posts;
-- Works
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t;
-- Does not work
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t where operation = 'GlobalStats';
-- invalid identifier 'OPERATION', the column does not seem recognized.
Tried the third example and expected the predicate to apply to the function output. I don't understand why the filter works on some TABLE() results and not others.
You need to double quote the column name
where "operation"=
From the Documentation
Note that because the output column names from the DESC USER command
were generated in lowercase, the commands use delimited identifier
notation (double quotes) around the column names in the query to
ensure that the column names in the query match the column names in
the output that was scanned

psql column doesn't exist but it does

I am trying to select a single column in my data table using raw SQL in a postgresql database from the psql command line. I am getting an error message that says the column does not exist. Then it gives me a hint to use the exact column that I referenced in the select statement. Here is the query:
SELECT insider_app_ownershipdocument.transactionDate FROM insider_app_ownershipdocument;
Here is the error message:
ERROR: column insider_app_ownershipdocument.transactiondate does not exist
SELECT insider_app_ownershipdocument.transactionDate FROM in...
HINT: Perhaps you meant to reference the column "insider_app_ownershipdocument.transactionDate".
I have no idea why this is not working.
(Postgres) SQL converts names automatically to lower case although it support case-sensitive names. So
SELECT insider_app_ownershipdocument.transactionDate FROM insider_app_ownershipdocument;
will be aquivalent to:
SELECT insider_app_ownershipdocument.transactiondate FROM insider_app_ownershipdocument;
You should protect the column name with double quotes to avoid this effect:
SELECT insider_app_ownershipdocument."transactionDate" FROM insider_app_ownershipdocument;

Derby run sql script by redirecting standardIO

I have the following join.sql script:
connect 'jdbc:derby:barra';
show tables;
create table sp500_univ as
select a.*,b.* from (select * from LEFT_SIDE) as a
left join (select * from RIGHT_SIDE) as b
on a.cmp_flg = b.cmp_flg2;
disconnect;
exit;
which I run with the following command:
java org.apache.derby.tools.ij < join.sql
and get the following output:
java org.apache.derby.tools.ij < join.sql
ij version 10.14
ij> ij> TABLE_SCHEM |TABLE_NAME |REMARKS
------------------------------------------------------------------------
APP |LEFT_SIDE |
APP |RIGHT_SIDE |
2 rows selected
ij> > > > ERROR 42X01: Syntax error: Encountered "<EOF>" at line 4, column 25.
Issue the 'help' command for general information on IJ command syntax.
Any unrecognized commands are treated as potential SQL commands and executed directly.
Consult your DBMS server reference documentation for details of the SQL syntax supported by your server.
ij> ij>
If I run this sql right from the command line in IJ it works.
apparently when running from a file you can't create tables and load data from a select statement. You need to add the WITH NO DATA. The WITH DATA option has not yet been implemented. From Derby's documentation:
CREATE TABLE ... AS ...
With the alternate form of the CREATE TABLE statement, the column names and/or the
column data types can be specified by providing a query. The columns in the query
result are used as a model for creating the columns in the new table.
If no column names are specified for the new table, then all the columns in the
result of the query expression are used to create same-named columns in the new
table, of the corresponding data type(s). If one or more column names are specified
for the new table, then the same number of columns must be present in the result of
the query expression; the data types of those columns are used for the corresponding
columns of the new table.
The WITH NO DATA clause specifies that the data rows which result from evaluating the
query expression are not used; only the names and data types of the columns in the
query result are used. The WITH NO DATA clause must be specified; in a future
release, Derby may be modified to allow the WITH DATA clause to be provided, which
would indicate that the results of the query expression should be inserted into the
newly-created table. In the current release, however, only the WITH NO DATA form of t
the statement is accepted.

SELECT database.table.column in Hive

Is it possible to use
SELECT DB.TABLE.COLUMN from DB.TABLE
in Hive?
I know it's possible to alias DB.TABLE as follows
SELECT T1.COLUMN FROM DB.TABLE AS T1
But, is there any way in Hive to select a column fully qualified by its database and table name, as shown in the first query above? I've done this before in MySQL but I don't know if there's a way to make Hive work this way.
No, that is not possible in Hive, you will get an exception:
SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'DB': (possible column names are: col)
And your second select sentence is valid.
To specify a database, either qualify the table names with database names ("db_name.table_name" starting in Hive 0.7) or issue the USE statement before the query statement (starting in Hive 0.6).
See language manual here: LanguageManual+Select

Inserting new rows into table-1 based on constraints defined on table-2 and table-3

I want to append new rows to a table-1 d:\dl based on the equality constraint lower(rdl.subdir) = lower(tr.n1), where rdl and tr would be prospective aliases for f:\rdl and f:\tr tables respectively.
I get a function name is missing ). message when running the following command in VFP9:
INSERT INTO d:\dl SELECT * FROM f:\rdl WHERE (select LOWER(subdir)FROM f:\rdl in (select LOWER(n1) FROM f:\tr))
I am using the in syntax, instead of the alias based equality statement lower(rdl.subdir) = lower(tr.n1) because I do not know where to define aliases within this command.
In general, the best way to get something like this working is to first make the query work and give you the results you want, and then use it in INSERT.
In general, in SQL commands you assign aliases by putting them after the table name, with or without the keyword AS. In this case, you don't need aliases because the ones you want are the same as the table names and that's the default.
If what you're showing is your exact code and you're running it in VFP, the first problem is that you're missing the continuation character between lines.
You're definitely doing too much work, too. Try this:
INSERT INTO d:\dl ;
SELECT * ;
FROM f:\rdl ;
JOIN f:\tr ;
ON LOWER(rdl.subdir) = LOWER(tr.n1)