I would like to run a hive query to be able to divide a column from one table by the total sum of a column from another table.
Do I have to join the tables?
The code below generates errors:
Select 100*(Num_files/total_Num_files) from jvros_p2, jvros_p3;
FAILED: Parse Error: line 1:75 mismatched input ',' expecting EOF near 'jvros_p2'
Yes, jvros_p3 is a single row single column table
Num_files is a column in jvros_p2 and total_Num_files is a single value in jvros_p3.
Your older version may be why your notation isn't working. Try this:
SELECT 100 * (Num_files / total_Num_files) FROM jvros_p2 JOIN jvros_p3;
I suspect that if you are eventually able to upgrade to at least 0.13, implicit join notation via comma-separated tables will be supported per HIVE-5558.
Related
I have multiple tables with very similar schema except one column, which can have different names.
I want to make some complicated calculations using Hive and would like to have one code for all tables with possible parametrisation. For some reasons, I can't parametrise queries using language like Python, Scala etc, so decided to go with pure Hive SQL.
I want to conditionally select appropriate column, but it seems, that Hive evaluates all parts of conditional expression/statement regardless of condition.
What did I wrong?
DROP TABLE IF EXISTS `so_sample`;
CREATE TABLE `so_sample` (
`app_version` string
);
SELECT
if (true, app_version, software_version) AS firmware
FROM so_sample
;
Output:
Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 2:25 Invalid table alias or column reference 'software_version': (possible column names are: app_version) (state=42000,code=10004)
Regards
Pawel
Try to use regex to select the column with different names, for more information see manual and don't forget
set hive.support.quoted.identifiers=none;
SQLite
There are multiple databases, one database for each time period (i.e. quarter). The column headers in each table are the same. Some of the columns. The data is identical between databases (e.g. ID, Name, Address, State, Website, etc). Some of the columns, the column header is the same but the
data in the column is different between databases.
The goal is to:
Select multiple columns from multiple databases, sum each column, convert the output from 000000000 to $000,000,000,000, adding three zero's to the output
(currently the data is represented in 000's).
Following is an iteration of queries that work, ending in the queries that fail.
Selecting one column from one database. This query works.
select dep
From AllReports19921231AssetsAndLiabilities;
output
"11005"
"34396"
"42244"
Adding a sum(columnName) method to this same query works.
select sum(dep)
From AllReports19921231AssetsAndLiabilities;
results: 3562807353
Attempting to sum(columnName) from multiple databases causes an error.
select sum(dep)
From AllReports19921231AssetsAndLiabilities,
AllReports19930331AssetsAndLiabilities;
error:
ambiguous column name: dep: select sum(dep)
From AllReports19921231AssetsAndLiabilities,
AllReports19930331AssetsAndLiabilities;
Using dot notation to attach a database to a column. Query works.
select AllReports19921231AssetsAndLiabilities.dep
From AllReports19921231AssetsAndLiabilities;
Output:
"11005"
"34396"
"42244"
However when I attempt to include dot notation and add sum(columnName) to the query, it fails.
select AllReports19921231AssetsAndLiabilities.sum(dep)
From AllReports19921231AssetsAndLiabilities;
I receive this error:
near "(": syntax error: select AllReports19921231AssetsAndLiabilities.sum(
What are correct ways to write this query?
The end goal is to select the same columns (e.g. col1, col2, col3, etc) from multiple databases (Q1, Q2, Q3, Q4).
Sum each column, add three zero's the output, then convert from 000000000 to $000,000,000,000
Note: There are 103 databases (i.e. one for each time period/quarter).
select AllReports19921231AssetsAndLiabilities.sum(dep),
AllReports19930331AssetsAndLiabilities.sum(dep),
AllReports19930630AssetsAndLiabilities.sum(dep)
From AllReports19921231AssetsAndLiabilities,
AllReports19930331AssetsAndLiabilities,
AllReports19930630AssetsAndLiabilities;
The above query outputs an error:
near "(": syntax error: select AllReports19921231AssetsAndLiabilities.sum(
Your syntax is wrong :
select sum(AllReports19921231AssetsAndLiabilities.dep)
From AllReports19921231AssetsAndLiabilities
Learn to use aliases!
select sum(aal.dep)
From AllReports19921231AssetsAndLiabilities aal;
The query is much easier to write and to read. The table alias (whether the full table name or an abbreviation) is attached to the column name. In SQL, this results in a qualified column reference. The qualification specifies what table it is coming from.
The table alias is not attached to a function, because SQL does not currently allow tables to contain functions.
I have the following join.sql script:
connect 'jdbc:derby:barra';
show tables;
create table sp500_univ as
select a.*,b.* from (select * from LEFT_SIDE) as a
left join (select * from RIGHT_SIDE) as b
on a.cmp_flg = b.cmp_flg2;
disconnect;
exit;
which I run with the following command:
java org.apache.derby.tools.ij < join.sql
and get the following output:
java org.apache.derby.tools.ij < join.sql
ij version 10.14
ij> ij> TABLE_SCHEM |TABLE_NAME |REMARKS
------------------------------------------------------------------------
APP |LEFT_SIDE |
APP |RIGHT_SIDE |
2 rows selected
ij> > > > ERROR 42X01: Syntax error: Encountered "<EOF>" at line 4, column 25.
Issue the 'help' command for general information on IJ command syntax.
Any unrecognized commands are treated as potential SQL commands and executed directly.
Consult your DBMS server reference documentation for details of the SQL syntax supported by your server.
ij> ij>
If I run this sql right from the command line in IJ it works.
apparently when running from a file you can't create tables and load data from a select statement. You need to add the WITH NO DATA. The WITH DATA option has not yet been implemented. From Derby's documentation:
CREATE TABLE ... AS ...
With the alternate form of the CREATE TABLE statement, the column names and/or the
column data types can be specified by providing a query. The columns in the query
result are used as a model for creating the columns in the new table.
If no column names are specified for the new table, then all the columns in the
result of the query expression are used to create same-named columns in the new
table, of the corresponding data type(s). If one or more column names are specified
for the new table, then the same number of columns must be present in the result of
the query expression; the data types of those columns are used for the corresponding
columns of the new table.
The WITH NO DATA clause specifies that the data rows which result from evaluating the
query expression are not used; only the names and data types of the columns in the
query result are used. The WITH NO DATA clause must be specified; in a future
release, Derby may be modified to allow the WITH DATA clause to be provided, which
would indicate that the results of the query expression should be inserted into the
newly-created table. In the current release, however, only the WITH NO DATA form of t
the statement is accepted.
Two tables are identical in terms of table name, column names, datatype and size. These tables are located in separate databases, but I am use to
current Log in in hr user.
insert into abc.employees select * from employees where employee_id=100;
I can not give use original query from corporate office.
Error starting at line 1 in command:
insert into abc.employees select * from employees where employee_id=100;
Error at Command Line:1 Column:25
Error report:
SQL Error: ORA-00913: too many values
00913. 00000 - "too many values"
*Cause:
*Action:
You should specify column names as below. It's good practice and probably solve your problem
insert into abc.employees (col1,col2)
select col1,col2 from employees where employee_id=100;
EDIT:
As you said employees has 112 columns (sic!) try to run below select to compare both tables' columns
select *
from ALL_TAB_COLUMNS ATC1
left join ALL_TAB_COLUMNS ATC2 on ATC1.COLUMN_NAME = ATC1.COLUMN_NAME
and ATC1.owner = UPPER('2nd owner')
where ATC1.owner = UPPER('abc')
and ATC2.COLUMN_NAME is null
AND ATC1.TABLE_NAME = 'employees'
and than you should upgrade your tables to have the same structure.
The 00947 message indicates that the record which you are trying to send to Oracle lacks one or more of the columns which was included at the time the table was created.
The 00913 message indicates that the record which you are trying to send to Oracle includes more columns than were included at the time the table was created.
You just need to check the number of columns and its type in both the tables
ie the tables that are involved in the sql.
If you are having 112 columns in one single table and you would like to insert data from source table, you could do as
create table employees as select * from source_employees where employee_id=100;
Or from sqlplus do as
copy from source_schema/password insert employees using select * from
source_employees where employee_id=100;
For me this works perfect
insert into oehr.employees select * from employees where employee_id=99
I am not sure why you get error. The nature of the error code you have produced is the columns didn't match.
One good approach will be to use the answer #Parodo specified
this is a bit late.. but i have seen this problem occurs when you want to insert or delete one line from/to DB but u put/pull more than one line or more than one value ,
E.g:
you want to delete one line from DB with a specific value such as id of an item but you've queried a list of ids then you will encounter the same exception message.
regards.
I have two tables. Using "except" command I can find the mismatch values between two tables.
EX:
(select * from test_table3) except (select * from test_table1)
Now I need to find the difference in which column's value? How can i find that column?
I am using SQL 2005.