Creating a table on Hive - sql

CREATE TABLE DowJones (quarter int, stock string, StockDate date, open double, high double, low double, close double, volume double, percent_change_price double, percent_change_volume_over_last_wk double, previous_weeks_volume double, next_weeks_open double, next_weeks_close double, percent_change_next_weeks_price double, days_to_next_dividend int, percent_return_next_dividend double) row format delimited fields terminated by ‘,’;
Error I get:
Error while compiling statement: FAILED: ParseException line 1:431 mismatched input ',' expecting StringLiteral near 'by' in table row format's field separator [ERROR_STATUS]
New to SQL, so apologies in advance if it's a really obvious fix.

Try like this...
row format delimited fields terminated by '\;'
Let us know

Related

Amazon Athena - mismatched input 'STORED'. Expecting: <EOF>

I am just playing around with Athena, and I tried following this link
https://awsfeed.com/whats-new/big-data/use-ml-predictions-over-amazon-dynamodb-data-with-amazon-athena-ml
Create an Athena table with geospatial data of neighborhood boundaries
I followed the code based on the sample plus looking at the picture.
However, this is where I ran into issues and had to change the code to this based on the error messages Athena was giving me. Now the current error is mismatched input 'STORED'. Expecting: <EOF
FROM WEBSITE -
CREATE EXTERNAL TABLE <table name
"objectid" int,
"nh_code" int,
"nh_name" string,
"shapearea" double,
"shapelen" double,
"bb_west" double,
"bb_south" double,
"bb_east" double,
"bb_north" double,
"shape" string,
"cog_longitude" double,
"cog_latitude" double)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
I kept getting errors around ROW FORMAT and have tweaked it below
WITH (ROW = DELIMITED
,FIELDS = '\t'
,LINES = '\n'
)
STORED INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
The error messages started at ROW and I've edited above. Now the error code relates to STORED so perhaps the changes I made are necessary. I am not sure. I am not very good with Athena so I was just following the guide and was hoping it would work. Any suggestions on what I am doing wrong?
Thanks.
You have a syntax error in your SQL, the first line should be:
CREATE EXTERNAL TABLE table_name (
There is a stray < in your example, table names can't have spaces, and there should be a ( to start the list of columns.

HiveQL How to convert milliseconds in decimal format to mm:ss.SSS

I'm trying to convert decimal timestamp in milliseconds to mm:ss.SSS . The method only works for integers, and the output is not what I desired:
select from_unixtime(cast(1911.13/1000 as bigint), 'mm:ss.SSS');
1911.13 milliseconds yields the output, 00:01.000, which is not correct, should be 00:01.911
I have tried to convert to double, but received errors
select from_unixtime(cast(1911.13/1000 as double), 'mm:ss.SSS');
error:[Code: 10014, SQL State: 42000] Error while compiling statement: FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments ''mm:ss.SSS'': No matching method for class org.apache.hadoop.hive.ql.udf.UDFFromUnixTime with (double, string). Possible choices: FUNC(bigint) FUNC(bigint, string) FUNC(int) FUNC(int, string)
Any help will be appreciated!
Anything after decimal is milliseconds. So, you can split the string into two parts. Convert before decimal part using from_unixtime and concat it with after decimal part to get the whole data.
select cast(from_unixtime(1911, 'mm:ss') as string)|| rpad('.130',4,'0') as col_ms
Pls note, millisecond .10 or .1 or .100 means 100 milliseconds, so i used rpad with 0 to pad it.
You can create a generic sql as well.
select cast(from_unixtime( cast(substr((cast(x/y) as string) ,1,instr((cast(x/y) as string),'.')) as bigint,'mm:ss') || || rpad(substr((cast(x/y) as string),instr((cast(x/y) as string),'.')),4,'0') as col_ms

Data insert issue

So i had this problem when adding a CSV file to my HQL code and run it on HDFS.
i found that when inserting data it get Nulls in partition parts and some columns gets delete i tried many different ways to insert data but still i gets this weird symbols and lost columns it is like that it cant read the CSV file ,
here is a Pic
enter image description here and here is the code`
CREATE database covid_db;
use covid_db;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=500;
set hive.exec.max.dynamic.partitions.pernode=500;
CREATE TABLE IF NOT EXISTS covid_db.covid_staging
(
Country STRING,
Total_Cases DOUBLE,
New_Cases DOUBLE,
Total_Deaths DOUBLE,
New_Deaths DOUBLE,
Total_Recovered DOUBLE,
Active_Cases DOUBLE,
Serious DOUBLE,
Tot_Cases DOUBLE,
Deaths DOUBLE,
Total_Tests DOUBLE,
Tests DOUBLE,
CASES_per_Test DOUBLE,
Death_in_Closed_Cases STRING,
Rank_by_Testing_rate DOUBLE,
Rank_by_Death_rate DOUBLE,
Rank_by_Cases_rate DOUBLE,
Rank_by_Death_of_Closed_Cases DOUBLE
)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_LZ'
tblproperties ("skip.header.line.count"="1", "serialization.null.format" = "''");
CREATE EXTERNAL TABLE IF NOT EXISTS covid_db.covid_ds_partitioned
(
Country STRING,
Total_Cases DOUBLE,
New_Cases DOUBLE,
Total_Deaths DOUBLE,
New_Deaths DOUBLE,
Total_Recovered DOUBLE,
Active_Cases DOUBLE,
Serious DOUBLE,
Tot_Cases DOUBLE,
Deaths DOUBLE,
Total_Tests DOUBLE,
Tests DOUBLE,
CASES_per_Test DOUBLE,
Death_in_Closed_Cases STRING,
Rank_by_Testing_rate DOUBLE,
Rank_by_Death_rate DOUBLE,
Rank_by_Cases_rate DOUBLE,
Rank_by_Death_of_Closed_Cases DOUBLE
)
PARTITIONED BY (COUNTRY_NAME STRING)
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_PARTITIONED';
FROM
covid_db.covid_staging
INSERT INTO TABLE covid_db.covid_ds_partitioned PARTITION(COUNTRY_NAME)
SELECT *,Country WHERE Country is not null;
CREATE EXTERNAL TABLE covid_db.covid_final_output
(
TOP_DEATH STRING,
TOP_TEST STRING
)
PARTITIONED BY (COUNTRY_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_FINAL_OUTPUT';
`
1st: You are checking file contents, and partition column is not stored in the file, it is stored in the metadata. Also dynamically created partition are directories in the format key=value. So, the last column you see in the file is not the partition column, it is Rank_by_Death_of_Closed_Cases.
2nd: You did not specify delimiter in second table DDL as well as NULL format. The default delimiter is '\001' (Ctrl-A). You can specify delimiter, for example TAB (\t) and desired NULL:
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS TEXTFILE;
But better do not redefine NULL format if you want to be able to distinguish NULLs and empty strings.

Parse error while creating External Table

I am new to Hadoop. I am trying to create an EXTERNAL table in Hive.
The following is the query I am using:
CREATE EXTERNAL TABLE stocks (
exchange STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 'hdfs:///data/stocks'
I am getting an error:
' ParseException cannot recognize input near 'exchange' 'STRING' ',' in column specification.
What am I missing? I tried reading the command - I don't think I am missing anything.
Because exchange is a keyword in hive, so you can't use exchange to be your column name. If you want to use it just add backticks around exchange
Exchange is reserved keyword in Hive So try to use different keyword in place of that-
Create table Stocks (exchange1 String, stock_symbol String, stock_date String, stock_price_open double, stock_price_high double, stock_price_low do
uble, stock_price_close double, stock_volume double, stock_price_adj_close double) row format delimited fields terminated by ",";

how to calculate one column from another columns?

In monetdb I created a table:
create table extractedcatalog(id int, ra double, decl double, x double, y double, z double);
ra,decl are all inserted into tables already, now I want to calculate x,y,z from ra,decl columns. In sql I executed like this:
update extractedcatalog set x = (cos(radians(decl))*cos(radians(ra)));
but i got response:
connection terminated
Is there any problem with my sql query?
Thanks very much!
Please indicate your platform and MonetDB version. A Connection Termination message indicates a loss of contact between your client and server. You may look into the Merovingian log file for further clues on the underlying cause.