So i had this problem when adding a CSV file to my HQL code and run it on HDFS.
i found that when inserting data it get Nulls in partition parts and some columns gets delete i tried many different ways to insert data but still i gets this weird symbols and lost columns it is like that it cant read the CSV file ,
here is a Pic
enter image description here and here is the code`
CREATE database covid_db;
use covid_db;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=500;
set hive.exec.max.dynamic.partitions.pernode=500;
CREATE TABLE IF NOT EXISTS covid_db.covid_staging
(
Country STRING,
Total_Cases DOUBLE,
New_Cases DOUBLE,
Total_Deaths DOUBLE,
New_Deaths DOUBLE,
Total_Recovered DOUBLE,
Active_Cases DOUBLE,
Serious DOUBLE,
Tot_Cases DOUBLE,
Deaths DOUBLE,
Total_Tests DOUBLE,
Tests DOUBLE,
CASES_per_Test DOUBLE,
Death_in_Closed_Cases STRING,
Rank_by_Testing_rate DOUBLE,
Rank_by_Death_rate DOUBLE,
Rank_by_Cases_rate DOUBLE,
Rank_by_Death_of_Closed_Cases DOUBLE
)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_LZ'
tblproperties ("skip.header.line.count"="1", "serialization.null.format" = "''");
CREATE EXTERNAL TABLE IF NOT EXISTS covid_db.covid_ds_partitioned
(
Country STRING,
Total_Cases DOUBLE,
New_Cases DOUBLE,
Total_Deaths DOUBLE,
New_Deaths DOUBLE,
Total_Recovered DOUBLE,
Active_Cases DOUBLE,
Serious DOUBLE,
Tot_Cases DOUBLE,
Deaths DOUBLE,
Total_Tests DOUBLE,
Tests DOUBLE,
CASES_per_Test DOUBLE,
Death_in_Closed_Cases STRING,
Rank_by_Testing_rate DOUBLE,
Rank_by_Death_rate DOUBLE,
Rank_by_Cases_rate DOUBLE,
Rank_by_Death_of_Closed_Cases DOUBLE
)
PARTITIONED BY (COUNTRY_NAME STRING)
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_PARTITIONED';
FROM
covid_db.covid_staging
INSERT INTO TABLE covid_db.covid_ds_partitioned PARTITION(COUNTRY_NAME)
SELECT *,Country WHERE Country is not null;
CREATE EXTERNAL TABLE covid_db.covid_final_output
(
TOP_DEATH STRING,
TOP_TEST STRING
)
PARTITIONED BY (COUNTRY_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_FINAL_OUTPUT';
`
1st: You are checking file contents, and partition column is not stored in the file, it is stored in the metadata. Also dynamically created partition are directories in the format key=value. So, the last column you see in the file is not the partition column, it is Rank_by_Death_of_Closed_Cases.
2nd: You did not specify delimiter in second table DDL as well as NULL format. The default delimiter is '\001' (Ctrl-A). You can specify delimiter, for example TAB (\t) and desired NULL:
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS TEXTFILE;
But better do not redefine NULL format if you want to be able to distinguish NULLs and empty strings.
Related
I have an table, in aws athena.
CREATE EXTERNAL TABLE s3inventory(
bucket string,
key string,
version_id string,
is_latest boolean,
is_delete_marker boolean,
size string,
last_modified_date string,
e_tag string,
storage_class string,
is_multipart_uploaded boolean,
replication_status string,
encryption_status string,
object_lock_retain_until_date string,
object_lock_mode string,
object_lock_legal_hold_status string,
intelligent_tiering_access_tier string,
bucket_key_status string,
checksum_algorithm string
) PARTITIONED BY (
dt string
And I need to sum the field size, but even changing the field still the error that doesn't convert , they said that cant sum the number.
SYNTAX_ERROR: line 1:8: Unexpected parameters (varchar) for function sum. Expected: sum(double) , sum(real) , sum(bigint) , sum(interval day to second) , sum(interval year to month) , sum(decimal(p,s))
I am just playing around with Athena, and I tried following this link
https://awsfeed.com/whats-new/big-data/use-ml-predictions-over-amazon-dynamodb-data-with-amazon-athena-ml
Create an Athena table with geospatial data of neighborhood boundaries
I followed the code based on the sample plus looking at the picture.
However, this is where I ran into issues and had to change the code to this based on the error messages Athena was giving me. Now the current error is mismatched input 'STORED'. Expecting: <EOF
FROM WEBSITE -
CREATE EXTERNAL TABLE <table name
"objectid" int,
"nh_code" int,
"nh_name" string,
"shapearea" double,
"shapelen" double,
"bb_west" double,
"bb_south" double,
"bb_east" double,
"bb_north" double,
"shape" string,
"cog_longitude" double,
"cog_latitude" double)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
I kept getting errors around ROW FORMAT and have tweaked it below
WITH (ROW = DELIMITED
,FIELDS = '\t'
,LINES = '\n'
)
STORED INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
The error messages started at ROW and I've edited above. Now the error code relates to STORED so perhaps the changes I made are necessary. I am not sure. I am not very good with Athena so I was just following the guide and was hoping it would work. Any suggestions on what I am doing wrong?
Thanks.
You have a syntax error in your SQL, the first line should be:
CREATE EXTERNAL TABLE table_name (
There is a stray < in your example, table names can't have spaces, and there should be a ( to start the list of columns.
i have one issue in bigquery.
Problem statement:
we have a json file which we have to upload to bigquery table.
When it is loaded , a table will be created.
TABLE DDL:
CREATE TABLE vb_reports
(
records ARRAY<STRUCT<rightData BYTES, mismatchEnd INT64, mismatchStart INT64, fields ARRAY, rightRecLength INT64, leftRecLength INT64, segments ARRAY<STRUCT<leftData BYTES, diffDesc STRING, endIndex INT64, startIndex INT64, rightData BYTES, segmentName STRING, segmentID STRING>>, leftData BYTES, recordNumber INT64>>,
iterationID INT64,
jobSummary STRING,
jobName STRING
)
The rightData and leftData columns are created as bytes type column.
But when we load data from json file, these columns are interpreted as Strings and sometimes as Bytes.
When the BigQuery Interprets these fields as String, then we get below error:
Provided Schema does not match Table vb_reports Field records.segments.leftData has changed type from BYTES to STRING
we are loading the json file from Bigquery consoles.
The problem is even though we have byte data , bigQuery Interprets it as String.
Please help us to create a column (leftData & rightData) with Bytes type which Accepts both Bytes & string Data Types.
Thanks in Advance.
hayyal
CREATE TABLE DowJones (quarter int, stock string, StockDate date, open double, high double, low double, close double, volume double, percent_change_price double, percent_change_volume_over_last_wk double, previous_weeks_volume double, next_weeks_open double, next_weeks_close double, percent_change_next_weeks_price double, days_to_next_dividend int, percent_return_next_dividend double) row format delimited fields terminated by ‘,’;
Error I get:
Error while compiling statement: FAILED: ParseException line 1:431 mismatched input ',' expecting StringLiteral near 'by' in table row format's field separator [ERROR_STATUS]
New to SQL, so apologies in advance if it's a really obvious fix.
Try like this...
row format delimited fields terminated by '\;'
Let us know
I am new to Hadoop. I am trying to create an EXTERNAL table in Hive.
The following is the query I am using:
CREATE EXTERNAL TABLE stocks (
exchange STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 'hdfs:///data/stocks'
I am getting an error:
' ParseException cannot recognize input near 'exchange' 'STRING' ',' in column specification.
What am I missing? I tried reading the command - I don't think I am missing anything.
Because exchange is a keyword in hive, so you can't use exchange to be your column name. If you want to use it just add backticks around exchange
Exchange is reserved keyword in Hive So try to use different keyword in place of that-
Create table Stocks (exchange1 String, stock_symbol String, stock_date String, stock_price_open double, stock_price_high double, stock_price_low do
uble, stock_price_close double, stock_volume double, stock_price_adj_close double) row format delimited fields terminated by ",";