Not getting the desired output with a Hive query - hive

I have two input files which are semicolon delimited. I loaded these files into two tables. Both tables contain the information on books. I joined both the tables on ISBN field. For creating these tables I used the below query to skip header and to read semi colon delimited files:-
Create table books (ISBN STRING,BookTitle STRING,BookAuthor STRING,YearOfPublication STRING,Publisher STRING,ImageURLS STRING,ImageURLM STRING,ImageURLL STRING) row format delimited fields terminated by '\;' lines terminated by '\n' tblproperties ("skip.header.line.count"="1");
Now when I am trying the below query but I am not getting the desired output:-
SELECT a.BookRating, COUNT(BookTitle)
FROM Books b
JOIN Rating a
on (b.ISBN = a.ISBN)
WHERE b.YearOfPublication = 2002
GROUP BY a.BookRating;
I am not getting anything. It just shows OK on the terminal after the query runs completely. Please let me know what can be done. Thanks in advance.

Your DDL script is not proper.
You have mentioned
row format delimited fields terminated by '\;'
But actually it should be
row format delimited fields terminated by ';'
Try this and let me know

YearOfPublication is a string so you need to change it to
WHERE b.YearOfPublication = '2002'

Related

Error Message "HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns..."

I ran this in AWS Athena:
CREATE EXTERNAL TABLE IF NOT EXISTS `nina-nba-database`.`nina_nba_test` (
`Data` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = 'nina'
) LOCATION 's3://nina-gray/'
TBLPROPERTIES ('has_encrypted_data'='false');
However when I try to select the table using the syntax below:
SELECT * FROM "nina-nba-database"."nina_nba_table" limit 10;
It gives me this error:
HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns
This query ran against the "layla-nba-database" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: b96e4344-5bbe-4eca-9da4-70be11f8e87d
Would anyone be able to help?
The input.regex in your query doesn't look like valid one. The specified regex group while creating the table becomes a new column. So if you want to read data inside a column as new column you can specify the valid regex, to understand more about regex you can refer to Regex SerDe examples from this aws documentation. Or if your use case to just read columnar data you can create the table specifying proper delimiter, For example if your data is comma separated you can specify the delimiter as
...
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
...
have a look at this example for more details.

select row from orc snappy table in hive

I have created a table employee_orc which is orc format with snappy compression.
create table employee_orc(emp_id string, name string)
row format delimited fields terminated by '\t' stored as orc tblproperties("orc.compress"="SNAPPY");
I have uploaded data into the table using the insert statement.
employee_orc table has 1000 records.
When I run the below query, it shows all the records
select * from employee_orc;
But when run the below query, it shows zero results even though the records exist.
select * from employee_orc where emp_id = "EMP456";
Why I am unable to retrieve a single record from the employee_orc table?
The record does not exist. You may think they are the same because they look the same, but there is some difference. One possibility are spaces at the beginning or end of the string. For this, you can use like:
where emp_id like '%EMP456%'
This might help you.
On my part, I don't understand why you want to specify a delimiter in ORC. Are you confusing CSV and ORC or external vs managed ?
I advice you to create your table differently
create table employee_orc(emp_id string, name string)
stored as ORC
TBLPROPERTIES (
"orc.compress"="ZLIB");

Delimiter is not shown in table description on hive

When I do show create table, I see the following delimiter:
ROW FORMAT DELIMITED FIELDS TERMINATED BY '
and when I do describe extended table_name, I see:
parameters:{serialization.format, field.delim})
So is there a way to identify what the delimiter is for the existing table showing the above?
ROW FORMAT DELIMITED - that line is telling hive that each new line in a file is a new row
FIELDS TERMINATED BY - that parameter is telling hive by what character should be delimited each row. If none is set the default will be used which is ctrl-A

Hive MAP isn't reading input correctly

I am trying create a table on this mahout recommender system output data on s3.
703209355938578 [18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916]
828667482548563 [18070:1.0,18641:1.0,18632:1.0,18770:1.0,17814:1.0,18095:1.0]
1705358040772485 [18783:1.0,17944:1.0,18632:1.0,18770:1.0,18914:1.0,18386:1.0]
with this schema,
CREATE external table user_ad_reco (
userid bigint,
reco MAP<bigint , double>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
LOCATION
's3://xxxxx/data/RS/output/m05/';
but while I am reading data back with hive,
hive >
select * from user_ad_reco limit 10;
It is giving output like this
703209355938578 {18519:1.5216354,18468:1.5127649,17962:null}
828667482548563 {18070:1.0,18641:1.0,18632:1.0,18770:1.0,17814:null}
1705358040772485 {18783:1.0,17944:1.0,18632:1.0,18770:1.0,18914:null}
So, last key:value of map input is missing in output with null in last output pair :(.
Can anyone help regarding this?
Reason for nulls :
input data format with brackets gives null, cause of brackets the row
format in not being properly read , the last map entry 1.5075916
is being read as 1.5075916], so it's giving null due to data type
mismatch.
703209355938578 [ 18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916 ]
input data format without brackets works clean : (tested)
703209355938578 18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916
Thanks #ramisetty, I have done it in some indirect way, first got rid of two brackets [,] out of the map string, then create schema on string without brackets that.
CREATE EXTERNAL TABLE user_ad_reco_serde (
userid STRING,
reco_map STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([0-9]+)\\s\\[([^]]+)]"
)
STORED AS TEXTFILE
LOCATION
's3://xxxxxx/data/RS/output/6m/2014-01-2014-05/';
CREATE external table user_ad_reco_plain(
userid bigint,
reco string)
LOCATION
's3://xxxxx/data/RS/output/6m_plain/2014-01-2014-05/';
CREATE external table user_ad_reco (
userid bigint,
reco MAP<bigint , double>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
LOCATION
's3://xxxxxx/data/RS/output/6m_plain/2014-01-2014-05/';
There might be some simpler way.

Values inserted in hive table with double quotes for string from csv file

I am exporting a csv file into hive table.
about the csv file : column values are enclosed within double-quotes , seperated by comma .
Sample record from csv
"4","good"
"3","not bad"
"1","very worst"
I created a hive table with the following statement,
create external table currys(review_rating string,review_comment string ) row format fields delimited by ',';
Table created .
now I loaded the data using the command load data local inpath and it was successful.
when I query the table,
select * from currys;
The result is :
"4" "good"
"3" "not bad"
"1" "very worst"
instead of
4 good
3 not bad
1 very worst
records are inserted with double-quotes which shouldnt be.
Please let me know how to get rid of this double quote .. any help or guidance is highly appreciated...
Thanks beforehand!
Are you using any serde? If so, then you can write a regex command in the SERDE PROPERTIES to remove the quotes.
Or you can use the csv-serde from here and define the quote character.