apache hive loads null values instead of intergers - apache

I am new to apache hive and was running queries on sample data which is saved in a csv file as below:
0195153448;"Classical Mythology";"Mark P. O. Morford";"2002";"Oxford University Press";"//images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg";"images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg"
and the table which i created is of form
hive> describe book;
OK
isbn bigint
title string
author string
year string
publ string
img1 string
img2 string
img3 string
Time taken: 0.085 seconds, Fetched: 8 row(s)
and the script which I used to create the table is:
create table book(isbn int,title string,author string, year string,publ string,img1 string,img2 string,img3 string) row format delimited fields terminated by '\;' lines terminated by '\n' location 'path';
When I try to retrieve the data from the table by using the following query:
select *from book limit 1;
I get the following result:
NULL "Classical Mythology" "Mark P. O. Morford" "2002" "Oxford University Press" "http://images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg" "images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg" "images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg"
Even though I specify the first column type as int or bigint the data into the table is getting loaded as NULL.
I tried searching on the internet and could figure out that I have to specify the row delimiter. I used that too but no change in the data from the table.
Is there anything that I am making a mistake... Please help.

Related

POSTGRESQL invalid input syntax for type integer

I am a novice user of PostgresQL. I found out in Internet interesting DB, which consists of 3 tables. Every of this table is in CSV format. However I could smoothly copy the first one to my sql database. Right now I have the following problem:
ERROR: invalid input syntax for type integer: "43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,"
CONTEXT: COPY worldcupsplayers, line 25645, column roundid: "43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,"
Every row in a table has the same structure, as follows:
43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,
I am wondering why in such row I have a problem (line 25645), while it looks as other ones.
I am trying to copy it with this command:
COPY worldcupsplayers(RoundID, MatchID, team_initials, coach_name, line_up, shirt_number, player_name, position, event)
FROM 'C:\Users\Public\worldcup\worldcupplayers.csv'
DELIMITER ','
CSV HEADER;
[table structure]
UPDATE: rows above and below
25643 - 43950100,43950008,SVN,KATANEC Srecko (SVN),N,23,BULAJIC,,
25644 - 43950100,43950010,BRA,SCOLARI Luiz Felipe (BRA),S,1,MARCOS,GK,
25645 - 43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,
25646 - 43950100,43950010,BRA,SCOLARI Luiz Felipe (BRA),S,2,CAFU,C,
As u may see they are all the same, with the commas every row has the same structure.

PgSQL insert into select after cast

I have a table with many columns stored as text. These columns have information which i am planning to cast and copy the data into a fresh table with the same column names but with correct data types ( float8 )
I tested the SELECT with CAST operator "::" and it works fine. All the columns are being converted as you can see in the picture
However , when I uncomment the INSERT statement ( above it ) to start writing in the target table, it throws an error. The target table has identical column names and only float8 column types.
The expression from the source table is indeed of type text but i am using the cast operator so why does it not work like before when only running the SELECT statement?
My query below:
INSERT INTO "PM"."new_VM_gcell_evolution_hourly_BSC"
(select
"CR3120:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"DL_Mean_Quality"::float8,
"A312Ca:Failed_Assignments_during_MTC_on_the_A_Interface_Includi"::float8,
"nsp_Urban_TA_4number"::float8,
"A3129C:Failed_Assignments_First_Assignment,_Assignment_Timed_Ou"::float8,
"R3120C:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"TSs_Interf_B5"::float8,
"R3120D:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"HO_Out_Internal_Succ_Rate"::float8,
"CH_Req_Protocol_Undefinednbnumber"::float8,
"UL_Drop_Congnumber"::float8,
"Timing_Adv"::float8,
"DL_Qual"::float8,
"A3100C:Assignment_Requests_TCHH_Only"::float8,
"MS_to_BTS_max_distance"::float8,
"TSs_Interf_B2"::float8,
"UL_Qual"::float8,
"UL_Drop_EDGE_Rate"::float8,
"Avg_BTS_Power_Level_AMR"::float8,
"DL_Drop_N3105number"::float8,
"nsp_Urban_TA_2_R"::float8,
"CH_Req_CallReestabnbnumber"::float8,
"TCH_Drops_cause_Timing_Advance"::float8,
"TCH_Drops_per_Erlang"::float8,
"UL_Drop_Flushnumber"::float8,
"A312F:Number_of_Assignment_Failures_No_Abis_Resource_Available"::float8,
"UL_Drop_GPRS_Rate"::float8,
"nsp_Urban_TA_3number"::float8,
"Call_Drop_rate"::float8,
"DL_Fail_Assingnumber"::float8,
"SDCCH_Radio_Failures"::float8,
"HO_Cmd_UL_Qual"::float8,
"UL/DL_RxQualnumber"::float8,
"UL/DL_RxLevnumber"::float8,
"UL_Fail_EDGE_Rate"::float8,
"Call_Drop_Total"::float8,
"HO_Out_Succ_Rate"::float8,
"Date"::text,
"A3127E:Failed_Assignments_during_Call_Reestablishment_on_the_Um"::float8,
"UL_Drop_N3103number"::float8,
"Configured_TCH"::float8,
"SDCCH_Drop_Call_Rate_only_LU"::float8,
"A312M:Failed_Assignments_Reconnection_to_Old_Channels,_No_Chann"::float8,
"SDCCH_Blocking_Rate"::float8,
"TCH_Drops_cause_DL_FER"::float8,
"TCH_Fail_rate"::float8,
"DL_Drop_Preemnumber"::float8,
"FR_Traffic_totalErl"::float8,
"DL_Congnumber"::float8,
"TCH_Assign_Unsuccnumber"::float8,
"nsp_Urban_TA_0_R"::float8,
"DL_Drop_GPRS_Rate"::float8,
"HR_TraficErl"::float8,
"SDCCH_Cong_Rate"::float8,
"Abis_and_Ater_Interface_Anlysistimes"::float8,
"A3100B:Assignment_Requests_TCHF_Only"::float8,
"AS4300D:Mean_Uplink_Level_during_Radio_Link_Failure_SDCCHdB"::float8,
"nsp_Urban_TA_89number"::float8,
"TCH_Drops_cause_DownQual"::float8,
"SDCCH_Drop_RxLev"::float8,
"Max_UL_Pwr_Duration"::float8,
"nsp_Channel_Req_LUnumber"::float8,
"TCH_HR_Initialy_Config"::float8,
"HO_Inc_Cong_Rate"::float8,
"DL_RxLev_avgdBm"::float8,
"TCH_Assign_Requestsnumber"::float8,
"A312K:Failed_Assignments_First_Assignment,_No_Channel_Available"::float8,
"Call_Drop_TCH_Quality_Rate"::float8,
"AMR_HR_TraficErl"::float8,
"nsp_Urban_TA_67_R"::float8,
"UL_Fail_MSnumber"::float8,
"Time"::text,
"TCH_Drops_cause_UpLevel"::float8,
"AMR_FR_TraficErl"::float8,
"TCH_FR_Initialy_Config"::float8,
"UL_Drop_N3101number"::float8,
"DL_FERnumber"::float8,
"SDCCH_Drop_Call_Rate_wo_LU"::float8,
"DL_Mean_StrengthdB"::float8,
"Call_Drop_Abis"::float8,
"SDCCH_Initialy_Confignumber"::float8,
"Call_Drop_TCH_RxLev_Rate"::float8,
"TA_Meannumber"::float8,
"nsp_Urban_TA_67number"::float8,
"AS3240NA:Average_MS_Power_Level_of_NonAMR_Call"::float8,
"Load"::float8,
"A3129Q:Failed_Assignments_Reconnection_to_Old_Channels,_Timer_E"::float8,
"CH_Request_11_bit"::float8,
"AS4340D:Mean_TA_during_Radio_Link_Failure_SDCCH"::float8,
"S4210A:Uplink_Interference_Indication_Messages_SDCCH"::float8,
"UL_FERnumber"::float8,
"A3129P:Failed_Assignments_Reconnection_to_Old_Channels,_Timer_E"::float8,
"K3003A:Successful_SDCCH_Seizures_Call_Type"::float8,
"CH_Req_LAUnbnumber"::float8,
"SDCCH_Drop_Rate"::float8,
"GBSC"::text,
"CS_CSSR_with_SDCCH_blocks_wo_LU"::float8,
"A312Aa:Failed_Assignments_during_MOC_on_the_A_Interface_Includi"::float8,
"Avg_BTS_Power_Level"::float8,
"R3120A:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"A312S:Failed_Assignments_Signaling_Channel"::float8,
"SD_Fail_MoC"::float8,
"UL_RxQual_avg"::float8,
"Call_Drop_HO"::float8,
"AS4330D:Mean_Downlink_Quality_during_Radio_Link_Failure_SDCCH"::float8,
"CH_Req_Emerg_Callsnbnumber"::float8,
"UL_RxQualnumber"::float8,
"UL_Fail_OtherCausenumber"::float8,
"A312Da:Failed_Assignments_during_Emergency_Call_on_the_A_Interf"::float8,
"UL_Drop_Preemnumber"::float8,
"TCH_Drops_cause_UpDown_FER"::float8,
"AS4320D:Mean_Downlink_Level_during_Radio_Link_Failure_SDCCHdB"::float8,
"A3129O:Failed_Assignments_First_Assignment,_Directed_Retry_Time"::float8,
"Max_DL_Pwr_Duration"::float8,
"UL_Drop_Suspendnumber"::float8,
"TCH_Assign_Congestionnumber"::float8,
"Transmission_Resource_Analysistimes"::float8,
"HO_Out_RxQual_rate"::float8,
"nsp_Urban_TA_1013number"::float8,
"A3129I:Failed_Assignments_Invalid_State"::float8,
"DL_Fail_OtherCausenumber"::float8,
"CR3129:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"HO_Inc_Successnumber"::float8,
"A3129B:Failed_Assignments_First_Assignment,_Terrestrial_Resourc"::float8,
"Trafic_totalErl"::float8,
"nsp_urban_TA_1463all_R"::float8,
"HO_Out_RxQualnumber"::float8,
"DL_RxQualnumber"::float8,
"HO_Inc_Succ_Rate"::float8,
"SDCCH_Fail_Rate"::float8,
"SDCCH_Non_Radio_Drops"::float8,
"nsp_Urban_TA_0number"::float8,
"A3129E:Failed_Assignments_CIC_Unavailable"::float8,
"nsp_CH_Request_PSnumber"::float8,
"TCH_Assign_Unsucc_rate"::float8,
"Call_Drop_Radio"::float8,
"EFR_TraficErl"::float8,
"A312L:Failed_Assignments_Reconnection_to_Old_Channels,_No_Chann"::float8,
"TCH_Drops_cause_DownLevel"::float8,
"nsp_Urban_TA_5number"::float8,
"Abnormal_Terminals_Analysistimes"::float8,
"HO_Inc_Congnumber"::float8,
"CSSR_Rate"::float8,
"M3020C:Call_Drops_on_SDCCHQuality"::float8,
"RH333:Handover_Drop_Rate_of_TCH"::float8,
"UL_Drop_EDGE_Abisnumber"::float8,
"A3129J:Failed_Assignments_Invalid_Message"::float8,
"TCH_Availability"::float8,
"UL_Level"::float8,
"nsp_Urban_TA_4_R"::float8,
"UL_Drop_OtherCausenumber"::float8,
"HR_Traffic_Rate"::float8,
"DL_Level"::float8,
"CH_Request_8_bit"::float8,
"UL_Fail_MS_Assingnumber"::float8,
"DL_Fail_BSC_Commandnumber"::float8,
"A3129R:Failed_Assignments_Reconnection_to_Old_Channels,_Reconne"::float8,
"UL_Fail_BSC_Commandnumber"::float8,
"DL_Drop_Suspendnumber"::float8,
"nsp_urban_TA_1463allnumber"::float8,
"Direct_Retry"::float8,
"nsp_Urban_TA_89_R"::float8,
"CH_Req_PSnbnumber"::float8,
"A3129N:Failed_Assignments_Reconnection_to_Old_Channels,_Terrest"::float8,
"A312A:Failed_Assignments_First_Assignment,_No_Channel_Available"::float8,
"nsp_Urban_TA_2number"::float8,
"DL_RxQual_avg"::float8,
"UL_Mean_StrengthdB"::float8,
"DL_Drop_EDGE_Rate"::float8,
"A3129T:Failed_Assignments_No_Ater_Resource_Available"::float8,
"TSs_Interf_B1"::float8,
"A3129D:Failed_Assignments_Reconnection_to_Old_Channels,_Reconne"::float8,
"nsp_Channel_Req_MOCnumber"::float8,
"UL_Fail_GPRS_Rate"::float8,
"S4210B:Downlink_Interference_Indication_Messages_SDCCH"::float8,
"R3120E:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"HO_Inc_Failnumber"::float8,
"DL_Drop_Flushnumber"::float8,
"S3655:Number_of_configured_TRXs_in_a_cell"::float8,
"TCH_Drops_cause_UpDown_Qual"::float8,
"CR3005:Number_of_Initially_Configured_Channels_Static_PDTCH_Sup"::float8,
"UL/DL_FERnumber"::float8,
"SDCCH_Blocking_"::float8,
"HO_Out_InterRAT_Succ_Rate"::text,
"CM30E:Call_Drops_on_SDCCH_Location_Updating"::float8,
"AS4310D:Mean_Uplink_Quality_during_Radio_Link_Failure_SDCCH"::float8,
"UL_Congnumber"::float8,
"TAnumber"::float8,
"Configured_SDCCH"::float8,
"Othernumber"::float8,
"TCH_Assign_Fail_Radionumber"::float8,
"HO_Out_Internal_Req_Nb"::float8,
"SDCCH_Cong_Nbnumber"::float8,
"UL_Mean_Quality"::float8,
"HO_Out_External_Req_Nb"::float8,
"nsp_Urban_TA_1013_R"::float8,
"SDCCH_Drops_wo_LU"::float8,
"Call_Drop_TCH_Qualitynumber"::float8,
"A3100A:Assignment_Requests_Signaling_Channel_TCH"::float8,
"Better_Cell"::float8,
"Call_Drop_TCH_RxLevnumber"::float8,
"nsp_Channel_Req_MTCnumber"::float8,
"DL_Fail_EDGE_Rate"::float8,
"ZTR104B:Call_Drop_Rate_on_SDCCH_Call_Type"::float8,
"Call_Drop_no_MR"::float8,
"DL_Drop_Congnumber"::float8,
"HO_Inc_InterRAT_reqnumber"::float8,
"TCH_Assign_Cong_rate"::float8,
"HO_Inc_Fail_Rate"::float8,
"TCH_Assign_Fail_sp_ver_unav"::float8,
"HO_Inc_InterRAT_unsucc"::float8,
"A3170A:Number_of_Completed_TCH_Assignments_CSFB_MOC"::float8,
"HO_Inc_Unsucc_Rate"::float8,
"TSs_Interf_B4"::float8,
"A3100K:Assignment_Requests_Signaling_Channel_SDCCH"::float8,
"TCH_Drops_cause_UpDown_Level"::float8,
"SDCCH_Fail_Nbnumber"::float8,
"TCH_Non_Radio_Drops"::float8,
"DL_Fail_MSnumber"::float8,
"HO_Cmd_DL_Qual"::float8,
"DL_Drop_OtherCausenumber"::float8,
"SMS_on_SDCCH"::float8,
"FERnumber"::float8,
"HO_Inc_Reqnumber"::float8,
"nsp_Urban_TA_3_R"::float8,
"M3020D:Call_Drops_on_SDCCHOther"::float8,
"HO_Out_External_Succ_Rate"::float8,
"M3020A:Call_Drops_on_SDCCHTA"::float8,
"TCH_Assign_Fail_Radio_rate"::float8,
"CR3001:Number_of_Initially_Configured_Channels_Static_PDCH"::float8,
"HR_Traffic_totalErl"::float8,
"CH_Req_MTCnbnumber"::float8,
"SDCCH_Drop_RxQual"::float8,
"UL_RxLev_avgdBm"::float8,
"CH_Req_MOCnbnumber"::float8,
"SDCCH_Dropnumber"::float8,
"A3169A:Failed_Assignments_Um_Cause"::float8,
"A3129H:Failed_Assignments_Clear_Commands_Sent_By_MSC"::float8,
"TCH_Radio_Drops"::float8,
"HO_Out_InterRAT_Req_Nb"::float8,
"HO_Inc_InterRAT_succnumber"::float8,
"DL_Drop_Abisnumber"::float8,
"TCH_Drops_cause_UL_FER"::float8,
"UL_RxLevelnumber"::float8,
"nsp_CH_Request_CSnumber"::float8,
"A3170B:Number_of_Completed_TCH_Assignments_CSFB_MTC"::float8,
"A3129F:Failed_Assignments_CIC_Allocated"::float8,
"DL_Fail_GPRS_Rate"::float8,
"M3020B:Call_Drops_on_SDCCHReceived_Level"::float8,
"nsp_Urban_TA_1number"::float8,
"SDCCH_Availability"::float8,
"Call_Drop_forced_HO"::float8,
"SDCCH_Drop_Others"::float8,
"Call_Drop_Equip."::float8,
"CH_Req_PS"::float8,
"SDCCH_Drop_Call_Rate"::float8,
"TCH_Drops_cause_UpQual"::float8,
"CH_Req_LMU_Reservednbnumber"::float8,
"DL_RxLevnumber"::float8,
"nsp_Urban_TA_1_R"::float8,
"A312Ea:Failed_Assignments_during_Call_Reestablishment_on_the_A_"::float8,
"TSs_Interf_B3"::float8,
"SDCCH_Drop_TimingAdvance"::float8,
"Avg_BTS_Power_Level_NAMR"::float8,
"TCH_Drops_cause_Other"::float8,
"FR_TraficErl"::float8,
"Avg_MS_Power_Level"::float8,
"HO_Outgoing_Requestsnumber"::float8,
"A3129G:Failed_Assignments_A_Interface_Failure"::float8,
"CH_Req_LAU"::float8,
"nsp_Urban_TA_5_R"::float8,
"HO_Inc_Unsuccessnumber"::float8,
"AS3240A:Average_MS_Power_Level_of_AMR_Call"::float8
from "PM"."VM_gcell_evolution_hourly_BSC_recovered")
It says "column is of type double but expression is of type text" so what's really happening is that it's trying to insert one of the expressions where you cast to ::text into a column that is of type double.
If the problem was converting from text to double, you'd get a different message. Besides, the SELECT worked fine, which means it didn't encounter any text data that it couldn't convert to double.
Since you don't specify the target table columns in your INSERT, and you have so many of them, most likely you missed a column or got the order wrong.
Honestly if the table has 270 columns and they have the same name in the source and destination tables, you should really generate the query using something like python from a list of columns, that will be faster than proofreading the 270 lines...

import a txt file with 2 columns into different columns in SQL Server Management Studio

I have a txt file containing numerous items in the following format
DBSERVER: HKSER
DBREPLICAID: 51376694590
DBPATH: redirect.nsf
DBTITLE: Redirect AP
DATETIME: 09.03.2015 09:44:21 AM
READS: 1
Adds: 0
Updates: 0
Deletes: 0
DBSERVER: HKSER
DBREPLICAID: 21425584590
DBPATH: redirect.nsf
DBTITLE: Redirect AP
DATETIME: 08.03.2015 09:50:20 PM
READS: 2
Adds: 0
Updates: 0
Deletes: 0
.
.
.
.
please see the source capture here
I would like to import the txt file into the following format in SQL
1st column 2nd column 3rd column 4th column 5th column .....
DBSERVER DBREPLICAID DBPATH DBTITLE DATETIME ......
HKSER 51376694590 redirect.nsf Redirect AP 09.03.2015 09:44:21 AM
HKSER 21425584590 redirect.nsf Redirect AP 08.03.2015 01:08:07 AM
please see the output capture here
Thanks a lot!
You can dump that file into a temporary table, with just a single text column. Once imported, you loop through that table using a cursor, storing into variables the content, and every 10 records inserting a new row to the real target table.
Not the most elegant solution, but it's simple and it will do the job.
Using Bulk insert you can insert these headers and data in two different columns and then using dynamic sql query, you can create a table and insert data as required.
For Something like this I'd probably use SSIS.
The idea is to create a Script Component (As a Transformation)
You'll need to manually define your Output cols (Eg DBSERVER String (100))
The Src is your File (read Normally)
The Idea is that you build your rows line by line then add the full row to the Output Buffer.
Eg
Output0Buffer.AddRow();
Then write the rows to your Dest.
If all files have a common format then you can wrap the whole thiing in a for each loop

Hive simple Regular expression

I am trying to check if all data within in a column is having a valid date.
create table dates (tm string, dt string) row format delimited fields terminated by '\t'
date.txt(sample data)
20181205 15
20171023 23
20170516 16
load data local inpath 'dates.txt' overwrite into table dates;
create temporary macro isitDate(s string)
case when regexp_extract(s,'((0[1-9]|[12][0-9]|3[01])',0) = ''
then false
else true
end;
select * from dates where isitDate(dt);
But select statement is giving below error-
Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to execute method public java.lang.String
org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer)
on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract#66b45e1e of
class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments
{15:java.lang.String, ((0[1-9]|[12][0-9]|3[01]):java.lang.String,
0:java.lang.Integer} of size 3
Is there something wrong with my regular expression.
made a stupid mistake, there is one extra opening bracket in macro

extract data in hive

I have data in a hive table column as below:
customer reason=Other#customer reason free text=Space#customer type=Regular#customer end date=2020-12-31 19:50:00#customer offering criterion=0#customer type=KK#Customer factor=0.00#customer period=0#customer type=TN#customer value=0#customer plan type=M#cttype type=KK#
I want to extract data value 0 against customer value.
I tried below query but it is giving full 'customer value=0' but i want only 0.
Please suggest.
select regexp_replace(regexp_extract(information,'customer period=[^#]*',0),'customer period=','') from detail;
The question is not clear at all.
If you want to limit the output, use LIMIT 1 (where 1 is number of rows you want to limit to)
Please give the following a shot,
---table creation----
create table ch7(custdetails map string,string>)
row format delimited
collection items terminated by '#'
map keys terminated by '=';
----data loading----
Load data local inpath 'your-file-location-in-local' overwrite into table ch7;
----data selection-----
select custdetails["customer value"] from ch7;
Hope this helps!!