Coding problem : invalid byte sequence for "UTF8" encoding: 0xff - sql

I have a CSV file, with 19 fields, some fields contain text in Arabic language.
The encoding of the file is UTF8:
I created a table in postgres with the same name of the fields of the file
The goal is to import the data from the CSV file to the table created.
CREATE TABLE Dubai_201606 (
Case_Type_Arabic VARCHAR(100),
Case_Type VARCHAR(100),
Case_Number VARCHAR(20),
Abbreviation VARCHAR(50),
Notice_Date Date,
Notifier VARCHAR(100),
English VARCHAR(100),
Notifier_N_Ref VARCHAR(50),
Notifier_Licence_No int,
Notifier_Company_No VARCHAR(50),
Notifier_City VARCHAR(50),
Party VARCHAR(100),
Enlish_Party VARCHAR(100),
Party_N_Ref VARCHAR(50),
Party_Licence_No int,
Party_Company_No VARCHAR(50),
Party_City VARCHAR(50),
Subject_Arabic TEXT,
Subject_English TEXT
)
Then I used Copy to import the file into the created table.
COPY Dubai_201606 FROM 'C:\Users\king-\OneDrive\Bureau\201606.csv' WITH CSV HEADER;
After execution I had the following error
ERROR: ERROR: sequence of bytes invalid for "UTF8" encoding: 0xff
CONTEXT: COPY dubai_201606, line 1
SQL state: 22021

Related

Is there anyway to use BULK INSERT from .txt file and choose specific column to import into SQL?

I have a .txt file that consists of around 80 columns and 1,900,000 rows. I want to import specific columns into SQL Server using BULK INSERT and make a new table but I kept getting an error that says the column format is wrong. I also need to keep NULLS since some columns have NULL. I created a new table that consists of all the columns that I chose out of the 80 columns. I tried to import this data through the Import and Export wizard and it took me around 10 hours which is strange, that is why I want to try importing it through query.
My code:
CREATE TABLE IM20190930
(
FACILITY_NUMBER varchar(255),
CUSTOMER_NUMBER varchar(255),
BRANCH_CODE varchar(255),
ACCOUNT_STATUS varchar(255),
SEGMENT_RULE_ID varchar(255),
PD_SEGMENT varchar(255),
RATING_CODE varchar(255),
EXCHANGE_RATE varchar(255),
GROUP_SEGMENT varchar(255),
DOWNLOAD_DATE varchar(255),
OUTSTANDING varchar(255),
BI_COLLECTABILITY varchar(255),
DAY_PAST_DUE varchar(255)
)
BULK INSERT dbo.IM20190930
FROM 'c:\Users\Emily\Desktop\IM20190930.txt'
WITH
(FIELDTERMINATOR=',',
ROWTERMINATOR='|',
KEEPNULLS)
You can use the FORMATFILE = 'format_file_path' option to specify the mapping between file source and table columns.
A format file can be expressed in text or XML.
From the doc

[Amazon](500310) Invalid operation: syntax error at end of input Position: 684;

CREATE EXTERNAL TABLE schema_vtvs_ai_ext.fire(
fire_number VARCHAR(50),
fire_year DATE,
assessment_datetime INTEGER,
size_class CHAR,
fire_location_latitude REAL,
fire_location_longitude REAL,
fire_origin VARCHAR(50),
general_cause_desc VARCHAR(50),
activity_class VARCHAR(50),
true_cause VARCHAR(50),
fire_start_date DATE,
det_agent_type VARCHAR(50),
det_agent VARCHAR(50),
discovered_date DATE,
reported_date DATE,
start_for_fire_date DATE,
fire_fighting_start_date DATE,
initial_action_by VARCHAR(50),
fire_type VARCHAR(50),
fire_position_on_slope VARCHAR(50),
weather_conditions_over_fire VARCHAR(50),
fuel_type VARCHAR(50),
bh_fs_date DATE,
uc_fs_date DATE,
ex_fs_date DATE
);
This is the SQL code i have written to add an external table in Redhsift schema but the below error. i can't seem to see where the error is?
[Amazon](500310) Invalid operation: syntax error at end of input Position: 684;
If your data is in Amazon S3, then you need to specify the file format (via STORED AS) and the path to data files in S3 (via LOCATION).
This is the example query for csv files (with 1 line header):
create external table <external_schema>.<table_name> (...)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');
See official doc for details.

Relative path error when creating external tables

My task is to create some external tables using hive beeline. But I encountered relative path error, says "Relative path in absolute URI: hdfs://localhost:8020./user/bigdata) (state=08S01,code=1)
Aborting command set because "force" is false and command failed:"
I am using a hql script(by requirement) to create external table, my script is this:
create external table ecustomer(
customer_id DECIMAL(3),
customer_code VARCHAR(5),
company_name VARCHAR(100),
contact_name VARCHAR(50),
contact_title VARCHAR(30),
city VARCHAR(30),
region VARCHAR(2),
postal_code VARCHAR(30),
country VARCHAR(30),
phone VARCHAR(30),
fax VARCHAR(30))
row format delimited fields terminated by '|'
stored as textfile location 'user/bigdata/ecustomer';
create external table eorder_detail(
order_id DECIMAL(5),
product_id DECIMAL(2),
customer_id DECIMAL(3),
salesperson_id DECIMAL(1),
unit_price DECIMAL(2,2),
quantity DECIMAL(2),
discount DECIMAL(1,1))
row format delimited fields terminated by '|'
stored as textfile location 'user/bigdata/eorder_detail';
create external table eproduct(
product_id DECIMAL(2),
product_name VARCHAR(50),
unit_price DECIMAL(2,2),
unit_in_stock DECIMAL(4),
unit_on_order DECIMAL(3),
discontinued VARCHAR(1))
row format delimited fields terminated by '|'
stored as textfile location 'user/bigdata/eproduct';
create external table esalesperson(
employee_id DECIMAL(1),
lastname VARCHAR(30),
firstname VARCHAR(30),
title VARCHAR(50),
birthdate VARCHAR(30),
hiredate VARCHAR(30),
notes VARCHAR(100))
row format delimited fields terminated by '|'
stored as textfile location 'user/bigdata/esalesperson';
create external table eorder(
order_id DECIMAL(5),
order_date VARCHAR(30),
ship_via DECIMAL(1),
ship_city VARCHAR(30),
ship_region VARCHAR(30),
ship_postal_code VARCHAR(30),
ship_country VARCHAR(30))
row format delimited fields terminated by '|'
stored as textfile location 'user/bigdata/eorder';
then, I execute this script on beeline server, however, I encountered the abovementioned error. I have already create a folder on my hadoop server for each table which are ecustomer, eorder_detail, eproduct, esalesperson and eorder. And the tables are also uploaded to hadoop server. Please help me resolve the error.
Try using an absolute path, instead of a relative one. e.g. 'hdfs://localhost:8020/user/bigdata/ecustomer'
create external table ecustomer(
customer_id DECIMAL(3),
customer_code VARCHAR(5),
company_name VARCHAR(100),
contact_name VARCHAR(50),
contact_title VARCHAR(30),
city VARCHAR(30),
region VARCHAR(2),
postal_code VARCHAR(30),
country VARCHAR(30),
phone VARCHAR(30),
fax VARCHAR(30))
row format delimited fields terminated by '|'
stored as textfile location 'hdfs://localhost:8020/user/bigdata/ecustomer';
...
[same for other DDLs]

FAILED: ParseException line 1:36 cannot recognize input near '1987'

I'm trying to creat an external table in Hive with this
CREATE EXTERNAL TABLE IF NOT EXISTS 1987(
YEAR INT,
MONTH INT,
DAYOFMONTH INT,
DAYOFWEEK INT,
DEPTIME INT,
CRS INT,
ARRTIME TIME,
CARRIER STRING,
FLIGHTNUM INT,
TAILNUM STRING,
ACTUALELAPSED INT,
CRSELAPSED INT,
AIRTIME INT,
ARRDELAY INT,
DEPDELAY INT,
ORIGIN STRING,
DEST STRING,
DISTANCE INT,
TAXIIN INT,
TAXIOUT INT,
CANCELLED INT,
CANCELLATIONCODE STRING,
DIVERTED INT,
CARRIERDELAY INT,
WEATHERDELAY INT,
NASDELAY INT,
SECURITYDELAY INT,
LATEAIRCRAFT INT,
Origin CHAR(1))
COMMENT 'A??O 1987'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
location '/user/raj_ops/PROYECTO/'1987.csv';
But get the following error:
org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: ParseException line 1:36 cannot recognize input
near '1987' '(' 'YEAR' in table name
Anyone knows why?
Thanks :)
location should be '/user/raj_ops/PROYECTO/' (without file itself). If you have other files in the same location then move them to separate locations, like /user/raj_ops/PROYECTO/1987/ for 1987. Because table can be built on top of location, not file.
And table name cannot start with digits. use backquotes 1987 or rename it like year_1987
I think you probably need to escape the table name with back-ticks if its numeric:
`1987`
There is an extra quote in the location value.Remove that.
location '/user/raj_ops/PROYECTO/'1987.csv';
Should be
location '/user/raj_ops/PROYECTO/1987.csv';

External table syntax error KUP-01005 Oracle

I receive error as below every time when i select external table that i have created.
ORA-29913: bład podczas wykonywania wywołania (callout) ODCIEXTTABLEOPEN
ORA-29400: bład kartrydza danych
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "minussign": expecting one of: "badfile, byteordermark, characterset, column, data, delimited, discardfile, dnfs_enable, dnfs_disable, disable_directory_link_check, field, fields, fixed, io_options, load, logfile, language, nodiscardfile, nobadfile, nologfile, date_cache, dnfs_readbuffers, preprocessor, readsize, string, skip, territory, variable, xmltag"
KUP-01007: at line 4 column 23
29913. 00000 - "error in executing %s callout"
The external table is created successfully. Here is the script which creates external table:
CREATE TABLE TB_CNEI_01C
(
NEW_OMC_ID VARCHAR(2),
NEW_OMC_NM VARCHAR(8),
NEW_BSS_ID VARCHAR(6),
NEW_BSS_NM VARCHAR(20),
OMC_ID VARCHAR(2),
OMC_NM VARCHAR(8),
OLD_BSS_ID VARCHAR(6),
OLD_BSS_NM VARCHAR(20),
DEPTH_NO INTEGER,
NE_TP_NO INTEGER,
OP_YN INTEGER,
FAC_ALIAS_NM VARCHAR(20),
FAC_GRP_ALIAS_NM VARCHAR(20),
SPC_VAL VARCHAR(4),
INMS_FAC_LCLS_CD VARCHAR(2),
INMS_FAC_MCLS_CD VARCHAR(3),
INMS_FAC_SCLS_CD VARCHAR(3),
INMS_FAC_SCLS_DTL_CD VARCHAR(2),
LDEPT_ID VARCHAR(3),
FAC_ID VARCHAR(15),
MME_IP_ADDR VARCHAR(20),
MDEPT_ID VARCHAR(4),
HW_TP_NM VARCHAR(20),
MME_POOL_NM VARCHAR(20),
BORD_CNT INTEGER,
FAC_DTL_CLSFN_NM VARCHAR(50),
INSTL_FLOOR_NM VARCHAR(20),
INSTL_LOC_NM VARCHAR(30)
)
ORGANIZATION EXTERNAL
(
TYPE oracle_loader
DEFAULT DIRECTORY EXTERNAL_DATA
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
badfile EXTERNAL_DATA:'testTable.bad'
logfile EXTERNAL_DATA:'testTable.log'
CHARACTERSET x-IBM949
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
(
NEW_OMC_ID VARCHAR(2),
NEW_OMC_NM VARCHAR(8),
NEW_BSS_ID VARCHAR(6),
NEW_BSS_NM VARCHAR(20),
OMC_ID VARCHAR(2),
OMC_NM VARCHAR(8),
OLD_BSS_ID VARCHAR(6),
OLD_BSS_NM VARCHAR(20),
DEPTH_NO INTEGER,
NE_TP_NO INTEGER,
OP_YN INTEGER,
FAC_ALIAS_NM VARCHAR(20),
FAC_GRP_ALIAS_NM VARCHAR(20),
SPC_VAL VARCHAR(4),
INMS_FAC_LCLS_CD VARCHAR(2),
INMS_FAC_MCLS_CD VARCHAR(3),
INMS_FAC_SCLS_CD VARCHAR(3),
INMS_FAC_SCLS_DTL_CD VARCHAR(2),
LDEPT_ID VARCHAR(3),
FAC_ID VARCHAR(15),
MME_IP_ADDR VARCHAR(20),
MDEPT_ID VARCHAR(4),
HW_TP_NM VARCHAR(20),
MME_POOL_NM VARCHAR(20),
BORD_CNT INTEGER,
FAC_DTL_CLSFN_NM VARCHAR(50),
INSTL_FLOOR_NM VARCHAR(20),
INSTL_LOC_NM VARCHAR(30)
)
)
LOCATION ('TB_CNEI_01C.csv')
);
I have checked all permisions for data directory and data files
I had a few commented lines in my 'CREATE TABLE ..' script. I removed those commented lines and the error disappeared.
I received the above suggestion from : http://www.orafaq.com/forum/t/182288/
It seems your CHARACTERSET(x-IBM949) containing - character is not valid
You may try the other alternatives without that sign,
such as
AL32UTF8, US7ASCII, WE8MSWIN1252 .. etc.