Unexpected END OF FILE while processing row - sql

Getting the following error when copying an input file into an empty db table. The input file only has 56732 rows, however I am getting an error on row 56733:
continue
* * * * * * * * * *
copy table temptable
(
abc = c(3),
bcao = c(1),
cba = c(10),
test = c(1)nl
)
from 'tempfile'
Executing . . .
E_CO0024 COPY: Unexpected END OF FILE while processing row 56733.
E_CO002A COPY: Copy has been aborted.
Anyone have any ideas why its trying to process an extra row? I have four other files the exact same format with different data and it processes fine.
Have no idea why this is happening...

The most likely cause is that you have some spaces or similar after your final row of data. You have set a new line as a delimiter on test, so the file needs to end with a new line. Delete anything after your data which isn't a blank new line.
As an example. Using the code below:
DECLARE GLOBAL TEMPORARY TABLE test (
v int
) ON COMMIT PRESERVE ROWS WITH NORECOVERY;
COPY test (
v = c(5)nl
) FROM 'J:\test.csv';
Will result in an error on line 4 for the following data:
34565
37457
35764
45685
and error on line 5 for this data (punctuation used to show issue, but it is probably a space or tab in your own file):
34565
37457
35764
45685
.

Related

trying to import csv file to table in sql

I have 4 csv files each having 500,000 rows. I am trying to import the csv data into my Exasol databse, but there is an error with the date column and I have a problem with the first unwanted column in the files.
Here is an example CSV file:
unnamed:0 , time, lat, lon, nobs_cloud_day
0, 2006-03-30, 24.125, -119.375, 22.0
1, 2006-03-30, 24.125, -119.125, 25.0
The table I created to import csv to is
CREATE TABLE cloud_coverage_CONUS (
index_cloud DECIMAL(10,0)
,"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
The command to import is
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv';
But I get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3050: [Column=0 Row=0] [Transformation of value='Unnamed: 0' failed - invalid character value for cast; Value: 'Unnamed: 0'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:59205' FILE 'e12a96a6-a98f-4c0a-963a-e5dad7319fd5' ;'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
Alternatively I use this table (without the first column):
CREATE TABLE cloud_coverage_CONUS (
"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
And use this import code:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'(2 FORMAT='YYYY-MM-DD', 3 .. 5);
But I still get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3052: [Column=0 Row=0] [Transformation of value='time' failed - invalid value for YYYY format token; Value: 'time' Format: 'YYYY-MM-DD'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:60350' FILE '22c64219-cd10-4c35-9e81-018d20146222' (2 FORMAT='YYYY-MM-DD', 3 .. 5);'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
(I actually do want to ignore the first column in the files.)
How can I solve this issue?
Solution:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv' (2 .. 5) ROW SEPARATOR = 'CRLF' COLUMN SEPARATOR = ',' SKIP = 1;
I did not realise that mysql is different from exasol
Looking at the first error message, a few things stand out. First we see this:
[Column=0 Row=0]
This tells us the problem is with the very first value in the file. This brings us to the next thing, where the message even tells us what value was read:
Transformation of value='Unnamed: 0' failed
So it's failing to convert Unnamed: 0. You also provided the table definition, where we see the first column in the table is a decimal type.
This makes sense. Unnamed: 0 is not a decimal. For this to work, the CSV data MUST align with the data types for the columns in the table.
But we also see this looks like a header row. Assuming everything else matches we can fix it by telling the database to skip this first row. I'm not familiar with Exasol, but according to the documentation I believe the correct code will look like this:
IMPORT INTO cloud_coverage_CONUS
FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'
(2 FORMAT='YYYY-MM-DD', 3 .. 5)
ROW SEPARATOR = 'CRLF'
COLUMN SEPARATOR = ','
SKIP = 1;

syntax error using perl dbi on copy into command

I have a copy into table command that includes multiple dollar signs in the sql, all of which are escaped. If I print out the actual command from the script and execute it manually it works perfectly. But when the perl script does it I get a syntax error. This is what I'm trying to execute, the printout from the command and then the sql error in turn, (I have assigned a $file in the script where it is inserting the data so that dollar sign doesn't get escaped below)
my $sql = "COPY INTO metaproc.control_table FROM (
SELECT SPLIT_PART(METADATA\$FILENAME,'/',4) as SEAT_ID,
\$1:auction_id_64 as AUCTION_ID_64,
DATEADD(S,\$1:date_time,'1970-01-01') as DATE_TIME,
\$1:user_tz_offset as USER_TZ_OFFSET,
\$1:creative_width as CREATIVE_WIDTH,
\$1:creative_height as CREATIVE_HEIGHT
FROM \#DBNAME.lnd.S3_PROD_ADIP/$file)
pattern = '\.*\.parquet' file_format = (TYPE = 'PARQUET' SNAPPY_COMPRESSION = TRUE)
ON_ERROR = 'SKIP_FILE_10%';";
my $sth = $dbh->prepare($sql);
$sth->execute;
COPY INTO metaproc.control_table FROM (
SELECT SPLIT_PART(METADATA$FILENAME,'/',4) as SEAT_ID,
$1:auction_id_64 as AUCTION_ID_64,
DATEADD(S,$1:date_time,'1970-01-01') as DATE_TIME,
$1:user_tz_offset as USER_TZ_OFFSET,
$1:creative_width as CREATIVE_WIDTH,
$1:creative_height as CREATIVE_HEIGHT
FROM #DBNAME.lnd.S3_PROD_ADIP/pr/appnexus/data_dt=20220217/19/STANDARD_20220218012146.gz.parquet)
pattern = '.*.parquet' file_format = (TYPE = 'PARQUET' SNAPPY_COMPRESSION = TRUE)
ON_ERROR = 'SKIP_FILE_10%';
SQL compilation error: syntax error line 3 at position 4 unexpected '?'. syntax error line 4 at position 13 unexpected '?'. syntax error line 4 at position 13 unexpected '?'.
COPY INTO DWH_AIR.LND_APN.LND_STANDARD_IMP_EVENT FROM (
SELECT SPLIT_PART(METADATA$FILENAME,'/',4) as SEAT_ID,
$1? as AUCTION_ID_64,
DATEADD(S,$1?,'1970-01-01') as DATE_TIME,
$1? as USER_TZ_OFFSET,
$1? as CREATIVE_WIDTH,
$1? as CREATIVE_HEIGHT
line 3 position 4 is the question mark after the '$1' on the 3rd line. I don't get it, why is it removing the ':auction_id_64' part of the string?
It looks like it is interpreting the : as a bind variable value, rather than a value in a variant. Have you tried using the bracket notation, instead?
https://docs.snowflake.com/en/user-guide/querying-semistructured.html#bracket-notation
I believe it would look something like
COPY INTO metaproc.control_table FROM (
SELECT SPLIT_PART(METADATA$FILENAME,'/',4) as SEAT_ID,
$1['auction_id_64'] as AUCTION_ID_64,
DATEADD(S,$1['date_time'],'1970-01-01') as DATE_TIME,
$1['user_tz_offset'] as USER_TZ_OFFSET,
$1['creative_width'] as CREATIVE_WIDTH,
$1['creative_height'] as CREATIVE_HEIGHT
FROM #DBNAME.lnd.S3_PROD_ADIP/pr/appnexus/data_dt=20220217/19/STANDARD_20220218012146.gz.parquet)
pattern = '.*.parquet' file_format = (TYPE = 'PARQUET' SNAPPY_COMPRESSION = TRUE)
ON_ERROR = 'SKIP_FILE_10%';
I am uncertain whether this works or not, but if it doesn't, I will remove the answer.

Amazon S3 Select From not working

Amazon S3 has a new feature called select from which allows one to run simple SQL queries against simple data files - like CSV or JSON. So I thought I'd try it.
I created and uploaded the following CSV to my S3 bucket in Oregon (I consider this file to be extremely simple):
aaa,bbb,ccc
111,111,111
222,222,222
333,333,333
I indicated this was CSV with a header row and issued the following SQL:
select * from s3object s
...which worked as expected, returning:
111,111,111
222,222,222
333,333,333
Then I tried one of the provided sample queries, which failed:
select s._1, s._2 from s3object s
...the error message was "Some headers in the query are missing from the file. Please check the file and try again.".
Also tried the following, each time receiving the same error:
select aaa from s3object s
select s.aaa from s3object s
select * from s3object s where aaa = 111
select * from s3object s where s.aaa = 111
select * from s3object s where s._1 = 111
So anytime my query references a column, either by name or number, either in the SELECT or WHERE clauses, I get the "headers in the query are missing". The AWS documentation provides no follow up information on this error.
So my question is, what's wrong? Is there an undocumented requirement about the column headers? Is there an undocumented way to reference columns? Does the "Select From" feature have a bug in it?
I did the following:
Created a file with the contents you show above
Entered S3 Select on the file, and ticked File has header row
Changed no other settings
These queries did NOT work:
select s._1, s._2 from s3object s
select * from s3object s where s._1 = 111
The reason they didn't work is that the file contains headers, so the columns have actual names.
These queries DID work:
select aaa from s3object s
select s.aaa from s3object s
select * from s3object s where aaa = 111 (Gave empty result)
select * from s3object s where s.aaa = 111 (Gave empty result)
When I treated the last two queries as strings, they returned the row as expected:
select * from s3object s where aaa = '111'
select * from s3object s where s.aaa = '111'
Getting back to this, on a whim I decided to replace this sample file with a new identical example file, and now I do not encounter the problem. In fact, I'm unable to replicate the problem that I originally posted.
I have a few theories: character encoding, end-of-line character, and the possible presence of an extra line in my original file, but I have been unable to re-create the original issue.
I've tried different editors to create the source file, I've tried unix vs windows end of line characters, I've tried extra line on the end, I've tried upper case vs lower case column headers, and I've tried different regions. Everything works now, so I'm completely mystified as to why it did not work in the first place.
Life goes on. Thanks to everyone for your efforts.
s3 select treats everything as string. The query
select * from s3object s where cast(aaa as int) = 111
select * from s3object s where cast(s.aaa as int) = 111
should return the expected results if the header rows are checked/unchecked appropriately.

Exporting data containing line feeds as CSV from PostgreSQL

I'm trying to export data From postgresql to csv.
First i created the query and tried exporting From pgadmin with the File -> Export to CSV. The CSV is wrong, as it contains for example :
The header : Field1;Field2;Field3;Field4
Now, the rows begin well, except for the last field that it puts it on another line:
Example :
Data1;Data2;Data3;
Data4;
The problem is i get error when trying to import the data to another server.
The data is From a view i created.
I also tried
COPY view(field1,field2...) TO 'C:\test.csv' DELIMITER ',' CSV HEADER;
It exports the same file.
I just want to export the data to another server.
Edit:
When trying to import the csv i get the error :
ERROR : Extra data after the last expected column. Context Copy
actions, line 3: <<"Data1, data2 etc.">>
So the first line is the header, the second line is the first row with data minus the last field, which is on the 3rd line, alone.
In order for you to export the file in another server you have two options:
Creating a shared folder between the two servers, so that the
database also has access to this directory.
COPY (SELECT field1,field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;
Triggering the export from the target server using the STDOUT of
COPY. Using psql you can achieve this running the following
command:
psql yourdb -c "COPY (SELECT * FROM your_table) TO STDOUT" > output.csv
EDIT: Addressing the issue of fields containing line feeds (\n)
In case you wanna get rid of the line feeds, use the REPLACE function.
Example:
SELECT E'foo\nbar';
?column?
----------
foo +
bar
(1 Zeile)
Removing the line feed:
SELECT REPLACE(E'foo\nbaar',E'\n','');
replace
---------
foobaar
(1 Zeile)
So your COPY should look like this:
COPY (SELECT field1,REPLACE(field2,E'\n','') AS field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;
the described above export procedure is OK, e.g:
t=# create table so(i int, t text);
CREATE TABLE
t=# insert into so select 1,chr(10)||'aaa';
INSERT 0 1
t=# copy so to stdout csv header;
i,t
1,"
aaa"
t=# create table so1(i int, t text);
CREATE TABLE
t=# copy so1 from stdout csv header;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> i,t
1,"
aaa"
>> >> >> \.
COPY 1
t=# select * from so1;
i | t
---+-----
1 | +
| aaa
(1 row)

Record in Hive is not complete when using Flume's Hive sink

I want to use Flume to collect data to Hive database.
I have stored the data in the Hive ,but the data is not complete.
I want to inset the record like follows:
1201,Gopal
1202,Manisha
1203,Masthanvali
1204,Kiran
1205,Kranthi
when I run the Flume ,there is bucket_00000 and bucket_00000_flush_length in the HDFS(/user/hive/warehouse/test2.db/employee12/delta_0000501_0000600). (the data base is test2 , the table name is employee12)
when i use " select * from employee12",it show as follows:
--------------------------------------------------------------------
hive> select * from employee12;
OK
(two line next)
1201 Gopal
1202
Time taken: 0.802 seconds, Fetched: 1 row(s)
----------------------------------------------------------------------
Can anyone help me to find :
Why it only shows two rows?
Why the second row only list 1202?
How to setup correct config?
Flume config:
agenthive.sources = spooldirSource
agenthive.channels = memoryChannel
agenthive.sinks = hiveSink
agenthive.sources.spooldirSource.type=spooldir
agenthive.sources.spooldirSource.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agenthive.sources.spooldirSource.spoolDir=/home/flume/flume_test_home/spooldir
agenthive.sources.spooldirSource.channels=memoryChannel
agenthive.sources.spooldirSource.basenameHeader=true
agenthive.sources.spooldirSource.basenameHeaderKey=basename
agenthive.sinks.hiveSink.type=hive
agenthive.sinks.hiveSink.hive.metastore = thrift://127.0.0.1:9083
agenthive.sinks.hiveSink.hive.database = test2
agenthive.sinks.hiveSink.hive.table = employee12
agenthive.sinks.hiveSink.round = true
agenthive.sinks.hiveSink.roundValue = 10
agenthive.sinks.hiveSink.roundUnit = second
agenthive.sinks.hiveSink.serializer = DELIMITED
agenthive.sinks.hiveSink.serializer.delimiter = ","
agenthive.sinks.hiveSink.serializer.serdeSeparator = ','
agenthive.sinks.hiveSink.serializer.fieldnames =eid,name
agenthive.sinks.hiveSink.channel=memoryChannel
agenthive.channels.memoryChannel.type=memory
agenthive.channels.memoryChannel.capacity=100
Hive create table sentence:
create table if not exists employee12 (eid int,name string)
comment 'this is comment'
clustered by(eid) into 1 buckets
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as orc
tblproperties('transactional'='true');
Try to use external tables. I found this article when working on similar setup.