Pentaho | Issue with CSV file to Table output - pentaho

I am working in Pentaho spoon. I have a requirement to load CSV file data into one table.
I have used , as delimter in CSV file. I can see correct data in preview of CSV file input step. But when I tried to insert data into Table Output step, I am getting data truncation error.
This is because I have below kind of values in one of my column.
"2,ABC Squere".
As you see, I have "," in my column value so it is truncating and throwing error.How to solve this problem?
I want to upload data in Table with this kind of values..

Here is one way of doing it
test.csv
--------
colA,colB,colC
ABC,"2,ABC Squere",test
See below the settings. The key is to use "" as encloser and , as delimiter.

you can change the delimiter say to PIPE and also keeping data as quoted text like "1,Name" this will treat the same as 1 column

Related

TYPE command. Inserting csv file

I have a CSV file im looking to load into a TSQL table using the "type" command.
Code: type yourfilename
When looking in the command prompt its breaking the file lines into two different rows and inserting them separately into my destination table
EX.
"Manheim Chicago","Manheim","IL","199004520601","On
Block","2D4FV47V86H126473","2006","DODGE","MAGNUM 4X2 V6"
I want solution to look like this
Solution Pic
[1]: https://i.stack.imgur.com/Bkgf6.png
Where this would be one record in the table.
Question. Does anyone know how to format a type command so it displays a full record without line breaks?

Issues loading CSV into BigQuery table

Im trying to create a BigQuery table using a pretty simple csv file I have stored in GCS.
I keep getting the same error over and over again:
Could not parse '1/1/2008' as datetime for field XXX
I've checked that the csv file isn't corrupted, and I've managed to upload everything into one column so the file is readable by BigQuery.
I've added the word NULL to any empty fields thinking consecutive delimiters may be causing the issues but I am still facing the same issue.
I know data, I understand data and CSV files.
BigQuery cannot cast '1/1/2008' as DATETIME and rather would expecting something like '2008-1-1'
So, you can either modify your CSV file or just use STRING for that XXX field and than translate it into DATETIME in your queries - like below
#standardSQL
SELECT PARSE_DATETIME('%d/%m/%Y', '1/1/2008')

Specify multiple delimiters for Redshift copy command

Is there a way to specify multiple delimiters to Redshift copy command while loading data.
I have a data file having the following format:-
1 | ab | cd | ef
2 | gh | ij | kl
I am using a command like this:-
COPY MY_TBL
FROM 's3://s3-file-path'
iam_role 'arn:aws:iam::ddfjhgkjdfk'
manifest
IGNOREHEADER 1
gzip delimiter '|';
Fields are separated by | and records are separated using newline. How do I copy this data into Redshift. Because my query above gives me a delimiter not found error
No, delimiters are single characters.
From Data Format Parameters:
Specifies the single ASCII character that is used to separate fields in the input file, such as a pipe character ( | ), a comma ( , ), or a tab ( \t ).
You could import it with a pipe delimiter, then perform an UPDATE command to STRIP() off the spaces.
Your error above suggests that something in your data is causing the COPY command to fail. This could be a number of things, from file encoding, to some funky data in there. I've struggled with the "delimiter not found" error recently, which turned out to be the ESCAPE parameter combined with trailing backslashes in my data which prevented my delimiter (\t) from being picked up.
Fortunately, there are a few steps you can take to help you narrow down the issue:
stl_load_errors - This system table contains details on any error logged by Redshift during the COPY operation. This should be able to identify the row number in your data file that is causing the problem.
NOLOAD - will allow you to run your copy command without actually loading any data to Redshift. This performs the COPY ANALYZE operation and will highlight any errors in the stl_load_errors table.
FILLRECORD - This allows Redshift to "fill" any columns that it sees as missing in the input data. This is essentially to deal with any ragged-right data files, but can be useful in helping to diagnose issues that can lead to the "delimiter not found" error. This will let you load your data to Redshift and then query in database to see where your columns start being out of place.
From the sample you've posted, your setup looks good, but obviously this isn't the entire picture. The options above should help you narrow down the offending row(s) to help resolve the issue.

Hue on Cloudera - NULL values (importing file)

Yesterday I installed Cloudera QuickStart VM 5.8. After the import operation of files from the database by HUE, in some tables there were a NULL value (the entire column). In previous steps data display them properly as they should be imported.
First Pic.
Second Pic.
can you run the command describe formatted table_name in hive shell and see what is the field delimiter and then go to the warehouse directory and see if the delimiter in the data and in the table definition is same.i am sure it will not be same thats why you see null.
i am assuming you have imported the data in the default warehouse directory.
then you can do one of the following
1) delete your hive table and create it again with correct delimiter as it is in the actual data ( row format delimited fields terminated by "your delimitor" and give location as your data file
or
2) delete the data that is imported and run sqoop import again and give the fields-terminated-by " the delimitor in the hive table definition"
Once check datatype of second(col_1) and third(col_2) in original database from where your exporting.
This can not be case of missing delimiter, else fourth(col_3) would not have populated correctly, which is correct.

pentaho merge rows diff with using text file inputs

I have a text file that I need to load into a database... I used merge rows(diff)...
I compared the text file input with table input step.. i used sorted merge for sorting columns for both text file input and table input steps.. and i used merge rows(diff) step followed by Synchronize after merge... My problem is if i run my job first time its inserting the text file data to database.. and the second time also its inserting same rows again into the database... Can any one please help me what mistake i did..
use " Insert / Update " step in your transformation.. so it will avoid your duplication problem.
Insert/update Description