Pentaho Text file out separator - pentaho

I am using a Text file output step in Pentaho Kettle for extracting data from sql and putting into CSV files. I have specified comma as the content separator. But sometimes I receive the files with semicolon seperated values. Any body else has faced the issue? I have read semicolon seperated values is the default content seperator for CSV file formats. I believe the content seperator is set to default to semicolon. Is this because the content seperator is set to default by the spoon environment based on the input data?

open the text file output step, go to content tab, their you will find option called Separator their what ever you will specify it will come into your final result, by-default you will find semi-column over their so just change it to comma and your problem will get resolved...

Related

" replaced by ""

redshift unload command is replacing " by "".
example :
UNLOAD($$ select '"Jane"' as name $$)
TO 's3://s3-bucket/test_'
iam_role arn:aws:iam::xxxxxx:role/xxxxxx'
HEADER
CSV
DELIMITER ','
ALLOWOVERWRITE
The output looks like : ""Jane""
If I run the same command with select 'Jane' as name , the output shows without quote at all like Jane. But I need the output to be "Jane"
You are asking for the unloaded file to be in CSV format and CSV format says that if you want a double quote in your data you need to escape it with another double quote. See https://datatracker.ietf.org/doc/html/rfc4180
So Redshift is doing exactly as you requested. Now if you just want a comma delimited file then you don't want to use "CSV" as this will add all the necessary characters to make the file fully compliant with the CSV specification.
This choice will come down to what tool or tools are reading the file and if they expect an rfc compliant CSV or just a simple file where fields are separated by commas.
This is a gripe of mine - tools that say they read CSV but don't follow the spec. If you say CSV then follow the format. Or call what you read something different, like CDV - comma delimited values.

How to clear txt file having different Delimiter using SSIS package?

I have text file which is having ^(CAP) and ,(Comma) as a delimiter and after clearing i need to load to sql . I have tried my best to clear a source file
But still file is not cleaned as expectation .
Please find the below picture i have tried to correct the source file
But still file is not cleared as expected . Please find below uncleared file .
You have a variety of issues here.
You have identified the header row delimiter as a comma. A row delimiter is the, usually invisible, delimiter than indicates a row's worth of data has happened. Traditionally, this is an Operating System specific value but it's a Carriage Return (CR), Line Feed (LF) or Carriage Return/Line Feed.
Your source data is not a comma delimited file with caret/circumflex/cap text delimiters. You have a comma-space delimited file which SSIS doesn't support in the editor. However, if you hand edit the dtsx file as I outlined in How to read a flatfile with lowercase thorn as the delimiter to specify that it should use comma space ColumnDelimiter="_x002C__x0020_"
Given a truncated version of your source data
ListCode, CAS, Name
^216^, ^^, ^Coal Dust^
^216^, ^7782-24-5^, ^Graphite (Natural)^
^216^, ^^, ^Inert or Nuisance Dust^
and the comma (0x2C) space (0x20) edited into the raw dtsx connection manager, I was able to pull data as I believe you are expecting
You might also run into additional issues given your selection of code pages and not checking the Unicode button but that's beyond my ability to generate matching source data from an image.
Just replace the ^, ^ with ^,^
It looks like your source
CAS, SubName, ListCode, Type, CountryCode, ListName
^1000413-72-8^,^fasiglifam^,^447^,^Chemical Inventory^,^EU^,^ECICS Custom Tariff Codes^
^1000413-72-8^,^fasiglifam^,^0^,^^,^NN^,^SPHERA Global Substance List^
Then edit your connection manager with below details
[![enter image description here][2]][2]
It will work .
[2]: https://i.stack.imgur.com/0x89k.png

skipLeadingRows=1 in external table definition

In the below example, how can I set the skip leading row option?
bq --location=US query --external_table_definition=sales::Region:STRING,Quarter:STRING,Total_sales:INTEGER#CSV=gs://mybucket/sales.csv 'SELECT Region,Total_sales FROM sales;'
Regards,
Sreekanth
Flags options can be found under the installation home folder (I marked in bold below the flag you are looking for)
/google-cloud-sdk/platform/bq/bq.py:
--[no]allow_jagged_rows: Whether to allow missing trailing optional columns in
CSV import data.
--[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import
data.
-E,--encoding: : The character encoding used by the input
file. Options include:
ISO-8859-1 (also known as Latin-1)
UTF-8
-F,--field_delimiter: The character that indicates the boundary between
columns in the input file. "\t" and "tab" are accepted names for tab.
--[no]ignore_unknown_values: Whether to allow and ignore extra, unrecognized
values in CSV or JSON import data.
--max_bad_records: Maximum number of bad records allowed before the entire job
fails.
(default: '0')
(an integer)
--quote: Quote character to use to enclose records. Default is ". To indicate
no quote character at all, use an empty string.
--[no]replace: If true erase existing contents before loading new data.
(default: 'false')
--schema: Either a filename or a comma-separated list of fields in the form
name[:type].
--skip_leading_rows: The number of rows at the beginning of the source file to
skip.
(an integer)
--source_format: : Format of
source data. Options include:
CSV
NEWLINE_DELIMITED_JSON
DATASTORE_BACKUP

PIG LOAD filename

I am just trying to load an unstructured input file and add the filename. So what I want to get is two fields :
filename:chararray, inputrow:chararray.
I can load the filename if I have a field delimiter using pigstorage(';','-tagfile') but I do not want to delimit fields at this point I just want the string and the filename. How can I do this ?
B
The way to load in files without applying a delimiter, is to choose a delimiter that does not (cannot) occur in the file.
For example, if your file is separated by ; and cannot contain tabs \t you could do:
pigstorage('\t','-tagfile')

Import flat file containing commas/quotes into SAP BODS

Hi I have a row like following in .csv file
12346,abcded,ssadsadc,2013.04.04 08.42.31,8,"I would like to use an
existing project as a template for a new project for another Report
Suite but it just overwrites the existing project rather than creates
new one even when I use the ""Save As"" function.",Analyst,,5,"Hotel
Room,Literature,Open/ Create",,
the text string has " and , as part of the string. Hence I am not able to use " as text delimiter in SAP BODS file format.
Could somebody help me on this?
Use a delimiter that is not expected to be in your data (ex. ~ or | ) or a string of multiple characters (ex. $^$ )