How to clear txt file having different Delimiter using SSIS package? - sql

I have text file which is having ^(CAP) and ,(Comma) as a delimiter and after clearing i need to load to sql . I have tried my best to clear a source file
But still file is not cleaned as expectation .
Please find the below picture i have tried to correct the source file
But still file is not cleared as expected . Please find below uncleared file .

You have a variety of issues here.
You have identified the header row delimiter as a comma. A row delimiter is the, usually invisible, delimiter than indicates a row's worth of data has happened. Traditionally, this is an Operating System specific value but it's a Carriage Return (CR), Line Feed (LF) or Carriage Return/Line Feed.
Your source data is not a comma delimited file with caret/circumflex/cap text delimiters. You have a comma-space delimited file which SSIS doesn't support in the editor. However, if you hand edit the dtsx file as I outlined in How to read a flatfile with lowercase thorn as the delimiter to specify that it should use comma space ColumnDelimiter="_x002C__x0020_"
Given a truncated version of your source data
ListCode, CAS, Name
^216^, ^^, ^Coal Dust^
^216^, ^7782-24-5^, ^Graphite (Natural)^
^216^, ^^, ^Inert or Nuisance Dust^
and the comma (0x2C) space (0x20) edited into the raw dtsx connection manager, I was able to pull data as I believe you are expecting
You might also run into additional issues given your selection of code pages and not checking the Unicode button but that's beyond my ability to generate matching source data from an image.

Just replace the ^, ^ with ^,^
It looks like your source
CAS, SubName, ListCode, Type, CountryCode, ListName
^1000413-72-8^,^fasiglifam^,^447^,^Chemical Inventory^,^EU^,^ECICS Custom Tariff Codes^
^1000413-72-8^,^fasiglifam^,^0^,^^,^NN^,^SPHERA Global Substance List^
Then edit your connection manager with below details
[![enter image description here][2]][2]
It will work .
[2]: https://i.stack.imgur.com/0x89k.png

Related

Pentaho Text file out separator

I am using a Text file output step in Pentaho Kettle for extracting data from sql and putting into CSV files. I have specified comma as the content separator. But sometimes I receive the files with semicolon seperated values. Any body else has faced the issue? I have read semicolon seperated values is the default content seperator for CSV file formats. I believe the content seperator is set to default to semicolon. Is this because the content seperator is set to default by the spoon environment based on the input data?
open the text file output step, go to content tab, their you will find option called Separator their what ever you will specify it will come into your final result, by-default you will find semi-column over their so just change it to comma and your problem will get resolved...

Removing handling newlines in a simple text import class

I have an input file that I want to use the string SPLIT function on for each line, depending on the Type field. However, the description field sometimes has data that has new lines in it so it messes up my file reader since it uses streamreader's readline() function
Handled:
Type|Name|User|Description
Type|Name|User|Description
Unhandled:
Type|Name|User|Description line 1
Description Line 2
Type|Name|User|Description
Besides not being able to validate on 'Type' for each line and keep reading the file for when the next Type field appears, are there any ways folks can come up with to properly read this file?
My solution was to have the file maker replace newline characters in their description field with another unique character that I can later add back in. I'm still interested in solutions from the file reader's perspective though
I know I'm talking to myself a lot here, but I found another solution, which is to remove remove line feeds, since the output file creator wrote out carriage returns for each line.
You could easily set a conditional statement to see if the Split array contains more than one element, which would indicate that it's a line you want to parse.

Cannot upload CSV that starts with an integer

I'm stuck with what seems like a weird BigQuery bug : I cannot upload a CSV file that starts (first line, first column) by an integer.
Here's my schema : COL1:INTEGER,COL2:INTEGER,COL3:STRING
Here's my csv file content :
100,4,XXX
100,4,XXX
If I put the STRING column as first column, the upload is OK.
If I add a header and tell BigQuery to skip it during the import, the upload is ok too.
But with the CSV and schema above, BigQuery always complains : Line:1 / Field:1, Value cannot be converted to expected type.
Anyone knows what the problem is ?
Thank you in advance,
David
I could not reproduce this problem--I copied and pasted the content into a file and uploaded it with no problems.
Perhaps the uploaded file format is corrupted somehow? If there are extra bytes at the beginning of the file, those would be ignored in a header row but might result in this error is the first value of the first field is expected to be an integer. I'd recommend examining the actual binary data in the file to make sure there's nothing funny going on.
Also, are you doing this import via web UI, command-line tool, or API? Have you tried one of the other methods?

How to get data from a .rtf file or excel file into database(sqlite) in iphone sdk?

I had lots of data in a .rtf file(having usernames and passwords).How can I fetch that data into a table. I'm using sqlite3.
I had created a "userDatabase.sql" in that I had created a table "usersList" having fields "username","password". I want to get the list of data in the "list.rtf" file in to my table "usersList". Please help me .
Thanks in advance.
Praveena.
I would write a little parser. Re-save the .rtf as a txt-file and assume it look like this:
user1:pass1
user2:pass2
user5:pass5
Now do this (in your code):
open the .txt file (NSString -stringWithContentsOfFile:usedEncoding:error:)
read line by line
for each line, fetch user and password (NSArray -componentsSeparatedByString)
store user/password into your DB
Best,
Christian
Edit: for parsing excel-sheets I recommend export as CSV file and then do the same
Parsing RTF files is mostly trivial. They're actually text, not binary (like doc pdf etc).
Last I used it, I remember the file format wasn't too difficult either.
Example:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\f0\fs22 Username Password\par
Username2 Password2\par
UsernameN PasswordN\par
}
Do a regular expression match to get the last { ... } part. By sure to match { not \{.
Next, parse the text as you want, but keep in mind that:
everything starting with a \ is escaped, I would write a little function to unescape the text
the special identifier \par is for a new line
there are other special identifiers, such as \b which toggles bolding text
the color change identifier, \cfN changes the text color according to the color table defined in the file header. You would want to ignore this identifier since we're talking about plain text.

Append newline to flat-file schema in BizTalk 2006 R2

I have a flat-file schema that has a header and detail records. It looks something like this:
HDR**2401*XX0062484*22750***20081006000000*000*******
LIN**001*788-0538-001*4891-788538010*20000*EA**0000***
I need to append two blank lines at the end of the message. Right now, if I have multiple records I get the following output:
HDR**2401*XX0062484*22750***20081006000000*000*******
LIN**001*788-0538-001*4891-788538010*20000*EA**0000***
HDR**2401*XX0062484*22750***20081006000000*000*******
LIN**001*788-0538-001*4891-788538010*20000*EA**0000***
What I want to see happen is something like this:
HDR**2401*XX0062484*22750***20081006000000*000*******
LIN**001*788-0538-001*4891-788538010*20000*EA**0000***
HDR**2401*XX0062484*22750***20081006000000*000*******
LIN**001*788-0538-001*4891-788538010*20000*EA**0000***
I could build a custom pipeline component to do this, but I'm wondering if there is a simpler way of getting what I need?
You should be able to accomplish what you want by using the Delimiter properties of the flat file schema.
Based on your example file I created a schema with the following record structure:
<Schema>
<Root>
<HDRGroup>
<HDR>
<LIN>
If you click on the root node of your schema you should see a list of properties for this root node. One properties section has the header 'Flat File'. In this flat file section the first three properties you can set are Child Delimiter, Child Delimiter Type and Child Order.
This is where you configure the schema to create the blank lines (in this case CR LF but you can set different things as you need) For your example I set the following:
Child Delimiter: 0x0D 0x0A 0x0D 0x0A
Child Delimiter Type: Hexadecimal
Child Order: Infix
0x0D 0x0A is a carriage return line feed, so the above simply creates two blank lines, infixed between each child of the root node.
The <HDRGroup> then functions to make sure that each header and its lines is kept together. For its delimiter settings I set:
Child Delimiter: 0x0D 0x0A
Child Delimiter Type: Hexadecimal
Child Order: Postfix
The <HDR> and <LIN> records then contain the actual schema definition for your message lines, delimited with an asterisk.
This schema works for something that looks to me like what you have asked for - this sort of flatfile schema and how it parses a file is highly dependant of the little details, however, such as what type of line breaks there are and if there are line breaks at the end of the file.
The princples of using the delimiters will stand, you will likely find you need to tinker with the settings.
For anybody who cares, I finally caved in and wrote a custom pipeline component to accomplish this.