How can I load in a pipe (|) delimited text file that has columns that sometimes contain line breaks? - vb.net

I have built an SSIS package that loads in several delimited text files into a SQL database. One of the files often contains line spaces in it, which breaks the standard data flow task of setting a flat file source and mapping to an ado.net destination since it thinks it is on a new line when it reaches a line break. The vendor sending over the files does not want to sent the file without any edits and can't do XML at this time. Is there any way to fix this? I was thinking of writing a small vb.net program that would correct the files so they would work in the SSIS package, but not sure how to write that logic. The file has 5 columns, the first 2 are big integer and always contain some long integer ID, then there is a small text column that just contains one short word, then a date, and then a long comments field that is causing the problem. The comments field is sometimes blank (which is ok), the problem are the rows that have line breaks. I never know how many line breaks are in the comments, some have none, some can have several, even multiple line breaks in a row, so was wondering if this is even possible.
5787626|6547599|Approved|1/10/2017|Applicant request for fee waiver approved
5443221|7742812|Active|11/5/2013|
3430962|7643957|Re-Scheduled|5/25/2016|REVISED TERMS AND CONDITIONS REJECTED
Applicant has 30 DAYS To submit paperwork for extension.
34433624|7673715|Denied|1/24/2017|
34113575|7653748|Active|1/8/2014|New terms have been granted.
Sample File Format.

As long as there is logic that you can program/predict, it will be possible.
I would do it using a Script Component as a source, which means you don't need to rewrite the file before processing it. It also provides a lot of flexibility, e.g., you can store values in variables while iterating over multiple lines in the file, etc.
I posted another answer recently that gives a lot of detail on how to go about this: SSIS import a Flat File to SQL with the first row as header and last row as a total.
An example of holding the values in variables until the row is ready to be written:-
For this example I am writing three columns, ID1, ID2 and Comments. The file looks like this:
1|2|Comment1
Comment2
4|5|Comment3
Comment4
Comment5
6|7|Comment6
The Script Component contains the following method.
public override void CreateNewOutputRows()
{
System.IO.StreamReader reader = null;
try
{
bool readFirstLine = false;
int id1 = 0;
int id2 = 0;
string comments = null;
reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (line.Contains("|"))
{
if (readFirstLine)
{
Output0Buffer.AddRow();
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
}
else
{
readFirstLine = true;
}
string[] fields = line.Split('|');
id1 = Convert.ToInt32(fields[0]);
id2 = Convert.ToInt32(fields[1]);
comments = fields[2];
}
else
{
comments += " " + line;
}
if (reader.EndOfStream)
{
Output0Buffer.AddRow();
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
}
}
}
catch
{
if (reader != null)
{
reader.Close();
reader.Dispose();
}
throw;
}
}
The result set is:
ID1 ID2 Comments
=== === ========
1 2 Comment1 Comment2
4 5 Comment3 Comment4 Comment5
6 7 Comment6

Related

SSIS Import data that is NOT columnar into SQL

I am fairly new to SSIS and need a little help getting started. I have several reports that come out of our mainframe. The reports are not in a columnar format. The date record is at the top then there might be some initial data then there might be a little more. So I need to read in each line look to see what the text reads and figure out if I need the data or move to the next row.
This is a VERY rough example of what the report I want to import into a SQL table.
DATE: 01/08/2020 FACILITY NAME PAGE1
REVENUE USAGE FOR ACCOUNTING PERIOD 02
----TOTAL---- ----TOTAL---- ----OTHER---- ----INSURANCE---- ----INSURANCE2----
SERVICE CODE - 123456789 DESCRIPTION: WIDGETS
CURR 2,077
IP 0.0000 3 2,345 0.00
143
OP 0.0000 2 1,231 0.00
YTD 5
IP 0.0000
76
OP 0.0000
etc . . . .. .
SERVICE CODE
After the SERVICE CODE the data will start to repeat like it is above. This is the basic idea of a report.
I want to get the Date then the Service Code, Description, Current IP Volume, Current IP Dollar, Current OP Volume, Current OP Dollar, YTD IP Volume, YTD IP Dollar, YTD OP Volume, YTD OP Dollar . . then repeat.
Just to clarify, I am not asking anyone to do this for me. I want to learn how to do this. I have looked on how to do this but every example I have looked at talks about doing this with a CSV, tab, or Excel file. i do not have that type of file so I was asking what I need to look at. I currently use Monarch to format the file, but again I want to learn more about SSIS and this is a perfect way to learn. Asking the vendor to redo the report is not an option plus I want to learn how to do this. Thank you I just wanted to get that out there.
Any help would be greatly appreciated.
Rodger
As stated in comments, you could do this using a script task. The basics steps are:
Define a DataTable to store your data.
Use a StreamReader to read your report.
Process this using a combination of conditionals, String Methods, and parsing to extract the relevant fields from the relevant line:
Write the DataTable to the database using SqlBulkCopy
The following would go inside your Main method in your script task:
//Define a table to store your data
var table = new DataTable
{
Columns =
{
{ "ServiceCode", typeof(string) },
{ "Description", typeof(string) },
{ "CurrentIPVolume", typeof(int) },
{ "CurrentIPDollar,", typeof(decimal) },
{ "CurrentOPVolume", typeof(int) },
{ "CurrentOPDollar", typeof(decimal) },
{ "YTDIPVolume", typeof(int) },
{ "YTDIPDollar,", typeof(decimal) },
{ "YTDOPVolume", typeof(int) },
{ "YTDOPDollar", typeof(decimal) }
}
};
var filePath = #"Your File Path";
using (var reader = new StreamReader(filePath))
{
string line = null;
DataRow row = null;
// As YTD and Curr are identical, we will need a flag later to mark our position within the record
bool ytdFlag= false;
//Loop through every line in the file
while ((line = reader.ReadLine()) != null)
{
//if the line is blank, move on to the next
if (string.IsNullOrWhiteSpace(line)
continue;
// If the line starts with service code, then it marks the start of a new record
if (line.StartsWith("SERVICE CODE"))
{
//If the current value for row is not null then this is
//not the first record, so we need to add the previous
//record to the tale before continuing
if (row != null)
{
table.Rows.Add(row);
ytdFlag= false; // New record, reset YTD flag
}
row = table.NewRow();
//Split the line now based on known values:
var tokens = line.Split(new string[] { "SERVICE CODE - ", "DESCRIPTION: "}, StringSplitOptions.None);
row[0] = tokens[0];
row[1] = tokens[1];
}
if (line.StartsWith("CURR"))
{
//Process the row --> "CURR 2,077"
//Not sure what 2,077 is, but this will parse it
int i = 0;
if (int.TryParse(line.Substring(4).Trim().Replace(",", ""), out i))
{
//Do something with your int
Console.WriteLine(i);
}
}
if (line.StartsWith(" IP"))
{
//Start at after IP then split the line into the 4 numbers
var tokens = line.Substring(3).Split(new [] { " "}, StringSplitOptions.RemoveEmptyEntries);
//If we have gone past the CURR record, then at to YTD Columns
if (ytdFlag)
{
row[6] = int.Parse(tokens[1]);
row[7] = decimal.Parse(tokens[1]);
}
//Otherwise we are still in the CURR section:
else
{
row[2] = int.Parse(tokens[1]);
row[3] = decimal.Parse(tokens[1]);
}
}
if (line.StartsWith(" OP"))
{
//Start at after OP then split the line into the 4 numbers
var tokens = line.Substring(3).Split(new [] { " "}, StringSplitOptions.RemoveEmptyEntries);
//If we have gone past the CURR record, then at to YTD Columns
if (ytdFlag)
{
row[8] = int.Parse(tokens[1]);
row[9] = decimal.Parse(tokens[1]);
}
//Otherwise we are still in the CURR section:
else
{
row[4] = int.Parse(tokens[1]);
row[5] = decimal.Parse(tokens[1]);
}
//After we have processed an OP record, we must set the YTD Flag to true.
//Doesn't matter if it is the YTD OP record, since the flag will be reset
//By the next line that starts with SERVICE CODE anyway
ytdFlag= true;
}
}
}
//Now that we have processed the file, we can write the data to a database
using (var sqlBulkCopy = new SqlBulkCopy("Your Connection String"))
{
sqlBulkCopy.DestinationTableName = "dbo.YourTable";
//If necessary add column mappings, but if your DataTable matches your database table
//then this is not required
sqlBulkCopy.WriteToServer(table);
}
This is a very quick example, far from the finished article, and I have done little or no testing, but it should give you the gist of how it could be done, and get you started on one possible solution.
It can definitely be cleaned up and refactored, but I have tried to make it as clear as possible what is going on, rather than trying to write the most efficient code ever. It should also (hopefully) demonstrate what a monumental pain this is to do, and very minor report changes things like an extra space be "OP" will break the whole thing.
So again, I would re-iterate, if you can get the data in a standard flat file format, with one line per record, you should. I do however appreciate that sometimes these things are out of your control, and I have had to write incredibly ugly import routines like this in the past, so I feel your pain if you can't get the data in a consumable format.

SSIS import a Flat File to SQL with the first row as header and last row as a total

I receive Text File that I have to Import to a SQL Table, I have to come with a SSIS because I will received the Flat File every Day , with the First Row as the Customer_ID, then come the invoice details and then the Total of the invoice.
Example :
30303
0000109291700080190432737000005Name of the product
0000210291700080190432737000010Name of the product
0000309291700080190432737000000Name of the product
003 000145
So let me Explain:
First 30303 is the Customer #
Other Rows Invoice Details
00001-> ROWID 092917-> DATE 000801904327->PROD 370->Trans 00010 -> AMOUNT
Name of the product
Last Row
003==>Total rows 000145==>Total of Invoice
Any Clue ?
I would use a Script Component as a source in a Data Flow Task. You can then use C# or VB.net to read the file, e.g., by using System.IO.StreamReader, in any way you wish. You can read a line at a time, store values in variables to write to every row (e.g., the customer number), etc. It's extremely flexible for complex files.
Here is an example script (C#) based on your data:
public override void CreateNewOutputRows()
{
System.IO.StreamReader reader = null;
try
{
bool line1Read = false;
int customerNumber = 0;
reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (!line1Read)
{
customerNumber = Convert.ToInt32(line);
line1Read = true;
}
else if (!reader.EndOfStream)
{
Output0Buffer.AddRow();
Output0Buffer.CustomerNumber = customerNumber;
Output0Buffer.RowID = Convert.ToInt32(line.Substring(0, 5));
Output0Buffer.Date = DateTime.ParseExact(line.Substring(5, 6), "MMddyy", System.Globalization.CultureInfo.CurrentCulture);
Output0Buffer.Prod = line.Substring(11, 12);
Output0Buffer.Trans = Convert.ToInt32(line.Substring(23, 3));
Output0Buffer.Amount = Convert.ToInt32(line.Substring(26, 5));
Output0Buffer.ProductName = line.Substring(31);
}
}
}
catch
{
if (reader != null)
{
reader.Close();
reader.Dispose();
}
throw;
}
}
The columns in 'Output 0' of the Script Component are configured as follows:
Name DataType Length
==== ======== ======
CustomerNumber four-byte signed integer [DT_I4]
RowID four-byte signed integer [DT_I4]
Date database date [DT_DBDATE]
Prod string [DT_STR] 12
Trans four-byte signed integer [DT_I4]
Amount four-byte signed integer [DT_I4]
ProductName string [DT_STR] 255
To implement this:
Create a string variable called 'FilePath' with your file path in it for the script to reference.
Create a Data Flow Task.
Add a Script Component to the Data Flow Task - you'll be asked what type it should be, select 'Source'.
Right-click the Script Component, click 'Edit'.
On the 'Script' pane, add the 'FilePath' variable to the 'ReadOnlyVariables' section.
On the 'Inputs and Outputs' pane, expand 'Output 0' and add columns to the 'Output Columns' section as per the above table.
On the 'Script' pane, click 'Edit Script', and then paste my code over the public override void CreateNewOutputRows() method (replacing it).
Your Script Component source is now configured, and you'll be able to use it like any other data source component. To write this data to a SQL Server table, add an OLEDB Destination to the Data Flow Task, and link the Script Component to that, configuring the columns appropriately.

Import CSV File Error : Column Value containing column delimiter

I am trying to Import a Csv File into SQL SERVER using SSIS
Here's an example how data looks like
Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name
Joseph Jade,2005-01-01,1,Good listener,Male,Amy
Amy Jade,2006-01-01,1,Good in science,Female,Amy
....
Csv Columns are not containing text qualifiers (quotations)
I Created a simple package using SSIS to import it into SQL but sometime the data in SQL looked like below
Student_Name Student_DOB Student_ID Student_Notes Student_Gender Student_Mother_Name
Ali Jade 2004-01-01 1 Good listener Bad in science Male,Lisa
The Reason was that somtimes [Student_Notes] column contains Comma (,) that is used as column delimiter so the Row are not imported Correctly
Any suggestions
A word of warning: I'm not a regular C# coder.
But anyway this code does the following:
It opens a file called C:\Input.TXT
It searches each line. If the line has more than 5 commas, it takes all the extra commas out of the third last field (notes)
It writes the result to C:\Output.TXT - that's the one you need to actually import
There are many improvements that could be made:
Get file paths from connection managers
Error handling
An experienced C# programmer could probably do this in hlaf the code
Keep in mind your package will need write access to the appropriate folder
public void Main()
{
// Search the file and remove extra commas from the third last field
// Extended from code at
// http://stackoverflow.com/questions/1915632/open-a-file-and-replace-strings-in-c-sharp
// Nick McDermaid
string sInputLine;
string sOutputLine;
string sDelimiter = ",";
String[] sData;
int iIndex;
// open the file for read
using (System.IO.FileStream inputStream = File.OpenRead("C:\\Input.txt"))
{
using (StreamReader inputReader = new StreamReader(inputStream))
{
// open the output file
using (StreamWriter outputWriter = File.AppendText("C:\\Output.txt"))
{
// Read each line
while (null != (sInputLine = inputReader.ReadLine()))
{
// Grab each field out
sData = sInputLine.Split(sDelimiter[0]);
if (sData.Length <= 6)
{
// 6 or less fields - just echo it out
sOutputLine = sInputLine;
}
else
{
// line has more than 6 pieces
// We assume all of the extra commas are in the notes field
// Put the first three fields together
sOutputLine =
sData[0] + sDelimiter +
sData[1] + sDelimiter +
sData[2] + sDelimiter;
// Put the middle notes fields together, excluding the delimiter
for (iIndex=3; iIndex <= sData.Length - 3; iIndex++)
{
sOutputLine = sOutputLine + sData[iIndex] + " ";
}
// Tack on the last two fields
sOutputLine = sOutputLine +
sDelimiter + sData[sData.Length - 2] +
sDelimiter + sData[sData.Length - 1];
}
// We've evaulted the correct line now write it out
outputWriter.WriteLine(sOutputLine);
}
}
}
}
Dts.TaskResult = (int)Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success;
}
In The Flat File Connection Manager. Make the File as only one column (DT_STR 8000)
Just add a script Component in the dataflowtask and Add Output Columns (Same as Example Shown)
in The script component split each row using the following Code:
\\Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name
Dim strCells() as string = Row.Column0.Split(CChar(","))
Row.StudentName = strCells(0)
Row.StudentDOB = strCells(1)
Row.StudentID = strCells(2)
Row.StudentMother = strCells(strCells.Length - 1)
Row.StudentGender = strCells(strCells.Length - 2)
Dim strNotes as String = String.Empty
For int I = 3 To strCells.Length - 3
strNotes &= strCells(I)
Next
Row.StudentNotes = strNotes
it worked fine for me
If import CSV file is not a routine
Import CSV file in Excel
Search error rows with Excel rows filter and rewrite them
Save Excel file in TXT Tab delimited
Import TXT file with SSIS
Else make a script that search comma in the Student Notes column range

BigQuery Data.ErrorProto.Reason "stopped"

I'm inserting data with insertAll() but DataInsertAllRespone.InsertErrors returns the same error of each row I have inserted.
The errors only give me the field
**Data.ErrorProto.Reason** which contains: **"stopped"**.
This is the method that call insertAll():
public bool InsertAll(BigqueryService s, String datasetId, String tableId, List<TableDataInsertAllRequest.RowsData> data)
{
TabledataResource t = s.Tabledata;
TableDataInsertAllRequest req = new TableDataInsertAllRequest()
{
Kind = "bigquery#tableDataInsertAllRequest",
Rows = data /*Posar aquĆ­ les files per pujar al BigQuery*/
};
TableDataInsertAllResponse response = t.InsertAll(req, projectId, datasetId, tableId).Execute();
if (response.InsertErrors != null) return true;
return false;
}
What happens? Why can't upload data?
*EDIT: * I realize that if i upload less than 6 rows works correctly, but the row size is about 1,6 Kb and the maximum row size is 20Kb.
Thanks,
Roger
Well, a few days ago I found the solution. When you streaming data into BigQuery using insertAll() method, you can stream multiple rows at once. If one of these rows is wrong Data.ErrorProto.Reason contains message for this error, for example, "Can't convert value to string." and the other rows contain "stopped" in Data.ErrorProto.Reason.
If you ever see this error, probably you have inconsistencies in the rows format

How to get the last input ID in a textfile?

Can someone help me in my problem?
Because I'm having a hard time of on how to get the last input ID in a text file. My back end is a text file.
Thanks.
this is the sample content of the text file of my program.
ID|CODE1|CODE2|EXPLAIN|CODE3|DATE|PRICE1|PRICE2|PRICE3|
02|JKDHG|hkjd|Hfdkhgfdkjgh|264|56.46.54|654 654.87|878 643.51|567 468.46|
03|DEJSL|hdsk|Djfglkdfjhdlf|616|46.54.56|654 654.65|465 465.46|546 546.54|
01|JANE|jane|Jane|251|56.46.54|534 654.65|654 642.54|543 468.74|
how would I get the last input id so that the id of the input line wouldn't back to number 1?
Make a function that read file and return list of lines(string) like this:
public static List<string> ReadTextFileReturnListOfLines(string strPath)
{
List<string> MyLineList = new List<string>();
try
{
// Create an instance of StreamReader to read from a file.
StreamReader sr = new StreamReader(strPath);
string line = null;
// Read and display the lines from the file until the end
// of the file is reached.
do
{
line = sr.ReadLine();
if (line != null)
{
MyLineList.Add(line);
}
} while (!(line == null));
sr.Close();
return MyLineList;
}
catch (Exception E)
{
throw E;
}
}
I am not sure if
ID|CODE1|CODE2|EXPLAIN|CODE3|DATE|PRICE1|PRICE2|PRICE3|
is part of the file but you have to adjust the index of the element you want to get
, then get the element in the list.
MyStringList(1).split("|")(0);
If you're looking for the last (highest) number in the ID field, you could do it with a single line in LINQ:
Dim MaxID = (From line in File.ReadAllLines("file.txt")
Skip 1
Select line.Split("|")(0)).Max()
What this code does is gets an array via File.ReadAllLines, skips the first line (which appears to be a header), splits each line on the delimiter (|), takes the first element from that split (which is ID) and selects the maximum value.
In the case of your sample input, the result is "03".