SSIS Import data that is NOT columnar into SQL - sql

I am fairly new to SSIS and need a little help getting started. I have several reports that come out of our mainframe. The reports are not in a columnar format. The date record is at the top then there might be some initial data then there might be a little more. So I need to read in each line look to see what the text reads and figure out if I need the data or move to the next row.
This is a VERY rough example of what the report I want to import into a SQL table.
----TOTAL---- ----TOTAL---- ----OTHER---- ----INSURANCE---- ----INSURANCE2----
CURR 2,077
IP 0.0000 3 2,345 0.00
OP 0.0000 2 1,231 0.00
IP 0.0000
OP 0.0000
etc . . . .. .
After the SERVICE CODE the data will start to repeat like it is above. This is the basic idea of a report.
I want to get the Date then the Service Code, Description, Current IP Volume, Current IP Dollar, Current OP Volume, Current OP Dollar, YTD IP Volume, YTD IP Dollar, YTD OP Volume, YTD OP Dollar . . then repeat.
Just to clarify, I am not asking anyone to do this for me. I want to learn how to do this. I have looked on how to do this but every example I have looked at talks about doing this with a CSV, tab, or Excel file. i do not have that type of file so I was asking what I need to look at. I currently use Monarch to format the file, but again I want to learn more about SSIS and this is a perfect way to learn. Asking the vendor to redo the report is not an option plus I want to learn how to do this. Thank you I just wanted to get that out there.
Any help would be greatly appreciated.

As stated in comments, you could do this using a script task. The basics steps are:
Define a DataTable to store your data.
Use a StreamReader to read your report.
Process this using a combination of conditionals, String Methods, and parsing to extract the relevant fields from the relevant line:
Write the DataTable to the database using SqlBulkCopy
The following would go inside your Main method in your script task:
//Define a table to store your data
var table = new DataTable
Columns =
{ "ServiceCode", typeof(string) },
{ "Description", typeof(string) },
{ "CurrentIPVolume", typeof(int) },
{ "CurrentIPDollar,", typeof(decimal) },
{ "CurrentOPVolume", typeof(int) },
{ "CurrentOPDollar", typeof(decimal) },
{ "YTDIPVolume", typeof(int) },
{ "YTDIPDollar,", typeof(decimal) },
{ "YTDOPVolume", typeof(int) },
{ "YTDOPDollar", typeof(decimal) }
var filePath = #"Your File Path";
using (var reader = new StreamReader(filePath))
string line = null;
DataRow row = null;
// As YTD and Curr are identical, we will need a flag later to mark our position within the record
bool ytdFlag= false;
//Loop through every line in the file
while ((line = reader.ReadLine()) != null)
//if the line is blank, move on to the next
if (string.IsNullOrWhiteSpace(line)
// If the line starts with service code, then it marks the start of a new record
if (line.StartsWith("SERVICE CODE"))
//If the current value for row is not null then this is
//not the first record, so we need to add the previous
//record to the tale before continuing
if (row != null)
ytdFlag= false; // New record, reset YTD flag
row = table.NewRow();
//Split the line now based on known values:
var tokens = line.Split(new string[] { "SERVICE CODE - ", "DESCRIPTION: "}, StringSplitOptions.None);
row[0] = tokens[0];
row[1] = tokens[1];
if (line.StartsWith("CURR"))
//Process the row --> "CURR 2,077"
//Not sure what 2,077 is, but this will parse it
int i = 0;
if (int.TryParse(line.Substring(4).Trim().Replace(",", ""), out i))
//Do something with your int
if (line.StartsWith(" IP"))
//Start at after IP then split the line into the 4 numbers
var tokens = line.Substring(3).Split(new [] { " "}, StringSplitOptions.RemoveEmptyEntries);
//If we have gone past the CURR record, then at to YTD Columns
if (ytdFlag)
row[6] = int.Parse(tokens[1]);
row[7] = decimal.Parse(tokens[1]);
//Otherwise we are still in the CURR section:
row[2] = int.Parse(tokens[1]);
row[3] = decimal.Parse(tokens[1]);
if (line.StartsWith(" OP"))
//Start at after OP then split the line into the 4 numbers
var tokens = line.Substring(3).Split(new [] { " "}, StringSplitOptions.RemoveEmptyEntries);
//If we have gone past the CURR record, then at to YTD Columns
if (ytdFlag)
row[8] = int.Parse(tokens[1]);
row[9] = decimal.Parse(tokens[1]);
//Otherwise we are still in the CURR section:
row[4] = int.Parse(tokens[1]);
row[5] = decimal.Parse(tokens[1]);
//After we have processed an OP record, we must set the YTD Flag to true.
//Doesn't matter if it is the YTD OP record, since the flag will be reset
//By the next line that starts with SERVICE CODE anyway
ytdFlag= true;
//Now that we have processed the file, we can write the data to a database
using (var sqlBulkCopy = new SqlBulkCopy("Your Connection String"))
sqlBulkCopy.DestinationTableName = "dbo.YourTable";
//If necessary add column mappings, but if your DataTable matches your database table
//then this is not required
This is a very quick example, far from the finished article, and I have done little or no testing, but it should give you the gist of how it could be done, and get you started on one possible solution.
It can definitely be cleaned up and refactored, but I have tried to make it as clear as possible what is going on, rather than trying to write the most efficient code ever. It should also (hopefully) demonstrate what a monumental pain this is to do, and very minor report changes things like an extra space be "OP" will break the whole thing.
So again, I would re-iterate, if you can get the data in a standard flat file format, with one line per record, you should. I do however appreciate that sometimes these things are out of your control, and I have had to write incredibly ugly import routines like this in the past, so I feel your pain if you can't get the data in a consumable format.


columnSummary is not added

I am trying to add columnSummary to my table using Handsontable. But it seems that the function does not fire. The stretchH value gets set and is set properly. But it does not react to the columnSummary option:
this.${stretchH: 'all',columnSummary: [
destinationRow: 0,
destinationColumn: 2,
reversedRowCoords: true,
type: 'custom',
customFunction: function(endpoint) {
}, false);
I have also tried with type:'sum' without any luck.
Thanks for all help and guidance!
columnSummary cannot be changed with updateSettings: GH #3597
You can set columnSummary settings at the initialization of Handsontable.
One workaround would be to somehow manage your own column summary, since Handsontable one could give you some headeache. So you may try to add one additional row to put your arithmetic in, but it is messy (it needs fixed rows number and does not work with filtering and sorting operations. Still, it could work well under some circumstances.
In my humble opinion though, a summary column has to be fully functionnal. We then need to set our summary row out of the table data. What comes to mind is to take the above mentioned additional row and take it away from the table data "area" but it would force us to make that out of the table row always looks like it still was in the table.
So I thought that instead of having a new line we could just have to add our column summary within column header:
Here is a working JSFiddle example.
Once the Handsontable table is rendered, we need to iterate through the columns and set our column summary right in the table cell HTML content:
for(var i=0;i<tableConfig.columns.length;i++) {
var columnHeader = document.querySelectorAll('.ht_clone_top th')[i];
if(columnHeader) { // Just to be sure column header exists
var summaryColumnHeader = document.createElement('div');
summaryColumnHeader.className = 'custom-column-summary';
columnHeader.appendChild( summaryColumnHeader );
Now that our placeholders are set, we have to update them with some arithmetic results:
var printedData = hotInstance.getData();
for(var i=0;i<tableConfig.columns.length;i++) {
var summaryColumnHeader = document.querySelectorAll('.ht_clone_top th')[i].querySelector('.custom-column-summary'); // Get back our column summary for each column
if(summaryColumnHeader) {
var res = 0;
printedData.forEach(function(row) { res += row[i] }); // Count all data that are stored under that column
summaryColumnHeader.innerText = '= '+ res;
This piece of code function may be called anytime it should be:
var hotInstance = new Handsontable(/* ... */);
setMySummaryHeaderCalc(); // When Handsontable table is printed
Handsontable.hooks.add('afterFilter', function(conditionsStack) { // When Handsontable table is filtered
}, hotInstance);
Feel free to comment, I could improve my answer.

How can I load in a pipe (|) delimited text file that has columns that sometimes contain line breaks?

I have built an SSIS package that loads in several delimited text files into a SQL database. One of the files often contains line spaces in it, which breaks the standard data flow task of setting a flat file source and mapping to an destination since it thinks it is on a new line when it reaches a line break. The vendor sending over the files does not want to sent the file without any edits and can't do XML at this time. Is there any way to fix this? I was thinking of writing a small program that would correct the files so they would work in the SSIS package, but not sure how to write that logic. The file has 5 columns, the first 2 are big integer and always contain some long integer ID, then there is a small text column that just contains one short word, then a date, and then a long comments field that is causing the problem. The comments field is sometimes blank (which is ok), the problem are the rows that have line breaks. I never know how many line breaks are in the comments, some have none, some can have several, even multiple line breaks in a row, so was wondering if this is even possible.
5787626|6547599|Approved|1/10/2017|Applicant request for fee waiver approved
3430962|7643957|Re-Scheduled|5/25/2016|REVISED TERMS AND CONDITIONS REJECTED
Applicant has 30 DAYS To submit paperwork for extension.
34113575|7653748|Active|1/8/2014|New terms have been granted.
Sample File Format.
As long as there is logic that you can program/predict, it will be possible.
I would do it using a Script Component as a source, which means you don't need to rewrite the file before processing it. It also provides a lot of flexibility, e.g., you can store values in variables while iterating over multiple lines in the file, etc.
I posted another answer recently that gives a lot of detail on how to go about this: SSIS import a Flat File to SQL with the first row as header and last row as a total.
An example of holding the values in variables until the row is ready to be written:-
For this example I am writing three columns, ID1, ID2 and Comments. The file looks like this:
The Script Component contains the following method.
public override void CreateNewOutputRows()
System.IO.StreamReader reader = null;
bool readFirstLine = false;
int id1 = 0;
int id2 = 0;
string comments = null;
reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path
while (!reader.EndOfStream)
string line = reader.ReadLine();
if (line.Contains("|"))
if (readFirstLine)
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
readFirstLine = true;
string[] fields = line.Split('|');
id1 = Convert.ToInt32(fields[0]);
id2 = Convert.ToInt32(fields[1]);
comments = fields[2];
comments += " " + line;
if (reader.EndOfStream)
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
if (reader != null)
The result set is:
ID1 ID2 Comments
=== === ========
1 2 Comment1 Comment2
4 5 Comment3 Comment4 Comment5
6 7 Comment6

Installable Trigger Failing with Test Add-On

I have been wrestling with an installable trigger issue for a couple of days now. All of my research indicates that an add-on should allow for an installable onEdit() trigger within a spreadsheet, but my attempts keep erroring out. I have simplified my project code a bit to exemplify my issue.
The error message:
Execution failed: Test add-on attempted to perform an action that is not allowed.
My code (listing functions is the order that they are called):
function onOpen() //creates custom menu for the evaluation tool ***FOR ADMININSTRATORS ONLY***
var ui = SpreadsheetApp.getUi();
ui.createMenu('Evaluation Menu') // Menu Title
.addItem('Create Installable OnEdit Trigger', 'createInstallableOnEditTrigger')
ui.createMenu('Evaluation Menu') // Menu Title
.addSubMenu(ui.createMenu('Manage Observations & Evidence')
.addSubMenu(ui.createMenu('Create New Observation')
.addItem('Formal', 'createNewFormalObservation')
.addItem('Informal', 'createNewInformalObservation')
function createInstallableOnEditTrigger() { // installable trigger to create employee look-up listener when user edits the EIN fields on the Documentation Sheet.
var ss = SpreadsheetApp.getActive();
function onEditListener(event) //this function conitnually listens to all edit, but only engages only certain conditions such as when a timestamp is determined to be needed or the Documentation Sheet needs to be auto-populated
//Determine whether or not the conditions are correct for continuing this function
var sheetName = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet().getName(); //determines the name of the currently active sheet
if (sheetName.indexOf("Evidence") > -1) // if the active sheet is an evidence collection sheet, a timestamp may be needed
populateEvidenceTimeStamp(event, sheetName);
else if (sheetName == "Documentation Sheet") //if the active sheet is the "Documentation Sheet" than auto-population and EIN lookups may be needed
employeeLookup(event, sheetName);
What am I missing? Any help is greatly appreciated!!
The below code has been added as requested by #Mogsdad.
populateEvidenceTimeStamp() is dependent upon generateTimeStamp() which is also included below:
function populateEvidenceTimeStamp(event, sheetName)
var evidenceColumnName = "Evidence";
var timeStampColumnName = "Timestamp";
var sheet = event.source.getSheetByName(sheetName);
var actRng = event.source.getActiveRange();
var indexOfColumnBeingEdited = actRng.getColumn();
var indexOfRowBeingEdited = actRng.getRowIndex();
var columnHeadersArr = sheet.getRange(3, 1, 1, sheet.getLastColumn()).getValues(); // grabs the column headers found in the 3rd row of the evidence sheet
var timeStampColumnIndex = columnHeadersArr[0].indexOf(timeStampColumnName); //determines the index of the Timestamp column based on its title
var evidenceColumnIndex = columnHeadersArr[0].indexOf(evidenceColumnName); evidenceColumnIndex = evidenceColumnIndex+1; //determines the index of the evidence column based on its title
var cell = sheet.getRange(indexOfRowBeingEdited, timeStampColumnIndex + 1); //determines the individual timestap cell that will be updated
if (timeStampColumnIndex > -1 && indexOfRowBeingEdited > 3 && indexOfColumnBeingEdited == evidenceColumnIndex && cell.getValue() == "") // only create a timestamp if 1) the timeStampColumn exists, 2) you are not actually editing the row containing the column headers and 3) there isn't already a timestamp in the Timestamp column for that row
function generateTimeStamp()
var timezone = "GMT-7"; // Arizona's time zone
var timestamp_format = "MM.dd.yyyy hh:mm:ss a"; // timestamp format based on the Java SE SimpleDateFormat class.
var currTimeStamp = Utilities.formatDate(new Date(), timezone, timestamp_format);
return currTimeStamp;
Below is the employeeLookup() function which is dependent upon lookupEIN()
function employeeLookup(event, sheetName)
if(sheetName == "Documentation Sheet" && !PropertiesService.getDocumentProperties().getProperty('initialized')) // if the activeSheet is "Documentation Sheet" and the sheet has not yet been initialized
var actRng = event.source.getActiveRange();
Logger.log("employeeLookup(): actRng: "+actRng.getRow()+" , "+actRng.getColumn());
if(actRng.getRow() == 4 && actRng.getColumn() == 9 && event.source.getActiveRange().getValue() != "") //if the "Teacher EIN" cell is the active range and it's not empty
var ein = actRng.getValue();
clearDocumentationSheetTeacherProfile(); //first clear the teacher profile information to avoid the possibility of EIN/Teacher Info mismatch if previous search did not yield results
var teacherDataArr = lookupEIN(ein, "Teachers");
//write retrieved teacher data to Documentation Spreadsheet
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Documentation Sheet");
sheet.getRange(5, 9, 1, 1).setValue(teacherDataArr[1]); // Teacher First Name
sheet.getRange(6, 9, 1, 1).setValue(teacherDataArr[2]); // Teacher Last Name
sheet.getRange(7, 9, 1, 1).setValue(teacherDataArr[3]); // Teacher Email
sheet.getRange(11, 9, 1, 1).setValue(teacherDataArr[4]); // School Name
sheet.getRange(11, 39, 1, 1).setValue(teacherDataArr[5]); // Site Code
sheet.getRange(10, 30, 1, 1).setValue(calculateSchoolYear()); //School Year
Logger.log("employeeLookup(): type:Teachers 'died. lookupEIN() did not return a valid array'"); //alert message already sent by lookUpEIN
else if (actRng.getRow() == 4 && actRng.getColumn() == 30 && actRng.getValue() != "" && !PropertiesService.getDocumentProperties().getProperty('initialized')) //if the "Observer EIN" cell is the active range
Logger.log("employeeLookup(): 'active range is Observer EIN'");
var ein = actRng.getValue();
clearDocumentationSheetObserverProfile(); //first clear the teacher profile information to avoid the possibility of EIN/Observer Info mismatch if previous search did not yield results
var observerDataArr = lookupEIN(ein, "Observers");
//write retrieved observer data to Documentation Spreadsheet
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Documentation Sheet");
sheet.getRange(5, 30, 1, 1).setValue(observerDataArr[1]); // Observer First Name
sheet.getRange(6, 30, 1, 1).setValue(observerDataArr[2]); // Observer Last Name
sheet.getRange(7, 30, 1, 1).setValue(observerDataArr[3]); // Observer Email
Logger.log("employeeLookup(): type:Observers 'died. lookupEIN() did not return a valid array'"); //alert message already sent by lookUpEIN
Logger.log("employeeLookup(): 'active range is not a trigger'");
//do nothing (not the right cell)
//Observer log has already been initialized and documentation cannot be altered. notify user
Logger.log("employeeLookup(): 'log already saved.... alerting user'");
function lookupEIN(ein, type)
Logger.log ("lookUpEIN(): 'engaged'");
var ss = SpreadsheetApp.openById(teacherObserverIndex_GID);
var sheet = ss.getSheetByName(type); //lookup type aligns with the individual sheet names on the teacherObserverIndex_GID document
var values = sheet.getDataRange().getValues();
var val = sheet.getDataRange();
for (var i = 1; i < values.length; i++)
if(values[i][0] == ein)
Logger.log ("lookUpEIN(): values[i]: "+values[i]);
return values[i];
Logger.log ("lookUpEIN(): 'no match found'");
//a match could not be found
Logger.log("An EIN match could not be found"); // create a feedback pop-up
einNotFoundDialogue(type); //alert user that there is a problem with the provided ein
Triggers can't be created when running a script as Test as add-on.
From :
There are a number of things to keep in mind while testing add-ons:
Installable triggers are currently not supported when testing.
Functionality that depends on installable triggers will not be
Some possible workarounds
For on open and on edit installable triggers, temporally add simple triggers to call the functions of the installable triggers. This might only work if the execution time of is less than the simple triggers limit.
Call the functions from the installable triggers from functions that create object that emulates the corresponding event object
Instead of using a stand-alone project use bounded projects. You might use CLASP or an extension like Google Apps Script GitHub Assistant Chrome extension to make it easier to copy the code from the stand-alone project to a bounded project.
How can I test a trigger function in GAS?
In my experience onEdit() is not available for test as Add-On.
I agree the documentation is not clear, it seems to be referring to only "Installable Triggers" but I think it applies to all Triggers except for the "onInstall" trigger that is run as soon as you start the test. (see: Testing Google Sheet Addon Triggers for more details)

Datatables sorting varchar

I have a SQL query that is pulling back results ordered correctly when I try the query in MSSQL Studio.
I am using Datatables from and everything is working great apart from a sorting issue. I have some properties in the first column and I would like to order these like this:
However what comes back is something like this:
I have looked though various posts but nothing seems to work and I believe that this must be something I should trigger from the datatables plugin but cannot find anything.
Could someone advise?
Your data contain numbers and characters, so they will be sorted as string by default. You should write your own plugin for sorting your data type. Have a look at here and here
to see how to write a plugin and how to use it with your table.
Edit: got some time today to work with the datatable stuff. If you still need a solution, here you go:
//Sorting plug-in
jQuery.extend( jQuery.fn.dataTableExt.oSort, {
"numchar-pre": function(str){
var patt = /^([0-9]+)([a-zA-Z]+)$/; //match data like 1a, 2b, 1ab, 100k etc.
var matches = patt.exec($.trim(str));
var number = parseInt(matches[1]); //extract the number part
var str = matches[2].toLowerCase(); //extract the "character" part and make it case-insensitive
var dec = 0;
for (i=0; i<str.length; i++)
dec += (str.charCodeAt(i)-96)*Math.pow(26, -(i+1)); //deal with the character as a base-26 number
return number + dec; //combine the two parts
//sort ascending
"numchar-asc": function(a, b){
return a-b;
//sort descending
"numchar-desc": function(a, b){
return b-a;
//Automatic type detection plug-in
var patt = /^([0-9]+)([a-zA-Z]+)$/;
var trimmed = $.trim(sData);
if (patt.test(trimmed))
return 'numchar';
return null;
You can use the automatic type detection function to let the data type automatically detected or you can set the data type for the column
"aoColumns": [{"sType": "numchar"}]

Couchdb views and many (thousands) document types

I'm studing CouchDB and I'm picturing a worst case scenario:
for each document type I need 3 view and this application can generate 10 thousands of document types.
With "document type" I mean the structure of the document.
After insertion of a new document, couchdb make 3*10K calls to view functions searching for right document type.
Is this true?
Is there a smart solution than make a database for each doc type?
Document example (assume that none documents have the same structure, in this example data is under different keys):
Views example (in this example only one per doc type)
"sensorX": {
"map": "function(doc) { if (doc.type == 'sensorX') emit(null, doc.valueA) }"
"sensorY": {
"map": "function(doc) { if (doc.type == 'sensorY') emit(null, doc.valueB) }"
"sensorZ": {
"map": "function(doc) { if (doc.type == 'sensorZ') emit(null, doc.valueC) }"
The results of the map() function in CouchDB is cached the first time you request the view for each new document. Let me explain with a quick illustration.
You insert 100 documents to CouchDB
You request the view. Now the 100 documents have the map() function run against them and the results cached.
You request the view again. The data is read from the indexed view data, no documents have to be re-mapped.
You insert 50 more documents
You request the view. The 50 new documents are mapped and merged into the index with the old 100 documents.
You request the view again. The data is read from the indexed view data, no documents have to be re-mapped.
I hope that makes sense. If you're concerned about a big load being generated when a user requests a view and lots of new documents have been added you could look at having your import process call the view (to re-map the new documents) and have the user request for the view include stale=ok.
The CouchDB book is a really good resource for information on CouchDB.
James has a great answer.
It looks like you are asking the question "what are the values of documents of type X?"
I think you can do that with one view:
function(doc) {
// _view/sensor_value
var val_names = { "sensorX": "valueA"
, "sensorY": "valueB"
, "sensorZ": "valueC"
var value_name = val_names[doc.type];
if(value_name) {
// e.g. "sensorX" -> "123"
// or "sensorZ" -> "789"
emit(doc.type, doc.value[value_name]);
Now, to get all values for sensorY, you query /db/_design/app/_view/sensor_value with a parameter ?key="sensorX". CouchDB will show all values for sensorX, which come from the document's value.valueA field. (For sensorY, it comes from value.valueB, etc.)
If you might have new document types in the future, something more general might be better:
function(doc) {
if(doc.type && doc.value) {
emit(doc.type, doc.value);
That is very simple, and any document will work if it has a type and value field. Next, to get the valueA, valueB, etc. from the view, just do that on the client side.
If using the client is impossible, use a _list function.
function(head, req) {
// _list/sensor_val
// Updating this will *not* cause the map/reduce view to re-build.
var val_names = { "sensorX": "valueA"
, "sensorY": "valueB"
, "sensorZ": "valueC"
var row;
var doc_type, val_name, doc_val;
while(row = getRow()) {
doc_type = row.key;
val_name = val_names[doc_type];
doc_val = row.value[val_name];
send("Doc " + + " is type " + doc_type + " and value " + doc_val);
Obviously use send() to send whichever format you prefer for the client (such as JSON).