Can someone help me in my problem?
Because I'm having a hard time of on how to get the last input ID in a text file. My back end is a text file.
Thanks.
this is the sample content of the text file of my program.
ID|CODE1|CODE2|EXPLAIN|CODE3|DATE|PRICE1|PRICE2|PRICE3|
02|JKDHG|hkjd|Hfdkhgfdkjgh|264|56.46.54|654 654.87|878 643.51|567 468.46|
03|DEJSL|hdsk|Djfglkdfjhdlf|616|46.54.56|654 654.65|465 465.46|546 546.54|
01|JANE|jane|Jane|251|56.46.54|534 654.65|654 642.54|543 468.74|
how would I get the last input id so that the id of the input line wouldn't back to number 1?
Make a function that read file and return list of lines(string) like this:
public static List<string> ReadTextFileReturnListOfLines(string strPath)
{
List<string> MyLineList = new List<string>();
try
{
// Create an instance of StreamReader to read from a file.
StreamReader sr = new StreamReader(strPath);
string line = null;
// Read and display the lines from the file until the end
// of the file is reached.
do
{
line = sr.ReadLine();
if (line != null)
{
MyLineList.Add(line);
}
} while (!(line == null));
sr.Close();
return MyLineList;
}
catch (Exception E)
{
throw E;
}
}
I am not sure if
ID|CODE1|CODE2|EXPLAIN|CODE3|DATE|PRICE1|PRICE2|PRICE3|
is part of the file but you have to adjust the index of the element you want to get
, then get the element in the list.
MyStringList(1).split("|")(0);
If you're looking for the last (highest) number in the ID field, you could do it with a single line in LINQ:
Dim MaxID = (From line in File.ReadAllLines("file.txt")
Skip 1
Select line.Split("|")(0)).Max()
What this code does is gets an array via File.ReadAllLines, skips the first line (which appears to be a header), splits each line on the delimiter (|), takes the first element from that split (which is ID) and selects the maximum value.
In the case of your sample input, the result is "03".
Related
I have following Code where i will receive list of names as parameter.In the loop, first i'm assigning index 0 value from list to local variable name. There after comparing next values from list with name. If we receive any non-equal value from list, i'm assigning value of result as 1 and failing the test case.
Below is the Array list
List<String> names= new ArrayList<String>();
names.add("John");
names.add("Mark");
Below is my selenium test method
public void test(List<String> names)
String name=null;
int a=0;
for(String value:names){
if(name==null){
System.out.println("Value is null");
name=value;
}
else if(name.equals(value)){
System.out.println("Received Same name");
name=value;
}
else{
a=1;
Assert.fail("Received different name in between");
}
}
How can i convert above code into lambda expressions?. I'm using cucumber data model, hence i receive data as list from feature file. Since i can't give clear explanation, just posted the example logic i need to convert to lambda expression.
Here's the solution: it cycles all element in your list checking if are all the same.
You can try adding or editing the list so you can have different outputs. I've written the logic, you can easly put it into a JUnit test
List<String> names= new ArrayList<>();
names.add("John");
names.add("Mark");
String firstEntry = names.get(0);
boolean allMatch = names.stream().allMatch(name -> firstEntry.equals(name));
System.out.println("All names are the same: "+allMatch);
Are you looking for duplicates, whenever you have distinct value , set a=1 and say assert to fail. You can achieve this by :
List<String> names= new ArrayList<String>();
names.add("John");
names.add("Mark");
if (names.stream().distinct().limit(2).count() > 1) {
a= 1,
Assert.fail("Received different name in between");
} else {
System.out.println("Received Same name");
}
I have custom extractor, and I'm trying to log some messages from it.
I've tried obvious things like Console.WriteLine, but cannot find where output is. However, I found some system logs in adl://<my_DLS>.azuredatalakestore.net/system/jobservice/jobs/Usql/.../<my_job_id>/.
How can I log something? Is it possible to specify log file somewhere on Data Lake Store or Blob Storage Account?
A recent release of U-SQL has added diagnostic logging for UDOs. See the release notes here.
// Enable the diagnostics preview feature
SET ##FeaturePreviews = "DIAGNOSTICS:ON";
// Extract as one column
#input =
EXTRACT col string
FROM "/input/input42.txt"
USING new Utilities.MyExtractor();
#output =
SELECT *
FROM #input;
// Output the file
OUTPUT #output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);
This was my diagnostic line from the UDO:
Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));
This is the whole UDO:
using System.Collections.Generic;
using System.IO;
using System.Text;
using Microsoft.Analytics.Interfaces;
namespace Utilities
{
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
public class MyExtractor : IExtractor
{
//Contains the row
private readonly Encoding _encoding;
private readonly byte[] _row_delim;
private readonly char _col_delim;
public MyExtractor()
{
_encoding = Encoding.UTF8;
_row_delim = _encoding.GetBytes("\n\n");
_col_delim = '|';
}
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
string s = string.Empty;
string x = string.Empty;
int i = 0;
foreach (var current in input.Split(_row_delim))
{
using (System.IO.StreamReader streamReader = new StreamReader(current, this._encoding))
{
while ((s = streamReader.ReadLine()) != null)
{
//Strip any line feeds
//s = s.Replace("/n", "");
// Concatenate the lines
x += s;
i += 1;
}
Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));
//Create the output
output.Set<string>(0, x);
yield return output.AsReadOnly();
// Reset
x = string.Empty;
}
}
}
}
}
And these were my results found in the following directory:
/system/jobservice/jobs/Usql/2017/10/20.../diagnosticstreams
good question. I have been asking myself the same thing. This is theoretical, but I think it would work (I'll updated if I find differently).
One very hacky way is that you could insert rows into a table with your log messages as a string column. Then you can select those out and filter based on some log_producer_id column. You also get the benefit of logging if part of the script works, but later parts do not assuming the failure does not roll back. Table can be dumped at end as well to file.
For the error cases, you can use the Job Manager in ADLA to open the job graph and then view the job output. The errors often have detailed information for data-related errors (e.g. row number in file with error and a octal/hex/ascii dump of the row with issue marked with ###).
Hope this helps,
J
ps. This isn't a comment or an answer really, since I don't have working code. Please provide feedback if the above ideas are wrong.
I receive Text File that I have to Import to a SQL Table, I have to come with a SSIS because I will received the Flat File every Day , with the First Row as the Customer_ID, then come the invoice details and then the Total of the invoice.
Example :
30303
0000109291700080190432737000005Name of the product
0000210291700080190432737000010Name of the product
0000309291700080190432737000000Name of the product
003 000145
So let me Explain:
First 30303 is the Customer #
Other Rows Invoice Details
00001-> ROWID 092917-> DATE 000801904327->PROD 370->Trans 00010 -> AMOUNT
Name of the product
Last Row
003==>Total rows 000145==>Total of Invoice
Any Clue ?
I would use a Script Component as a source in a Data Flow Task. You can then use C# or VB.net to read the file, e.g., by using System.IO.StreamReader, in any way you wish. You can read a line at a time, store values in variables to write to every row (e.g., the customer number), etc. It's extremely flexible for complex files.
Here is an example script (C#) based on your data:
public override void CreateNewOutputRows()
{
System.IO.StreamReader reader = null;
try
{
bool line1Read = false;
int customerNumber = 0;
reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (!line1Read)
{
customerNumber = Convert.ToInt32(line);
line1Read = true;
}
else if (!reader.EndOfStream)
{
Output0Buffer.AddRow();
Output0Buffer.CustomerNumber = customerNumber;
Output0Buffer.RowID = Convert.ToInt32(line.Substring(0, 5));
Output0Buffer.Date = DateTime.ParseExact(line.Substring(5, 6), "MMddyy", System.Globalization.CultureInfo.CurrentCulture);
Output0Buffer.Prod = line.Substring(11, 12);
Output0Buffer.Trans = Convert.ToInt32(line.Substring(23, 3));
Output0Buffer.Amount = Convert.ToInt32(line.Substring(26, 5));
Output0Buffer.ProductName = line.Substring(31);
}
}
}
catch
{
if (reader != null)
{
reader.Close();
reader.Dispose();
}
throw;
}
}
The columns in 'Output 0' of the Script Component are configured as follows:
Name DataType Length
==== ======== ======
CustomerNumber four-byte signed integer [DT_I4]
RowID four-byte signed integer [DT_I4]
Date database date [DT_DBDATE]
Prod string [DT_STR] 12
Trans four-byte signed integer [DT_I4]
Amount four-byte signed integer [DT_I4]
ProductName string [DT_STR] 255
To implement this:
Create a string variable called 'FilePath' with your file path in it for the script to reference.
Create a Data Flow Task.
Add a Script Component to the Data Flow Task - you'll be asked what type it should be, select 'Source'.
Right-click the Script Component, click 'Edit'.
On the 'Script' pane, add the 'FilePath' variable to the 'ReadOnlyVariables' section.
On the 'Inputs and Outputs' pane, expand 'Output 0' and add columns to the 'Output Columns' section as per the above table.
On the 'Script' pane, click 'Edit Script', and then paste my code over the public override void CreateNewOutputRows() method (replacing it).
Your Script Component source is now configured, and you'll be able to use it like any other data source component. To write this data to a SQL Server table, add an OLEDB Destination to the Data Flow Task, and link the Script Component to that, configuring the columns appropriately.
I have built an SSIS package that loads in several delimited text files into a SQL database. One of the files often contains line spaces in it, which breaks the standard data flow task of setting a flat file source and mapping to an ado.net destination since it thinks it is on a new line when it reaches a line break. The vendor sending over the files does not want to sent the file without any edits and can't do XML at this time. Is there any way to fix this? I was thinking of writing a small vb.net program that would correct the files so they would work in the SSIS package, but not sure how to write that logic. The file has 5 columns, the first 2 are big integer and always contain some long integer ID, then there is a small text column that just contains one short word, then a date, and then a long comments field that is causing the problem. The comments field is sometimes blank (which is ok), the problem are the rows that have line breaks. I never know how many line breaks are in the comments, some have none, some can have several, even multiple line breaks in a row, so was wondering if this is even possible.
5787626|6547599|Approved|1/10/2017|Applicant request for fee waiver approved
5443221|7742812|Active|11/5/2013|
3430962|7643957|Re-Scheduled|5/25/2016|REVISED TERMS AND CONDITIONS REJECTED
Applicant has 30 DAYS To submit paperwork for extension.
34433624|7673715|Denied|1/24/2017|
34113575|7653748|Active|1/8/2014|New terms have been granted.
Sample File Format.
As long as there is logic that you can program/predict, it will be possible.
I would do it using a Script Component as a source, which means you don't need to rewrite the file before processing it. It also provides a lot of flexibility, e.g., you can store values in variables while iterating over multiple lines in the file, etc.
I posted another answer recently that gives a lot of detail on how to go about this: SSIS import a Flat File to SQL with the first row as header and last row as a total.
An example of holding the values in variables until the row is ready to be written:-
For this example I am writing three columns, ID1, ID2 and Comments. The file looks like this:
1|2|Comment1
Comment2
4|5|Comment3
Comment4
Comment5
6|7|Comment6
The Script Component contains the following method.
public override void CreateNewOutputRows()
{
System.IO.StreamReader reader = null;
try
{
bool readFirstLine = false;
int id1 = 0;
int id2 = 0;
string comments = null;
reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (line.Contains("|"))
{
if (readFirstLine)
{
Output0Buffer.AddRow();
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
}
else
{
readFirstLine = true;
}
string[] fields = line.Split('|');
id1 = Convert.ToInt32(fields[0]);
id2 = Convert.ToInt32(fields[1]);
comments = fields[2];
}
else
{
comments += " " + line;
}
if (reader.EndOfStream)
{
Output0Buffer.AddRow();
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
}
}
}
catch
{
if (reader != null)
{
reader.Close();
reader.Dispose();
}
throw;
}
}
The result set is:
ID1 ID2 Comments
=== === ========
1 2 Comment1 Comment2
4 5 Comment3 Comment4 Comment5
6 7 Comment6
I have a csv file generated with headings on the first row and data on the rest. The file varies each time and I have to have all these values for further usage. I'm using File.ReadAllLines(path) but could ignore the header row. How to accomplish this? Please help.
you should just start from the second line (index 1 of returned string[])
EDIT This is better:
File.ReadAllLines(#"c:\test.txt").Skip(1); // this will return an IEnumerable<string>
File.ReadAllLines(#"c:\test.txt").Skip(1).ToArray(); // This will return an array of string (string[])
OLD
bool first = true;
StringBuilder sb = new StringBuilder();
File.ReadLines(#"c:\test.txt").ToList().ForEach(c =>
{
if (first) first = false;
else sb.Append(c);
}
);
string res = sb.ToString();
This will essentually skip the first line, Don't know if there is a better way to do it