I'm new to visual basic and have been tasked with creating an app that will read in various text files(.csv, .txt) and compare some of the data contained within.
I thought I would read in the files and convert them to datatables. Once I had them in a datatable I figured I could remove the unnecessary rows/columns and then sort and compare the pertinent info for differences. The difficulty is that the various files are formatted differently so I will need to get each type formatted correctly. Is this the best approach?
If so, I need help with the datatables. I have read in a .csv, parsed it into a datatable, but I'm having trouble with the logic/coding to get rid of the rows and columns that I don't need. Also, I'm not sure how to handle a row that has a cell with a comma seperated list of values that will need to be split into individual rows.
Thank you.
There are lots of ways to accomplish this. One way:
Read each file, then convert it to a string array in a common format, such as csv. For each file, you can handle the issues of field location, field format, and multiple rows, and convert it to the common format. After you have the files in a consistent format, you can move them to a data table for sorting, comparison, etc.
Related
I have a collection of excel spreadsheets that are formatted... less than ideally.
I'm testing out some solutions involving SQLBulkCopy and OleDB, but I'm a bit concerned about how to handle the format of this sheet.
I was considering writing a custom Insert statement, but would like to see if there may be some easier way to implement a heuristic.
Below is a sample of the data I will be parsing:
The highlighted columns are the ones I'll be loading into the two tables. One table will hold order #s, and the other table will hold all the lines below that order number.
Any suggestions on tackling this would be lovely. The excel sheets are hand entered, so some weird cases exist (one order number with multiple carriers, which imposes the question of whether I should treat the first row with the order number as a line in the database structure I designed.
I'm implementing this importer within VB.net, to my dismay, to avoid being looked at funny by my coworkers :).
One approach would be to save the worksheet to a text file (e.g., CSV) and then use AWK to split it at the empty row. Some examples are in this SO answer: Bash how to split file on empty line with awk
You could then import the CSV files directly into the database.
Amusingly , if I wrote anything in VB.NET I'd definitely get looked at funny by my coworkers
So I'd use a library called EPPlus to read the excel and not have to worry about converting it. How you do the blank line detection is an open question- checking that the Value of ten cells on the row is Nothing or Empty would suffice. Then take the next row as your parent, and proceed with subsequent rows as children until the next blank
Take a look at this answer for more info on how to detect blank rows in Excel- if you get stuck turning any of the c# into vb shoot us a question. Online converters exist because the two languages are the same thing under the hood
This is perhaps one of those many times discussed questions with solutions more specific to actual system that outputs the data into a CSV file.
Is there a simple way to export data like 3332401187555, 9992401187000 into a CSV file in a way that later when opened in Excel, the columns won't show them in "scientific" format? Should this be important, the data is retrieved directly by an SQL SELECT statement from any DBMS.
This also means that I've tried solutions like surrounding the values with apostrophes '3332401187555' and the Excel cell recognizes those as text and doesn't do any conversions/masking. Was wondering if there was a more elegant way without actually it being a pre-set Excel template with text data fields.
1. Try exporting the numbers prefixed with single quote. Example: '3332401187555.
2. In excel, select the column containing number values
and then select Number in Format Cells.
Just have to save your file with Excel the option CSV file. And you have the in file in requested format.
I have a CSV file having more than 700 columns. I just want 175 columns from them to be inserted into a RDBMS table or a flat file usingPentaho (PDI). Now, the source CSV file has variable columns i.e. the columns can keep adding or deleting but have some specific keywords that remain constant throughout. I have the list of keywords which are present in column names that have to excluded, e.g. starts_with("avgbal_"), starts_with("emi_"), starts_with("delinq_prin_"), starts_with("total_utilization_"), starts_with("min_overdue_"), starts_with("payment_received_")
Any column which have the above keywords have to be excluded and should not pass onto my RDBMS table or a flat file. Is there any way to remove the above columns by writing some SQL query in PDI? Selecting specific 175 columns is not possible as they are variable in nature.
I think your example is fit to use meta data injection you can refer to example shared below
https://help.pentaho.com/Documentation/7.1/0L0/0Y0/0K0/ETL_Metadata_Injection
two things you need to be careful
maintain list of columns you need to push in.
since you have changing column names so you may face issue with valid columns as well which you want to import or work with. in order to do so make sure you generate the meta data file every time so you are sure about the column names you want to push out from the flat file.
I want to extract the table data from pdf to excel/csv. How can I do this using Automation Anywhere?
Please find below the sample table from pdf document.
There are multiple ways to extract data from PDFs.
You can extract raw data, formatted data, or create form fields if the layout is consistent.
If the layout is more random, you might want to take a look at IQ Bot, where there are predefined classifications for things like Orders etc.
I would err on using form fields if you have unusual fonts like " for inches character if you have a standard format, since the encoding doesn't map well with the raw/formatted option.
The raw format has some quirks where you don't always get all the characters you expect, such as missing first letter of a data item for raw.
The formatted option is good at capturing tabular columns as they go across the line.
I am doing a project that requires me to transfer data from one notepad to another notepad (saved using excel tab delimited form).
I have successfully done that, the only that left is I need to sort those data after transferring it.
For your information, I am transferring 5 column from the first notepad to the second notepad. I saved those information in five arrays.
How am I supposed to sort them after pasting?
I tried using vb.net sort function but that only will sort one array while the rest of the arrays wont follow.
I tried lines.sort also but the result is not satisfying, any other idea to sort those data like what we normally do manually in excel?
Any help will be very much appreciated.
One solution would be to create an object with 5 values in it. Then you would create an list of those objects (that way the values are all linked).
Then you would just do:
OBJECT.Sort(Function(x, y) x.valueToSortBy.CompareTo(y.valueToSortBy))
This would give you a list of your objects sorted by the value you wanted.