Best way to convert text file - vb.net

I am trying to make a conversion program which converts multiple text files from a cad design file into a file that the machine can read.
each file has multiple values and is laid out like this:
X -0.0001
Y 1.0500
Z 1.5700
LOCATION 0.0050
Each file stands for a location that the machine is supposed to go to and do something. The output needs to look something like this:
X-0.0001Y1.0500Z1.5700L0.0050
Other information regarding position is here also.
so it's a fairly simple conversion. But what I'm wondering is what the best way to go about it is. Do I convert each file individually then combine them? The other information has to go at the bottom of the file. So where there are more files it would go:
Location 1
Location 2
Location 1 parameters
location 2 parameters
I have tried a couple different ways and still cannot come up with the best way to do it.
essentially what I'm asking is what the best/ most efficient way would be to convert these files. Sorry if this is confusing.
Note I am using vb.net for the programming language

if this is a project of enormous scale (e.g. millions of files), you might want to look into something like map reduce.
if not (which I'm guessing), I would suggest as follows:
parse each file sequentially, adding (appending)the results to each of TWO files. Finally, combine the two files and you're done.
LOCATIONS_FILE (FILE 1)
Location 1
Location 2
(etc)
METADATA_FILE (FILE 2)
location 1 params
location 2 params
(etc)
when all files are parsed, append the contents of FILE 2 to those of FILE 1.
FINAL FILE
Location 1
Location 2
(etc)
location 1 params
location 2 params
(etc)
I don't use VB.NET. But, pseudocode would be something like:
fn parse_file(file,locations_filehandle, metadata_filehandle):
file.extract_locations() -> append(locations_filehandle)
file.extract_metadata() -> append(metadata_filehandle)
fn main():
for file in files:
parse_file(file,locations_filehandle,metadata_filehandle)
finalfile=locations_filehandle.read() + metadata_filehandle.read()
finalfile.writeToDisk()
main()

Related

PDF extract specific pages & merge with new filename

I have 2 pdf files (templates) from which I need to extract 1 page each and save as a combined pdf. Each pdf has a filename with a different 3 letter location indicator (e.g. LOC) for each of the agreement pdfs. I'm looking for a way to batch process these and save with the location indicator in the new combined filename. There are approx 500 locations.
Example files:
Agreement1_LOC.pdf - extract pg 3
Agreement2_LOC.pdf - extract pg 1
Agreement1_AAA.pdf - extract pg 3
Agreement2_AAA.pdf - extract pg 1
Save as LOC_combined.pdf (in same or new dir)
I'm looking for a way to batch process or loop through a directory. If it's easier, I have a list of all the filenames in .csv. I'm sure it could be done in python, powershell, or even batch file but I'm not very familiar with these. Trying to learn with real life example.
Using PDFtk pro, I can do it one at a time.
pdftk A=Agreement1_LOC.pdf B=Agreement2_LOC.pdf cat A3 B1 output LOC_combined.pdf
I found batch files for merging but none that save with portion of original filenames.

Not able to filter files using pathGlobFilter

We are trying to read file from directory based on pattern from azure blob srorage.We are using
pathGlobFilter option to select files. The directory contains following files
Sales_51820_14529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
We need to process only those files which does not have "T" in file name .We need to process only these two files
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
But we are not able to read only these two files.
Here is the code,
df = spark.read.format("csv").schema(structSchema).options(header=False,inferSchema=True,sep='|',pathGlobFilter= "Sales_\d{5} _ \d{8}_[a-z0-9]+.csv$").load("wasbs://abc#xxxxx.blob.core.windows.net/abc/2022/02/11/"
Regards,
Rajib
Glob is not a standard regular expression, there is differences between them.
For example glob doesn't match the number of times.
For details, see:here
Back to this question, a relatively stupid way, looking forward to the perfect solution of the giant.
pathGlobFilter="Sales_[0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[a-z0-9]*.csv"

Compare fields in 2 files and write only missing or different fields

I have 2 LDIF output prepared to one line each object separated by "|" and all attribut fields are sorted.
I want to write first field contain dn: when a field is missing or has a different value in second file. If it's missing i need a marker add and a marker replace if it's different. If all fields are identical, nothing should be written.
My script with 2 loops works, but is to slow for millions of rows. I'm trying this now with awk, but I don't know how to compare 2 files with it.
File 1
dn:abc|attribut a: 10|attribut b: 11|attribut c: 12
dn:xyz|attribut a: 10|attribut b: 11|attribut c: 12
File 2
dn:abc|attribut a: 10|attribut b: 11|attribut c: 12
dn:xyz|attribut a: 10|attribut c: 11
Needed Output
dn:xyz|add attribute b: 11|replace attribute c: 12
Line with dn:abc is identical in both file, so it's not written in output file.
Line dn:xyz has difference, so I need this first field contain "dn:xyz". Next field attribut a: 10 is identical so nothing do. Next field is missing in file 2, I need "add attribut b: 11". Last field attribut c: is in both files but the value differs in file 2, I need "replace attribut c: 12" the value from file 1.
This is not a direct answer to the question, however, from the comments it seems that the files to be compared are being created from standard LDIF formatted files.
There already exist tools to take two LDIF files and output the changes needed to reconcile them. For example:
OpenDJ provides ldifdiff in its opendj-ldap-toolkit package
ldifdiff is a Go tool
ldiff is an unmaintained Perl script to "generate differences between two LDIF files"
ldap-diff - another Perl script
One of these tools is likely to be much more reliable than something new from scratch.
Some background reading on LDIF, including change records:
rfc2849
notes from Oracle

PDF File Merge Based on Filename

I have large batches of pdf files that must be merged.
Folder1 FileName Explaination: invoice12-105767-1510781492.pdf - 105767 is the component that will match with a pdf filename in Folder2.
"invoice12-" First section of the filename. This can sometimes be "invoice11-" or "invoice6-" so merging based on character length became challenging. The "invoicexx-" are based on where in the system the file came from.
"105767" Second part of the filename. This is the key component for matching and merging. this will be the filename in Folder2 it belongs with.
"-1510781492.pdf" Third part of the filename is a system generated unique ID, which can contain more or less characters.
Folder1:
invoice12-105767-1510781492.pdf
invoice12-105768-1510781484.pdf
invoice12-105769-1510781469.pdf
Folder2:
105767.pdf
105768.pdf
105769.pdf
OutputFolder:
Example I don't want to merge all the files in both folders into 1 huge file. I need them merged based on the Folder2 filename. (105767.pdf + invoice12-105767-1510781492.pdf) in that order specifically, also.
The final output should be three pdf files merged in order as follows:
105767.pdf + invoice12-105767-1510781492.pdf to make 1 file named 105767.pdf
105768.pdf + invoice12-105768-1510781484.pdf to make 1 file named 105768.pdf
105769.pdf + invoice12-105769-1510781469.pdf to make 1 file named 105769.pdf
I would appreciate any assistance with a way to automate this process. I merge over 800 files per day. This small automation would shave hours off my day and my wrist from carpel tunnel.
I primarily use Mac OS 10.13.1. I have looked around in Mac's "Automater" program and cannot figure out how to get it to do what I need. (I did figure out a great way to split files into single pages)
I downloaded pdftk server (since that is Mac compatible) but cannot figure out if this type of match and merge is capable with this program.
I have Adobe Acrobat DC Professional and it does not seem to have this match and merge function.
I am even open to other paid programs. I just need a fairly future-proof way of getting this mundane task done through automation on my Mac.
You can take a look at the APDFL library examples that are provided with sample code. These libraries are supported on Mac, but are not free.
https://dev.datalogics.com/adobe-pdf-library/sample-program-descriptions/c1samples/#mergedocuments
Here is a snippet of the code you would need to use:
APDFLDoc doc1 ( csInputFileName1.c_str(), true);
APDFLDoc doc2 ( csInputFileName2.c_str(), true);
// Insert doc2's pages into doc1.
// Here, we've stated PDLastPage, which adds the pages just before the last page of the target.
// If we specify PDBeforeFirstPage instead, doc2's pages will be inserted at the head of doc1.
PDDocInsertPages ( doc1.getPDDoc(),
PDLastPage,
doc2.getPDDoc(),
0,
PDAllPages,
PDInsertAll,
NULL, NULL, NULL, NULL);
doc1.saveDoc ( csOutputFileName.c_str(), PDSaveFull | PDSaveLinearized);

Can you split a PDF 'package' into separate files with CF8 or CF9?

The cfpdf tag has lots of options but I can't seem to find one for splitting apart a PDF package into separate files which can be saved to the file system.
Is this possible?
There's not a direct command, but you can achieve what you want to do in very few lines of code by using action="merge", with the "pages" attribute. So if you wanted to take a 20-page PDF and create 20 separate files, you could use getInfo to get the number of pages in the input document, then loop from 1 to that number, and in that loop, do a merge from your input document to a new output document for each iteration, with pages="#currentPage#" (or whatever your loop counter is)