Using Macros to scan .odt file in libreoffice for specific content - libreoffice-basic

Total beginner here when it comes to macros in Libreoffice.
Let's say I have a text (.odt or .doc) that contains words (irrelevant) and two types of information. The information always starts with a "/". Information type 1 looks like /234hdf (numbers and letters) and information type 2 like /02395, so only numbers.
I want to use a script that filters for type 1 information and returns it.
Is using a macro the correct way to filter for this? If so, what language should I use and where do I start?
Thanks in advance!

Related

Access to Word Template

I am having issues figuring out the best way to do this:
I have a word template for an interview pre-night. What I need to do is fill out the word template with the interviewer and the people who are interviewing them. There will always be 1 interviewee but up to 12 interviewers. The part giving me issue is that there will not always be 12 interviewers so the area that the data is moved to needs to be dynamic. Should I create a table or bookmarks in Word and use VBA to move the data, or design the report in Access? Thanks for your help!
I am going to assume that what you are trying to do is complicated enough that "mail merge" won't work for you.
It really depends on whether you need the end result to be a Word document or an Access report. Both easily convert to PDF for document archiving. If you prefer to work with Word, add the key fields into your document with all the formatting necessary and then use VBA to do a search replace.
Two ways you could go about dealing with the 1-12 interviewers issue:
Use VBA to create one long segment containing all interviewers as a
single "field".
Add 12 sets of key fields and use search/replace
(through VBA of course not manually) to fill in the exiting
interviewer info and delete the key fields for the non-existing ones.

Mass Excel hyperlinks

I've looked around and can't seem to find any help for what I am looking to do.
I have a document that I am using to record data related to repairs and such on machines.
All of my entries are done in a numeric order.
I have to scan hard copies in and hyperlink to them from the excel sheet.
All the files are named to me in a numerical order as well that matches the number in column A.
Is there a way to do this as a formula?
You can use the HYPERLINK formula. The 1st parameter allows you to dynamically build up the URL to point to, so if you say you have the correct numbers already in a column, you can use that to build the URL. Provided the URL for each scanned document can uniquely be derived using only that number, of course.
If the scanned documents are on the web, then you can use e.g.:
=HYPERLINK("https://www.myserver.com/scans/"&A4&".pdf","Scan nr. "&A4)
If they're on your own computer (or on a network drive), then you can use e.g.:
=HYPERLINK("file:///D:/scans/"&A4&".pdf","Scan nr. "&A4)
--- EDIT ---
As Cyril pointed out, you can also just use e.g.:
=HYPERLINK("D:\scans\"&A4&".pdf","Scan nr. "&A4)
which makes it a bit more readable. Also note that Excel really likes to warn you when using these types of links ;)

Pentaho PDI how to validate source Excel metadata for the order and number of columns?

In my case, I need to process input data in Excel (xls and xlsx) format. I need to do a file level validation of the Excel file for the order and number of columns, before processing the row level data. If this file level validation is failed, then exclude this file and inform the concerned through mail.
Please guide me, with some sample or example, how to validate the excel files for metadata? I thought of placing a variable in kettle.properties with semicolon separated header fields and compare this with the source excel file. But not getting a way to extract only the header row from file as I want.
Please guide me.
Are column names on Row 1 of your file (or any other row reasonably close to row 1) and you know how many fields are in each, at most? If so, maybe you can get away with that.
Step 1: You need to understand how many rows may there be, what they may be called, what data types, etc.
Step 2: Read the first N rows of the file(s) ensuring the header row will be read; Filter everything that is not the header (how to? depends on the specific structure). Because you don't know what are the field names, just name them field0, ... field999 or whatever.
Step3: Work some magic on the headers; filtering based on position of certain fields, mapping field names to data types, etc.
Step4: Metadata injection. Using the information you already have from before, you create a template transformation that is generic in the sense that field names are not set up in the excel input step. The metadata injection allows you to set up that step in run time, depending on the entire logic you just applied on the headers.
This page has a couple example videos: http://wiki.pentaho.com/display/EAI/ETL+Metadata+Injection
I had to build something like that (only it was CSV files and not XLS) a while back and metadata injection allowed me to load every single file in one go with 100% mapping accuracy. Of course, the magic happens before, when you parse the header row.
Thanks nsousa for your answer.
I got to the required solution with the help of my colleague. Here what I did
(1) Read only the 1st row of the source Excel file as normal data (no header, limit 1) where the field names will be called as F1, F2 etc
(2) concat the fields (data) to get a pattern
(3) Match this pattern with acual metadata pattern, if they are matching, then excel file is passed
Good trick. Thanks.

Automate adding bookmarks to tables and then create an index

I have a program which outputs a collection of tables in a word document which I eventually want to post as an html file with bookmarks and an index. The tables are grouped by "Name:" where there is a 3 row table that contains detailed header information for a section of data, then there is a second table which can span multiple pages which contains the data for that section. There is then a page break so that the next sections header table is on a new page. This can occur for a variable number of sections numbers in the hundreds. I need to write a script that
searches my document for "Name:", which is unique and would not
appear anywhere but the header table,
grabs the text that follows "Name:" within that table cell (for example "Name: Line 1234)
replaces all the blanks in that text string with an underscore to
make it a suitable bookmark name,
creates a bookmark with the name,
goes back and creates an index at the front of the document
Saves the file as an html
I have a passing familiarity with VB for word, I have used it a bit in excel, but am by no means an expert. I would appreciate any advice on functions and objects that I should be using for this script.
Hey MikeV from what I can gather, your problem seems more conceptual, less specific. What I mean is, have you started yet? Or looking at a blank script page?
I'm relatively new to coding, so I get that myself. What I do is make a list of what I need to do (what you have). Then think of the code or psuedo-code that would go with each step. Then you can start to build your script. You don't have to start with step one (as step 2/3 is often the more interesting bit), but let's do that.
Now, you need to search for a text string containing "Name:". I am proficient with VBA in excel, but haven't done anything for word. So I'd look it up. Googling "VBA find word in word document" will bring you to this page, which shows you how to approach step one. So steal their code, alter it to fit your needs and move on to step 2. Repeat the process, and that's how you build your algorithm! :)
Just a FYI, typically StackOverflow is for specific questions with an answer that can be confirmed, whereas you asked for help building an algorithm. I'd reserve those questions for your programming professor or friend who can help.
cheers

Word VBA - Matching large selecting of text based keys with data. Embedded resource/text?

I have a pretty complex VBA plugin for Word written that automatically creates a report for me, using XML input, cycling through the X objects within the report to create the output. It is currently embedded into a Word Template file .DOCM.
I need to insert into the report a static list of text, based on the name of the item within the XML. For example, within my XML I have entries with a name BLAH1, BLAH2, BLAH3. Every time I see BLAH1, I need to match it with the static INSERT1, and BLAH2 match it with INSERT2, etc.
This seems simple enough, but her lies the problem...
It appears there are no Hashmap's in VBA without requiring external libraries, which I can't really rely on, since I can't install items on the machines where this will be running. As a result I can't store this reference data in a Hashmap as far as I can tell.
I can't seem to concatenate more than about 20 lines of strings together without hitting a max within VBA, and just parsing the chunk of text for what I need since there are about 1500 "lines" in my reference data, which greatly exceeds 20.
I also haven't found a way to embed a text, or any other type of file to hold this information within the file, and then parse the data.
I really would like to have everything within the single template file, without requiring additional text or other files to be bundled with the document. If there is no other option, I will go that route, but I wanted to see what create ideas people at Stackoverflow might have first ;-)
Have you considered using Word's Document Variables? They are name/value pairs stored invisibly within the document. (ActiveDocument.Variables("BLAH1").Value = "INSERT1" to create one, debug.print ActiveDocument.Variables("BLAH1").Value to retrieve a value (you have to use an error handler to detect non-existent indices if you go that route). Word can store (at least) hundreds of thousands of these things).