Cannot use capital letters in word - malware

I just opened a word document in docx format and I am unable to type the letter T as a capital. I can type it in other word documents on the same computer.
The reason I am asking it on this site, is because I believe it may be malware.
Insight?

I would recommend to download Combofix(can be install from http://www.bleepingcomputer.com/download/combofix/) and start by running it in safe mode.
Then you can do a full system scan. I recommend downloading Malwarbytes, Super Anti Spyware, and Spybot Search and Destroy. Start them, do the updates then do a full scan with all of them. If everything is fixed uninstall all of them since they are all trial(re install when needed).

Related

Replace words/phrases in existing PDF or docx with other words

I am trying to make a dynamic PDF generator as an .NET Core API. I want to take an existing PDF, or .docx file, and edit it so it replaces the current name (John Doe) with something that can be replaced like #NAME_PLACEHOLDER.
I then want to transform #NAME_PLACEHOLDER -> John Doe (or whatever is in the KeyValuePair or Dictionary<string, string>).
I am running this on a Docker environment, so I can easily execute commands and I am willing to do that as well.
So far I have tried a few things:
1) pdf2htmlEX
Executes as pdf2htmlEX file.pdf
Does the job pretty well
Can be converted back to PDF using Google Chrome headless or similar
Problem: Only the characters used in the PDF can be used to replace. So if I only use A, B, C as characters, it will make D into Times New Roman (or default font)
2) LibreOffice ODT to PDF
This was pretty nice, because I could simply unzip the .odt file, open content.xml, search and replace, then save it as an .odt file again
Could be converted into PDF rather easily using soffice --convert-to pdf
LibreOffice is quite nice
Problem 1: Microsoft Word -> Save as ODT tends to break the formatting, so we have to use LibreOffice to go and change it back again
Problem 2: We don't want to move away from Microsoft's Office suite
3) HTML to PDF using Chrome Headless
What you see is what you get
By far the best option, if we're all developers aaand have unlimited time
Problem 1: Only our developers can make changes, since our marketing department do not know HTML
Problem 2: Our existing PDFs would have to be rewritten in HTML
As you can see, I have tried a bunch of things. None of them, except Chrome Headless, has lived up to my expectations. What I really like about #3 is what you see is what you get. I can make the whole thing in HTML, press CTRL+P and see what it looks like as a finished PDF, basically.
I am looking for a better solution, though. It can be paid. It can be free. All I need is to change out words/phrases with other words dynamically, which apparently seems like a tough thing to do.
Thanks for specifying what you've already found clearly. It helps a lot providing a succinct answer.
The conversion is always tricky - I'm sure you know Word has trouble displaying/editing some Word documents itself.
I have experience regarding point #2 "LibreOffice ODT to PDF" and can suggest a few things to test:
Don't use Microsoft to do the docx->odt conversion. It's not good as you know. Use LibreOffice itself to do this step. The rest of your process remains the same.
For some documents, Libre Office does doc->odt much better. So, you can instead work with DOC format and get a better result without any other changes.
You won't be able to remove the devs from the process, but you can certainly reduce their role allowing your business/marketing teams to have more direct input simply by:
get the starting point document to the devs to run through the conversion process. The devs can "clean up" the document to make it convert nicely.
make this version of the document the "official" starting point. The business or technical teams can load it, adjust it, and put it back into the process.
if possible, expose a test-platform to the business teams so they can download, adjust, upload and render to PDF. This cycle means they will be able to achieve more and if they're good, do impressive stuff without any dev input.
the above steps simply mean don't expect perfect conversion of arbitrary complex documents. Starting from a (even complex) working baseline is great.
Some of that might show you that your #2 is actually going to get the best overall results.
I hope that helps.

I'd like to recognize the text of all pdfs on my computer and save them without moving them from their locations. Is it possible?

I've tried using Adobe Acrobat X Pro to "recognize text in multiple files."
When I start this process and it asks for the directory, I've chose C:, my main hard drive.
It took hours to load and when it did, the list of files it generated included word documents as well. Adobe said I couldn't proceed until I removed the problem files.
Once I removed all the pdfs Adobe flagged as having errors (like password protection) and the prompt remained, I assumed it meant the word documents in the list.
So I manually removed those too. But Adobe still said that I couldn't proceed until problem files were removed and there weren't any remaining files in the list that adobe had flagged as having issues.
My firm is trying to make sure all pdfs we have are searcheable. Currently, some are and some aren't. Our goal is to make them all searchable without removing them from their varied locations.
I think you can do this using a combination of
regular java : to list all files in a directory that match a given criterium (e.g. their name ends with '.pdf')
iText : to iterate over the PDF document and extract all images
Tess4J : a port of Tesseract (google OCR engine) for java, to turn the extracted images back into text
Unless I am much mistaken, Tesseract even offers a crude version of this workflow for you. But only for 1 pdf at a time. So you'd still need some windows/linux scripting to pipe in all files of a given directory.

Acrobat Reader search fails for odd characters

I came across some odd characters in a PDF file. I tried to search to see if there were more, but Adobe Reader's search didn't even find the instance that I cut and pasted from to start the search. What?? I used both the Control-F search and the full Control-Shift-F search.
Can someone help me understand why I can't search for these characters?
The computer I'm using is Windows 7, 64 bit, US English locale settings.
The example text below should read "system's current state."
If it matters, I believe these odd characters are an artifact of Microsoft Word's "Smart Quotes". I can make them go away by not cutting and pasting from Word to the XSD/LaTex PDF generation tool my coworker set up. But I want to understand why Acrobat Reader won't let me find instances of them via search.

mercurial version control with word

This is a followup to svn or mercurial version control of word documents
I potentially want multiple non-programmers to be able to use version control on word documents. I can configure mercurial to look at the unzipped docx files. What I want is as follows:
Read from Docx files (answered in that question, using a feature of mercurial to unzip before comparing, awesome!
automatically merge documents whenever there are non-colliding changes. It appears from the previous answer that this is done using comparison tools.
programmatically run word on the two documents if there are collisions, comparing the two.
I have manually opened one file, then another in Word to see what it was like. On my word 2004, it seems a bit buggy, but I see from reviews that the feature is much improved in 2010.
I found this link:
http://office.microsoft.com/en-us/word-help/command-line-switches-for-microsoft-office-word-2007-HP010164010.aspx#BM1
for command lines, and now see that I can execute the command:
winword /q /f file1.docx /f file2.docx
The q is for quiet, /f specifies a file. The docs don't say if I can specify two files but I tried and it loads two in separate windows.
So the only thing I don't know is how to trigger word to compare the two.
Is the word interaction a fairly easy scripting job, or does it involve binary APIs that I don't want to know about, like DCOM, ActiveX, etc.
Digging around in the TortoiseHg directory, I found some examples of scripts implementing diff/merge of doc files in the diff-scripts directory. There is an [extdiff] section in Mercurial.ini that can be configured to use this scripts. This may get you started.

Looking for fast "Find in Files" program

I currently have a directory with 98,000 individual archive transaction files. I need to search those files for user input strings and have the option to open the files as it finds them or at the end of the search. I'm using Notepad++ currently and, while functional, it's quite slow. I thought about writing my own, but I am only familiar with .NET and I'm a beginner. Also, I'm not sure how efficient that would be compared to NP++.
This tool would be used again and again so the dev time would definitely be worth it if it came to that. Is there some other tool out there that's already developed that would accomplish this?
Agent Ransack
I've been using it for years.
I recommend you using Astrogrep, a grep utility for Windows. You can open files as it finds them, and it shows you the line where the match was found, without having to open the file.
Assuming the archive transaction files are plain text, you can download CYGWIN which is an environment providing UNIX tools for Windows.
Once that's done, you can open a new Cygwin Bash Shell, then do cd 'c:\\foo' to get into the directory with your files, then do grep -F -r "my string" * to find your text. (The -F means it searches for that literal string as opposed to a regular expression and -r means recursive.)
Possibly overkill, but you could index the folder using Lucene, keep the index uptodate (as transaction files are added) and then searches will take trivial amounts of time, you can target the file, line and word number of each match for a given search string