How do I replace part of a string with a lua filter in Pandoc, to convert from .md to .pdf? - pdf

I am writing markdown files in Obsidian.md and trying to convert them via Pandoc and LaTeX to PDF. Text itself works fine doing this, howerver, in Obsidian I use ==equal signs== to highlight something, however this doesn't work in LaTeX.
So I'd like to create a filter that either removes the equal signs entirely, or replaces it with something LaTeX can render, e.g. \hl{something}. I think this would be the same process.
I have a filter that looks like this:
return {
{
Str = function (elem)
if elem.text == "hello" then
return pandoc.Emph {pandoc.Str "hello"}
else
return elem
end
end,
}
}
this works, it replaces any instance of "hello" with an italicized version of the word. HOWEVER, it only works with whole words. e.g. if "hello" were part of a word, it wouldn't touch it. Since the equal signs are read as part of one word, it won't touch those.
How do I modify this (or, please, suggest another filter) so that it CAN replace and change parts of a word?
Thank you!
this works, it replaces any instance of "hello" with an italicized version of the word. HOWEVER, it only works with whole words. e.g. if "hello" were part of a word, it wouldn't touch it. Since the equal signs are read as part of one word, it won't touch those.
How do I modify this (or, please, suggest another filter) so that it CAN replace and change parts of a word?
Thank you!

A string like Hello, World! becomes a list of inlines in pandoc: [ Str "Hello,", Space, Str "World!" ]. Lua filters don't make matching on that particularly convenient: the best method is currently to write a filter for Inlines and then iterate over the list to find matching items.
For a complete example, see https://gist.github.com/tarleb/a0646da1834318d4f71a780edaf9f870.
Assuming we already found the highlighted text and converted it to a Span with with class mark. Then we can convert that to LaTeX with
function Span (span)
if span.classes:includes 'mark' then
return {pandoc.RawInline('latex', '\\hl{')} ..
span.content ..
{pandoc.RawInline('latex', '}')}
end
end
Note that the current development version of pandoc, which will become pandoc 3 at some point, supports highlighted text out of the box when called with
pandoc --from=markdown+mark ...
E.g.,
echo '==Hi Mom!==' | pandoc -f markdown+mark -t latex
⇒ \hl{Hi Mom!}

Related

VBA replace certain carriage

All.
I am used to programming VBA in Excel, but am new to the structures in Word.
I am working through a library of text files to update them. Many of them are either OCR documents, or were manually entered.
Each has a recurring pattern, the most common of which is unnecessary carriage returns.
For example, I am looking at several text files where there is a double return after each line. A search and replace of all double carriage returns removes all paragraph distinctions.
However, each line is approximately 30 characters long, and if I manually perform the following logic, it gives me a functional document.
If there is a double carriage return after 30+ characters, I replace them with a space.
If there were less than 30 characters prior to the double return, I replace them with a single return.
Can anyone help me with some rudimentary code that would help me get started on that? I could then modify it for each "pattern" of text documents I have.
e.g.
In this case, there are more than
thirty characters per line. And I
will keep going to illustrate this
example.
This would be a new paragraph, and
would be separated by another of
the single returns.
I want code that would return:
In this case, there are more than thirty character returns. And I will keep going to illustrate this example.
This would be a new paragraph, and would be separated by another of the single returns.
Let me know if anyone can throw something out that I can play with!
You can do this without code (which RegEx requires), simply using Word's own wildcard Find/Replace tools, where:
Find = ([!^13]{30,})[^13]{1,}
Replace = \1^32
and, to clean up the residual multi-paragraph breaks:
Find = [^13]{2,}
Replace = ^p
You could, of course, record the above as a macro...
Here is a RegEx that might work for you:
(\n\n)(?<!\.(\n\n))
The substitution is just a plain space, you can try it out (and modify / tweak it) here: https://regex101.com/r/zG9GPw/4
This 'pattern' tells the RegEx engine to look for the newline character \n which occurs x2 like this \n\n (worth noting this is from your question and might be different in your files, e.g. could be \r\n) and it assumes that a valid line break will be proceeded by a full stop: \..
In RegEx the full stop symbol is a single character wild card so it needs to be escaped with the '\' (n and r are normal characters, escaping them tells the RegEx engine they represent newline and return characters).
So... the expression is looking for a group of x2 newline characters but then uses a negative look-behind to exclude any matches where the previous character was a full stop.
Anyway, it's all explained on the site:
Here is how you could do a RegEx find and replace using NotePad++ (I'm not sure if it comes with RegEx or if a plugin is needed, either way it is easy). But you can set a location, filters (to target specific file types), and other options (such as search in sub-directories).
Other than that, as #MacroPod pointed out you could also do this with MS Word, document by document, not using any code :)

how to change text item delimiter in Keynote rich text so period and comma are not ignored (using Javascript for Automation)?

In short:
with iWork rich text objects, breaking the text up in words goes from:
"This... he said, is a sentence!"
to:
["This", "he", "said", "is", "a", "sentence"]
So: periods, comma and exclamation point have disappeared.
Similar to the AppleScript situation, but with Javascript for Automation it is unclear to me how to set the text item delimiter (plus: I am hoping it can be simpler than in the old days).
In detail:
I would like to modify rich text like:
testing [value] units <ignore this>
>>>
also ignore this
<<<
etc.
The text can contain variations in size/color/weight, which should be kept.
The result should be e.g.:
testing 123 units
etc.
When I go through the words (in my case: presenter notes in Keynote), I get:
["testing", "value", "units", "ignore", "this", "also", "ignore", "this", "etc"]
instead of:
["testing", "[value]", "units", "<ignore", "this>", ">>>", "also", "ignore", "this", "<<<", "etc."]
So: characters like ., [, and > don't show up, which makes it impossible to search/replace.
To get the words, I use:
words = Application("Keynote").documents[0].slides[0].presenterNotes.words
I also tried using whose() in combination with ignoring/considering (case, hyphens, punctuation), but the result is the same.
How can I get a list of words that include the non-alphanumeric characters?
To get the full text of a slide's notes, use the presenterNotes() method:
note = Application("Keynote").documents[0].slides[0].presenterNotes()
It's not exactly intuitive for me, but it works fine.

Does mIRC Scripting have an escape character?

I'm trying to write a simple multi-line Alias that says several predefined strings of characters in mIRC. The problem is that the strings can contain:
{
}
|
which are all used in the scripting language to group sections of code/commands. So I was wondering if there was an escape character I could use.
In lack of that, is there a method, or alternative way to be able to "say" multiple lines of these strings, so that this:
alias test1 {
/msg # samplestring}contains_chars|
/msg # _that|break_continuity}{
}
Outputs this on typing /test1 on a channel:
<MyName> samplestring}contains_chars|
<MyName> _that|break_continuity}{
It doesn't have to use the /msg command specifically, either, as long as the output is the same.
So basically:
Is there an escape character of sorts I can use to differentiate code from a string in mIRC scripting?
Is there a way to tell a script to evaluate all characters in a string as a literal? Think " " quotes in languages like Java.
Is the above even possible using only mIRC scripting?
"In lack of that, is there a method, or alternative way to be able to "say" multiple lines of these strings, so that this:..."
I think you have to have to use msg # every time when you want to message a channel. Alterativelty you can use the /say command to message the active window.
Regarding the other 3 questions:
Yes, for example you can use $chr(123) instead of a {, $chr(125) instead of a } and $chr(124) instead of a | (pipe). For a full list of numbers you can go to http://www.atwebresults.com/ascii-codes.php?type=2. The code for a dot is 46 so $chr(46) will represent a dot.
I don't think there is any 'simple' way to do this. To print identifiers as plain text you have to add a ! after the $. For example '$!time' will return the plain text '$time' as $time will return the actual value of $time.
Yes.

ANSI escape codes in GNU Smalltalk

I'm trying to make a console-based program that makes use of ANSI escape codes with GNU Smalltalk. I can't seem to figure out how to go about printing a string object formatted with ANSI escape codes. I've tried the following.
'\x1b[31mHi' displayNl
This prints the entire string, including the escape code, without any formatting. I would have expected this to print "Hi" in red (and then everything else in the console after that, as I didn't reset the color.)
After googling a bit, I was able to find a couple issues on mailing lists where people were trying to produce things like newlines using "\n". Most of the answers were using the Transcript object's cr method, but I didn't find anything about colors in the textCollector class.
It looks like it shouldn't be all that hard to create my own module in C to achieve this functionality, but I'd like to know if there's a better way first.
I'm aware of the ncurses bindings, but I'm not sure that'd be practical for just making certain pieces of text in the program colored. So, is there a standard way of outputting colored text to the terminal in GNU Smalltalk using ANSI escape sequences?
Ended up getting an answer on the GNU Smalltalk mailing list. Looks like you can use an interpolation operator to achieve this.
For example ('%1[31mHi' % #($<16r1B>)) displayNl. would change the color to red, and ('%1[34mHi' % #($<16r1B>)) displayNl. would change the color to blue.
Basically, the % operator looks for a sequences that look like "%(number)" and replaces them with the objects in the array to the right of the operator. In our case, the array has one item, which is the ascii escape character in hexadecimal. So the "%1" in "%1[31mHi' is being replaced with the escape character, and then printed.
(This answer was stolen almost verbatim from Paolo on the GNU Smalltalk mailing list.)

vb.net VB 2010 Underscore and small rectangles in string outputs?

I've made some good progress with my first attempt at a program, but have hit another road block. I'm taking standard output (as a string) froma console CMD window (results of dsquery piped to dsget) and have found small rectangles in the output. I tried using Regex to clean the little bastards but it seems they are related to the _ (underscore), which I need to keep (to return 2000/NT logins). Odd thing is - when I copy the caharcter and paste it into VS2K10 Express it acts like a carrige return??
Any ideas on finding out what these little SOB's are -- and how to remove them?
Going to try using /U or /A CMD switch next..
The square is often just used whenever a character is not displayable. The character could very well be a CR. You can use a Regular Expression to just get normal characters or remove the CR LF characters using string.replace.
You mentioned that you are using the string.replace function, and I am wondering if you are replacing the wrong character or something like that. If all your trying to do is remove a carriage return I would skip the regular expressions and stick with the string.replace.
Something like this should work...
strInputString = strInputString.replace(chr(13), "")
If not could you post a line or two of code.
On a side note, this might give some other examples....
Character replacement in strings in VB.NET