Inline math font size and equations spacing in markdown to pdf conversion using pandoc - pdf

I'm using vim and markdown as an alternative to obsidian. I'm doing the conversion from markdown to pdf using pandoc and I would like to resemble as much as possible the pdf output of obsidian since I like how it looks.
In general I could make both pdf looks almost the same, but, I got two problems, the first is that the inline math font is too big, the second that the spacing before and after an equation is different.
Here are two screenshots, the first one being the pandoc output, the second the obsidian output.
To style the pdf I'm using a custom latex snippet which I include with pandoc -H style.tex ... during the pdf compilation, with this I was able to change the spacing between the text and the sections title as well as other things like page margins, etc. But I didn't find anything related to the math nor the equation for a template
I've also tried writing the equation as $\small \vec{E}$ but didn't work.
I think it has to be a way of changing the spacing from the latex template, I know that pandoc is using the unicode-math package to convert the latex equations but didn't find nothing related on how to change the spacing for the equations nor the font size.
EDIT: the style.tex file
% page setup
\usepackage[a4paper,
top=2cm,
bottom=1.75cm,
left=1.75cm,
right=1.75cm]{geometry}
\usepackage{titlesec}
\usepackage{fontspec}
% inline code (backticks in md)
% taken from https://jdhao.github.io/2019/05/30/markdown2pdf_pandoc/
\linespread{1.15}
\definecolor{bgcolor}{HTML}{e0e0e0}
\let\oldtexttt\texttt
\renewcommand{\texttt}[1]{
\colorbox{bgcolor}{\oldtexttt{#1}}
}
% change boldfont bold to extrabold
% \setmainfont[
% BoldFont={Inter-ExtraBold}
% ]{Inter}
% change regular font to light font
% \setmainfont{Inter light}
\newfontfamily\titlefont{Inter}[
UprightFont = *-Regular,
BoldFont = *-ExtraBold,
]
\newfontfamily\sectionsfont{Inter}[
UprightFont = *-Regular,
BoldFont = *-SemiBold,
]
\titleformat{\section}
{\titlefont\huge\bfseries}
{}
{0em}
{}
\titleformat{\subsection}
{\sectionsfont\LARGE\bfseries}
{}
{0em}
{}
\titleformat{\subsubsection}
{\sectionsfont\Large\bfseries}
{}
{0em}
{}
\titleformat{\paragraph}
{\sectionsfont\large\bfseries}{\theparagraph}{1em}{}
\titlespacing*{\paragraph}
{0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
\titlespacing*{\subsubsection}
{0pt}{2.5ex plus 1ex minus .2ex}{1.5ex plus .2ex}
\titleformat{\subparagraph}
{\normalfont\large\bfseries}{\theparagraph}{1em}{}
\titlespacing*{\subparagraph}
{0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
EDIT2: this is the .tex output part showed in the screenshot
taken from:
pandoc --pdf-engine=xelatex file.md -o file.tex
eléctrica que efectúa el campo sobre la partícula. se puede calcular entonces como:
\[\frac{w_{a \rightarrow b}}{q_{0}} = - \int_{a}^{b} \vec{e} \cdot d\vec{l} = v_{b} - v_{a} = v_{ab}\]
donde \({q_{0}}\) es una pequeña carga puntual, \(v_{a}\) y \(v_{b}\) el potencial por unidad de carga de los puntos \(a\) y \(b\) respectivamente, \(\vec{e}\) el valor del campo eléctrico

Related

Latex printing Listings in black/white only, although the rest is printed in color?

I'm currently having trouble printing my generated PDF of a latex document. Although printing the whole document in color just fine, for some reason all my listings are printed in black and white only. In the PDF itself the listings are displayed in color, just as expected. Is this a normal latex behavior or am I doing something wrong? I'm using the regular listing package with the following defined style. Minimum working example:
\documentclass[ twoside,openright,numbers=noenddot,%
toc=bibliography,toc=listof,%
footinclude=false,headinclude=false,cleardoublepage=empty,%
BCOR=5mm,paper=a4,fontsize=11pt,%DIV=14,%
ngerman%
]{scrreprt}
\RequirePackage[utf8]{inputenc}
\DeclareUnicodeCharacter{00A0}{~}
\RequirePackage[T1]{fontenc}
\newcommand{\currentVersion}{Version 2.1\xspace}
\newcounter{dummy}
\PassOptionsToPackage{ngerman}{babel}
\RequirePackage{babel}
\RequirePackage{csquotes}
\renewcaptionname{ngerman}{\listfigurename}{Abbildungen}
\renewcaptionname{ngerman}{\listtablename}{Tabellen}
\PassOptionsToPackage{fleqn}{amsmath}
\RequirePackage{amsmath}
\usepackage{geometry}
\geometry{a4paper,left=25mm,right=35mm,top=25mm,bottom=30mm}
\PassOptionsToPackage{dvipsnames}{xcolor}
\RequirePackage{xcolor}
\definecolor{ingwi}{cmyk}{.9,0,0,0}
\usepackage{textcomp}
\usepackage{scrhack}
\usepackage{xspace}
\usepackage{mparhack}
\PassOptionsToPackage{printonlyused}{acronym}
\usepackage{acronym}
\usepackage{booktabs}
\usepackage{multirow}
\usepackage[shadow]{todonotes}
\newcommand{\todox}[1]{\todo[inline, size=\small]{#1}}
\newcounter{todocounter}
\renewcommand{\todox}[2][]{\stepcounter{todocounter}\todo[inline, size=\small,caption={\thetodocounter: #2}, #1]{\renewcommand{\baselinestretch}{0.5}\selectfont\thetodocounter: #2\par}}
\usepackage{blindtext}
\counterwithout{footnote}{chapter}
\usepackage{tabularx}
\setlength{\extrarowheight}{3pt}
\usepackage{caption}
\captionsetup{format=plain,indention=1em,font=small}
\usepackage{subfig}
\usepackage{wrapfig}
\usepackage{listings}
\lstset{emph={trueIndex,root},emphstyle=\color{BlueViolet}}%\underbar} % for special keywords
\lstset{language=[LaTeX]Tex,%C++,
keywordstyle=\color{RoyalBlue},%\bfseries,
basicstyle=\small\ttfamily,
%identifierstyle=\color{NavyBlue},
commentstyle=\color{Green}\ttfamily,
stringstyle=\rmfamily,
numbers=none,%left,%
numberstyle=\scriptsize,%\tiny
stepnumber=5,
numbersep=8pt,
showstringspaces=false,
breaklines=true,
% frameround=ftff,
frame=single,
texcl=true,
belowcaptionskip=.75\baselineskip
%frame=L
}
\lstdefinestyle{Java}{
belowcaptionskip=1\baselineskip,
breaklines=true,
xleftmargin=\parindent,
language=Java,
texcl=true,
numbers=left,
numberstyle=\tiny,
stepnumber=1,
numbersep=8pt,
showstringspaces=false,
basicstyle=\footnotesize\ttfamily,
keywordstyle=\bfseries\color{blue},
commentstyle=\itshape\color{black!50!white},
morecomment=[s][\color{white}]{---}{+++},
morecomment=[s][\color{orange!90!black}]{#}{\ },
identifierstyle=\color{black},
stringstyle=\color{green!60!black}}
\lstdefinestyle{Xml}{
belowcaptionskip=1\baselineskip,
breaklines=true,
xleftmargin=\parindent,
language=Java,
texcl=true,
numbers=left,
numberstyle=\tiny,
stepnumber=1,
numbersep=8pt,
showstringspaces=false,
basicstyle=\footnotesize\ttfamily,
identifierstyle=\bfseries\color{black},
commentstyle=\itshape\color{black!50!white},
stringstyle=\color{green!60!black}}
\PassOptionsToPackage{pdftex,hyperfootnotes=false,pdfpagelabels}{hyperref}
\usepackage{hyperref}
\pdfcompresslevel=9
\pdfadjustspacing=1
\PassOptionsToPackage{pdftex}{graphicx}
\usepackage{graphicx}
\hypersetup{%
%draft, % = no hyperlinking at all (useful in b/w printouts)
pdfstartpage=1, pdfstartview=Fit,%
colorlinks=true, linktocpage=true,
%urlcolor=Black, linkcolor=Black, citecolor=Black, %pagecolor=Black,%
%urlcolor=brown, linkcolor=RoyalBlue, citecolor=green, %pagecolor=RoyalBlue,%
% uncomment the following line if you want to have black links (e.g., for printing)
colorlinks=false, pdfborder={0 0 0},
breaklinks=true, pdfpagemode=UseNone, pageanchor=true, pdfpagemode=UseOutlines,%
plainpages=false, bookmarksnumbered, bookmarksopen=true, bookmarksopenlevel=1,%
hypertexnames=true, pdfhighlight=/O,%nesting=true,%frenchlinks,%
pdftitle={test},%
pdfauthor={\textcopyright\ test, test},%
pdfsubject={},%
pdfkeywords={},%
pdfcreator={pdfLaTeX},%
pdfproducer={LaTeX with hyperref}%
}
\makeatletter
\#ifpackageloaded{babel}%
{%
\addto\extrasamerican{%
\renewcommand*{\figureautorefname}{Figure}%
\renewcommand*{\tableautorefname}{Table}%
\renewcommand*{\partautorefname}{Part}%
\renewcommand*{\chapterautorefname}{Chapter}%
\renewcommand*{\sectionautorefname}{Section}%
\renewcommand*{\subsectionautorefname}{Section}%
\renewcommand*{\subsubsectionautorefname}{Section}%
}%
\addto\extrasngerman{%
\renewcommand*{\chapterautorefname}{Kapitel}%
\renewcommand*{\sectionautorefname}{Abschnitt}%
\renewcommand*{\subsectionautorefname}{Abschnitt}%
\renewcommand*{\subsubsectionautorefname}{Abschnitt}%
\renewcommand*{\paragraphautorefname}{Absatz}%
\renewcommand*{\subparagraphautorefname}{Absatz}%
\renewcommand*{\footnoteautorefname}{Fu\"snote}%
\renewcommand*{\FancyVerbLineautorefname}{Zeile}%
\renewcommand*{\theoremautorefname}{Theorem}%
\renewcommand*{\appendixautorefname}{Anhang}%
\renewcommand*{\equationautorefname}{Gleichung}%
\renewcommand*{\itemautorefname}{Punkt}%
}
\providecommand{\subfigureautorefname}{\figureautorefname}%
}{\relax}
\makeatother
\PassOptionsToPackage{l2tabu,orthodox,abort}{nag}
\usepackage{nag}
\usepackage{enumitem}
\setdescription{font=\normalfont\bfseries}
\usepackage[activate={true,nocompatibility},final,tracking=true,kerning=true,spacing=true,factor=1100,stretch=10,shrink=10]{microtype}
\usepackage{mathpazo}
\setkomafont{disposition}{\bfseries}
\usepackage{adjustbox}
\usepackage{tabularx}
\usepackage{pifont}
\newcommand{\cmark}{\ding{51}}%
\newcommand{\xmark}{\ding{55}}%
\newcommand{\code}[1]{\texttt{\em{#1}}}
\begin{document}
\frenchspacing
\raggedbottom
\selectlanguage{ngerman}
\pagenumbering{roman}
\pagestyle{plain}
\cleardoublepage
\pagenumbering{arabic}
\pagestyle{headings}
\begin{minipage}[chbt]{0.95\textwidth}
\begin{lstlisting}[caption=test,captionpos=b,label={lst:example_facade},style=Java]
public class ExampleFacade implements CrudFacade<ExampleEntity> {
#Override
public ExampleEntity getSpecificEntity(String id) throws NoSuchDataSetException {
// retrieve specific entity from database
// ...
}
}
\end{lstlisting}
\end{minipage}
\begin{figure}[hbt]
\centering
\includegraphics[width=0.6\textwidth]{insert your picture here}
\caption{test}
\label{fig:lifecycle}
\end{figure}
\end{document}
Download my PDF: https://www.file-upload.net/download-14777480/test.pdf.html
Edit: It seems as though this is only a problem with my particular printer for some reason I can't explain at all. When using the printer of a friend it works just fine, even with the original PDF, not just the working example. I'm using the Brother DCP-L3550CDW.

Rmarkdown knit pdf - getting underlined text instead of italic using *italic* (huxtable issue?)

Rmarkdown text (between chunks) when formated italic using * * knits to pdf underlined and not italic format when I print huxtable.
Here is my example:
```
---
title: "<center><center>"
author: "<center> jd <center><br>"
date: "<center> `r Sys.Date()` <center>"
output:
pdf_document:
fig_caption: yes
toc: yes
toc_depth: 3
number_sections: true
latex_engine: xelatex
html_document:
code_folding: show
df_print: paged
theme: yeti
highlight: tango
toc: yes
toc_float:
collapsed: false
smooth_scroll: false
number_sections: true
fontsize: 10pt
---
This * * makes text *italic*.
```{r lib, message = FALSE}
library(huxtable)
library(tidyverse)
data(iris)
dt_hux <- iris[1:5,1:5] %>% as_hux() %>%
set_font_size(8) %>% set_font("Arial") %>%
set_bold(1, everywhere) %>%
set_top_border(1, everywhere) %>%
set_bottom_border(c(1, 6), everywhere)```
Until this point using * * will give italic format in knit pdf (if next chunck is not run).
But after the next chunk is run * * will underline text (in whole Rmarkdown). Commenting out **dt_hux** returns formatting to italic. Also knit to html will print italic formatting even with dt_hux.
```{r table}
options(huxtable.latex_use_fontspec = TRUE)
options(huxtable.print=print_latex)
dt_hux```
```
Is there a solution to this issues as I need to print huxtable in pdf?
From the TeXnical perspective the problem is that the ulem package is loaded without the normalem option. A couple of workarounds:
use classoption: normalem (based on Knitr hook to add code before \documentclass line in tex file to avoid options clash with xcolor). Caveat: this will pass the option to all packages and might be undesired in case the same option name is also used by other packages (I'm not aware of any other package that uses this option, but just in case ...)
add \normalem either as header-include or at the start of your document
This problem was fixed in huxtable 5.2.0, so you just need to update your package.

How to convert PDF with images which I don't care about to text?

I'm trying to convert pdf to text files. The problem is that those pdf contain images, which I don't care about (this is the type of file I want to extract (https://www.sia.aviation-civile.gouv.fr/pub/media/store/documents/file/l/f/lf_sup_2020_213_fr.pdf). Note that if I do copy/paste with my mouse, it work quite well (except the line break), so I'd guess that it's possible. Most of the answer I found online work pretty well on dummy pdf with text only, but give especially bad result on the map.
For instance, something like this
from tika import parser # pip install tika
raw = parser.from_file('test2.pdf')
print(raw['content'])
works well for retrieving the text, but I have a lot of trash like this :
ERY
CTR
3
CH
A
which appear because of the map.
Something like this, which work by converting the pdf to images and then reading the images, face the same problem (I found it on a very similar thread on stackoverflow, but there is no answer) :
import pytesseract as pt
from PIL import Image
import sys
def convert(name):
pages = convert_from_path(name, dpi=200)
for idx,page in enumerate(pages):
page.save('page'+str(idx)+'.jpg', 'JPEG')
quote = Image.open('page'+str(idx)+'.jpg')
text = pt.image_to_string(quote, lang="fra")
file_ex = open('page'+str(idx)+'.text',"w")
file_ex.write(text)
file_ex.close()
if __name__ == '__main__':
convert(sys.argv[1])
Finally, I tried to remove the image first, and then using one of the solutions above, but it didn't work better :
from tika import parser # pip install tika
from PyPDF2 import PdfFileWriter, PdfFileReader
# Remove the images
inputStream = open("lf_sup_2020_213_fr.pdf", "rb")
outputStream = open("test3.pdf", "wb")
src = PdfFileReader(inputStream)
output = PdfFileWriter()
[output.addPage(src.getPage(i)) for i in range(src.getNumPages())]
output.removeImages()
output.write(outputStream)
outputStream.close()
# Read from pdf without images
raw = parser.from_file('test2.pdf')
print(raw['content'])
Do you know how to solve this ? It can be in any language.
Thanks
One approach you could try is to use a toolkit capable of parsing the text characters in the PDF then use the object properties to try and remove the unwanted map labels while keeping the text characters required.
For example, the ParsePages method from LEADTOOLS PDF toolkit (which is what I am familiar with since I work for the vendor of this toolkit) can be used to obtain the text from the PDF:
using (PDFDocument document = new PDFDocument(pdfFileName))
{
PDFParsePagesOptions options = PDFParsePagesOptions.All;
document.ParsePages(options, 1, -1);
using (StreamWriter writer = File.CreateText(txtFileName))
{
IList<PDFObject> objects = document.Pages[0].Objects;
writer.WriteLine("Objects: {0}", objects.Count);
foreach (PDFObject obj in objects)
{
if (obj.TextProperties.IsEndOfLine)
writer.WriteLine(obj.Code);
else
writer.Write(obj.Code);
}
writer.WriteLine("---------------------");
}
}
This will obtain all the text in the PDF for the first page, with the unwanted results as you mentioned. Here is an excerpt below:
Objects: 3918
5
91L
F5
4
1 LF
N
OY
L2
1AM
TService
8
26
1de l’Information
0
B09SUP AIP 213/20
7
Aéronautique
Date de publication : 05 NOV
e-mail : sia.qualite#aviation-civile.gouv.fr
Internet : www.sia.aviation-civile.gouv.fr
141
17˚
82
N20
9Objet : Création de 4 zones réglementées temporaires (ZRT) pour l’exercice VOLOPS en région de Chambéry
En vigueur : Du mercredi 25 Novembre 2020 au vendredi 04 décembre 2020
More code can be used to examine the properties for each parsed character:
writer.WriteLine(" ObjectType: {0}", obj.ObjectType.ToString());
writer.WriteLine(" Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom);
writer.WriteLine(" TextProperties.FontHeight: {0}", obj.TextProperties.FontHeight.ToString());
writer.WriteLine(" TextProperties.FontIndex: {0}", obj.TextProperties.FontIndex.ToString());
writer.WriteLine(" Code: {0}", obj.Code);
writer.WriteLine("------");
This will give the properties for each character:
Objects: 3918
ObjectType: Text
Bounds: -60.952693939209, 1017.25231933594, -51.8431816101074, 1023.71826171875
TextProperties.FontHeight: 7.10454273223877
TextProperties.FontIndex: 48
Code: 5
------
Using these properties, the unwanted text might be filtered using their properties. For example, I noticed that the FontHeight for a good portion of the unwanted text is around 7 PDF units, so the first code might be altered to avoid extracting any text smaller than 7.25 PDF units:
foreach (PDFObject obj in objects)
{
if (obj.TextProperties.FontHeight > 7.25)
{
if (obj.TextProperties.IsEndOfLine)
writer.WriteLine(obj.Code);
else
writer.Write(obj.Code);
}
}
The extracted output would give a better result, an excerpt follows:
Objects: 3918
Service
de l’Information
SUP AIP 213/20
Aéronautique
Date de publication : 05 NOV
e-mail : sia.qualite#aviation-civile.gouv.fr
Internet : www.sia.aviation-civile.gouv.fr
Objet : Création de 4 zones réglementées temporaires (ZRT) pour l’exercice VOLOPS en région de Chambéry
En vigueur : Du mercredi 25 Novembre 2020 au vendredi 04 décembre 2020
Lieu : FIR : Marseille LFMM - AD : Chambéry Aix-Les-Bains LFLB, Chambéry Challes les Eaux LFLE
ZRT LE SIRE, MOTTE CASTRALE, ALLEVARD
*
C
D
E
In the end, you will have to try and come up with a good criteria to filter out the unwanted text without removing the text you need to keep, using this approach.

How to add latexbangla in book document class?

When I am using the latexbangla package
\usepackage[banglamainfont=Kalpurush, banglattfont=Siyam Rupali,
feature=0, changecounternumbering=0]{latexbangla}.
\newfontfamily{\bengalifontsf}[Script=Bengali]{Noto Sans Bengali}
in the following book documentation class, the header footer is not working.
MWE:
\documentclass[twoside,12pt,english]{book}
%\usepackage[banglamainfont=Kalpurush, banglattfont=Siyam Rupali, feature=0, changecounternumbering=0]{latexbangla} % to write bengali in latex
\usepackage{babel}
\usepackage[utf8]{inputenc}
\usepackage{color}
\definecolor{marron}{RGB}{60,30,10}
\definecolor{darkgray}{RGB}{0,80,0}
\usepackage[demo]{graphicx}
\usepackage{wallpaper}
\usepackage{fancyhdr}
\usepackage{geometry}
\geometry{
tmargin=5cm,
bmargin=5cm,
lmargin=5cm,
rmargin=3cm,
headheight=1.5cm,
headsep=0.8cm,
footskip=0.5cm}
\usepackage{fourier-orns}
\newcommand{\ornpar}{\noindent \textcolor{darkgray}{ \raisebox{-1.9pt}[10pt][10pt]{\leafright} \hrulefill \raisebox{-1.9pt}[10pt][10pt]{\leafright \decofourleft \decothreeleft \aldineright \decotwo \floweroneleft \decoone}}}
\makeatletter
\def\headrule{{\color{darkgray}\raisebox{-2.1pt}[10pt][10pt]{\leafright} \hrulefill \raisebox{-2.1pt}[10pt][10pt]{~~~\decofourleft \decotwo\decofourright~~~} \hrulefill \raisebox{-2.1pt}[10pt][10pt]{ \leafleft}}}
\makeatother
\fancyhf{}
\renewcommand{\chaptermark}[1]{\markboth{#1}{}}
\renewcommand{\sectionmark}[1]{\markright{#1}}
\newcommand{\estcab}[1]{\itshape\textcolor{marron}{\nouppercase #1}}
\fancyhead[LO]{\estcab{\rightmark}} % malo cuando no hay section ~~~ \thesection
\fancyfoot[RE]{\ornpar \\ \large \sffamily\bf \textcolor{darkgray}{\thepage ~~~ \reflectbox{\leafNE}} \hfill}
\pagestyle{fancy}
\renewcommand{\footnoterule}{\vspace{-0.5em}\noindent\textcolor{marron}{\decosix \raisebox{2.9pt}{\line(1,0){100}} \lefthand} \vspace{.5em} }
\usepackage[hang,splitrule]{footmisc}
\addtolength{\footskip}{0.5cm}
\setlength{\footnotemargin}{0.3cm}
\setlength{\footnotesep}{0.4cm}
%\newfontfamily{\bengalifontsf}[Script=Bengali]{Noto Sans Bengali}
\begin{document}
\tableofcontents
\chapter{First Chapter}
\newpage
\section{New Section}
\end{document}
Without using latexbangla package, everything is working fine. But I want to use latexbangla and the header footer should be the same as below.
How do I do it?

How to extract Highlighted Parts from PDF files

Is there any way to extract highlighted text from a PDF file programmatically? Any language is welcome. I have found several libraries with Python, Java, and also PHP but none of them do the job.
To extract highlighted parts, you can use PyMuPDF. Here is an example which works with this pdf file:
Direct download
# Based on https://stackoverflow.com/a/62859169/562769
from typing import List, Tuple
import fitz # install with 'pip install pymupdf'
def _parse_highlight(annot: fitz.Annot, wordlist: List[Tuple[float, float, float, float, str, int, int, int]]) -> str:
points = annot.vertices
quad_count = int(len(points) / 4)
sentences = []
for i in range(quad_count):
# where the highlighted part is
r = fitz.Quad(points[i * 4 : i * 4 + 4]).rect
words = [w for w in wordlist if fitz.Rect(w[:4]).intersects(r)]
sentences.append(" ".join(w[4] for w in words))
sentence = " ".join(sentences)
return sentence
def handle_page(page):
wordlist = page.get_text("words") # list of words on page
wordlist.sort(key=lambda w: (w[3], w[0])) # ascending y, then x
highlights = []
annot = page.first_annot
while annot:
if annot.type[0] == 8:
highlights.append(_parse_highlight(annot, wordlist))
annot = annot.next
return highlights
def main(filepath: str) -> List:
doc = fitz.open(filepath)
highlights = []
for page in doc:
highlights += handle_page(page)
return highlights
if __name__ == "__main__":
print(main("PDF-export-example-with-notes.pdf"))
Ok, after looking I found a solution for exporting highlighted text from a pdf to a text file. Is not very hard:
First, you highlight your text with the tool you like to use (in my case, I highlight while I'm reading on an iPad using Goodreader app).
Transfer your pdf to a computer and open it using Skim (a pdf reader, free and easy to find on the web)
On FILE, choose CONVERT NOTES and convert all the notes of your document to SKIM NOTES.
That's all: simply go to EXPORT an choose EXPORT SKIM NOTES. It will export you a list of your highlighted text. Once opened this list can be exported again to a txt format file.
Not much work to do, and the result is fantastic.