How can I use a custom font for displaying Emojis in Google Colab using Seaborn graphs - matplotlib

I'm making a study about the use of emojis on Brazilian Twitter, I've manage to extract the data from the Twitter and made a pandas DataFrame that looks like this.
index,emojis
🚨,75
🤣,66
🇧🇷,55
😂,35
🚩,32
👏,25
👇,18
🤔,18
👏🏻,17
❤,15
The thing is that Seaborn default font does not have all the emojis, so my graph ends up looking something like this:
sns.barplot(x = data.index, y="emojis",data=data)
seaborn graph with a bad display of emojis
!wget "https://fonts.google.com/download?family=Noto%20Color%20Emoji"
!mv font_file.ttf /usr/share/fonts/truetype/
!unzip "download?family=Noto Color Emoji"
!fc-cache -f -v
#the error I get
mv: cannot stat 'font_file.ttf': No such file or directory
/usr/share/fonts: caching, new cache contents: 0 fonts, 1 dirs
/usr/share/fonts/truetype: caching, new cache contents: 0 fonts, 2 dirs
/usr/share/fonts/truetype/humor-sans: caching, new cache contents: 1 fonts, 0 dirs
/usr/share/fonts/truetype/liberation: caching, new cache contents: 16 fonts, 0 dirs
/usr/local/share/fonts: caching, new cache contents: 0 fonts, 0 dirs
/root/.local/share/fonts: skipping, no such directory
/root/.fonts: skipping, no such directory
/var/cache/fontconfig: cleaning cache directory
/root/.cache/fontconfig: not cleaning non-existent cache directory
/root/.fontconfig: not cleaning non-existent cache directory
fc-cache: succeeded
I tried this two methods that I've found here in stack overflow, The first one written above, I get an error "mv: cannot stat 'font_file.ttf': No such file or directory", that i have no idea how to solve.
!wget https://fonts.google.com/download?family=Noto%20Color%20Emoji -P /usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf
matplotlib.font_manager._rebuild()
matplotlib.rc('font', family='Noto Color Emoji')
This second method did not gave me any errors but didn't change the font, maybe I'm missing a function to define the font for the graph, as I am using seaborn for the graph but changing the font on Matplotlib, but in my understanding it should work as seaborn is built over matplotlib.
If someone can shine a light in this problem I'll be very thankful for the help as I tried multiple suggestion from Stack overflow, using vs code in my computer and using Google Colab on the cloud, I could not change the fonts to display Emojis on my graph in either way, so my last resort was to ask a question here.

Related

BWA alignment "fail to locate the index files"

This question has been asked previously, but unfortunately for me the solutions posted did not resolve my issue. I am trying to use BWA to align my ddradseq paired end reads to a reference genome, and keep running into the issue of the program throwing the error [E::bwa_idx_load_from_disk] fail to locate the index files.
This is the code I am using:
# Load the BWA module:
module load bwa
bwa mem -t 2 genome/genome.fasta \ processrad_out/sample1.1.fq.gz processrad_out/sample1.2.fq.gz \
>bwa_out/sample1.sam
My data it stored in the 'processrad_out' directory, indexed genome in the 'genome' directory, and I want the output to be stored in the 'bwa_out' directory.
I have already indexed the genome, and have tried running my script in the indexed genome directly, but still seem to have the same issue. These are the indexed files I have:
LeachsGenome.fasta
LeachsGenome.fasta.ann
LeachsGenome.fasta.fai
LeachsGenome.fasta.sa
LeachsGenome.fasta.amb
LeachsGenome.fasta.bwt
LeachsGenome.fasta.pac
Previously when I indexed I was missing the .fai file but managed to correct that issue. Despite, the program is still having difficulty locating the indexed files. I have also tried re-indexing as that seems to have worked for some people but no such luck. I know my paths are correct and I can quite figure out what the correct solution is.
I am fairly new to bioinformatics and am trying to learn as much as I can. Any suggestions are welcome and highly appriciated!

trouble with utf-8 with julia and jupyterlab

I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).

Blender Command line importing files

I will run the script on the Blender command line. All I want to do is run the same script for several files. I have completed the steps to run a background file (.blend) and run a script in Blender, but since I have just loaded one file, I can not run the script on another file.
I looked up the Blender manual, but I could not find the command to import the file.
I proceeded to creating a .blend file and running the script.
blender -b background.blend -P pythonfile.py
In addition, if possible, I would appreciate it if you could tell me how to script the camera and track axes to track to contraint (Ctrl + T -> Track to constraint).
really thank you for reading my ask.
Blender can only have one blend file open at a time, any open scripts are cleared out when a new file is opened. What you want is a loop that starts blender for each blend file using the same script file.
On *nix systems you can use a simple shell script
#!/bin/sh
for BF in $(ls *.blend)
do
blender -b ${BF} -P pythonfile.py
done
A more cross platform solution is to use python -
from glob import glob
from subprocess import call
for blendFile in glob('*.blend'):
call([ 'blender',
'-b', blendFile,
'--python', 'pythonfile.py' ])
To add a Track-to constraint to Camera pointing it at Cube -
camera = bpy.data.objects['Camera']
c = camera.constraints.new('TRACK_TO')
c.target = bpy.data.objects['Cube']
c.track_axis = 'TRACK_NEGATIVE_Z'
c.up_axis = 'UP_Y'
This is taken from my answer here which also animates the camera going around the object.
bpy.context.view_layer.objects.active = CameraObject
bpy.ops.object.constraint_add(type='TRACK_TO')
CameraObject.constraints["Track To"].target = bpy.data.objects['ObjectToTrack']

How to load a CSV file from the Mayavi GUI?

I know how to read the CSV into numpy and do it from a Python script, and that is good enough for my use case.
But since it has a GUI with data loading functionality, I was expecting it would just work for such an universal data format.
So I tried to go on the menu:
File
Load data
Open file
but when I select a simple CSV file:
i=0; while [ "$i" -lt 10 ]; do echo "$i,$((2*i)),$((4*i))"; i=$((i+1)); done > main.csv
which contains:
0,0,0
1,2,4
2,4,8
3,6,12
4,8,16
5,10,20
6,12,24
7,14,28
8,16,32
9,18,36
an error popup shows on the GUI:
No suitable reader found for file /home/ciro/main.csv
Google led me to this interesting file in the source tree: https://github.com/enthought/mayavi/blob/e2569be1096be3deecb15f8fa8581a3ae3fb77d3/mayavi/tools/data_wizards/csv_loader.py but that just looks like an example of how to do it from a script.
Tested in Mayavi 4.6.2.
From the documentation
One needs to have some data or the other loaded before a Module or Filter may be used. Mayavi supports several data file formats most notably VTK data file formats. Alternatively, mlab can be used to load data from numpy arrays. For advanced information on data structures, refer to the Data representation in Mayavi section.
I've tested importing using the GUI on a Asus Laptop Intel CoreTM i7-4510U CPU # 2.00 GHz with 8 GBs de RAM, using Windows 10, both in and out of a Python virtualenv and always got the same problem:
It all points to CSV files not being directly supported, so had to find another workaround.
My favorite was to use a virtual environment and install on it mayavi, jupyterlab, PyQt5 and Pandas.
Then, using PowerShell, start a Jupyter notebook (jupyter notebook) > Upload > Select the .csv. This imported a 1,25 GBs (153543233 rows x 3 columns) .csv in around 20s, which then became available for usage.

Convert PDF text into outlines?

Does anybody know a way to vectorize the text in a PDF document? That is, I want each letter to be a shape/outline, without any textual content. I'm using a Linux system, and open source or a non-Windows solution would be preferred.
The context: I'm trying to edit some old PDFs, for which I no longer have the fonts. I'd like to do it in Inkscape, but that will replace all the fonts with generic ones, and that's barely readable. I've also been converting back and forth using pdf2ps and ps2pdf, but the font info stays there. So when I load it into Inkscape, it still looks awful.
Any ideas? Thanks.
To achieve this, you will have to:
Split your PDF into individual pages;
Convert your PDF pages into SVG;
Edit the pages you want
Reassemble the pages
This answer will omit step 3, since that's not programmable.
Splitting the PDF
If you don't want a programmatic way to split documents, the modern way would be with using stapler. In your favorite shell:
stapler burst file.pdf
Would generate {file_1.pdf,...,file_N.pdf}, where 1...N are the PDF pages. Stapler itself uses PyPDF2 and the code for splitting a PDF file is not that complex. The following function splits a file and saves the individual pages in the current directory. (shamelessly copying from the commands.py file)
import math
import os
from PyPDF2 import PdfFileWriter, PdfFileReader
def split(filename):
with open(filename) as inputfp:
inputpdf = PdfFileReader(inputfp)
base, ext = os.path.splitext(os.path.basename(filename))
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(inputpdf.getNumPages()))),
'd',
ext
])
for page in range(inputpdf.getNumPages()):
outputpdf = PdfFileWriter()
outputpdf.addPage(inputpdf.getPage(page))
outputname = output_template % (page + 1)
with open(outputname, 'wb') as fp:
outputpdf.write(fp)
Converting the individual pages to SVG
Now to convert the PDFs to editable files, I'd probably use pdf2svg.
pdf2svg input.pdf output.svg
If we take a look at the pdf2svg.c file, we can see that the code in principle is not that complex (assuming the input filename is in the filename variable and the output file name is in the outputname variable). A minimal working example in python follows. It requires the pycairo and pypoppler libraries:
import os
import cairo
import poppler
def convert(inputname, outputname):
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
# We only have one page, since we split prior to converting. Get the page
page = pdffile.get_page(0)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
At this point you should have an SVG in which all text has been converted to paths, and will be able to edit with Inkscape without rendering issues.
Combining steps 1 and 2
You can call pdf2svg in a for loop to do that. But you would need to know the number of pages beforehand. The code below figures the number of pages and does the conversion in a single step. It requires only pycairo and pypoppler:
import os, math
import cairo
import poppler
def convert(inputname, base=None):
'''Converts a multi-page PDF to multiple SVG files.
:param inputname: Name of the PDF to be converted
:param base: Base name for the SVG files (optional)
'''
if base is None:
base, ext = os.path.splitext(os.path.basename(inputname))
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
pages = pdffile.get_n_pages()
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(pages))),
'd',
'.svg'
])
# Iterate over all pages
for nthpage in range(pages):
page = pdffile.get_page(nthpage)
# Output file name based on template
outputname = output_template % (nthpage + 1)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
# Free some memory
surface.finish()
Assembling the SVGs into a single PDF
To reassemble you can use the pair inkscape / stapler to convert the files manually. But it is not hard to write code that does this. The code below uses rsvg and cairo. To convert from SVG and merge everything into a single PDF:
import rsvg
import cairo
def convert_merge(inputfiles, outputname):
# We have to create a PDF surface and inform a size. The size is
# irrelevant, though, as we will define the sizes of each page
# individually.
outputsurface = cairo.PDFSurface(outputname, 1, 1)
outputcontext = cairo.Context(outputsurface)
for inputfile in inputfiles:
# Open the SVG
svg = rsvg.Handle(file=inputfile)
# Set the size of the page itself
outputsurface.set_size(svg.props.width, svg.props.height)
# Draw on the PDF
svg.render_cairo(outputcontext)
# Finish the page and start a new one
outputcontext.show_page()
# Free some memory
outputsurface.finish()
PS: It should be possible to use the command pdftocairo, but it doesn't seem to call render_for_printing(), which makes the output SVG maintain the font information.
I'm afraid to vectorize the PDFs you would still need the original fonts (or a lot of work).
Some possibilities that come to mind:
dump the uncompressed PDF with pdftk and discover what the font names are, then look for them on FontMonster or other font service.
use some online font recognition service to get a close match with your font, in order to preserve kerning (I guess kerning and alignment are what's making your text unreadable)
try replacing the fonts manually (again pdftk to convert the PDF to a PDF which is editable with sed. This editing will break the PDF, but pdftk will then be able to recompress the damaged PDF to a useable one).
Here's what you really want - font substitution. You want some code/app to be able to go through the file and make appropriate changes to the embedded fonts.
This task is doable and is anywhere from easy to non-trivial. It's easy when you have a font that matches the metrics of the font in the file and the encoding used for the font is sane. You could probably do this with iText or DotPdf (the latter is not free beyond the evaluation, and is my company's product). If you modified pdf2ps, you could probably manage changing the fonts on the way through too.
If the fonts used in the file are font subsets that have creative reencoding, then you are in hell and will likely have all manner of pain doing the change. Here's why:
PostScript was designed at a point when there was no Unicode. Adobe used a single byte for characters and whenever you rendered any string, the glyph to draw was taken from a 256 entry table called the encoding vector. If a standard encoding didn't have what you wanted, you were encouraged to make fonts on the fly based on the standard font that differed only in encoding.
When Adobe created Acrobat, they wanted to make transition from PostScript as easy as possible so that font mechanism was modeled. When the ability to embed fonts into PDFs was added, it was clear that this would bloat the files, so PDF also included the ability to have font subsets. Font subsets are made by taking an existing font and removing all the glyphs that won't be used and re-encoding it into the PDF. The may be no standard relationship between the encoding vector and the code points in the file - all those may be changed. Instead, there may be an embedded PostScript function /ToUnicode which will translate encoded characters to a Unicode representation.
So yeah, non-trivial.
For the folks who come after me:
The best solutions I found were to use Evince to print as SVG, or to use the pdf2svg program that's accessible via Synaptic on Mint. However, Inkscape wasn't able to cope with the resulting SVGs--it entered an infinite loop with the error message:
File display/nr-arena-item.cpp line 323 (?): Assertion item->state & NR_ARENA_ITEM_STATE_BBOX failed
I'm giving up this quest for now, but maybe I'll try again in a year or two. In the meantime, maybe one of these solutions will work for you.