How to convert Jupyter notebook to PDF via Latex? - pdf

I am trying to convert a Jupyter notebook to pdf via Latex using nbconvert, in order to automatically include citations to articles contained in a separate .bib file. I worked according to the tutorial/example here. Such a tutorial was suggested in the nbconvert documentation, here.
I have the following files in the same directory in which I am running the Jupyter notebook:
citations.tplx (the template to be used to include the bibliography)
references.bib (a .bib file containing the citations, taken from Google Scholar)
Inside the markdown cells, I use the following syntax to cite a work:
<cite data-cite="cortez2009modeling">(Cortez, 2009)</cite>
where such a work in the .bib file is reported as follows:
#article{cortez2009modeling,
title={Modeling wine preferences by data mining from physicochemical properties},
author={Cortez, Paulo and Cerdeira, Ant{\'o}nio and Almeida, Fernando and Matos, Telmo and Reis, Jos{\'e}},
journal={Decision support systems},
volume={47},
number={4},
pages={547--553},
year={2009},
publisher={Elsevier}
}
In a new notebook, also saved in the same location, I run the following command, also taken by the tutorial mentioned above:
%%bash
jupyter nbconvert --to latex --template citations.tplx --post pdf my_notebook.ipynb
I get a very long output, full of warnings, but basically, the error is:
ModuleNotFoundError: No module named 'pdf'
I also tried to do this according to other tutorials on the web, but even when the PDF file was indeed generated (using a slightly different nbconvert command), my citations were not captured in the text (a question mark would appear instead), and there was no bibliography at the end of the document. A warning would say there were 'problems' with Bibtex, but nothing more.
In the following, I report the complete output of the command I wrote above:
Traceback (most recent call last):
File "/opt/anaconda3/bin/jupyter-nbconvert", line 11, in <module>
sys.exit(main())
File "/opt/anaconda3/lib/python3.8/site-packages/jupyter_core/application.py", line 254, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/config/application.py", line 844, in launch_instance
app.initialize(argv)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/config/application.py", line 87, in inner
return method(app, *args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 290, in initialize
super().initialize(argv)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/config/application.py", line 87, in inner
return method(app, *args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/jupyter_core/application.py", line 225, in initialize
self.parse_command_line(argv)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/config/application.py", line 87, in inner
return method(app, *args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/config/application.py", line 713, in parse_command_line
self.update_config(self.cli_config)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/config/configurable.py", line 220, in update_config
self._load_config(config)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/config/configurable.py", line 190, in _load_config
warn(msg)
File "/opt/anaconda3/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/traitlets.py", line 1214, in hold_trait_notifications
self.notify_change(change)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/traitlets.py", line 1227, in notify_change
return self._notify_observers(change)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/traitlets.py", line 1264, in _notify_observers
c(event)
File "/opt/anaconda3/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 265, in _postprocessor_class_changed
self.postprocessor_factory = import_item(new)
File "/opt/anaconda3/lib/python3.8/site-packages/traitlets/utils/importstring.py", line 38, in import_item
return __import__(parts[0])
ModuleNotFoundError: No module named 'pdf'
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-22-c5829f9d50d0> in <module>
----> 1 get_ipython().run_cell_magic('bash', '', 'jupyter nbconvert --to latex --template citations.tplx --post pdf Orlando_Taddeo_CW.ipynb\n')
/opt/anaconda3/envs/tf/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2397 with self.builtin_trap:
2398 args = (magic_arg_s, cell)
-> 2399 result = fn(*args, **kwargs)
2400 return result
2401
/opt/anaconda3/envs/tf/lib/python3.7/site-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
140 else:
141 line = script
--> 142 return self.shebang(line, cell)
143
144 # write a basic docstring:
/opt/anaconda3/envs/tf/lib/python3.7/site-packages/decorator.py in fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
/opt/anaconda3/envs/tf/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/opt/anaconda3/envs/tf/lib/python3.7/site-packages/IPython/core/magics/script.py in shebang(self, line, cell)
243 sys.stderr.flush()
244 if args.raise_error and p.returncode!=0:
--> 245 raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
246
247 def _run_script(self, p, cell, to_close):
CalledProcessError: Command 'b'jupyter nbconvert --to latex --template citations.tplx --post pdf Orlando_Taddeo_CW.ipynb\n'' returned non-zero exit status 1.
Could anyone please shed some light on this? I really can't figure out why it does not work.
Thank you very much in advance.

Related

Multiple Callbacks and 'TypeError'?

I am trying to run a Python program that generates visuals from an audio file. I'm a bit of a beginner here, so I've just been reverse-engineering issues and incompatibilities that have come up along the way.
Now, I am faced with an error. The program runs successfully but when it attempts to write/save the output video file, it gives me multiple tracebacks and a a 'TypeError' at the very end:
Traceback (most recent call last):
File "visualize.py", line 400, in <module>
clip.write_videofile(outname,audio_codec='aac')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/moviepy/decorators.py", line 54, in requires_duration
return f(clip, *a, **k)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/moviepy/decorators.py", line 135, in use_clip_fps_by_default
return f(clip, *new_a, **new_kw)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/moviepy/decorators.py", line 22, in convert_masks_to_RGB
return f(clip, *a, **k)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/moviepy/video/VideoClip.py", line 307, in write_videofile
logger=logger)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/moviepy/video/io/ffmpeg_writer.py", line 216, in ffmpeg_write_video
ffmpeg_params=ffmpeg_params) as writer:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/moviepy/video/io/ffmpeg_writer.py", line 88, in __init__
'-r', '%0.02f' % fps,
TypeError: must be real number, not NoneType
My understanding is that each of the listed files and lines within them are giving the same error? Or does the TypeError at the end only apply to the most recent (lowest) file reference?
Would love to figure out how to resolve this. Somewhere in these file(s) a number is being referenced incorrectly, it seems.
Thanks so much for any help!!
I tried diving into the most recent (lowest) file called 'ffmpegwriter.py' and I went to the referenced line. To me, '%.02f' looked like it might have been formatted incorrectly, so I tried adding a 0 before the decimal, but same error.
Not sure where to look...

Unable to use latex in python plots - RuntimeError: LaTeX was not able to process the following string: b'lp'

I want to use latex for the labels of a figure but I get the error
RuntimeError: LaTeX was not able to process the following string:
b'lp'
Here is the full report generated by LaTeX:
and that is it (I don't see any report)
Edit : Here is the whole error I obtain:
Traceback (most recent call last):
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_qt5agg.py", line 197, in __draw_idle_agg
FigureCanvasAgg.draw(self)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 464, in draw
self.figure.draw(self.renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\artist.py", line 63, in draw_wrapper
draw(artist, renderer, *args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\figure.py", line 1144, in draw
renderer, self, dsu, self.suppressComposite)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\image.py", line 139, in _draw_list_compositing_images
a.draw(renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\artist.py", line 63, in draw_wrapper
draw(artist, renderer, *args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 2426, in draw
mimage._draw_list_compositing_images(renderer, self, dsu)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\image.py", line 139, in _draw_list_compositing_images
a.draw(renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\artist.py", line 63, in draw_wrapper
draw(artist, renderer, *args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\axis.py", line 1138, in draw
renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\axis.py", line 1078, in _get_tick_bboxes
extent = tick.label1.get_window_extent(renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\text.py", line 967, in get_window_extent
bbox, info, descent = self._get_layout(self._renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\text.py", line 353, in _get_layout
ismath=False)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 230, in get_text_width_height_descent
renderer=self)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 676, in get_text_width_height_descent
dvifile = self.make_dvi(tex, fontsize)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 423, in make_dvi
report))
RuntimeError: LaTeX was not able to process the following string:
b'lp'
Here is the full report generated by LaTeX:
Here is my code:
import numpy as np
from matplotlib import rc
import matplotlib.pyplot as plt
plt.close('all')
rc('text', usetex = True)
mu = np.linspace(0,10,100)
eta = mu**2
fig, ax = plt.subplots()
ax.plot(mu,eta,label= r'$\eta (\mu)$')
ax.set_title('Test')
ax.legend()
I installed MiKTeX, added it to the environment variables according to https://docs.alfresco.com/4.2/tasks/fot-addpath.html.
I checked the packages of MiKTeX, where I found "miktex-dvipng-bin-x64-2.9" in the "\MiKTeX excecutables" category. I deduce I have a dvipng.
I dowloaded Ghostcript, which I also added to the environment variables.
I tried compiling a latex script to pdf using texworks, and it worked just fine, from what I deduce LaTeX is properly installed.
From https://matplotlib.org/1.4.1/users/usetex.html, this is all I was supposed to need...
I tried deleting my .matplotlib/tex.cache directory
I tried writting
import matplotlib as mpl
mpl.rcParams['text.usetex']=True
instead of
from matplotlib import rc
rc('text', usetex = True)
but it yielded the same result.
Swiching
rc('text', usetex = True)
to
rc('text', usetex = False)
prevents the error, but my labels are not written with latex...
After a lot of googling, I'm short with ideas.
Could anyone help me please?
My configuration is:
- Python 3.6 (I run my code in spyder)
- MiKTeX 2.9
- Ghostscript 9.50
- Windows 10
- Edit : matplotlib 2.0.2
Edit : upgrading to matplotlib 3.1.1, I get the following error:
Traceback (most recent call last):
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 304, in _run_checked_subprocess
stderr=subprocess.STDOUT)
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\Vincent\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 209, in __init__
super(SubprocessPopen, self).__init__(*args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 992, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] Le fichier spécifié est introuvable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_qt5.py", line 505, in _draw_idle
self.draw()
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 388, in draw
self.figure.draw(self.renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\artist.py", line 38, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\figure.py", line 1709, in draw
renderer, self, artists, self.suppressComposite)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\image.py", line 135, in _draw_list_compositing_images
a.draw(renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\artist.py", line 38, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 2607, in draw
self._update_title_position(renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 2556, in _update_title_position
if title.get_window_extent(renderer).ymin < top:
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\text.py", line 890, in get_window_extent
bbox, info, descent = self._get_layout(self._renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\text.py", line 291, in _get_layout
ismath="TeX" if self.get_usetex() else False)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 201, in get_text_width_height_descent
s, fontsize, renderer=self)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 448, in get_text_width_height_descent
dvifile = self.make_dvi(tex, fontsize)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 338, in make_dvi
texfile], tex)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 308, in _run_checked_subprocess
'found'.format(command[0])) from exc
RuntimeError: Failed to process string with tex because latex could not be found
Traceback (most recent call last):
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 304, in _run_checked_subprocess
stderr=subprocess.STDOUT)
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\Vincent\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 209, in __init__
super(SubprocessPopen, self).__init__(*args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\Vincent\Anaconda3\lib\subprocess.py", line 992, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] Le fichier spécifié est introuvable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_qt5.py", line 505, in _draw_idle
self.draw()
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 388, in draw
self.figure.draw(self.renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\artist.py", line 38, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\figure.py", line 1709, in draw
renderer, self, artists, self.suppressComposite)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\image.py", line 135, in _draw_list_compositing_images
a.draw(renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\artist.py", line 38, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 2607, in draw
self._update_title_position(renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 2556, in _update_title_position
if title.get_window_extent(renderer).ymin < top:
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\text.py", line 890, in get_window_extent
bbox, info, descent = self._get_layout(self._renderer)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\text.py", line 291, in _get_layout
ismath="TeX" if self.get_usetex() else False)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 201, in get_text_width_height_descent
s, fontsize, renderer=self)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 448, in get_text_width_height_descent
dvifile = self.make_dvi(tex, fontsize)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 338, in make_dvi
texfile], tex)
File "C:\Users\Vincent\Anaconda3\lib\site-packages\matplotlib\texmanager.py", line 308, in _run_checked_subprocess
'found'.format(command[0])) from exc
RuntimeError: Failed to process string with tex because latex could not be found
For about the 5th time I unistalled and reinstalled MikTeX, and this time my problem is solved. When running my python script, three windows opened successively asking for packages installations. I attach to this answer screenshots of these three windows. After accepting to install these packages, my script worked just fine. Previously, when unistalling and reinstalling MikTeX, I already saw windows asking for installation and accepted the installations. I don't know why this time it worked whilst it didn't work before...
Anyway, my problem was solved uninstalling and reinstalling MiKTeX, running my script and accepting packages.
Window for installation of a file from type1cm
Window for installation of a file from iftex
Window for installation of a file from zhmetrics

spacy english model install is failing

windows 10, python 26 - 32 bit. vc++ 32 bit.
console as admin.
failing to install English model as instructed here
tried also German. tried to download and link it manually.
something wrong with spacy link command.
Anyone knows about this issue?
Traceback (most recent call last):
File "c:\python27\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "c:\python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "c:\python27\lib\site-packages\spacy\__main__.py", line 71, in <module>
plac.Interpreter.call(CLI)
File "c:\python27\lib\site-packages\plac_ext.py", line 1142, in call
print(out)
File "c:\python27\lib\site-packages\plac_ext.py", line 914, in __exit__
self.close(exctype, exc, tb)
File "c:\python27\lib\site-packages\plac_ext.py", line 952, in close
self._interpreter.throw(exctype, exc, tb)
File "c:\python27\lib\site-packages\plac_ext.py", line 964, in _make_interpreter
arglist = yield task
File "c:\python27\lib\site-packages\plac_ext.py", line 1139, in call
raise_(task.etype, task.exc, task.tb)
File "c:\python27\lib\site-packages\plac_ext.py", line 380, in _wrap
for value in genobj:
File "c:\python27\lib\site-packages\plac_ext.py", line 95, in gen_exc
raise_(etype, exc, tb)
File "c:\python27\lib\site-packages\plac_ext.py", line 966, in _make_interpreter
cmd, result = self.parser.consume(arglist)
File "c:\python27\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "c:\python27\lib\site-packages\spacy\__main__.py", line 45, in link
cli_link(origin, link_name, force)
File "c:\python27\lib\site-packages\spacy\cli\link.py", line 14, in link
symlink(origin, link_name, force)
File "c:\python27\lib\site-packages\spacy\cli\link.py", line 50, in symlink
link_path.symlink_to(model_path)
File "c:\python27\lib\site-packages\pathlib.py", line 1167, in symlink_to
self._accessor.symlink(target, self, target_is_directory)
TypeError: symlink() takes exactly 3 arguments (4 given)
I think it's a bug in pathlib, and has nothing to do with spacy.
You can work around it, but it's ugly.
Edit line 1167 of C:\Python27\lib\site-packages\pathlib.py. Comment it out.
Re-run python -m spacy download en
cd C:\python27\lib\site-packages
mklink /j spacy\data\en en_core_web_sm\en_core_web_sm-1.2.0

Writing to BigQuery from within a ParDo function

I would like to call a beam.io.Write(beam.io.BigQuerySink(..)) operation from within a ParDo function to generate a separate BigQuery table for each key in the PCollection (i'm using the python SDK). Here are two similar threads, which unfortunately didn't help:
1) https://stackoverflow.com/questions/31156774/about-key-grouping-with-groupbykey
2) Dynamic table name when writing to BQ from dataflow pipelines
When I execute the following code, the rows for the first key get inserted into BigQuery and then the pipeline fails with the error below. Would really appreciate any suggestions on what I'm doing wrong or any suggestions on how to fix it.
Pipeline code:
rows = p | 'read_bq_table' >> beam.io.Read(beam.io.BigQuerySource(query=query))
class par_upload(beam.DoFn):
def process(self, context):
key, value = context.element
### This block causes issues ###
value | 'write_to_bq' >> beam.io.Write(
beam.io.BigQuerySink(
'PROJECT-NAME:analytics.first_table', #will be replace by a dynamic name based on key
schema=schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
)
)
### End block ######
return [value]
### Following part works fine ###
filtered = (rows | 'filter_rows' >> beam.Filter(lambda row: row['topic'] == 'analytics')
| 'apply_projection' >> beam.Map(apply_projection, projection_fields)
| 'group_by_key' >> beam.GroupByKey()
| 'par_upload_to_bigquery' >> beam.ParDo(par_upload())
| 'flat_map' >> beam.FlatMap(lambda l: l) #this step is just for testing
)
### This part works fine if I comment out the 'write_to_bq' block above
filtered | 'write_to_bq' >> beam.io.Write(
beam.io.BigQuerySink(
'PROJECT-NAME:analytics.another_table',
schema=schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED)
)
Error message:
INFO:oauth2client.client:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Attempting refresh to obtain initial access_token
INFO:root:Writing 1 rows to PROJECT-NAME:analytics.first_table table.
INFO:root:Final: Debug counters: {'element_counts': Counter({'CreatePInput0': 1, 'write_to_bq/native_write': 1})}
ERROR:root:Error while visiting par_upload_to_bigquery
Traceback (most recent call last):
File "split_events.py", line 137, in <module>
run()
File "split_events.py", line 132, in run
p.run()
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/pipeline.py", line 159, in run
return self.runner.run(self)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/direct_runner.py", line 102, in run
super(DirectPipelineRunner, self).run(pipeline)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 98, in run
pipeline.visit(RunVisitor(self))
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/pipeline.py", line 182, in visit
self._root_transform().visit(visitor, self, visited)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/pipeline.py", line 419, in visit
part.visit(visitor, pipeline, visited)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/pipeline.py", line 422, in visit
visitor.visit_transform(self)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 93, in visit_transform
self.runner.run_transform(transform_node)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 168, in run_transform
return m(transform_node)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/direct_runner.py", line 98, in func_wrapper
func(self, pvalue, *args, **kwargs)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/direct_runner.py", line 180, in run_ParDo
runner.process(v)
File "apache_beam/runners/common.py", line 133, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:4483)
File "apache_beam/runners/common.py", line 139, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:4311)
File "apache_beam/runners/common.py", line 150, in apache_beam.runners.common.DoFnRunner.reraise_augmented (apache_beam/runners/common.c:4677)
File "apache_beam/runners/common.py", line 137, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:4245)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/typehints/typecheck.py", line 149, in process
return self.run(self.dofn.process, context, args, kwargs)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/typehints/typecheck.py", line 134, in run
result = method(context, *args, **kwargs)
File "split_events.py", line 73, in process
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 724, in __ror__
return self.transform.__ror__(pvalueish, self.label)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 445, in __ror__
return _MaterializePValues(cache).visit(result)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 105, in visit
return self._pvalue_cache.get_unwindowed_pvalue(node)
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 262, in get_unwindowed_pvalue
return [v.value for v in self.get_pvalue(pvalue)]
File "/Users/dimitri/anaconda/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 244, in get_pvalue
value_with_refcount = self._cache[self.key(pvalue)]
KeyError: "(4384177040, None) [while running 'par_upload_to_bigquery']"
Edit (after the first answer):
I didn't realise my value needs to be a PCollection.
I've changed my code to this now (which is probably very inefficient):
key_pipe = p | 'pipe_' + key >> beam.Create(value)
key_pipe | 'write_' + key >> beam.io.Write(beam.io.BigQuerySink(..))
Which now works fine locally but not with BlockingDataflowPipelineRunner :-(
The pipeline fails with the following error:
JOB_MESSAGE_ERROR: (979394c29490e588): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 474, in do_work
work_executor.execute()
File "dataflow_worker/executor.py", line 901, in dataflow_worker.executor.MapTaskExecutor.execute (dataflow_worker/executor.c:24331)
op.start()
File "dataflow_worker/executor.py", line 465, in dataflow_worker.executor.DoOperation.start (dataflow_worker/executor.c:14193)
def start(self):
File "dataflow_worker/executor.py", line 469, in dataflow_worker.executor.DoOperation.start (dataflow_worker/executor.c:13499)
fn, args, kwargs, tags_and_types, window_fn = (
ValueError: too many values to unpack (expected 5)
In the similar threads, the only suggestion to do BigQuery write operations in a ParDo was to use the BigQuery API directly, or using a client.
The code that you wrote is putting a Dataflow ParDo class beam.io.BigQuerySink() into a DoFn function. The ParDo class expects to work on a PCollection like filtered in the working code example. Which is not the case for the non-functioning code working on value.
I think the easiest option would be to take a look at the gcloud-python BigQuery function insert_data() and put this inside your ParDo.

matplotlib pgf: OSError: No such file or directory in subprocess.py

I try to use matplotlib to create a pgf file for LaTeX:
from matplotlib.pyplot import subplots
from numpy import linspace
x = linspace(0, 100, 30)
fig, ax = subplots(figsize = (10, 6))
ax.scatter(x, x)
fig.tight_layout()
fig.savefig('/home/mark/dicp/python/figure.pgf')
But I get OSError: [Errno 2] No such file or directory:
Traceback (most recent call last):
File "visualize/latex_figs.py", line 32, in <module>
fig.savefig('/home/mark/dicp/python/figure.pgf')
File "/usr/local/lib/python2.7/dist-packages/matplotlib/figure.py", line 1421, in savefig
self.canvas.print_figure(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backend_bases.py", line 2220, in print_figure
**kwargs)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backend_bases.py", line 1957, in print_pgf
return pgf.print_pgf(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_pgf.py", line 818, in print_pgf
self._print_pgf_to_fh(fh, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_pgf.py", line 797, in _print_pgf_to_fh
RendererPgf(self.figure, fh),
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_pgf.py", line 409, in __init__
self.latexManager = LatexManagerFactory.get_latex_manager()
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_pgf.py", line 223, in get_latex_manager
new_inst = LatexManager()
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_pgf.py", line 305, in __init__
cwd=self.tmpdir)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
It also generates this part of the output file:
%% [whole bunch of comments]
\begingroup%
\makeatletter%
\begin{pgfpicture}%
\pgfpathrectangle{\pgfpointorigin}{\pgfqpoint{10.000000in}{6.000000in}}%
\pgfusepath{use as bounding box}%
I do not understand what OSError: No such file or directory in subprocesses.py has to do with anything... The file I'm trying to save is writable. Am I misunderstanding something, or is this a bug I should report?
I also had this problem while trying to run the example scripts. The problem occurs where backend_pgf.py first tries to use the default LaTeX command. It seems that the PGF backend assumes that it should use xelatex by default. If the problem is the same for you as for me, then you have two options:
add the key "pgf.texsystem" : "pdflatex" (or lualatex, whatever) to your matplotlib.rcParams. For example, add the following snippet to the top of your script:
import matplotlib
pgf_with_rc_fonts = {"pgf.texsystem": "pdflatex"}
matplotlib.rcParams.update(pgf_with_rc_fonts)
ensure that you have xelatex, and that it is on your PATH, and use that as the default latex command (i.e. assuming you're on a Mac or Linux system, which xelatex should return a path).