I'm trying to follow some examples on Dask's website for Bags and DataFrames. The examples require me to use datasets: dask.datasets but I didn't get datasets with my installation as far as I can tell, i.e. I get an error telling me dask has no attribute datasets, and I don't see it in the dask folder.
I can't find help using Google or Dask's website. Where is datasets?
https://examples.dask.org/bag.html
https://examples.dask.org/dataframe.html
Thanks in advance.
dask.datasets was introduced in version 0.18.2, so check your version to make sure you have 0.18.2 or later. To install the latest release (as of today), run:
pip install dask==2.10.1
Related
I'm trying to use vincent package to visualize my data (in pandas) in jupyter notebook, but have trouble in initial attempt ,here is the code I use (copied from the http://wrobstory.github.io/2013/10/mapping-data-python.html):
import vincent
import pandas
world_topo=r'world-countries.topo.json'
geo_data = [{'name': 'countries',
'url': world_topo,
'feature': 'world-countries'}]
vis = vincent.Map(geo_data=geo_data, scale=200)
vis.to_json('vega.json')
vis.display()
After I ran the code, nothing was displayed. I checked the type of the vis:
vincent.charts.Map
I'm not sure how to proceed here, I appreciate any input on this problem.
Not sure at which point of implementation of this you are.
Assuming you just used pip to install vincent and tried the code in PY IDLE , you might be missing 2 important steps:
AFIK vincent only generates jsons to be presented using Vega via Jupyter notebook.
To render with Vega You will need to install:
1) Jupyter and dependencies
2) Vega and dependencies
I was able to do so using these instructions.
Once jupiter launched, a window opens in the browser, I had to choose 'Python3' under 'new', and put code in the prompt on that page.
Alternately you can use this online Vega renderer. Please also see Vega docs
Note that it seems that vincent is not the latest technology for that purpose, their page points to Altair
Also, I noticed that the json that is generated in 'vega.json' from the code you posted, using the original data, does not render anywhere. That's also an issue, probably happens because it uses outdated format, but I am not sure.
I have limited experience with this technology but I was able to get graphs to render, specifically this, and it is also how it looked for me.
I know that this post is old but I found your error and I thought I would answer here to help future users of vincent as it has worked beautifully for me. I am working with the anaconda version of vincent and jupyter notebook.
First, you have to initialize vincent in your notebook
import vincent
vincent.core.initialize_notebook()
and your next problem is that your URL isn't actually pointing anywhere. For the world map topography you need:
world_topo="https://raw.githubusercontent.com/wrobstory/vincent_map_data/master/world-countries.topo.json"
A decent map printed out for me with those two exceptions.
The Python interpreter has been selected:
We can see that the pyspark were available (via pip) and visible to that python interpreter:
However the python interpreter does not recognize pyspark package:
pyspark is the only package that seems to suffer from this issue: pandas, numpy, sklearn etc all work. So what is different about pyspark ?
While the following is not really an answer to the original question, it is a middling - and only partial - workaround.
We need to add several environment variables to the Run configuration:
In particular the SPARK_HOME, PYTHON_SUBMIT_ARGS (=pyspark-shell), and PYTHONPATH are needed.
This is inconvenient to need to set for every pyspark run configuration .. but presents a last resort.
I am new to Pandas, trying to learn the basics from lecture videos. In one of these the presenter demonstrates that one can call help on methods using ??.
For example if I have loaded a dataframe df then typing df.getitem?? should print the docstring as well as the source code to the console. This would be really great to have but it doesn't work for me! I tried different variants of the command and also tried to find a comment online on this, without success.
What do I need to type in order to retrieve the docstring as well as the source code of a Pandas method? Thanks a lot for your help !
(I am using Python 3.5 and PyCharm in case that makes a difference)
I believe that your lecturer was using ipython as this does support dynamic object information. For instance this is the output in ipython when you do df.__getitem__?? you see the following:
I strongly recommend ipython for interactive python development, you'll find a lot of devs using this for data exploration and analysis, the workbook is really useful for saving your commands and the output
I am increasingly irritated and frustrated by the Tensorflow documentation. I searched on google for documentation regarding
tf.reshape
I'm getting directed to a generic page like here. I want to see the details of tf.reshape and not the entirety of the documentation.
Am I doing something wrong here?
Do not Google about Tensorflow documentation, use the TensorFlow Python reference documentation and ctrl + f
The probably fastest way is to use the Tf documentation is:
http://devdocs.io/tensorflow~python/
Just type tf.reshape and you are done.
which can be also used offline and automatically updates the docs.
edit: even typing only res shows you the documentation.
Update for posterity:
With the new TensorFlow, the website is now indexed with Google, and it should also soon be indexed by other search engines.
I would suggest you use the GitHub repo as your documentation instead. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/g3doc/api_docs/python/functions_and_classes
For example tf.reshape is in a single Markdown file https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard0/tf.reshape.md
To search for the document you want, you could use the GitHub search under that functions_and_classes folder.
An example is
tf.reshape() path:tensorflow/g3doc/api_docs/python/functions_and_classes language:Markdown
https://github.com/tensorflow/tensorflow/search?utf8=✓&q=tf.reshape%28%29+path%3Atensorflow%2Fg3doc%2Fapi_docs%2Fpython%2Ffunctions_and_classes+language%3AMarkdown&type=Code
which search for tf.reshape() under the documentation folder.
I use the non-official Dash/Zeal docset for TensorFlow:
https://github.com/ppwwyyxx/dash-docset-tensorflow
It is a very convenient way of browsing the TensorFlow documentation offline and it solves the problem you are describing.
Is this what you are looking for? Using the search functionality of the browser helped me find it.
I suppose that you have installed tensorflow in your computer and that you know the name of function that you may want to use.
So if you use some Python IDE, I think you can directly jump to the declaration or definition of this function and see the usage and explanation. That is the same documentation as online (although for some functions it is not very clear).
You can use the url for tensorflow documentation and add what you want to search..
The base url is:
https://www.tensorflow.org/api_docs/python/tf/
You can add what_ever_you_want_to_search after the /
Since Tensorflow r1.1 a search on google for items like 'tf.shape' now lists the appropriate page at the top of the search results.
This didn't work back in r0.10 and r0.11, maybe because there were many markdown formatting issues in the Tensorflow docs themselves.
Since you tf is developing best way is to go through the tf API. And it's good if you can follow these slides in http://web.stanford.edu/class/cs20si/
My question is: How can I use my up to date pandas installation to build the pandas documentation?
I read the pandas/doc/README.rst, which says to navigate to the pandas/doc directory and run python make.py html. Unfortunately, this does not work for me as it seems to require having first done an inplace build of pandas. My (windows) computer does not have the necessary prerequisites for doing a development build, and it seems to be an unnecessary burden when I just want to add some notes that improve the documentation.
The background for why I am asking this question is that earlier this week I posted a SO question about pandas hdf5 output. In discussing the answer with Jeff, he encouraged me to add some commentary to the documentation. So I forked the pandas repository and began to think about how I would add to the documentation. I am not interested in setting up a development build of the complete pandas installation. I would like to be able to modify the documentation, build it, and see my changes before submitting a pull request. Is there a reasonable way to do this?
Thanks!
Here are the pandas contributing guidelines & howto's.