Latest (master/snapshot) Spark documentation - either online or run locally [closed] - documentation

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Is there a location where the latest spark documentation has been built and is available?
For example the 1.3.0 release candidate branch was cut five days ago but it is not available from the apache site - the newest is the already-in-production 1.2.0.
Even better would be the output of an Amplab Jenkins build. But maybe someone just publishes it regularly on a publicly accessible location?
Alternatively what is the procedure to generate html from the Spark markdown's? I can easily put up a local webserver to serve them.

Nightly publication of documentation snapshots and build artifacts is on the Spark roadmap; see https://issues.apache.org/jira/browse/SPARK-1517.

For the SECOND part of my question - how to generate the docs locally - the docs/README.md does have instructions.
The result is shown here;
And here we are (notice localhost:4000 and Spark version 1.3.0 - which is not released yet)
The instructions are copied here:
The markdown code can be compiled to HTML using the Jekyll
tool. Jekyll and a few dependencies must be
installed for this to work. We recommend installing via the Ruby Gem
dependency manager. Since the exact HTML output varies between
versions of Jekyll and its dependencies, we list specific versions
here in some cases:
$ sudo gem install jekyll
$ sudo gem install jekyll-redirect-from
Execute jekyll from the docs/ directory. Compiling the site with
Jekyll will create a directory called _site containing index.html as
well as the rest of the compiled files.
You can modify the default Jekyll build as follows:
# Skip generating API docs (which takes a while)
$ SKIP_API=1 jekyll build
# Serve content locally on port 4000
$ jekyll serve --watch
# Build the site with extra features used on the live page
$ PRODUCTION=1 jekyll build
Pygments
We also use pygments (http://pygments.org) for syntax highlighting in
documentation markdown pages, so you will also need to install that
(it requires Python) by running sudo pip install Pygments.
To mark a block of code in your markdown to be syntax highlighted by
jekyll during the compile phase, use the following sytax:
{% highlight scala %}
// Your scala code goes here, you can replace scala with many other
// supported languages too.
{% endhighlight %}
Sphinx
We use Sphinx to generate Python API docs, so you will need to install
it by running sudo pip install sphinx.

Related

Possible to publish docusaurus cross-repo from multiple-repositories, to aggregate docs together?

My question: Are there any docusaurus features out of the box (beyond https://github.com/facebook/docusaurus/pull/764) that will make the following easier? (I've asked this here because their github question template tells me issues of that type will be shut, and to ask them over here instead).
In my company we have several different repositories containing documentation in markdown and also markdown generated from source-code documentation from a variety of different coding languages.
I would like to explore using docusaurus to define a central site, but pull in documentation from a number of different repositories.
I'd like to do that:
to get a centralised search index
to aid discoverability
to get a centrally owned consistent theme/UX
to publish onwards into confluence so that non-technical users can find and browse content if that becomes the company policy to use ( :( )
to retain all the advantages of docs-close-to-code
This is the structure that docusaurus expects:
docs/ # all documentation should be placed here
website/
blog/
build/ # on yarn run build
core/
Footer.js
package.json
pages/
sidebars.json
siteConfig.js
static/
and this is the structure of published website that I'd like to end up with:
/v1/products/{product}/{version}/{language}/{content as from docs/}
# e.g.
/v1/products/spanner/{version}/en-GB/readme.html
/v1/internal/{gh-org}/{gh-repo}/{language}/{content as from docs/}
#e.g.
/v1/my-org/my-repo/{version}/en-GB/readme.html
/v1/my-org/my-repo/{version}/en-GB/proto-generated.html
(v1 is there because I predict I'll have forgotten something, and it lets me hedge against that and make later breaking-change redirects easier)
and I think therefore this is the intermediate structure I'll need to aggregate things into:
docs/
product/
language/
prose|generated-lang
gh-org/
repo/
language/
prose|generated-lang
website/
blog/
product/
language/
prose|generated-lang
gh-org/
repo/
language/
prose|generated-lang
core/
Footer.js
package.json
pages/
product/
language/
prose|generated-lang
gh-org/
repo/
language/
prose|generated-lang
sidebars.json
siteConfig.js
static/
product/
language/
prose|generated-lang
gh-org/
repo/
language/
prose|generated-lang
... does that hang together?
I can git clone via bash or submodules quite readily to arrange this; that's not particularly an issue. I want to know if there are things that exist already that will allow me to avoid needing to do that - e.g. native features of docs-site tools, bazel rules, whatever.
If you don't require a single page app and don't need React (docusaurus mentions this here), you can accomplish this using MkDocs as your static site generator and the multirepo plugin. Below are the steps to get it all setup. I assume you have Python installed and you created a Python venv.
python -m pip install git+https://github.com/jdoiro3/mkdocs-multirepo-plugin
mkdocs new my-project
cd my-project
Add the below to your newly created mkdocs.yml. This will configure the plugin.
plugins:
- multirepo:
repos:
- section: Repo1
import_url: {Repo1 url}
- section: Repo2
import_url: {Repo2 url}
- section: Repo3
import_url: {Repo3 url}
Now, you can run mkdocs serve or mkdocs build, which will build a static site with all the documentation in one site.
This will:
get a centralised search index to aid discoverability
get a centrally owned consistent theme/UX (I suggest using Material for MkDocs)
retain all the advantages of docs-close-to-code
A similar plugin could probably be written for docusaurus.
You can use a script to pull those md files, put them in the right location and then build docusaurus. You can do this with Github's actions upon any change to one of your source repos automatically
Docusaurus has the support of multi-instance. We have to use #docusaurus/plugin-content-docs plugin. Read more about it here https://docusaurus.io/docs/docs-multi-instance.

How can I add an external, third-party dependency to a perl6 project?

Either I've missed it or there's no clear information about that topic.
Where should I look for Perl 6 libraries? CPAN.org? Or only http://modules.perl6.org?
When I've chosen one, how can I add it to my Perl 6 project?
If I find it on GitHub, how can I add it to my Perl 6 project?
Please make sure to read #smonff's answer as well for responses to questions 2 and 3.
Where should I look for perl 6 libraries?
modules.perl6.org.
When I've chosen one, how can I add it to my perl 6 project?
Use zef to install it on your local system.
Read the modules doc page for directions on useing a module in your project.
If I find it [somewhere], how can I add it to my perl 6 project?
If zef can see it (and zef will usually be able to see a module if its repo is listed at modules.perl6.org) then zef should be able to install it. If not, contact the author or ask about it on #perl6.
As an answer to point 2) and 3) , you can take a look at 6pm. It's idea is to be NPM for Perl6. It could also be compared to Perl5's Carton. 6pm works over Zef.
$ 6pm init
# Install dependencies to ./perl6-modules and add it to META6.json
$ 6pm install Test::Meta --save
# Run a file using the local dependencies
$ 6pm exec-file test.p6
# Make your code always use 6pm by making it "use SixPM;"
$ perl6 test.p6
See the full documentation for more information.

Sphinx PDF output using latexpdf

I am trying to build Sphinx doc output as PDF rather than HTML. I can only use the tools which come with Sphinx, i.e. I cannot download additionl tools like rst2pdf. I have tried using 'make latexpdf', per the Sphinx documentation, which states it will produce pdf in addition to the .tex files. However I am only getting .tex. What am I missing? The Sphinx documentation states that PDF files will be produced.
Sphinx uses Latex to export the documentation as a PDF file.
Thus one needs the basic Latex dependencies used to write a pdf on the system.
For example, on a system running Ubuntu 16.04, they can be downloaded and installed by :
apt-get install texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended
If running Sphinx 1.6 or above on GNU/Linux or OSX, you may also need the latexmk package.
Reference: sphinx.builders.latex.LaTeXBuilder documentation.
After installing the above packages, running make latexpdf in the sphinx project directory generates the documentation output as the PDF file ./_build/latex/<sphinx-project-name>.pdf>
Note: In the current scenario where you do not see a PDF file on your system after running make latexpdf, check the output of the command for any errors regarding missing latex tools/files. Use the system package manager to identify the missing packages and install them.

Locating Apache source code for mod_perl on Mac

I get the following error in Terminal when I try to install mod_perl:
Please tell me where I can find your apache src
[../apache_x.x/src]
I've tried using cpan > install mod_perl(even forcing v. 2) AND I've tried just downloading it, and doing $ perl Makefile but they both lead to the same error.
I'm trying to follow steps from http://bulknews.net/lib/mod_perl_guide/install.html or Oreilly's CGI Programming with Perl but the site says:
The first thing first is to download the Apache source code and unpack it into a directory -- the name of which you will need very soon
Mac comes with Apache, which is why I don't want to download it. But how can I find the apache src???
Update: Haven't checked, but did find apache2 folder in ~/private/var/log
Additional Info --- separate locations of mod_perl files:
I have an unzipped folder: mod_perl-1.31 in my ~/Downloads folder. (for manual install)
I found tar.gz files of mod_perl -1 and -2 in ~/G/GO/GOZER/mod_perl-1.31.tar.gz (or 2.04) (for cpan)
Should I delete these?
Let me know if there is any other info required to solve this, or if I somehow missed a post with this same question. Thanks a lot.
It's quite possible that Mac OS X doesn't ship with the apache source code (I'll be damned if I can find it.) I can find no references to it online or on my machine.
I am going to ignore the built-in Apache installation and install my own. This article discusses PHP and Apache on Mac OS X but I'll also be using mod_perl on my system and will adjust as necessary: http://www.phpied.com/installing-php-and-apache-on-mac-osx-that-was-pretty-easy/
Install it dynamically as a dso.
https://perl.apache.org/docs/2.0/user/install/install.html#Dynamic_mod_perl

How to keep synchronized, per-version documentation?

I am working on a small toy project who is getting more and more releases. Until now, the documentation was just a set of pages in the wordpress blog I setup for the project. However, as time passes, new releases are out and I should update the online documentation to match the most recent release.
Unfortunately, if I do so, the docs for the previous releases will "disappear" as my doc pages are updated to the most recent version, therefore I decided to include the documentation in the release package and to keep the most recent documentation available online as a web page as well.
A trivial idea would be to wget the current docs from the wordpress pages, save them into the svn and therefore into the release package, repeating the procedure at every new release. Unfortunately, the HTML I get must be hacked by hand to fix the links (or I should hack wordpress to use BASE so that the HTML code is easily relocatable, something I don't want to do).
How should I handle the requirements of having at the same time:
user-browsable documentation for the proper version included in the downloadable package
most recent documentation available online (and properly styled with my web theme)
keep synchronized between the svn and the actual online contents (in wordpress, or something else that fits nicely with my wordpress setup)
easy to use
Thanks
Edit: started a bounty to see if I can lure more answers. I think this is a quite important issue, and it would be nice to have multiple hints and opinions for future readers.
I would check your pages into SVN, and then have your webserver update from its local SVN working copy when you're ready to release. Put everything into SVN--wordpress, CSS, HTML, etc.
WGet can convert all the links in the document for you. See the convert-links option:
http://www.gnu.org/software/wget/manual/html_node/Advanced-Usage.html
Using this in conjuction with the other methods could yield a solution.
I think there are two problems to be solved here
how and where to keep the documentation aligned with the code
where to publish the documentation
For 1 i think it's best to:
keep the documentation in a repository (SVN or git or whatever you already use for the code) as a set of files, instead of in a db as it is easier to keep a history of changes (an possibly to stay in par with the code releases
use an approach where the documentation is generated from a set of source files (you'd keep the sources in the repository) from which the html files for the distribution package or for publishing on the web are generated. The two could possibly differ, as on the web you'd need to keep some version information (in the URL) that you don't need when packaging a single release.
To do "2" there are several tools that may generate a static site. One of them is Jekyll it's in ruby and looks quite complete and customizable.
Assuming that you use a tool like jekyll and keep the files and source in SVN you might setup your repo in this way:
repo/
tags/
rel1.0/
source/
documentation/
rel2.0/
source/
documentation/
rel3.0/
source/
documentation/
trunk/
source/
documentation/
That is:
You keep the current documentation beside the source in the trunk
When you do a release you create a tag for the release
you configure your documentation generator to generate documentation for each of the repo/tags//documentation directory such that the documentation for each release is put in documentation_site/ directory
So to publish the documentation (point 2 above):
you copy on the server the contents of the documentation_site directory, putting it in the same base dir of your wordpress install or linking from that, such that each release doc can be accessed as: http://yoursite/project/docs/relXX/
you create a link to the current release documentation such that it can always be reached as http://yoursite/project/docs/current
The trick here is to publish the documentation always under a proper release identifier (in the URL, on the filesystem) and use a link (or a redirect) to make sure that the "current documentation" on the web server points to the current release.
I have seen some programs use help & manual. But I am a Mac user and I have no experience with it to know if it's any good. I'm looking for a solution myself for Mac.
For my own projects, if that were a need, I would create a sub-dir for the documentation, and have all the files refer from the known-base of there relatively. For example,
index.html -- refers to images/example.jpg
README
-- subdirs....
images/example.jpg
section/index.html -- links back to '../index.html',
-- refers to ../images/example.jpg
If the docs are included in the SVN/tarball download, then they are readable as-is. If they are generated from some original files, they would be pre-generated for a downloadable version.
Archive versions of the documentation can be unpacked/generated and placed into named directorys (eg docs/v1.05/)
Its a simple PHP script that can be written to get a list the subdirs of the /docs/ directory from the local disk and display a list, and highlighting the most recent, for example.