I am documenting a library that has a Python component and a JavaScript component. The overall user documentation, and the Python API documentation are in reStructuredText, processed with Sphinx. The JavaScript API is in jsdoc and is processed with jsdoc-toolkit. The principal output format will be HTML. I am new to reST, Sphinx and jsdoc.
I have set up a build system so all the generated html pages are dumped into a single directory tree. I now need to insert into the main page (generated from reST) a link to the generated Javascript documentation. This needs to be a relative link, since the docs may be located in different places on different installations. reST will automatically parse a full URL, but I can't figure out how to make it insert a relative link. Constructs like :ref: and :doc: don't seem to help, because they expect the target to be reST.
Any ideas?
Figured it out. The following inserts a relative reference to the document js/index.html:
`Javascript API <js/index.html>`_
Related
This is my first time using Ektron and i'm trying to implement Json-LD schema scripts for each page. I have 68 scripts that I need to implement that are unique for each page.
I thought I would be able to implement these scripts through meta data, but now i'm unsure. Each script is over 1000 characters, the html and meta tag types only allow 500 characters, so i'm assuming i'm in the wrong place. If anyone could shed some light it would be much appreciated.
Ektron's metadata isn't intended for large chunks of data / content. So, yes, you will find limits there.
Here are two things you might try as workarounds.
Most direct:
Use the Ektron Library. Go to the Library tab and click on the Root node and view Properties. Add an extension to allow you to upload your JSON-LD as a file. Use metadata on the content item to reference the uploaded file. Combine the two upon output.
If you want the JSON-LD to be editable within the CMS...
Gaming the platform a bit
Create a new SmartForm definition and include in it a single plain-text, multi-line field (not Rich text). This should hold your JSON-LD. Set up a folder and, if your version supports it (you didn't specify CMS version, so I will assume relatively recent), set the folder to be non-searchable so these things don't come up in site search results. Add a restriction to the folder to only allow the Smart Form definition you just created. Create your JSON-LD there using the plain-text field. You should be able to store up to 1MB.
Same as above, add your JSON-LD as text then use a reference to this item from the content you want to use it.
The metadata in this case (and possibly the library one, though I'd have to test and I don't have an Ektron environment for development anymore) will give you the Content ID for the object holding your JSON-LD. You'll have to make another API call but will give you the solution you appear to want from above.
I'm looking to create a central repository for all of our published API documentation using DocFx. I have documentation auto-generated via my build (using TFS) and published through my release (using Octopus) just fine for multiple individual sites. However, I'm wanting to pull it altogether in one location. The thinking is that through a parent site you could filter content in any of the individual sites without having to drill down into them. Do you have a recommendation on how to do this?
Also, within this same documentation repository I want to provide the capability to search by all of the meta data (project-level documentation) across the hundreds of projects in our portfolio. This will give our BA, DEV and QA teams easier access to what all our systems do. I like the "filtering" capability built into DocFx, but I'm wanting full-text search across all of the meta data. Do you have a recommendation for this functionality as well?
To change the location of the docfx output, edit the docfx.json file and specify the dest value. By default it is "dest": "_site". For more formatting guidance, reference: https://dotnet.github.io/docfx/tutorial/docfx.exe_user_manual.html.
Regarding full-text search, that is possible by simply ensuring the ExtractSearchIndex post-processor is invoked (in order to generate an index.json file of keywords) and that the global _enableSearch value is set to true in the docfx.json file. A snippet from that file would look like:
"postProcessors": [ "ExtractSearchIndex" ],
"globalMetadata": {
"_enableSearch": "true"
}
For your first question:
I think what you expect is like the .NET API Browser. The source code behind this page is not open to public, so you need create this page by yourself, through collecting xrefmap.yml from multiple sites, and extract the needed data into this page.
For your second question:
DocFX uses Luna to scan all the output files and generate an index file called index.json for later search use. In your case, you should want to limit the search scope only in the metadata you defined. This is also not supported by DocFX by default. You can also use Luna in your central place to search these meta. You can create your specific index.json for each project first, and the cental place to collect them for the search page.
I am trying to parse a page on a wikia to get additional information for a Infobox Book template that is on the page. The problem is the I can only get the template's source instead of the transformed template on the page.
I'm using the following url as a base:
http://starwars.wikia.com/api.php?format=xml&action=expandtemplates&text={{Infobox%20Book}}&generatexml=1
The documentation doesn't really tell me how to point it to a specific page and parse the transformed template from the page. Is this even possible or do I need to parse it all myself?
To expand a template with the parameters from a given page, you will have to provide those parameters. There is no way for the API to know how the template is used in different pages (it could even be used twice!).
This works:
action=expandtemplates&text={{Infobox Book|book name=Lost Tribe of the Sith: Skyborn}}
You will, of course have to keep adding all the parameters you want to parse (there are 14 in your example).
If you have templates that change automatically depending on which page they are (that is not the case here), e.g. by making use of magic words such as {{PAGENAME}}, you can add &page=Lost_Tribe_of_the_Sith:_Skyborn to your API call, to set the context the template should be expanded in.
If you to not know the parameters given, you can either:
Render the whole page with index.php?action=render&title=Lost_Tribe_of_the_Sith:_Skyborn, and parse the returned html to carve out the actual infobox
Fetch (action=query&prop=revisions) and parse the wikicode to get the parameters to the template, and supply them to the expandtemplates call
Start using an extension like Semantic MediaWiki, that allows you to treat your wiki more like a database
1 and 2 can go wrong in any number of ways, of course, as with a wiki you have, by definition, no way of knowing that the content is always entered in a consistent way.
We have some pdf as a multimedia components. And I would like to know the url before I publish the page where we are using the pdf. The component is already published.
I was trying to guess looking another examples: domain/en/multimedia/Name_pdf.pdf
But didn't work...
¿Someone knows the rule?
Thank you,
A lot depends on how you are handling the multimedia component within your templates, whether you are creating variants (probably not for PDF files) and, to a certain extent, what kind of delivery environment you are using and how it is configured.
If you are trying to output a link to your published binary from a template, you should probably use the out-of-the-box functionality by outputting syntax like
text
from your templates and then using the other default TBBs to publish the component and resolve the link for you.
However, in a default, standard 2009 environment, with normal publishing, you should find that your file is simply published to the file system at
<Publication Images Path>\<Binary Filename>
With the link resolving to
http://<Your server>/<Publication Images URL>/<Binary Filename>
Note:
In earlier versions of Tridion the binary filename was manipulated to include the TCM Id for uniqueness, but by 2009 this was no longer the case
The website of my university unfortunately does not provide feeds but they keep publishing information there that is important for me (deadlines, dates of exams etc.) as links to pdfs
in a certain section of the website.
How can I regularly scrape that section of the site and have me notified (growl, mail something alike).
Normally I would use wget to mirror it but how to extract only parts of the website?
Is there a cli tool that can extract the XHTML via XPATH or similar?
Try this:
wget --spider --server-response http://example.com
This will print the headers which might contain the "Length"-attribute. If it changes, you can notify yourself.
edit: If it changes, you can download the whole html file, grep for a pdf file or whatever you want to look for (maybe for "<div id='news'>(.*?)</div>")
Mmm... You should take a look at QueryPath. QueryPath makes easy to parse HTML. What if the HTML structure changes? What if you want specific elements of the page? QueryPath does the hard work for you. Do you like JQuery? QueryPath is like the JQuery of PHP.
See: http://www.ibm.com/developerworks/opensource/library/os-php-querypath/index.html?S_TACT=105AGX01&S_CMP=HP
See: http://querypath.org/
You might be interested in looking at Pjscrape (disclaimer: this is my project). It's a web-scraping tool built on PhantomJS, giving you full jQuery access to the page in a headless Webkit browser context. It makes it very easy to pull semi-structured data from webpages via the command line, particularly if the page you're scraping has a consistent structure for new elements.
For example, you can pull all the course titles from this course catalog with the following code:
pjs.addScraper(
// the page you're scraping
'http://www.ischool.berkeley.edu/courses/catalog',
// selector for elements you want to pull text from
'.views-row .views-field-title'
);
// suppress STDOUT logging
pjs.config('log', 'none');
Running this from the command line gives you JSON to STDOUT by default:
~> phantomjs /path/to/pjscrape.js my_script.js
["W10. Introduction to Information","24. Freshman Seminar", ...]
So it would be pretty simple to run this script on a regular basis, capture the output in a file, and then alert you when the new output doesn't match the previous scrape. You can also write your own scraper functions, so there's a lot of flexibility for more complex scraping if a simple selector won't do the trick.