What's the meaning of 'soup' in jsoup and Beautiful Soup? - beautifulsoup

What's the meaning of "soup" in jsoup and Beautiful Soup, and why it is called "soup"?

It's BeautifulSoup, and is named after so-called 'tag soup', which refers to
"syntactically or structurally incorrect HTML written for a web page", from the Wikipedia definition.
jsoup is the Java version of Beautiful Soup.

According to wiki "Beautiful Soup is a Python library for parsing HTML documents (including having malformed markup, i.e. non-closed tags, so named after Tag soup)."
Those were named after Tag soup
Reference : http://en.wikipedia.org/wiki/Beautiful_Soup

Beautiful Soup is used for web-scraping and a great tool for extracting information from large unstructured data. As a Python library used for pulling data from HTML, XML, and other markup language files, Beautiful Soup can extract articles and content and turn it into a Python list or dictionary.

Related

Beautiful Soup XML parsing safety

How dangerous is it to parse untrusted xml with Beautiful Soup?
defusedxml seems to be the go-to module for parsing untrusted xml, however it chokes on malformed xml. I want the leniency of Beautiful Soup, but is it as trustworthy as defusedxml?
List of xml vulnerabilities, for reference:
https://docs.python.org/3.8/library/xml.html#xml-vulnerabilities

XSL-FO Stylesheet template to generate a PDF manual

I want to create a user manual in A4 format for my programming tool, much like the example here. Is there any clean, well formatted open-src documentation XSL-FO Stylesheet that I can use as my template? It should have support to include
tables
code samples
images
You can try to use DocBook, powerful markup language for technical documentation.
For example, Special DocBook features, user-configurable parameters for XSL-FO output.
For using out of the box just apply docbook/fo/docbook.xsl to your valid docbook xml file.

Pdf generators from an xml template?

Are there any PDF generators out there be it commercial or open source which can be used for research purposes? That i can use such as pdfnow.com or any standalone desktop apps which allow me to generate a PDF from a XML template. I have tried researching but there is allot of ambiguity going around.
Applidok is generating PDF based on an original (raw) PDF, a template definition and dynamic/user data (e.g. from a form): http://go.applidok.com/en/howitworks.gz.html
Template format there is JSON, not XML, but approach is the same.

How to create product catalogue in pdf

In this case I have XML data source and external images files whole together representing products catalogue. The basic structure of XML document is following:
categories
subcategories
products
I'm looking for a tool to convert described data source to pdf document, preferably with basic navigation functionality and hierarchical structure. Probably I can do it writing XSLT stylesheet, or writing code in some script language for generating TEX document. Can anyone provide any good LaTeX style for product catalogue or open source tool for generating pdf catalogues?
I suggest you the famous iText Java pdf library generation.
You can load data from XML and generate any type of PDF.
You have to write Java classes to create these type of documents.
I'd use XSL-FO if I thought a transformation would work or iText if I was writing a Java app and expressing it in code worked better.

dojo js library + jsdoc -> how to document the code?

I'd love to ask you how do the guys developing dojo create the documentation?
From nightly builds you can get the uncompressed js files with all the comments, and I'm sure there is some kind documenting script that will generate some html or xml out of it.
I guess they use jsdoc as this can be found in their utils folder, but I have no idea on how to use it. jsDoc toolkit uses different /**commenting**/ notations than the original dojo files.
Thanks for all your help
It's all done with a custom PHP parser and Drupal. If you look in util/docscripts/README and util/jsdoc/INSTALL you can get all the gory details about how to generate the docs.
It's different than jsdoc-toolkit or JSDoc (as youv'e discovered).
FWIW, I'm using jsdoc-toolkit as it's much easier to generate static HTML and there's lots of documentation about the tags on the google code page.
Also, just to be clear, I don't develop dojo itself. I just use it a lot at work.
There are two parts to the "dojo jsdoc" process. There is a parser, written in PHP, which generates xml and/or json of the entirety of listed namespaces (defined in util/docscripts/modules, so you can add your own namespaces. There are basic usage instructions atop the file "generate.php") and a Drupal part called "jsdoc" which installs as a drupal module/plugin/whatever.
The Drupal aspect of it is just Dojo's basic view of this data. A well-crafted XSLT or something to iterate over the json and produce html would work just the same, though neither of these are provided by default (would love a contribution!). I shy away from the Drupal bit myself, though it has been running on api.dojotoolkit.org for some time now.
The doc parser is exposed so that you may use its inspection capabilities to write your own custom output as well. I use it to generate the Komodo .cix code completion in a [rather sloppy] PHP file util/docscripts/makeCix.php, which dumps information as found into an XML doc crafted to match the spec there. This could be modified to generate any kind of output you chose with a little finagling.
The doc syntax is all defined on the style guideline page:
http://dojotoolkit.org/reference-guide/developer/styleguide.html