Document Construction Api - api

I'm currently building a API service that accepts Input in HTTP Requests, processes the information, uses a template engine (Currently Jade) to parse the Template files and then outputs in either HTML, PDF or a Image.
I would like to have this service not be bound to a Database, as I don't see a need for it. The service has one goal, accept input and output the result in the desired format.
Currently I can't decide on how to store and read my templates, it's a new world without a database....
Do I store them in a folder such as "templates" which I read each time I want a list of templates ? But have no idea how and if file locks will cause problems ?
Any suggestions ?

Have a look at Express.js it will allow you to set up a project with a good default directory structure. By default it stores Jade templates in 'views'. There will be no problems with file locking.
One other thing I would do is separate the API service from the view rendering. At the moment I use restify for pure REST services it is specifically geared toward that use case. So the workflow would roughly look as follows
'views' folder <--> Jade templates <--> Express <--> JSON data <--> REST API

Related

How to send data with html file from Express server

Currently, I'm making two round trips to my server. The first trip is to send the html file to my client via res.sendFile(). Once that html file is loaded in the client, I need to fetch the data for that page, so I have to make a second request to the server (sometimes using an IIFE to get the data immediately on page load), where I send the data back via res.send() or res.json().
From what I've read, it's not possible to do all of this in one step, so are two round trips to the server the best way (or only way) to render an html file and its data in the client?
The only other option I know of is to use a templating engine like Handlebars or EJS, but I don't think either one can handle the complex logic I need in the client. I tried Handlebars once, and the client logic was a mess.
If you need to fill your HTML page with dynamic data there are no other options to use a template engine if you want just one trip to your server.
For me using PUG template engine (formerly Jade) was good enough.

Hybrid REST + stateless operations in an API

I'm implementing a RESTful API for what is essentially a document store, but am hitting a brick wall because I need a hybrid approach to one of the operations that can be performed on these documents.
Essentially, a user should be able to generate PDF versions of documents that are stored as JSON but also generate PDF versions of JSON strings that are passed arbitrarily (with no record in the database). The PDF reports never need to be stored anywhere, they are always generated on the fly.
My current API looks like:
/Documents
/Documents/1234
/Documents/1234?rev=4
Now, one way to implement the PDF generation would be to do:
/Documents/1234/Reports
or
/Reports/1234
But since we don't need to store PDFs (generated on the fly), both are reduced to only a GET operation, and it doesn't really act on a 'Report' object - which doesn't seem RESTful to me.
What complicates it further is that a user should be able to manually pass a JSON blob to the service and get a PDF. So something like:
/API/GeneratePDF
So does a separate stateless API make sense for this one operation? Maybe then redirect a request like /Reports/1234 to /API/GeneratePDF with the JSON blob for the 1234 document. It all seems a bit messy :)
The URL '/reports/123/' is pointing to a 'report' resource and it should not matter what backend operations will be acted on it.
When thinking about resource-url and its associated operations, the only relevant operations are "GET/PUT/POST/DELETE"
Then map the business operations (like generate PFD report) to the url+HTTP-Op+params.
Like in this case, map 'generate PDF report" to "GET /reports/123/"
use-case-1: simple get report
GET /reports/123/
return: {pdf-report}
use-case-2: customized report
GET /reports/123/
param: {"json info passed along with the get operation"
return: {pdf-report}
The the backend can detect if there are input from the client to decide what specific backend operations should be taken to generate the report.
Hope this help!

How to know the url from a pdf component? (Tridion2009)

We have some pdf as a multimedia components. And I would like to know the url before I publish the page where we are using the pdf. The component is already published.
I was trying to guess looking another examples: domain/en/multimedia/Name_pdf.pdf
But didn't work...
¿Someone knows the rule?
Thank you,
A lot depends on how you are handling the multimedia component within your templates, whether you are creating variants (probably not for PDF files) and, to a certain extent, what kind of delivery environment you are using and how it is configured.
If you are trying to output a link to your published binary from a template, you should probably use the out-of-the-box functionality by outputting syntax like
text
from your templates and then using the other default TBBs to publish the component and resolve the link for you.
However, in a default, standard 2009 environment, with normal publishing, you should find that your file is simply published to the file system at
<Publication Images Path>\<Binary Filename>
With the link resolving to
http://<Your server>/<Publication Images URL>/<Binary Filename>
Note:
In earlier versions of Tridion the binary filename was manipulated to include the TCM Id for uniqueness, but by 2009 this was no longer the case

Semantic store and entity hub

I am working on a content platform that should provide semantic features such as querying with SPARQL and providing rdf documents for the contained content.
I would be very thankful for some
clarification on the following
questions:
Did I get that right, that an entity
hub can connect several semantic
stores to a single point of access?
And if not, what is the difference
between a semantic store and an
entity hub?
What frameworks would you use to
store content documents as well as
their semantic annotation?
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
Thanks in advance,
Chris
The only Entityhub term that I know is belong to Apache Stanbol project. And here is a paragraph from the original documentation explaining what Entityhub does:
The Entityhub provides two main services. The Entityhub provides the
connection to external linked open data sites as well as using indexes
of them locally. Its services allow to manage a network of sites to
consume entity information and to manage entities locally.
Entityhub documentation:
http://incubator.apache.org/stanbol/docs/trunk/entityhub.html
Enhancer component of Apache Stanbol provides extracting external entities related with the submitted content using the linked open data sites managed by Entityhub. These enhancements of contents are formed as RDF data. Then, it is also possible to store those content items in Apache Stanbol and run SPARQL queries on top of RDF enhancements. Contenthub component of Apache Stanbol also provides faceted search functionality over the submitted content items.
Documentation of Apache Stanbol:
http://incubator.apache.org/stanbol/docs/trunk/
Access to running demos:
http://dev.iks-project.eu/
You can also ask your further questions to stanbol-dev AT incubator.apache.org.
Alternative suggestion...
Drupal 7 has in-built RDFa support for annotation and is more of a general purpose CMS than Semantic MediaWiki
In more detail...
I'm not really sure what you mean by entity hub, where are you getting that definition from or what do you mean by it?
Yes one can easily write a system that connects to multiple semantic stores, given the context of your question I assume you are referring to RDF Triple Stores?
Any decent CMS should be assigning documents some form of unique/persistent ID to documents so even if the system you go with does not support semantic annotation natively you could build your own extension for this. The extension would simply store annotations against the documents ID in whatever storage layer you chose (I'd assume a Triple Store would be appropriate) and then you can build appropriate query and presentation layers for querying and viewing this data as required.
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
Apache Stanbol
Do you want to implement a traditional CMS extended with some Semantic capabilities, or do you want to build a Semantic CMS? It could look the same, but actually both a two completely opposite approaches.
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
You can integrate Apache Stanbol with a JCR/CMIS compliant CMS like Alfresco. To get custom annotations, I suggest creating your own custom enhancement engine (maven archetype) based on your domain and adding it to the enhancement engine chain.
https://stanbol.apache.org/docs/trunk/components/enhancer/
One this is done, you can use the REST API endpoints provided by Stanbol to retrieve the results in RDF/Turtle format.

Retrieving dynamic text from a website in vb.net (VS2008)

I want to be able to retrieve dynamic data from a web page (share prices). I started out by retrieving the html code before I realised that as it is live data, the html code will be of little use. Although I am looking to capture specific data, all i wish to do is process a webpage that I specify which will return the text off that website and not the HTML code. Basically a copy and paste of the entire page would be great..
Any ideas would be really appreciated!
'Screen Scraping' by parsing HTML is so early 2000s...what I would do is read up on Amazon's Mechnical Turk. You can develop a queued architecture where you submit urls to this Mechnical Turk service. The service would automatically distribute these bits of work to users who would then do the dirty task of copying and pasting out the valuable stock quote information you require. Users around the world would anxiously await delivery of the next URL to their Mechanical Turk inbox...pinning for the opportunity to copy/paste out another share price for your application. Sure, it might take a few minutes to update your prices, but hey, they would be HAND parsed by REAL people around the globe! Just think of the possibilities!
Well, the HTML contains the text of the website, so you "just" need to parse the HTML.
EDIT: If the data is not in the HTML but loaded dynamically, the situation is different. As I can see, you have two options:
Find out how the data is loaded (i.e. read the JavaScript on the page). If it is updated via some web service, you could query the same web service in your program.
Use a web browser to get the data and then get the dynamic HTML tree of the page. Maybe the WPF Webbrowser control can help you with this, but I'm not sure since I've never done this myself.
Is it possible to find this same data provided in a ready-to-consume format rather than scraping HTML for it? It seems like there's probably public web-services for stock quotes.
For example: A quick search for "Stock price webservice" turned up http://www.webservicex.net/stockquote.asmx; an ASMX web-service that is easy to consume in .NET.
In your Visual Studio project you should be add a reference to this service via the "Add Web Reference" command; the dialog you're given varies depending on whether your project is targeting for .NET 2.0 or .NET 3.0/3.5.
I added a reference to the service named StockPriceProxy:
Public Function GetQuote(ByVal symbol As String) As String
Using quoteService As New StockPriceProxy.StockQuote
return quoteService.GetQuote(symbol)
End Using
End Function