Get output of a template call in a page from MediaWiki API - api

I am trying to parse a page on a wikia to get additional information for a Infobox Book template that is on the page. The problem is the I can only get the template's source instead of the transformed template on the page.
I'm using the following url as a base:
http://starwars.wikia.com/api.php?format=xml&action=expandtemplates&text={{Infobox%20Book}}&generatexml=1
The documentation doesn't really tell me how to point it to a specific page and parse the transformed template from the page. Is this even possible or do I need to parse it all myself?

To expand a template with the parameters from a given page, you will have to provide those parameters. There is no way for the API to know how the template is used in different pages (it could even be used twice!).
This works:
action=expandtemplates&text={{Infobox Book|book name=Lost Tribe of the Sith: Skyborn}}
You will, of course have to keep adding all the parameters you want to parse (there are 14 in your example).
If you have templates that change automatically depending on which page they are (that is not the case here), e.g. by making use of magic words such as {{PAGENAME}}, you can add &page=Lost_Tribe_of_the_Sith:_Skyborn to your API call, to set the context the template should be expanded in.
If you to not know the parameters given, you can either:
Render the whole page with index.php?action=render&title=Lost_Tribe_of_the_Sith:_Skyborn, and parse the returned html to carve out the actual infobox
Fetch (action=query&prop=revisions) and parse the wikicode to get the parameters to the template, and supply them to the expandtemplates call
Start using an extension like Semantic MediaWiki, that allows you to treat your wiki more like a database
1 and 2 can go wrong in any number of ways, of course, as with a wiki you have, by definition, no way of knowing that the content is always entered in a consistent way.

Related

When can I trust my translated content in Vue?

I'd like to verify my understanding of v-html and XSS concerns when using Vue I18n.
We have a few classes of content:
User supplied - user adds text to an input and the data will be display as-is on a page
Translator supplied - a translator using a CMS is adding content to translation files and we then show those values
Engineer supplied - an engineer is adding html to a translation file
#1 is outright untrustworthy
#2 could be trustworthy, but would require review (we will have generated translation files that will get added to a PR for the normal code review & QA process)
#3 is just trustworthy? Is there a difference between an engineer adding html in a template vs a translation file. Or is the mere fact that html could be in a translation file open us up to some sort of exploit?
There is a section on the Vue I18n documentation that says,
...if you configure the tag in a locale message, there is a
possibility of XSS vulnerabilities due to localizing with
v-html="$t('term')".
Is there an inherent XSS risk when using "Trusted Content" in a client side translation file?
Edit: After thinking about this it becomes dicey real quick. If we allow v-html on a section of translation because we know the say a hyperlink is ok, but we also display something else like company_name there could be bad content in there.
I did a little checking and it looks like Rails handles this differently. Even if a translation is marked as _html, the values supplied are still escaped, just the content itself is allowed to have html.
Example:
with name = "<b>Name</b>"
Rails -> foo_html: "hi {name} <u>underlined text</u>"
results in underlined text, but the name is not bold and the displayed content still has <b> around it (it ignores the html inside a supplied argument)
Vue I18n -> with v-html -> foo: "hi {name} <u>underlined text</u>"
the text is underlined, and the name is bolded, which is not safe
You are asking several questions
As long as content from #2 is audited for nefarious content, is it safe to use as raw html?
Yes and no. In theory, if you check every single message, and know the context of every single message, and never have time constraints so people check every single message before each release, you are fine. In practice, people cut corners, or do not know that a particular message will be inserted as html, or do not understand how some string might be malformed html, but get converted by the browser to valid html that is actually nefarious. Someone might get access to your CMS and change a translation string that you didn't expect to be changed. Someone might forge a form submission in your cms if it is not configured right by tricking an employee to visit an url.
Or is the mere fact that html could be in a translation file open us up to some sort of exploit?
v-html is just that. It places html unfiltered in your document.
What is that possibility, how does it work?
v-html with translated strings creates unnecessary risks and extra overhead by requiring a copywriter with extensive technical knowledge to check every single translation message, where instead you could have a copywriter without any technical knowledge do that. The only thing you need to do is use the pattern as outlined in the documentation, which allows anyone to change the translatable bits (which will be escaped), while keeping the html in your source control.

URL Query Parameter

I am a novice to APIs and I am aware of the major kind of path in a rest API: path such as www.example.com/carsand query parameters such as www.example.com/cars?color=blue.
I just visit an e-commerce website and I am confused about the current path. I selected the category iphone-8 and got that url: https://www.example.fr/iphone-8.html
On the same page, I filter all phones with a price between 250 and 300 euros. This is the new url: https://www.example.fr/iphone-8.html#price=250&price=300
Does this url means that the filter is only applied on the html because of the # and therefore there is no api call for filtering?
Does this url means that the filter is only applied on the html because of the # and therefore there is no api call for filtering?
No that doesn't follow.
The experiment to try would be to load the original page into your browser, turn on the developer tools used to watch the network traffic, and then perform your search.
What you may discover is that when you manipulate the filter controls on the web page, what's really happening under the covers is that java script code is running, and making calls to fetch data from some back end endpoint, and then re-rendering the web page on the client. The fragment is being updated so that, if you were to bookmark the link, or copy it to another tab in your browser, the underlying javascript can reproduce "the same" results (by getting the search parameters from the fragment and repeating the search).
It should be possible to repeat those same calls directly from the browser itself (you won't necessarily get the HTML rendering, of course, but you'll probably be able to look at the filtered results in their own native representation (application/json, perhaps).
What's the avantage of using the frag rather than the query parameters for price?
Fragments are not part of the absolute-URI, and the query part is.
Which is to say, the query part is still part of the identifier of a primary resource, and is part of the request-line that is sent to the server.
But fragments are used to identify secondary resources; resources embedded within some primary resource.
Consider:
https://www.rfc-editor.org/rfc/rfc3986#section-3.5
This identifies a secondary resource (specifically, section-3.5) that is included within a primary resource (an HTML representation of RFC 3986). So we "fetch" the secondary resource by first loading the primary resource (the whole RFC) and then use the fragment identifier and HTML processing rules to discover the appropriate element in the document.
The fragment part is strictly a client side concern.

Implementing JSon-LD Schema in Ektron, is it possible?

This is my first time using Ektron and i'm trying to implement Json-LD schema scripts for each page. I have 68 scripts that I need to implement that are unique for each page.
I thought I would be able to implement these scripts through meta data, but now i'm unsure. Each script is over 1000 characters, the html and meta tag types only allow 500 characters, so i'm assuming i'm in the wrong place. If anyone could shed some light it would be much appreciated.
Ektron's metadata isn't intended for large chunks of data / content. So, yes, you will find limits there.
Here are two things you might try as workarounds.
Most direct:
Use the Ektron Library. Go to the Library tab and click on the Root node and view Properties. Add an extension to allow you to upload your JSON-LD as a file. Use metadata on the content item to reference the uploaded file. Combine the two upon output.
If you want the JSON-LD to be editable within the CMS...
Gaming the platform a bit
Create a new SmartForm definition and include in it a single plain-text, multi-line field (not Rich text). This should hold your JSON-LD. Set up a folder and, if your version supports it (you didn't specify CMS version, so I will assume relatively recent), set the folder to be non-searchable so these things don't come up in site search results. Add a restriction to the folder to only allow the Smart Form definition you just created. Create your JSON-LD there using the plain-text field. You should be able to store up to 1MB.
Same as above, add your JSON-LD as text then use a reference to this item from the content you want to use it.
The metadata in this case (and possibly the library one, though I'd have to test and I don't have an Ektron environment for development anymore) will give you the Content ID for the object holding your JSON-LD. You'll have to make another API call but will give you the solution you appear to want from above.

Infinite amt of Google custom search boxes per website?

I've got a site where users can create groups (we call them games)
www.ongoingworlds.com/games/270/
www.ongoingworlds.com/games/287/ etc
Each of these games has it's own user-generated content. I want to use a Google custom search for each game. But I can't see an easy way to amend the embed code to add a dynamic path, and I don't want to have to register multiple (hundreds) of GCSEs separately to get an embed code for each.
What would be the best way of allowing each of these URLs (above) to have their own GCSE?
You can search subparts of your site by using a combination of site: operator and webSearchQueryAddition parameter on gcse element.
webSearchQueryAddition appends additional search term to your user's query. If for each of the "games" you change the webSearchQueryAddition to point to the "game" base url, the search results will be matching that url. You can inject that parameter programmatically with e.g. javascript, for each of the "games".
Documentation is here: https://developers.google.com/custom-search/docs/element#supported_attributes
And here is working example:
http://jsfiddle.net/t2s5M/

How to use regular urls without the hash symbol in spine.js?

I'm trying to achieve urls in the form of http://localhost:9294/users instead of http://localhost:9294/#/users
This seems possible according to the documentation but I haven't been able to get this working for "bookmarkable" urls.
To clarify, browsing directly to http://localhost:9294/users gives a 404 "Not found: /users"
You can turn on HTML5 History support in Spine like this:
Spine.Route.setup(history: true)
By passing the history: true argument to Spine.Route.setup() that will enable the fancy URLs without hash.
The documentation for this is actually buried a bit, but it's here (second to last section): http://spinejs.com/docs/routing
EDIT:
In order to have urls that can be navigated to directly, you will have to do this "server" side. For example, with Rails, you would have to build a way to take the parameter of the url (in this case "/users"), and pass it to Spine accordingly. Here is an excerpt from the Spine docs:
However, there are some things you need to be aware of when using the
History API. Firstly, every URL you send to navigate() needs to have a
real HTML representation. Although the browser won't request the new
URL at that point, it will be requested if the page is subsequently
reloaded. In other words you can't make up arbitrary URLs, like you
can with hash fragments; every URL passed to the API needs to exist.
One way of implementing this is with server side support.
When browsers request a URL (expecting a HTML response) you first make
sure on server-side that the endpoint exists and is valid. Then you
can just serve up the main application, which will read the URL,
invoking the appropriate routes. For example, let's say your user
navigates to http://example.com/users/1. On the server-side, you check
that the URL /users/1 is valid, and that the User record with an ID of
1 exists. Then you can go ahead and just serve up the JavaScript
application.
The caveat to this approach is that it doesn't give search engine
crawlers any real content. If you want your application to be
crawl-able, you'll have to detect crawler bot requests, and serve them
a 'parallel universe of content'. That is beyond the scope of this
documentation though.
It's definitely a good bit of effort to get this working properly, but it CAN be done. It's not possible to give you a specific answer without knowing the stack you're working with.
I used the following rewrites as explained in this article.
http://www.josscrowcroft.com/2012/code/htaccess-for-html5-history-pushstate-url-routing/