When can I trust my translated content in Vue? - vue.js

I'd like to verify my understanding of v-html and XSS concerns when using Vue I18n.
We have a few classes of content:
User supplied - user adds text to an input and the data will be display as-is on a page
Translator supplied - a translator using a CMS is adding content to translation files and we then show those values
Engineer supplied - an engineer is adding html to a translation file
#1 is outright untrustworthy
#2 could be trustworthy, but would require review (we will have generated translation files that will get added to a PR for the normal code review & QA process)
#3 is just trustworthy? Is there a difference between an engineer adding html in a template vs a translation file. Or is the mere fact that html could be in a translation file open us up to some sort of exploit?
There is a section on the Vue I18n documentation that says,
...if you configure the tag in a locale message, there is a
possibility of XSS vulnerabilities due to localizing with
v-html="$t('term')".
Is there an inherent XSS risk when using "Trusted Content" in a client side translation file?
Edit: After thinking about this it becomes dicey real quick. If we allow v-html on a section of translation because we know the say a hyperlink is ok, but we also display something else like company_name there could be bad content in there.
I did a little checking and it looks like Rails handles this differently. Even if a translation is marked as _html, the values supplied are still escaped, just the content itself is allowed to have html.
Example:
with name = "<b>Name</b>"
Rails -> foo_html: "hi {name} <u>underlined text</u>"
results in underlined text, but the name is not bold and the displayed content still has <b> around it (it ignores the html inside a supplied argument)
Vue I18n -> with v-html -> foo: "hi {name} <u>underlined text</u>"
the text is underlined, and the name is bolded, which is not safe

You are asking several questions
As long as content from #2 is audited for nefarious content, is it safe to use as raw html?
Yes and no. In theory, if you check every single message, and know the context of every single message, and never have time constraints so people check every single message before each release, you are fine. In practice, people cut corners, or do not know that a particular message will be inserted as html, or do not understand how some string might be malformed html, but get converted by the browser to valid html that is actually nefarious. Someone might get access to your CMS and change a translation string that you didn't expect to be changed. Someone might forge a form submission in your cms if it is not configured right by tricking an employee to visit an url.
Or is the mere fact that html could be in a translation file open us up to some sort of exploit?
v-html is just that. It places html unfiltered in your document.
What is that possibility, how does it work?
v-html with translated strings creates unnecessary risks and extra overhead by requiring a copywriter with extensive technical knowledge to check every single translation message, where instead you could have a copywriter without any technical knowledge do that. The only thing you need to do is use the pattern as outlined in the documentation, which allows anyone to change the translatable bits (which will be escaped), while keeping the html in your source control.

Related

Implementing JSon-LD Schema in Ektron, is it possible?

This is my first time using Ektron and i'm trying to implement Json-LD schema scripts for each page. I have 68 scripts that I need to implement that are unique for each page.
I thought I would be able to implement these scripts through meta data, but now i'm unsure. Each script is over 1000 characters, the html and meta tag types only allow 500 characters, so i'm assuming i'm in the wrong place. If anyone could shed some light it would be much appreciated.
Ektron's metadata isn't intended for large chunks of data / content. So, yes, you will find limits there.
Here are two things you might try as workarounds.
Most direct:
Use the Ektron Library. Go to the Library tab and click on the Root node and view Properties. Add an extension to allow you to upload your JSON-LD as a file. Use metadata on the content item to reference the uploaded file. Combine the two upon output.
If you want the JSON-LD to be editable within the CMS...
Gaming the platform a bit
Create a new SmartForm definition and include in it a single plain-text, multi-line field (not Rich text). This should hold your JSON-LD. Set up a folder and, if your version supports it (you didn't specify CMS version, so I will assume relatively recent), set the folder to be non-searchable so these things don't come up in site search results. Add a restriction to the folder to only allow the Smart Form definition you just created. Create your JSON-LD there using the plain-text field. You should be able to store up to 1MB.
Same as above, add your JSON-LD as text then use a reference to this item from the content you want to use it.
The metadata in this case (and possibly the library one, though I'd have to test and I don't have an Ektron environment for development anymore) will give you the Content ID for the object holding your JSON-LD. You'll have to make another API call but will give you the solution you appear to want from above.

Get output of a template call in a page from MediaWiki API

I am trying to parse a page on a wikia to get additional information for a Infobox Book template that is on the page. The problem is the I can only get the template's source instead of the transformed template on the page.
I'm using the following url as a base:
http://starwars.wikia.com/api.php?format=xml&action=expandtemplates&text={{Infobox%20Book}}&generatexml=1
The documentation doesn't really tell me how to point it to a specific page and parse the transformed template from the page. Is this even possible or do I need to parse it all myself?
To expand a template with the parameters from a given page, you will have to provide those parameters. There is no way for the API to know how the template is used in different pages (it could even be used twice!).
This works:
action=expandtemplates&text={{Infobox Book|book name=Lost Tribe of the Sith: Skyborn}}
You will, of course have to keep adding all the parameters you want to parse (there are 14 in your example).
If you have templates that change automatically depending on which page they are (that is not the case here), e.g. by making use of magic words such as {{PAGENAME}}, you can add &page=Lost_Tribe_of_the_Sith:_Skyborn to your API call, to set the context the template should be expanded in.
If you to not know the parameters given, you can either:
Render the whole page with index.php?action=render&title=Lost_Tribe_of_the_Sith:_Skyborn, and parse the returned html to carve out the actual infobox
Fetch (action=query&prop=revisions) and parse the wikicode to get the parameters to the template, and supply them to the expandtemplates call
Start using an extension like Semantic MediaWiki, that allows you to treat your wiki more like a database
1 and 2 can go wrong in any number of ways, of course, as with a wiki you have, by definition, no way of knowing that the content is always entered in a consistent way.

Security - Protecting an insert statement from malicious code

I'm building a commenting system. The comment is sent to a stored procedure in SQL.
What is the best way to prevent html, script, or SQL queries to be injected into the table? I want to do this server-side.
For example:
INSERT INTO MyTable (UserID, Comment) VALUES (#UserID, #Comment)
What would be the best way to deal with the comment field and remove any potential HTML, Scripts, or Queries to prevent attacks? Or to drop the insert if it contains certain characters? Eventually I want the user to be able to insert a link though, which would render in on the site as a clickable link...
Just new to this security stuff and obviously it's important.
Thank you so much.
Use parameterised statements (as you appear to be doing) with parameters for all variables and you have nothing to worry about from SQL injection.
HTML and JS injections are a concern to do with the page output phase, not database storage. Trying to do HTML escaping or validation in the database layer will be frustrating and fruitless: it's not the right place to be dealing with those concerns, you'll miss or mis-handle data, and the tools for string manipulation in SQL are weak.
Don't think in terms of detecting “attacks”, because blacklists will always fail. Instead aim to handle all text correctly, and then you'll be secure as a side effect of being accurate. Variable text that you drop into an HTML file needs to be HTML-escaped; variable text that you drop into a JavaScript string literal needs to be JS-escaped.
If you're using standard .NET templates, use the <%: syntax to HTML-escape text. Use that as your output tag instead of <%= and you'll be fine. Similarly, if you're using WebForms, use the controls whose Text property is automatically HTML-escaped. (Unfortunately this is inconsistent.) Where you have to generate markup directly, use HttpUtility.HtmlEncode explicitly.
Encoding for JavaScript string literals is a little trickier. There is HttpUtility.JavaScriptStringEncode, but JS strings commonly live inside HTML <script> blocks (making the </ sequence dangerous where it isn't in native JS), or in HTML inline event handlers (where you would need to JS-encode and then HTML-encode as well). It tends to be a better strategy to encode the data you want to send to JS in the DOM using regular HTML-escaping, for example in a data- attribute or an <input type="hidden">, and have the JS grab the value from the DOM.
If you really have to allow the user to input custom markup, then you'll need to filter it at input time to a small whitelist of approved elements and attributes. Use an existing HTML purifier library.

Editing the head element on an old blog platform on a post-by-post basis. Is this impossible or am I missing something?

Sorry for being a total rookie.
I am trying to help my professor implement this advice:
Either as a courtesy to Forbes or a favor to yourself, you may want to include the rel="canonical" link element on your cross-posts. To do this, on the content you want to take the backseat in search engines, you add in the head of the page. The URL should be for the content you want to be favored by search engines. Otherwise, search engines see duplicate content, grow confused, and then get upset. You can read more about the canonical tag here: http://www.mattcutts.com/blog/canonical-link-tag/. Have a great day!
The problem is I am having trouble figuring out how to edit the head element on a post-by-post basis. We are currently on a super old blogging platform (Movable Type 3.2 from 2005), so maybe it is not possible. But I'd like to know if that is likely the reason, so I'm not missing out on a workaround.
If anyone could point me in the right direction, I would greatly appreciate it!
Without knowing much about your installation, I'll give a general description, and hopefully it matches what you see and helps.
In Movable Type, each blog has a "Design" section where you can see and edit the templates for the blog. On this page, the templates that are published once are listed under "Index Templates," and the templates published multiple times, once per entry, per category, etc., are listed under "Archive Templates."
There probably is an archive template called "Entry" (could be renamed) publishing to a path like category/sub-category/entry-basename.php. This is the main template that publishes each entry. Click on this to open the template editor.
This template could be an entire HTML document, or it might have "includes" that look like <MTInclude module=""> or <$mt:Include module=""$> (MT supports varying tag styles.).
You may find there is an included module that contains the <head> content, or it might just be right in that template. To "follow" the includes and see those templates, there should be links on the side of the included templates.
Once you find the <head> content, you can add a canonical link tag like this:
<mt:IfArchiveType type="Individual">
<mt:If tag="EntryPermalink">
<link rel="canonical" href="<$mt:EntryPermalink$>" />
</mt:If>
</mt:IfArchiveType>
Depending on your needs, you might want to customize this to output a specific URL structure for other types of content, like category listings. The above will just take care of telling search engines the preferred URL for each entry.
#Charlie: may be I'm missing something, but your solution basically places a canonical link on each entry to… itself, which is a no-no for search engines (the link should point to another page that's considered the canonical one).
#user2359284 you need a way to define the canonical entry for those which need this link. As Shmuel suggested, either reuse an unused field or a custom field plugin. Then you simply add that link in the header in the proper archive template that outputs your notes. In the hypothesis that the Entry template includes the same header as other templates, and, say, you're using the Keywords field to set the URL, then the following code should work (the mt:IfArchiveType test simply ensures it's output in the proper context, which you don't need if your Entry template has its own code for the header):
<mt:IfArchiveType type="Individual">
<link rel="canonical" href="<$mt:EntryKeywords$>" />
</mt:IfArchiveType>

Which multilingual web design solution is fastest for the user, if this is indeed an issue?

Context:
I'm in the design phase of what I'm hoping will be a big website (lots of traffic, lots of users reading and writing to database).
I want to offer this website in the three languages I speak myself (English, French, and by the time I finish the website, I will hopefully have learned enough Spanish to offer that too)
Dilemma:
I'm wondering how I should go about offering these various languages (and perhaps more in the future).
Criteria:
Many methods exist for designing multi-language websites. I'm looking for the technique that will result in a faster browsing experience for the user.
Choices:
Currently, I can think of (and have read about) the following choices. They are sorted in order of preference up to now.
Store all language-specific strings
in a database and fetch the good one
depending on prefered-language
(members can choose which language
they prefer),
browser-default-language and which
language is selected during the
current session, in that order.
Pros:
Most of the time, a single
test at the beggining of the session
confirms which language to use for
the remainder of the session (stored
in a SESSION variable). Otherwise, a
user logging in also fetches the
right language and keeps it until
he/she logs out (no further tests). So the testing part should be
pretty fast.
Cons:
I'm afraid that accessing the
database all the time would be quite
time-consuming (longer page load for
the user), especially considering
that lots of users could also be
accessing the database at the same
time for the same reason (getting the website text in the correct language), but also
for posting comments and the such.
Strings which include variables
(e.g. "Hello " + user.name + ", how
are you?") are harder to
store because the variable (e.g.
user name) changes for each user.
A direct link to a portal for a specific language would be ugly (e.g. www.site.com?lang=es)
Store all language-specific strings
in a text file and fetch the good one
depending on prefered-language
(members can choose which language
they prefer),
browser-default-language and which
language is selected during the
current session, in that order.
Pros:
Most of the time, a single
test at the beggining of the session
confirms which language to use for
the remainder of the session (stored
in a SESSION variable). Otherwise, a
user logging in also fetches the
right language and keeps it until
he/she logs out (no further tests). So the testing part should be
pretty fast.
Cons:
I'm afraid that accessing the
text file all the time would be quite
time-consuming (longer page load for
the user), especially considering
that lots of users could also be
accessing the file at the same
time for the same reason (getting the website text in the correct language).
Strings which include variables
(e.g. "Hello " + user.name + ", how
are you?") are harder to
store because the variable (e.g.
user name) changes for each user.
I don't think multiple users could access the text file concurrently, though I may be wrong. If that's the case though, every user loading a page would have to wait for his/her turn to access the text file.
Fetching the very last string of the text file could be pretty long...
A direct link to a portal for a specific language would be ugly (e.g. www.site.com?lang=es)
Creating multiple versions of the website in seperate folders, where each version is in a different language.
Pros:
No extra-treatment is needed for handling languages, so no extra waiting time.
Cons:
Maintaining the website will be like going to school: painfull, long, makes you stupid after doing the same thing over and over again.
ugly url (e.g. www.site.com/es/ instead of www.site.com)
Additionnaly, the coices above could be combined with one or more of the following techniques:
Caching certain frequently requested pages (in a singleton or static PHP function?). Certain sentences could also be cached for every language.
Pros
Quicker access for frequently-requested pages.
Which pages need caching can be determined dynamically, with time.
Cons
I'm not sure about this one, but would this end up bloating the server's RAM?
Rewritting the url could be used for many things.
A user looking for direct access to one language could do so using www.site.com/fr/somefile and would be redirected to www.site.com/somefile, but with the language selected beign stored in a session variable.
Pros
Search engines like this because they have two different pages to show for two different languages
Cons
Bookmarking a page doesn't mean you'll en up with the right language when you come back, unless I put the language information in the url (www.site.com/somefile?lang=fr)
A little more info
I usually user the following technologies to make a website:
PHP
SQL
XHTML
CSS
Javascript (and AJAX)
This being said, if a solution requires that I learn a new language or something, I'm very open to doing so. I have no deadline for this project and I do intend to learn a lot from doing it!
Conclusion:
What I'm looking for is a method that allows me to offer multiple languages while not increasing page load time and not going crazy when trying to maintain the website. If you guys/gals have other ideas I should consider, I will try adding them to my list. Another possibility is that I'm overdoing this. Maybe I won't gain enough time with these methods for this all to be worth it, I just don't know how to verify if I need to worry about this or not.. so if you have any ideas for that, it would also help me.
Whether you use a database or a filesystem to store the translations, you should be loading the text all at once and then serving it from memory. Most applications will typically not have so much text that this becomes a problem. In Java or .Net this could be accomplished by storing the text in a singleton or static object. Then all the strings are in RAM and do not need to be loaded or parsed. If your platform does not have a convenient way to store data in ram, you could run a separate caching application such as memcached.
The rest of your concerns can be mitigated by hiding the details. Build or find a framework that lets you load your translations and then look them up by some key. If you decide to switch to files or a database later, the rest of your code is unaffected. In the short term do whichever is easier for you. I've found that it's best to have a mix: it's easier to manage application text along with the source code in a version control system. But some text changes often, or needs to change without requiring a build+deployment cycle, and that text should be in the DB.
Finally, don't build strings with substitutions in them. Use some kind of format string, because otherwise your translators will go crazy trying to translate sentence fragments.
(Warning: Java code sample)
//WRONG
String msg = "Hello, " + username + ", welcome back.";
//RIGHT
String fmt = "Hello, %s, welcome back."; // in real code: load this string from a file or the db
String msg = fmt.format(username);
Another person mentioned encoding the language in the URL. This is the preferred way to do it if you care what a search engine thinks of your site. Google recommends using different hostnames or a different subdirectory. This means that the language headers sent by the user can't be used for anything, except perhaps initially sending them to one landing page or another. You will need to determine the language for each request based on the incoming URL (this actually simplifies your code a lot later on). In Java I'd store the language code in the Request and just grab it whenever I need it.
The easiest way to handle language codes in the URL is to use re-writing. A client sends a request for www.yoursite.com/de/somepage and internally you re-write the request to www.yoursite.com/somepage and store the language identifier somewhere. In Java each request has an HttpServletRequest object where you can store attributes for the lifecycle of the request. If your framework doesn't have anything like that you can just add a parameter to the url: www.yoursite.com/de/somepage => www.yoursite.com/somepage?lang=de. If you are using hostname-based languages you can use hostnames such as de.yoursite.com or www.yoursite.de. There are pros and cons to using this approach. For one thing, using country-code TLDs means registering new TLDs and trying to figure out whether a country code is appropriate to represent a language (it's often not). Using differnet hostnames/domains means you have to consider under what domains cookies are stored. If you want a cookie-free subdomain you need to plan this carefully. But from the coding side a language-based hostname doesn't need any additional re-writing; you can read the hostname (it's the Host header in the HTTP request) and parse that to determine the language.
Offer the initial page in a language depending on the Accept-Language HTTP header.
Let the user set the language in the current session and, if they're authenticated, in their user profile.
In your code and templates, mark strings as "translatable." You should have tools that gather all the strings from your codebase and let your translaters translate them.
Have a layer which loads the translations from the database either individually or as a bundle, and apply them to the page which is loading. Cache these parts to make them fast -- every page load shouldn't make a hundred calls to the database for every translatable string.
Checkout how Django does it -- it should be enlightening.
"I'm afraid that accessing [the database/text file] all the time would be quite time-consuming"
It would be, but that's why you'd likely be using caching to some extent. Nearly all large sites are accessing data stored outside the HTML page itself and, as such, utilize caching techniques as needed.
Your question regarding speed really is irrelevant to having multiple languages. It's an issue of storing data (content) so it's easy to maintain and present to the user. Whether it's one language or 10 the problem is the same.
Create the most generic form of the site as you can. Import the translation from a database, with fall back (i.e. an order of languages, if a translation does not exist then use the next best langauge (For German: German, Dutch, English etc).
You would solve performance issues by keeping caches of the dynamically created pages. [Check the dependent data and update if necessary]
The perfered language that a user would like is passed along in the HTTP request headers. Having a select language+query string would often be unnecessary.
Resource files would be one way to go. It is easier to send to translators. However it can be difficult to resuse amongst multiple websites.
Databases are convient because it is the first thing that should be backed up on a website. It also has the benefit of being fast. However, if you have an extremely database focused project, you may not want to add additional strain on your database.
For my solutions I want this:
The language should be indicated in the URL, it works better with google indexing the page and people following the links in google's search result.
As much pre-generated translations as possible, for faster page-serving.
The first is quite easily done by having an URL like http://example.com/fr/and-so-on. URL rewriting can turn that into http://example.com/and-so-on?lang=fr which is potentially easier to handle.
For pre-generating translations, it is good to use a html template framework so you can generate translated templates from one set of source templates. A blunt approach is to generate a sed-script from a language key-value files, and run that sed script on each template to get a translated version.
What remains then is to translate the dynamically generated parts of the pages. There are a few tools for that java has bundles, gnu gettext is a quite nice tool.