I have created a rails application, in which I am displaying any type of name, name can be anything from person, cars, vegetables, any material.
So I thought of including some ingredient name like Crème Fraîche, whenever I copy this name from other web page and store in my database, it properly stores it.
While displaying this name on web page, I get some strange characters appearing on the page like Cr�me Fra�che.
I have used charset UTF-8, then also it displays the name like this.
I checked in my database name is stored properly, but on page and in irb it displays the name like this.
I have wasted nearly 5 days on searching for the above problem but didn't succeed.
I hope this time I will get some help
Thanks in advance
Pankaj
include the following meta tag in the section of your page
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
This tells browsers to use ISO encoding to interpret the text on the page
Even with the correct encoding in the webpage, characters will still display corrupted on a machine where the required fonts are not installed and obviously you cannot be sure what fonts are installed on machines looking at your page
Browsers may also need to be configured. For example, Firefox needs to be told in the Options/Tools/Content section to use the above character set by default
To be absolutely certain your characters will display correctly use the unambiguous HTML code for each non-standard character e.g. instead of Crème use Cr & egrave ; me (NB there should not be spaces between the "&" and the ";" - I had to type it in this way to stop this page from interpreting the code and displaying è).
This can have implications for web searches and ordering functions in your database. You could store the text as Crème in your database and pass everything through a conversion function before delivering to the HTML page (this will obviously introduce a slight delay in fetching the page). You might also consider having two versions of your data, a raw data version and a display version. Then you could pass new data through the conversion function, store the converted version and have the converted data written to the HTML page.
Related
I'd like to verify my understanding of v-html and XSS concerns when using Vue I18n.
We have a few classes of content:
User supplied - user adds text to an input and the data will be display as-is on a page
Translator supplied - a translator using a CMS is adding content to translation files and we then show those values
Engineer supplied - an engineer is adding html to a translation file
#1 is outright untrustworthy
#2 could be trustworthy, but would require review (we will have generated translation files that will get added to a PR for the normal code review & QA process)
#3 is just trustworthy? Is there a difference between an engineer adding html in a template vs a translation file. Or is the mere fact that html could be in a translation file open us up to some sort of exploit?
There is a section on the Vue I18n documentation that says,
...if you configure the tag in a locale message, there is a
possibility of XSS vulnerabilities due to localizing with
v-html="$t('term')".
Is there an inherent XSS risk when using "Trusted Content" in a client side translation file?
Edit: After thinking about this it becomes dicey real quick. If we allow v-html on a section of translation because we know the say a hyperlink is ok, but we also display something else like company_name there could be bad content in there.
I did a little checking and it looks like Rails handles this differently. Even if a translation is marked as _html, the values supplied are still escaped, just the content itself is allowed to have html.
Example:
with name = "<b>Name</b>"
Rails -> foo_html: "hi {name} <u>underlined text</u>"
results in underlined text, but the name is not bold and the displayed content still has <b> around it (it ignores the html inside a supplied argument)
Vue I18n -> with v-html -> foo: "hi {name} <u>underlined text</u>"
the text is underlined, and the name is bolded, which is not safe
You are asking several questions
As long as content from #2 is audited for nefarious content, is it safe to use as raw html?
Yes and no. In theory, if you check every single message, and know the context of every single message, and never have time constraints so people check every single message before each release, you are fine. In practice, people cut corners, or do not know that a particular message will be inserted as html, or do not understand how some string might be malformed html, but get converted by the browser to valid html that is actually nefarious. Someone might get access to your CMS and change a translation string that you didn't expect to be changed. Someone might forge a form submission in your cms if it is not configured right by tricking an employee to visit an url.
Or is the mere fact that html could be in a translation file open us up to some sort of exploit?
v-html is just that. It places html unfiltered in your document.
What is that possibility, how does it work?
v-html with translated strings creates unnecessary risks and extra overhead by requiring a copywriter with extensive technical knowledge to check every single translation message, where instead you could have a copywriter without any technical knowledge do that. The only thing you need to do is use the pattern as outlined in the documentation, which allows anyone to change the translatable bits (which will be escaped), while keeping the html in your source control.
I am trying to parse a page on a wikia to get additional information for a Infobox Book template that is on the page. The problem is the I can only get the template's source instead of the transformed template on the page.
I'm using the following url as a base:
http://starwars.wikia.com/api.php?format=xml&action=expandtemplates&text={{Infobox%20Book}}&generatexml=1
The documentation doesn't really tell me how to point it to a specific page and parse the transformed template from the page. Is this even possible or do I need to parse it all myself?
To expand a template with the parameters from a given page, you will have to provide those parameters. There is no way for the API to know how the template is used in different pages (it could even be used twice!).
This works:
action=expandtemplates&text={{Infobox Book|book name=Lost Tribe of the Sith: Skyborn}}
You will, of course have to keep adding all the parameters you want to parse (there are 14 in your example).
If you have templates that change automatically depending on which page they are (that is not the case here), e.g. by making use of magic words such as {{PAGENAME}}, you can add &page=Lost_Tribe_of_the_Sith:_Skyborn to your API call, to set the context the template should be expanded in.
If you to not know the parameters given, you can either:
Render the whole page with index.php?action=render&title=Lost_Tribe_of_the_Sith:_Skyborn, and parse the returned html to carve out the actual infobox
Fetch (action=query&prop=revisions) and parse the wikicode to get the parameters to the template, and supply them to the expandtemplates call
Start using an extension like Semantic MediaWiki, that allows you to treat your wiki more like a database
1 and 2 can go wrong in any number of ways, of course, as with a wiki you have, by definition, no way of knowing that the content is always entered in a consistent way.
I have a page tab app that I am hosting. I have both http and https supported. While I receive a signed_request package as expected, after I decode it does not contain page information. That data is simply missing.
I verified that like schemes are being used (https) among facebook, my hosted site and even the 'go between'-- facebook's static page handler.
Also created a new application with page tab support but got the same results-- simply no page information in the signed_request.
Any other causes people can think of?
I add the app to the page tab using this link:
https://www.facebook.com/dialog/pagetab?app_id=176236832519816&next=https://www.intelligantt.com/Facebook/application.html
Here is the page tab I am using (Note: requires permissions):
https://www.facebook.com/pages/School-Auction-Test-2/154869721351873?id=154869721351873&sk=app_176236832519816
Here is the decoded signed_request I am receiving:
{"algorithm":"HMAC-SHA256","code":!REMOVED!,"issued_at":1369384264,"user_id":"1218470256"}
5/25 Update - I thought maybe the canvas app urls didn't match the page tab urls so I spent several hours going through scenarios where they both had a trailing slash or not. Where they both had a trailing ? or not, with query parameters or not.
I also tried changing the 'next' value when creating the page tab to the canvas app url and the page tab url.
No success on either count.
I did read where because I'm seeing the 'code' value in the signed_request it means Facebook either couldn't match my urls or that I'm capturing the second request. However, I given all the URL permutations I went through I believe the urls match. I also subscribed to the 'auth.authResponseChange' which should give me the very first authResponse that should contain the signed_request with page.id in it (but doesn't).
If I had any reputation, I'd add a bounty to this.
Thanks.
I've just spent ~5 hours on this exact same problem and posted a prior answer that was incorrect. Here's the deal:
As you pointed out, signed_request appears to be missing the page data if your tab is implemented in pure javascript as a static html page (with *.htm extension).
I repeated the exact same test, on the exact same page, but wrapped my html page (including js) within a Perl script (with *.cgi extension)... and voila, signed_request has the page info.
Although confusing (and should be better documented as a design choice by Facebook), this may make some sense because it would be impossible to validate the signed_request wholly within Javascript without placing your secretkey within the scope (and therefore revealing it to a potential hacker).
It would be much easier with the PHP SDK, but if you just want to use JavaScript, maybe this will help:
Facebook Registration - Reading the data/signed request with Javascript
Also, you may want to check out this: https://github.com/diulama/js-facebook-signed-request
simply you can't get the full params with the javascript signed_request, use the php sdk to get the full signed_request . and record the values you need into javascript variabls ...
with the php sdk after instanciation ... use the facebook object as following.
$signed_request = $facebook->getSignedRequest();
var_dump($signed_request) ;
this is just to debug but u'll see that the printed array will contain many values that u won't get with js sdk for security reasons.
hope that helped better anyone who would need it, cz it seems this issue takes at the min 3 hours for everyone who runs into.
As an example, the Rails parameterize method would create a string like so:
"hello-there-joe-smith" == "Hello There Joe.Smith".parameterize
For legacy reasons, a project I am working on requires uppercase letters as well as periods to be available in a particular URL parameter.
Why would this ever be a problem?
Clarification
The url type I'm talking about is what is used instead of an id, commonly knows as a slug.
Would a Rails app with the following url come to any issues: http://example.com/Smith.Joe?
This will be a problem both in terms of SEO and browser caching (and hence performance,)
Search engines are case sensitive, so same URL in different case will be taken as two URLs.
Browser like IE's caching is case sensitive, so eg. if you try to access your page as MYPAGE.aspx and at some place in code, you write it as mypage.aspx then IE will treat them as two different pages and instead of getting it from cahce, it will get it from server.
Dashes should be fine but underscores should be avoided : http://www.mattcutts.com/blog/dashes-vs-underscores/
I'm guessing a site like stack overflow doesn't keep an html file around for every question ever asked. Instead, server-side code creates the page every time a question is clicked on(I think). Is it possible for search engines to index every quesiton on Stack Overflow, or would a page-per-question need to be kept in the directory so the search engine can crawl it?
Yes. Search engines can index dynamically generated pages no problem. In fact, from the search engine bot's perspective, it can't really even distinguish between a dynamically generated page and a static one.
You might be interested by the Dynamic URLs vs. static URLs post on the Official Google Webmaster Central Blog.
Yes it's perfectly possible - when a link is followed the server returns HTML just like any other web page. The only difference is that the server generated it, rather than a person.
As far as the client (be it a browser or search engine) is concerned, there is no difference between a server-generated page and a static file. They're virtually indistinguishable (depending on how the page is generated, it might be missing Last-Modified headers, etc). As such, yes, search engines can index generated pages without a problem.
That said, there is something to be said for giving them a hint. Using sitemaps, for example, gives a search engine a nice listing of all your pages, so it's less likely to miss them. More importantly, it can summarize last modified times, to focus the search engine's attention on what has changed recently. This isn't mandatory, but it does help - regardless of whether the pages are static HTML or generated.
Any link that uses a GET can be followed by most crawlers. Anything that requires a POST will generally be ignored.
The mechanism for generating the page is irrelevant.
yes if this is not restricted by robot.txt or meta tags.Search engine requests web page like normal user,no one have access to server side code(if your site isn't hacked))
Search engines can see pretty much anything on a given Web page that isn't hidden behind client-side code (i.e., JavaScript).
So, if there's a URL that you can enter into your browser's address bar to get this page, and this page is linked to from somewhere, a search engine will find it and "see" the same content that you do. The fact that the page was generated dynamically by a server is irrelevant to a search engine, since what is sent to a browser upon requesting a URL is still just an HTML file.
In other words, that HTML file doesn't exist in the same form on the server - i.e., it's actually some server-side code that generates HTML, not a static HTML file - but that's not what a search engine is crawling though and indexing, rather links to document URLs that are exactly what you see in your browser's address bar.