Unicode Blog Article title through API - shopify

I am having an issue with the Shopify API when creating a new Article object for a Blog. I have an app which imports merchant's 3rd party blogs into Shopify. Some of these blogs are non-english, and so contain plenty of unicode characters. I can encode the body for these posts just fine using xml character replacement, but I can not encode the titles. For example, here is a sample blog in Russian imported into a Shopify test store: http://heller-sawayn5574.myshopify.com/blogs/unicode (original blog here: http://recen-zist.livejournal.com/)
You can see the body is shown properly encoded, but the title isn't and appears garbled. Shopify takes the xml encoded characters and displays them in the title literally, instead of the characters being converted back to unicode, as is the case in the body of the article.
If I log into the store admin and change the title manually to include unicode characters, it will display correctly. So the issue only happens when creating the blog post via the API. However, any post created with unicode characters in the title is then non-editable using the API due to this error.

I found a workaround for this by not XML encoding the title characters, but instead leaving them as UTF-8. I was using the Python Shopify API, and using UTF-8 was not previously possible, forcing me to encode using XML. For a background of the problem see this discussion:
https://groups.google.com/forum/?hl=en&fromgroups=#!topic/shopify-app-discuss/T5gee1A_2lE
The workaround is to update the pyactiveresource dependency to version 1.0.2
There is still the issue of why xml character encoding will work for the blog post body, but not for the blog title. But as long as there's another way to do it, this shouldn't matter much.

Related

How to access all text from a website, including the a tag?

I'm trying to extract all the article text from the following site:
https://www.phonearena.com/reviews/Samsung-Galaxy-S9-Plus-Review_id4494
I tried findAll(text=True) but it extracts lot of useless information.
So I did findAll(text=True, recursive=False) but it ignores text data in certain tags like ? What's the most effective way of extracting the text in this case?
The website seems to be javascript protected. It loads the body content when requests already retrieved the http response. You need to simulate a real page request. With the python module Selenium Webdriver it would be possible.

How should I provide DocuSign with a PDF?

We're using Python and the Requests library to add PDFs to DocuSign envelopes using the Add document method of the REST API v2:
response = requests.put(
'<base URL>/envelopes/<envelope ID>/documents/<next document ID>',
files={'document': <the PDF file object>}, # <- added to the request's body
headers=self._get_headers(
{
'Content-Disposition': 'document; filename="the-file.pdf";'
}
),
timeout=60
)
This has worked for us in most cases, except that about 1 in 100 PDFs isn't accepted via the API. When this problem occurs we tell our users to upload the PDFs directly through the DocuSign UI, which works. This prompted us (with the help of support) to look at the Document params link that appears above the example request on the Add document page linked above. The page shows a documentBase64 attribute, and a number of other fields. How can I supply the document in this format, with all the fields specified? Should I replace the file in my call above with files={'document': <JSON-encoded object>} or? I can't figure out how to add a document OTHER than the way we're currently doing it. Is there another way I'm missing?
It looks like there are now two different ways to add a document to a Draft Envelope with the REST API:
Use a multi-part request, where the first part contains the JSON body and each subsequent part contains a document's bytes -- in un-encoded format. An example of this approach is shown on pages 136-137 of the REST API guide (http://www.docusign.com/sites/default/files/REST_API_Guide_v2.pdf).
Use a normal request (i.e., not multi-part request), and supply document bytes in base64-encoded format as the value of the documentBase64 property for each document object in the Request. (This looks to be new as of the recent December 2013 API release/update.)
Based on the info you've included in your question, I suspect you're currently using approach #1. As described above, the main difference between the two approaches is the general structure of the request, and ALSO -- approach #1 expects document bytes to be un-encoded, while approach #2 expects document bytes to be base64-encoded. I suspect your issue has to do with encoding of files. i.e., if you're using approach #1 and any of the files are encoded, you'll likely have issues.

Posting an image on Tumblr using OAuth

I'm trying to post an image on Tumblr using OAuth. I'm using Objective C, but a general solution would be very helpful as well.
I manage to post an image that is online (using the "source" parameter). However, I can't post an image from the client (using the "data" parameter), and some issues confuse me.
1) Should I use "multipart/form-data" or "application/x-www-form-urlencoded"? I've seen conflicting claims on this issue.
2) What should I put in my post body and what should I put in my basestring? According to the oAuth specifications, if I use "multipart/form-data" then I don't need to add the parameters that I add to the post body (like "type" and "caption") to the base string, but even when I succeeded posting with "source", it only worked if I added the parameters to the base string as well.
3) In what format should I add the image to the body? If I need to also add it to the base string, then in what format should I add it there?
Thanks!
The problem with Tumblr is "unusual" implementation of OAuth (OAuth issues). It's not likely to be possible with "old" OAuth (from code.google.com/p/oauth/). I myself ended up using one of these solutions: crossbreeding OAuth with ASIFormDataRequest, but it is not likely to work with multiple images, or integrating TumblrUploadr. Also, Tumblr is likely to work better with new OAuth library (from here)
, but it will definitely conflict with ShareKit if you use it.
Concerning your questions:
TumblrUploadr uses application/x-www-form-urlencoded so it
is likely to be this one.
With any of solutions above, you should
just pass UIImageJPEGRepresentation of your image. TumblrUploadr has
it's own URL-encoding and for ASIFormDataRequest, I'm not sure.
URL-encoded binary data, as it is said in Tumblr API. I myself
didn't investigate ASIFormDataRequest deeply, so I'm not sure if you
need to add it to base string.

What is Twitter's definition of an URL?

I already asked the same question over at dev.twitter.com, however, I didn't get an answer there. So maybe someone here on SO ran into the same issue and has an answer.
In my application I count the length of the characters the user enters to compose a tweet. However, if the user enters an URL, this will be shortened automatically (by Twitter's API) when posting the tweet. So I have to replace the length of the URL with the length of the resulting t.co URL in my character counter.
However, the problem is now, what is Twitter's definition of an URL so that I know when you adapt my character counter and when not. For example www.verylongexampleurl.de gets shortened, while verylongexampleurl.de (without the www) doesn't, but verylongexampleurl.com does get shortened again.
I couldn't find any documentation, but maybe I missed it. All hints are appreciated.
Quoting from dev.twitter.com:
Need help parsing tweet text?
Take a look on the Twitter text processing library we’re using for auto linking and extraction of usernames, lists & hashtags.
Ruby: https://github.com/twitter/twitter-text-rb
Java: https://github.com/twitter/twitter-text-java
Javascript: https://github.com/twitter/twitter-text-js
The actual specification (tests) can be found here: https://github.com/twitter/twitter-text-conformance/blob/master/autolink.yml

Twitter API Strange Encoded Chars Returned

I'm building a little app that uses the twitter search api: http://search.twitter.com/search.json?q=funny
Now everything is working great, but sometimes it messes up my PHP script when the tweets contain chineze chars. Now on my site I use UTF-8 meta headers. Can someone tell me how to convert the following chars which were outputted by the Twitter API to a readable format?
EXAMPLE OUTPUT:
\u525B\u624D\u5728igfw.tk\u770B\u5230\u6709\u4E00\u500B\u535A\u5BA2\u63D0\u4F9B\u6BCF\u534A\u5C0F\u6642\u6539\u5BC6\u78BC\u7684ssh,\u62FF\u4F86\u7DF4\u7FD2\u4E00\u4E0Bbash\u8173\u672C\uFF0C\u65B7\u958B\u5F8C\u6703\u81EA\u52D5\u91CD\u65B0\u9023\u63A5\uFF0C\u4F7F\u7528Ctrl-C\u9000\u51FA,Cygwin\u4E0A\u7DE8\u5BEB\uFF0CLinux\u8A72\u6C92\u554F\u984C\uFF0CMAC\u81EA\u884C\u6E2C\
the \uXXXX are unicode codes for the characters
you could use a table of values pairing with the utf. or pack the string as json and use json_decode that already has that behavior.
you can see an example of how to do the conversion in the Service_json.json_decode() source, as one implementation that comes to my mind now
http://pear.php.net/package/Services_JSON/redirected