Twitter API Strange Encoded Chars Returned - api

I'm building a little app that uses the twitter search api: http://search.twitter.com/search.json?q=funny
Now everything is working great, but sometimes it messes up my PHP script when the tweets contain chineze chars. Now on my site I use UTF-8 meta headers. Can someone tell me how to convert the following chars which were outputted by the Twitter API to a readable format?
EXAMPLE OUTPUT:
\u525B\u624D\u5728igfw.tk\u770B\u5230\u6709\u4E00\u500B\u535A\u5BA2\u63D0\u4F9B\u6BCF\u534A\u5C0F\u6642\u6539\u5BC6\u78BC\u7684ssh,\u62FF\u4F86\u7DF4\u7FD2\u4E00\u4E0Bbash\u8173\u672C\uFF0C\u65B7\u958B\u5F8C\u6703\u81EA\u52D5\u91CD\u65B0\u9023\u63A5\uFF0C\u4F7F\u7528Ctrl-C\u9000\u51FA,Cygwin\u4E0A\u7DE8\u5BEB\uFF0CLinux\u8A72\u6C92\u554F\u984C\uFF0CMAC\u81EA\u884C\u6E2C\

the \uXXXX are unicode codes for the characters
you could use a table of values pairing with the utf. or pack the string as json and use json_decode that already has that behavior.
you can see an example of how to do the conversion in the Service_json.json_decode() source, as one implementation that comes to my mind now
http://pear.php.net/package/Services_JSON/redirected

Related

REST API method that accepts multiple file uploads and additional arguments

I'm attempting to create a REST API method that accepts multiple file uploads with some additional arguments. This API method will be called from both web forms, web services or mobile apps.
Is there a standard I should be following with regards to how the method takes these parameters in?
So far, I've considered the following two approaches:
JSON body: file data to be included as base64 encoded fields within the JSON object. Fine if being called from other web services, but troublesome when calling from a HTML form?
multipart/form-data: easy to use with HTML forms, but problematic when calling from web services or mobile apps?
I know that either of the two approaches would work, but I'd like to implement this the correct way (if there is one) according to current standards. Any ideas?
Do modern JS libraries/frameworks make it easy to POST HTML forms to web APIs as JSON objects
Yes, we have a lot of library to convert the file into base64.
In my opinion, choose what is based on your requirement. Firstly, exchanging data in multipart format should be more efficient than base64 json string. But this article show, the term of the size is little.
But if we use json, you could pass multiple other variable in the json format and we could read it easily.
Besides, if your file is image, the browser understand data URIs (base64 encoded images), there is no need to transform these if the client is a browser.

How do I use Google's Vision API to convert a PDF (non-searchable) to a searchable PDF?

From what I've seen, Google's Vision API lets you perform OCR on a PDF, but it returns only the detected text in a JSON format. What I need is a searchable (OCR'd) PDF file in return. Is this possible?
Notice that the OutputConfig type doesn't have any metadata field to configure the resulting file's format. As you are already aware, the API returns a JSON response. You could either first get the JSON data with the API and explore the use of any of the following repositories for JSON to PDF conversion or directly use any specialized module such as OCRmyPDF that specifically serves this purpose on your source PDF and avoid the use of the API altogether.

Unicode Blog Article title through API

I am having an issue with the Shopify API when creating a new Article object for a Blog. I have an app which imports merchant's 3rd party blogs into Shopify. Some of these blogs are non-english, and so contain plenty of unicode characters. I can encode the body for these posts just fine using xml character replacement, but I can not encode the titles. For example, here is a sample blog in Russian imported into a Shopify test store: http://heller-sawayn5574.myshopify.com/blogs/unicode (original blog here: http://recen-zist.livejournal.com/)
You can see the body is shown properly encoded, but the title isn't and appears garbled. Shopify takes the xml encoded characters and displays them in the title literally, instead of the characters being converted back to unicode, as is the case in the body of the article.
If I log into the store admin and change the title manually to include unicode characters, it will display correctly. So the issue only happens when creating the blog post via the API. However, any post created with unicode characters in the title is then non-editable using the API due to this error.
I found a workaround for this by not XML encoding the title characters, but instead leaving them as UTF-8. I was using the Python Shopify API, and using UTF-8 was not previously possible, forcing me to encode using XML. For a background of the problem see this discussion:
https://groups.google.com/forum/?hl=en&fromgroups=#!topic/shopify-app-discuss/T5gee1A_2lE
The workaround is to update the pyactiveresource dependency to version 1.0.2
There is still the issue of why xml character encoding will work for the blog post body, but not for the blog title. But as long as there's another way to do it, this shouldn't matter much.

What is Twitter's definition of an URL?

I already asked the same question over at dev.twitter.com, however, I didn't get an answer there. So maybe someone here on SO ran into the same issue and has an answer.
In my application I count the length of the characters the user enters to compose a tweet. However, if the user enters an URL, this will be shortened automatically (by Twitter's API) when posting the tweet. So I have to replace the length of the URL with the length of the resulting t.co URL in my character counter.
However, the problem is now, what is Twitter's definition of an URL so that I know when you adapt my character counter and when not. For example www.verylongexampleurl.de gets shortened, while verylongexampleurl.de (without the www) doesn't, but verylongexampleurl.com does get shortened again.
I couldn't find any documentation, but maybe I missed it. All hints are appreciated.
Quoting from dev.twitter.com:
Need help parsing tweet text?
Take a look on the Twitter text processing library we’re using for auto linking and extraction of usernames, lists & hashtags.
Ruby: https://github.com/twitter/twitter-text-rb
Java: https://github.com/twitter/twitter-text-java
Javascript: https://github.com/twitter/twitter-text-js
The actual specification (tests) can be found here: https://github.com/twitter/twitter-text-conformance/blob/master/autolink.yml

Google Reader API HTTP Response parsing (Objective C)

Using the API, trying to get items in a specific feed returns this:
{“direction”:”ltr”,”id”:”feed/http://arstechnica.com/index.rssx”,”title”:”Ars Technica”,”description”:”The Art of Technology”,”self”:[{"href":"http://www.google.com/reader/api/0/stream/contents/feed/http://arstechnica.com/index.rssx?ot\u003d1273193172856169\u0026r\u003dn\u0026xt\u003duser/-/state/com.google/read\u0026n\u003d4\u0026ck\u003d1273193873\u0026client\u003diPadReader"}],”alternate”:[{"href":"http://arstechnica.com/index.php","type":"text/html"}],”updated”:1273193873,”items”:[]}
They look like key/value pairs but it’s plain text with UTF8 String encoding and won’t encode into a dictionary. I’m using Objective-C and I’m not sure where to go from here. So far I’ve been able to parse the XML response for unread items, but parsing the plain-text doesn’t look feasible. What is your practice?
It looks like JSON markup. You'll want to use a JSON parser for Objective-C.