I need to ensure that I have the body of a MailItem-Object encoded as UTF-8, but I cannot figure out what the current encoding is and I have not found any notes on this. Even the HTMLBody does not seem to specify this, at least I couldn't find it.
All strings in the Outlook Object Model are UTF-16; you can easily convert them to UTF-8. . What exactly are you trying to do and why?
Related
I'm trying to use Twisted in a web-app, and I'm coming across an interesting issue. I'm very new to Twisted, so I'm not sure if I'm seeing a bug in Twisted, or if I just am not using it correctly.
Theoretically from the example, a File resource object can be use to both serve files from a directory, as well as provide the directory listing. So assuming I have the variables (port, reportsDir) defined elsewhere before the code snippet, I do the following:
rootResource = Resource()
rootResource.putChild("reports", File(reportsDir))
reactor.listenTCP(port, Site(rootResource))
reactor.run(installSignalHandlers=False)
Now, when I access '/reports' on my host I get a message "Request did not return bytes" in my browser with a bunch of stuff that was obviously produced by twisted, but also contains a print of a u'.....' string literal, which in fact has the directory listing in it. So the DirectoryLister is obviously creating the listing HTML, but it isn't seeing as valid by something in Twisted. It doesn't seem to like the unicode string; which was in fact produced by Twisted itself.
Do I need to set some other configuration item to get it to convert the unicode string to the necessary bytes object (or whatever), or some other approach?
Many thanks,
-D
Well, it seems like the issue is that Python will promote any string to unicode if any source string on a format was unicode. In my case, "reportsDir" was unicode because it came from a XML file, and that set it down the error path.
Changing the above line:
rootResource.putChild("reports", File(reportsDir))
to:
rootResource.putChild("reports", File(reportsDir.encode('ascii', 'ignore')))
fixed the issue. I would however suggest that the Twisted developers do a check for unicode in the constructor for File, or in the DirectoryLister simply check for unicode, and if it is then return the ascii-encoded version.
I have a problem to strip out the format in a note table
Here is an example:
";\red31\green73\blue125;
\viewkind4\uc1\ltrpar\f0\fs20 USEFUL TEXT BODY \cf1\f3
\ltrpar\f0\fs17
"
How to get rid of those stuff? I want to play safe not to replace anything after'\'
Many thanks,
Rick
Your making it quite difficult for yourself by not replace '\' .
If you look at http://other9.tripod.com/Refs/easy-rtf.html you will see that there are different RTF codes and there is no default size for the codes.
Additionally, it is not like HTML where there must be a necessary "closing" tag which makes it additionally difficult.
The only thing I can think of is to record all possible RTF codes (or use an RTF parser library) and hence be able to recognize if a \ is or is not RTF code.
i am trying to parse rss/atom-feeds in my rails app, but i encountered some serious problems with non-ASCII characters, eg. the german umlauts ÄÖÜ or ß. Some feeds in the wild use proper UTF-8, but some others make me cry. The general Problem is:
I must be able to parse any Feeds, whatever encoding they might have. The "loss" of characters is not an option (though its my current status), because i do some text and language analysis with the feed-items.
What i use so far:
FeedZirra for fetching and parsing the feeds, works well so far. I also "sanitize" the values i get from FeedZirra.
HTMLEntities (gem) for unescaping special characters, like "Ä" which means "Ä"
rCharDet19 gem, to figure out which encoding the feed might have, and to:
string.encode! to convert from whatever it is to utf-8
Ruby 1.9.3 (lastest) and Rails 3.2.8 on Ubuntu Linux 12.04
The problem is, that i literally have no idea what i'm doing wrong.
def self.sanitize_encoding_and_htmlentities str
cd = CharDet.detect str
s = str.encode(:invalid => :replace, :undef => :replace, :replace => '')
coder = HTMLEntities.new
coder.decode(s)
end
This is my current sanitize method. As sample-feed i use
http://www.N24.de/2/index.rss
So far, the "special" characters got replaced completely. This is the only variant i found which just works without raising an error due to invalid byte stuff. I changed the encode method slightly, because i read in the ruby doc that without any encoding given, the encode method should "translate" to the given default_internal Encoding of the app, which is utf-8 in my case. CharDet stands there just for possible changes to anything related, might be useful.
I used the magic_encoding gem, so every file in my project should have the comment on the first line. My database is sqlite3 with utf-8.
As of 2012, is there anything i should look at? Did i make anything really wrong?
Thanks for help!
EDIT:
The feeds may be rss of any kind, atom, and/or just invalid XML. The Encoding may be UTF-8, something different, or just says "utf-8" while its some windows-XXX stuff, and so on. I really need a solution for this alltogether.
Also the fetching/parsing must be as fast as possible, that's why i picked feedzirra.
My current Idea is to get the feedcontent, replace every char in the "title" and "description" nodes with htmlentities if possible, use the encode! method to switch to utf-8, and then unescape the htmlentities. After this, special characters should be keeped i think, but i can't get something like this working at the moment. Might this be a good approach?
Finally i found the main Problem:
Feedzirra already returns UTF-8 when accessing entries and their attributes. But i used the sanitize method to access attributes, which returns ASCII-8BIT and weird characters escaped as html-entities.
However, i kicked all the sanitizing and encoding stuff out of my code, and now it just works. Seems that FeedZirra has something built in to transcode the feeds if neccessary.
I'm working in IOS and trying to pass some content to a web server via an NSURLRequest. On the server I have a PHP script setup to accept the request string and convert it into an JSON object using the Zend_JSON framework. The issue I am having is whenever the character "ø" is in any part of the request parameters, then the request string is cut short by one character.
Request string before going to server.
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Østfold","udid":"bed164974ea0d436a43f3cdee0e005a1"}]
Request string on server before any parsing
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Nord-Trøndelag","udid":"bed164974ea0d436a43f3cdee0e005a1"}
Everything looks exactly the same except the final closing ] is missing. I'm thinking it's having an issue when converting the string to UTF-8, but not sure the correct way to fix this issue.
Does anyone have any ideas why this is happening?
first of all do not trust the xcode console in such cases. you never know which coding the console is actually using.
second, escape the invalid characters before you build you json string. easiest way would probably to make sure you are using the same unicode representation, like utf-8, all the time.
third, if there are still invalid characters use a json lib with a parser (does the encoding). validate the output by parsing back to e.g. NSString. or validate the output manually by using a web form like http://jsonformatter.curiousconcept.com/
the badest way is to replace the single characters in the string, build your json and convert back. one way to do this could be to replace e.g an german ä with its unicode representaion U+00E4 (http://www.utf8-chartable.de/).
Thats the way I do it. I am glad that I nerver needed to go further than step three and this is the step you should do anyway to keep your code simple.
Please try to use Zends internal json Encoding:
Zend_Json::$useBuiltinEncoderDecoder = true;
should fix your issue.
I'm working on a licensing system for my application. I'd like to put all licensing information (licensee name, expiration date, and enabled features) into an object, encrypt that object with a private key, then represent the encrypted data as a single text string which I can send via email to my customers.
I've managed to get the encrypted data into a byte stream, but I don't know how to convert that byte stream into a text value -- something that contains no control characters or whitespace. Can anyone offer advice on how to do that? I've been researching the Encoding class, but I can't find a text-only encoding.
I'm using Net 2.0 -- mostly VB, but I can do C# also.
Use a Base64Encoder to convert it to a text string that can be decoded with a Base64Decoder. It is great for representing arbitary binary data in a text friendly manner, only upper and lower case A-Z and 0-9 digits.
BinHex is an example of one way to do that. It may not be exactly what you want -- for example, you might want to encode your data such that it's impossible to inadvertently spell words in your string, and you may or may not care about maximizing the density of information. But it's an example that may help you come up with your own encoding.
I've found Base32 useful for license keys before. There are some C# implementations linked from this answer. My own license code is based on this implementation, which avoids ambiguous characters to make it easier to retype the keys.