Twisted.web File directory listing issues - twisted.web

I'm trying to use Twisted in a web-app, and I'm coming across an interesting issue. I'm very new to Twisted, so I'm not sure if I'm seeing a bug in Twisted, or if I just am not using it correctly.
Theoretically from the example, a File resource object can be use to both serve files from a directory, as well as provide the directory listing. So assuming I have the variables (port, reportsDir) defined elsewhere before the code snippet, I do the following:
rootResource = Resource()
rootResource.putChild("reports", File(reportsDir))
reactor.listenTCP(port, Site(rootResource))
reactor.run(installSignalHandlers=False)
Now, when I access '/reports' on my host I get a message "Request did not return bytes" in my browser with a bunch of stuff that was obviously produced by twisted, but also contains a print of a u'.....' string literal, which in fact has the directory listing in it. So the DirectoryLister is obviously creating the listing HTML, but it isn't seeing as valid by something in Twisted. It doesn't seem to like the unicode string; which was in fact produced by Twisted itself.
Do I need to set some other configuration item to get it to convert the unicode string to the necessary bytes object (or whatever), or some other approach?
Many thanks,
-D

Well, it seems like the issue is that Python will promote any string to unicode if any source string on a format was unicode. In my case, "reportsDir" was unicode because it came from a XML file, and that set it down the error path.
Changing the above line:
rootResource.putChild("reports", File(reportsDir))
to:
rootResource.putChild("reports", File(reportsDir.encode('ascii', 'ignore')))
fixed the issue. I would however suggest that the Twisted developers do a check for unicode in the constructor for File, or in the DirectoryLister simply check for unicode, and if it is then return the ascii-encoded version.

Related

gulp-newer vs gulp-changed

What're the differences between them?
gulp-newer:
gulp.src(imgSrc)
.pipe(newer(imgDest))
.pipe(imagemin())
.pipe(gulp.dest(imgDest));
gulp-changed:
gulp.src(SRC)
.pipe(changed(DEST))
// ngmin will only get the files that
// changed since the last time it was run
.pipe(ngmin())
.pipe(gulp.dest(DEST));
It seems gulp-changed is more powerful, because it provides an option
hasChanged: changed.compareLastModifiedTime
I hope it's not too late to answer this question. I have had to evaluated both of them at a source-code level for a recent project, and here is my take.
gulp-newer
At the core, this plugin compares the source and dest file's modified time (see node API) to decide whether the source file is newer than the dest file or if there is no dest file at all. Here is the related code in the plugin:
var newer = !destFileStats || srcFile.stat.mtime > destFileStats.mtime;
gulp-changed
This plugin by default also uses a file's modified time to decide which to pass through the stream
function compareLastModifiedTime(stream, cb, sourceFile, targetPath) {}
but it goes one step further by offering an option to compare the file's content SHA1 hash:
function compareSha1Digest(stream, cb, sourceFile, targetPath) {}
This information is nicely documented.
Conclusion
So theoretically speaking, if you use gulp-changed's default hasChanged: changed.compareLastModifiedTime, each plugin is relatively as fast as the other. If you use gulp-changed's hasChanged: changed.compareSha1Digest, it's reasonable to expect gulp-changed to be a bit slower because it does create a SHA1 hash of the file content. I didn't benchmark but I'm also interested in seeing some number.
Which to choose
gulp-changed, purely because of the developer behind it (sindresorhus). If one day this awesome man decides that he will stop supporting his gulp plugins, I think I will stop using gulp altogether.
Joking aside, though, gulp-changed's source code is gulp-y, while gulp-newer's source reads pretty much like just another node module's source with lots of promises. So another +1 for gulp-changed :)
HUGE EDIT
Gulp-changed only works with 1:1 source:dest mapping. If you need many:1, e.g. when using with gulp concat, choose gulp-newer instead.
May I suggest gulp-newy in which you can manipulate the path and filename in your own function. Then, just use the function as the callback to the newy(). This gives you complete control of the files you would like to compare.
This will allow 1:1 or many to 1 compares.
newy(function(projectDir, srcFile, absSrcFile) {
// do whatever you want to here.
// construct your absolute path, change filename suffix, etc.
// then return /foo/bar/filename.suffix as the file to compare against
}
In order to answer this question you will have to compare both plugins source code.
Seems that gulp-changed has more options as you have said, more used (it was downloading more time) and more contributors, thus, it could be more updated and refactored, as it was being used more.
Something that can make a difference, due to them documentation.
On the example, for gulp-newer, its used like this:
gulp.task('default', function() {
gulp.watch(imgSrc, ['images']);
});
Thus, seems that once this task is running, it will only notice files that are changing while you are using this plugin.
On gulp-changed, they say: "will only get the files that changed since the last time it was run". So, and I didnt try this on a working example, that gulp-changed proccess all files and then only the ones that have been changed since last execution, so seems it will always "look" at all files and internally (md5 hash? no clue, didnt check the source) decide whereas a file has changed since last execution. Do not need a watcher for that.
All this, was only reading their official documentation.
A "on the wild test" would be very welcomed !

Feed Encoding Problems Ruby 1.9

i am trying to parse rss/atom-feeds in my rails app, but i encountered some serious problems with non-ASCII characters, eg. the german umlauts ÄÖÜ or ß. Some feeds in the wild use proper UTF-8, but some others make me cry. The general Problem is:
I must be able to parse any Feeds, whatever encoding they might have. The "loss" of characters is not an option (though its my current status), because i do some text and language analysis with the feed-items.
What i use so far:
FeedZirra for fetching and parsing the feeds, works well so far. I also "sanitize" the values i get from FeedZirra.
HTMLEntities (gem) for unescaping special characters, like "Ä" which means "Ä"
rCharDet19 gem, to figure out which encoding the feed might have, and to:
string.encode! to convert from whatever it is to utf-8
Ruby 1.9.3 (lastest) and Rails 3.2.8 on Ubuntu Linux 12.04
The problem is, that i literally have no idea what i'm doing wrong.
def self.sanitize_encoding_and_htmlentities str
cd = CharDet.detect str
s = str.encode(:invalid => :replace, :undef => :replace, :replace => '')
coder = HTMLEntities.new
coder.decode(s)
end
This is my current sanitize method. As sample-feed i use
http://www.N24.de/2/index.rss
So far, the "special" characters got replaced completely. This is the only variant i found which just works without raising an error due to invalid byte stuff. I changed the encode method slightly, because i read in the ruby doc that without any encoding given, the encode method should "translate" to the given default_internal Encoding of the app, which is utf-8 in my case. CharDet stands there just for possible changes to anything related, might be useful.
I used the magic_encoding gem, so every file in my project should have the comment on the first line. My database is sqlite3 with utf-8.
As of 2012, is there anything i should look at? Did i make anything really wrong?
Thanks for help!
EDIT:
The feeds may be rss of any kind, atom, and/or just invalid XML. The Encoding may be UTF-8, something different, or just says "utf-8" while its some windows-XXX stuff, and so on. I really need a solution for this alltogether.
Also the fetching/parsing must be as fast as possible, that's why i picked feedzirra.
My current Idea is to get the feedcontent, replace every char in the "title" and "description" nodes with htmlentities if possible, use the encode! method to switch to utf-8, and then unescape the htmlentities. After this, special characters should be keeped i think, but i can't get something like this working at the moment. Might this be a good approach?
Finally i found the main Problem:
Feedzirra already returns UTF-8 when accessing entries and their attributes. But i used the sanitize method to access attributes, which returns ASCII-8BIT and weird characters escaped as html-entities.
However, i kicked all the sanitizing and encoding stuff out of my code, and now it just works. Seems that FeedZirra has something built in to transcode the feeds if neccessary.

IOS JSON escaping special characters

I'm working in IOS and trying to pass some content to a web server via an NSURLRequest. On the server I have a PHP script setup to accept the request string and convert it into an JSON object using the Zend_JSON framework. The issue I am having is whenever the character "ø" is in any part of the request parameters, then the request string is cut short by one character.
Request string before going to server.
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Østfold","udid":"bed164974ea0d436a43f3cdee0e005a1"}]
Request string on server before any parsing
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Nord-Trøndelag","udid":"bed164974ea0d436a43f3cdee0e005a1"}
Everything looks exactly the same except the final closing ] is missing. I'm thinking it's having an issue when converting the string to UTF-8, but not sure the correct way to fix this issue.
Does anyone have any ideas why this is happening?
first of all do not trust the xcode console in such cases. you never know which coding the console is actually using.
second, escape the invalid characters before you build you json string. easiest way would probably to make sure you are using the same unicode representation, like utf-8, all the time.
third, if there are still invalid characters use a json lib with a parser (does the encoding). validate the output by parsing back to e.g. NSString. or validate the output manually by using a web form like http://jsonformatter.curiousconcept.com/
the badest way is to replace the single characters in the string, build your json and convert back. one way to do this could be to replace e.g an german ä with its unicode representaion U+00E4 (http://www.utf8-chartable.de/).
Thats the way I do it. I am glad that I nerver needed to go further than step three and this is the step you should do anyway to keep your code simple.
Please try to use Zends internal json Encoding:
Zend_Json::$useBuiltinEncoderDecoder = true;
should fix your issue.

Issues getting all data from an image file using Lua io.read('*a')

I'm trying to get all the data from image file (jpg/jpeg/gif/png/bmp etc.) use Lua's io.read() function, but I'm not having much luck as it only seems to read a small piece of the data.
As a side note all plain text files are being read just fine, so I'm assuming that the problem is with character encoding or some such thing.
Example:
local data
local fileHandle
fileHandle = io.open ( 'pic.jpg')
data = fileHandle:read('*a')
print(data)
If you're on Windows, open file as binary: io.open('pic.jpg', 'rb').
Also it is a good idea to wrap io.open() in assert() to catch errors (or do handle them otherwise, of course).

vb.net character set

According to MSDN vb.net uses this extended character set. In my experience it actually uses this:
What am I missing? Why does it say it uses the one and uses the other?
Am I doing something wrong?
Is there some sort of conversion tool to the original character set?
This behaviour is defined in the documentation of the Chr command:
The returned value depends on the code page for the current thread, which is contained in the ANSICodePage property of the TextInfo class in the System.Globalization namespace. You can obtain ANSICodePage by specifying System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.
So, the output of Chr for values greater than 127 is system-dependent. If you want reproducible results, create the desired instance of Encoding by calling Encoding.GetEncoding(String), then use Encoding.GetChars(Byte()) to convert your numeric values into characters.
If you go up one level in the chart linked in your question, you will see that they do not claim that this chart is always the output of the Chr command:
The characters that appear in Windows above 127 depend on the selected typeface.
The charts in this section show the default character set for a console application.
Your application is a WinForm application, not a console application. Even in the console, the character set used can be changed (for example, by using the chcp command), hence the word "default".
For detailed information about the encodings used in .net, I recommend the following MSDN article: Character Encoding in the .NET Framework.
The first character set is Code Page 437 (CP437), the second looks like Code Page 1252 (CP1252) also known as Windows Latin-1.
I'd guess VB.Net is simply picking up the default encoding for the PC.
How did you write all this? Because usually, when you use a output stream function, you can specify the encoding going with it.
Edit: I know this is not C#, but you can see the idea...
You'd have to set the encoding of your filestream, by doing something like this:
Setting the encoding when creating the filestream