S3's BOTO is returning NoSuchKey while trying to copy an existing key - amazon-s3

I've create a key on S3.
mykey.exists() returns true
mykey.get_contents_to_filename() generates a file that is correct
But:
mykey.copy('bucket', '/backup/file')
returns:
NoSuchKey
The Specified key does not exist.
Key = mykey
It looks like I'm using boto 2.0b4
If the key exists, why am I getting a NoSuchKey error?
What am I missing?
edit: change backslash in key name to the foreslashes that I am actually using

I have a theory that because amazon s3 is eventually consistent, one request could see the key (.exists() == True) while another request ends up at a different s3 server which does not yet have knowledge of the new key (an inconsistent read - this is the difficulty with eventually consistent data stores. This is known behavior for s3 with a put followed by a head/get. I expect it to hold for copy as well.) After a usually short (but indefinite) period of time all requests will see your key. Normally this is only about a second or two. Put a 30 second timeout in your code between the exists() check and the copy. Does it still happen?
The issue is described here: https://forums.aws.amazon.com/thread.jspa?threadID=21634&tstart=0)

I think you may be running into an issue with your key name. The baskslash characters in the string '\backup\file' are actually is interpreted as string escapes so '\b' is replaced with the ASCII backspace character and '\f' is interpreted as the ASCII formfeed (see this for more details). While that probably isn't what you intended, it really should still work but there was a bug in the escaping of key names in boto2.0b4 (which is fixed now in github master) that is preventing this from working.
If you actually want your keyname to be "\backup\file" try specifying it as r'\backup\file' in Python. This treats it as a raw string and no escape processing will occur.

Related

Can we set multiple DCM_SpecificCharacterSet while importing records using DICOM?

Currently, I am using the below code to set parameters to retrieve data from PACS.
DcmDataset findParams = DcmDataset();
findParams.putAndInsertString(DCM_QueryRetrieveLevel, "SERIES");
findParams.putAndInsertString(DCM_SpecificCharacterSet, "ISO_IR 192");
However, just wanted to check can we provide support multiple characters set to import data at the same time, Code will look like something below, I am trying to check whether this is possible or not as I dont have the facility to verify the same.
findParams.putAndInsertString(DCM_SpecificCharacterSet, "ISO_IR 192" ,"ISO_IR 100");
I think that what you want to express is that "this Query SCU can accept responses in the following character sets". This is plainly not possible. See a discussion in the DICOM newsgroup for reference. It ends with a proposal to add character set negotiation to the association negotiation. But such a supplement has not been submitted yet, and I am not aware of anyone working on it currently.
The semantics of the attribute Specific Character Set (0008,0005) in the context of the Query Retrieve Service Class:
PS3.4, C.4.1.1.3.1 Request Identifier Structure
Conditionally, the Attribute Specific Character Set (0008,0005). This Attribute shall be included if expanded or replacement character sets may be used in any of the Attributes in the Request Identifier. It shall not be included otherwise
I.e. it describes nothing but the character encoding of your request dataset.
and
C.4.1.1.3.2 Response Identifier Structure
Conditionally, the Attribute Specific Character Set (0008,0005). This Attribute shall be included if expanded or replacement character sets may be used in any of the Attributes in the Response Identifier. It shall not be included otherwise. The C-FIND SCP is not required to return responses in the Specific Character Set requested by the SCU if that character set is not supported by the SCP. The SCP may return responses with a different Specific Character Set.
I.e. you cannot control the character set in which the SCP will send you the responses. Surprising but a matter of fact.
Sending multiple values for the attribute is possible but has different semantics. It means that the request contains characters from different character sets which are switched using Code Extension Techniques as defined in ISO 2022. An illustrative example how this would look like and what it would mean is found in PS3.5, H.3.2
What implementors usually do to avoid character set compatibility issues is configuring "the one and only" character set for a particular installation (=hospital) in a locale configuration that is configured upon system setup. It works pretty well, for e.g. an installation in Russia will very likely support Cyrillic (ISO_IR 144) or UNICODE (ISO_IR 192) or both. In case of "both", you can select the character set that you prefer for configuring your system.

Amazon s3 URL + being encoded to %2?

I've got Amazon s3 integrated with my hosting account at WP Engine. Everything works great except when it comes to files with + characters in them.
For example in the following case when a file is named: test+2.pdf
http://support.mcsolutions.com/wp-content/uploads/2011/11/test+2.pdf = does not work.
The following URL is the amazon URL. Notice the + charcter is encoded. Is there a way to prevent/change this?
http://mcsolutionswpe.s3.amazonaws.com/mcsupport/wp-content/uploads/2011/11/test%2b2.pdf
Other URLs work fine:
Amazon -> http://mcsolutionswpe.s3.amazonaws.com/mcsupport/wp-content/uploads/2011/11/test2.pdf
Website -> http://support.mcsolutions.com/wp-content/uploads/2011/11/test2.pdf
If I understand your question correctly, then no, there is no way to really change this.
The cause appears to be an unfortunate design decision made on S3 many years ago -- which, of course, cannot be fixed, now, because it would break too many other things -- which involves S3 using an incorrect variant of URL-escaping (which includes but is not quite limited to "percent-encoding") in the path part of the URL, where the object's key is sent.
In the query string (the optional part of a URL after ? but before the fragment, if present, which begins with #), the + character is considered equivalent to [SPACE], (ASCII Dec 32, Hex 0x20).
...but in the path of a URL, this is not supposed to be the case.
...but in S3's implementation, it is.
So + doesn't actually mean +, it means [SPACE]... and therefore, + can't also mean +... which means that a different expression is required to convey + -- and that value is %2B, the url-escaped value of + (ASCII Dec 43, Hex 0x2B).
When you upload your files, the + is converted by the code you're using (assuming it understands this quirk, as apparently it does) into the format S3 expects (%2B)... and so it must be requested using %2B so when you download the files.
Strangely, but not surprisingly, if you store the file in S3 with a space in the path, you can actually request it with a + or a space or even %20 and all three of these should actually fetch the file... so if seeing the + in the path is what you want, you can sort of work around the issue by saving it with a space instead, though this workaround deserves to be described as a "hack" if ever a workaround did. This tactic will not work with libraries that generate pre-signed GET URLs, unless they specifically are designed to ignore the standard behavior of S3 and do what you want, instead... but for public links, it should be essentially equivalent.

How to identify Drive ID?

The new Google Drive Android API has 2 types of string IDs available, the 'resource' ID and the 'encoded' ID.
'encoded' id from DriveId.encodeToString()
"DriveId:CAESHDBCMW1RVVcyYUZKZmRhakIzMDBVbXMYjAUgssy8yYFRTTNKRU55"
'resource' id from DriveId.getResourceId()
"UW2aFJfdajB3M3JENy00Ums0B1mQ"
In the process I end-up with a string that can contain any one of them (result of some timing issues). My question is:
If I need to 'parse' the string in order to identify the type, is there a characteristic I can rely on? For instance:
'encoded' id will always start with 'DriveId:' substring
'resource' id will have some length limit
can I abuse error return from 'decodeFromString()'?
or should I form (pre-pend) the string container with my own tag? What could be the minimal 'safe' tag (i.e. what will never appear in the beginning of these ids) ?
Please point me in the right direction so I don't have to re-do it with the next release.
I have run into yet another issue that should be mentioned here so others don't waste time falling into the same pit. The 'resourceID' can be ported and will remain unique for the object it identifies, where 'encodedID' has only 'device' scope. Means that you CAN'T transfer 'encodedID' to another device (with the same account) and try to retrieve file/folder with it. So I assume it is unique to a Google Play Services instance.
Please do not rely on any formatting of either ID type. This are subject to change without notice.
If you need to use both, and track the differences between them you should have your own method of doing so within your app.
Really, you should probably always just store the encoded ID since this one is always guaranteed to present, and if it contains a resourceId, its easy to get back out.

Twisted.web File directory listing issues

I'm trying to use Twisted in a web-app, and I'm coming across an interesting issue. I'm very new to Twisted, so I'm not sure if I'm seeing a bug in Twisted, or if I just am not using it correctly.
Theoretically from the example, a File resource object can be use to both serve files from a directory, as well as provide the directory listing. So assuming I have the variables (port, reportsDir) defined elsewhere before the code snippet, I do the following:
rootResource = Resource()
rootResource.putChild("reports", File(reportsDir))
reactor.listenTCP(port, Site(rootResource))
reactor.run(installSignalHandlers=False)
Now, when I access '/reports' on my host I get a message "Request did not return bytes" in my browser with a bunch of stuff that was obviously produced by twisted, but also contains a print of a u'.....' string literal, which in fact has the directory listing in it. So the DirectoryLister is obviously creating the listing HTML, but it isn't seeing as valid by something in Twisted. It doesn't seem to like the unicode string; which was in fact produced by Twisted itself.
Do I need to set some other configuration item to get it to convert the unicode string to the necessary bytes object (or whatever), or some other approach?
Many thanks,
-D
Well, it seems like the issue is that Python will promote any string to unicode if any source string on a format was unicode. In my case, "reportsDir" was unicode because it came from a XML file, and that set it down the error path.
Changing the above line:
rootResource.putChild("reports", File(reportsDir))
to:
rootResource.putChild("reports", File(reportsDir.encode('ascii', 'ignore')))
fixed the issue. I would however suggest that the Twisted developers do a check for unicode in the constructor for File, or in the DirectoryLister simply check for unicode, and if it is then return the ascii-encoded version.

Is it safe to convert a mysqlpp::sql_blob to a std::string?

I'm grabbing some binary data out of my MySQL database. It comes out as a mysqlpp::sql_blob type.
It just so happens that this BLOB is a serialized Google Protobuf. I need to de-serialize it so that I can access it normally.
This gives a compile error, since ParseFromString() is not intended for mysqlpp:sql_blob types:
protobuf.ParseFromString( record.data );
However, if I force the cast, it compiles OK:
protobuf.ParseFromString( (std::string) record.data );
Is this safe? I'm particularly worried because of this snippet from the mysqlpp documentation:
"Because C++ strings handle binary data just fine, you might think you can use std::string instead of sql_blob, but the current design of String converts to std::string via a C string. As a result, the BLOB data is truncated at the first embedded null character during population of the SSQLS. There’s no way to fix that without completely redesigning either String or the SSQLS mechanism."
Thanks for your assistance!
It doesn't look like it would be a problem judging by that quote (it's basically saying if a null character is found in the blob it will stop the string there, however ASCII strings won't have random nulls in the middle of them). However, this might present a problem for internalization (multibyte charsets may have nulls in the middle).