exist-db not reading gzip files - gzip

I have just installed a new Exist-db and I'm willing to use it to parse XML files that are actually compressed in gzip.
It is my understanding that exist-db has the cappability to perform this kind of operations, but I keep getting the error MIME type invalid.
I've added a new MIME type in the mime-types.xml file with the following parameters:
<mime-type name="application/zip" type="binary">
<description>GZIP archive</description>
<extensions>.gz</extensions>
</mime-type>
But I keep getting the same reading error.
Could somebody point me in the right direction? Am I missing something?
Thanks!
G.

eXist-db can only work on XML data that has been parsed and processed (and indexed) into the eXist-db internal storage format. This means that the data needs to be decompressed before it can be queried; A GZIPped XML document stored in the database is considered to be "a binary blob' and cannot be queried.
When the GZIP file is stored in the database, you can use the compression:unzip() function (link) to uncompress the document. The document can then be stored in the database.

Related

error: failed to encode '---------_dict.sql' from UTF-8 to Windows-1250

when I download the repository I get this error from the git
error: failed to encode '---------_dict.sql' from UTF-8 to Windows-1250.
Then while I want to commit and push I get the same error with the same files with the .sql extension. Anyone have any idea? Someone had a similar problem? Could it be related to the .gitattributes file which has
* .sql text working-tree-encoding = Windows-1250
This error message means that some part of the conversion failed, most likely because the contents of the file cannot be converted to windows-1250. It's likely that the file contains UTF-8 sequences corresponding to Unicode characters that have no representation in windows-1250.
You should contact the author of the repository and notify them of this problem and ask them to fix it. In your local system, you can add .git/info/attributes which has the following to force the files to UTF-8 instead:
*.sql text working-tree-encoding=UTF-8
Note that if you do this, you must ensure that the files you check in are actually UTF-8 and not windows-1250.

Load compressed data from Amazon S3 to Postgres using datastage

I am trying to load data which is stored in .gz format in S3 to PostgreSQL server using Datastage. I am using the ODBC connector on the target (database) side. I am able to load uncompressed data from S3 to PostgreSQL but no luck with compressed data so far. I have tried the Expand Stage but it's not helping or I am not doing the right thing. Without the "Expand" the data is coming but it is trying to read the compressed data, while doing so it fails and throws an error:
Amazon_S3_0,1: com.ascential.e2.common.CC_Exception: Failed to initialize the parser: The row delimiter was not found within the first 132 bytes of the file. Ensure that the Row delimiter property matches the row delimiter of the file.
at com.ibm.iis.cc.cloud.CloudLogger.createCCException (CloudLogger.java: 196)
at com.ibm.iis.cc.cloud.CloudStage.processReadAndParse (CloudStage.java: 1591)
at com.ibm.iis.cc.cloud.CloudStage.process (CloudStage.java: 680)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.run (CC_JavaAdapter.java: 443)
Amazon_S3_0,1: Failed to initialize the parser: The row delimiter was not found within the first 132 bytes of the file. Ensure that the Row delimiter property matches the row delimiter of the file. (com.ibm.iis.cc.cloud.CloudLogger::createCCException, file CloudLogger.java, line 196)
If someone has come across this, please share your valuable inputs.

Boto3: upload file from base64 to S3

How can I directly upload a base64 encoded file to S3 with boto3?
object = s3.Object(BUCKET_NAME,email+"/"+save_name)
object.put(Body=base64.b64decode(file))
I tried to upload the base64 encoded file like this, but then the file is broken. Directly uploading the string without the base64 decoding also doesn't work.
Is there anything similar to set_contents_from_string() from boto2?
I just fixed the problem and found out that the way of uploading was correct, but the base64 string was incorrect because it still contained the prefix data:image/jpeg;base64, - removing that prefix solved the problem.
If you read the documentation thoughtfully on Object.put, you will see this
response = object.put(
ACL='private'......,
Body=b'bytes'|file,
.....,
Body only accept file object or bytes, any other type will failed. base64.b64decode doesn't read file object automatically, you must read the data into the decode module.
# FIX
object.put(Body=base64.b64decode(file.read()))
As reminder, always post the stacktrace.

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception
org.apache.commons.vfs.FileNotFolderException:
Could not list the contents of
"tar:gz:file:///C:/test/test.tar.gz!/test.tar!/"
because it is not a folder.
I use windows8.1, pdi 5.2
Where it might be wrong?
For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.
Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)
I have placed a sample code containing both reading zip and gzip files. Check it here.
Hope it helps :)

How can I determine if a file upload is a valid CSV file - or at least text - in ColdFusion 8?

I have a form which allows a user to upload a file to the server. How can I validate that the uploaded file is in fact the expected format (CSV, or at least validate that it is a text file) in ColdFusion 8?
For simple formats like CSV, just check yourself, for example via regex.
<cffile action="read" file="#uploadedFile#" variable="contents" charset="UTF-8">
<cfset LooksLikeCSV = REFind("^([^;]*;)+[^;]*$", contents)>
You can place additional checks with regard to file size limits or forbidden characters.
For other file formats, you can check for header signatures that occur in the first few bytes of the file.
You could even write a full parser for your expected file format - for CSV validation, you could do a ListToArray() at CR/LF and check each line individually against a regex. XML should work pretty straightforward as well - just try to pass it to XmlParse(). Binary formats like images are a little more difficult, but libraries exist there as well.
I dont know if it can help you but Ben Nadel wrote excellents posts about CSV:
http://www.bennadel.com/blog/483-Parsing-CSV-Data-Using-ColdFusion.htm
http://www.bennadel.com/blog/976-Regular-Expressions-Make-CSV-Parsing-In-ColdFusion-So-Much-Easier-And-Faster-.htm
http://www.bennadel.com/blog/501-Parsing-CSV-Values-In-To-A-ColdFusion-Query.htm
I think it's as simple as specifying the accept value in cffile ...Unfortunately the CF8 docs don't specify the value as part of the info for cffile ... It's under file management ...
<cffile action=”upload” filefield=”filename” destination=”#destination#” accept=”text/csv”>
CF8 » Controlling the type of file uploaded