BZip decompress - sql

Is there a way to decompress a BZip2 compressed string in MS SQL? Other than using xp_cmdshell and running it through bzip2.exe?
I have a string like BZh41AY&SY3‹Ï¬€ !˜„]ÉáB#Î/>° this is simply 'test'

You could use SQL CLR to do this - utilizing the GZipStream class to decompress the value, or some third party library if you so choose.

Related

Openvms: Extracting RMS Indexed file t to Windows as a sequential flat file

I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before
The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.

ORC format - PIG - dependent libraries

I understand to write into ORC format + snappy compression (pig script),
using OrcStorage('-c SNAPPY')
I need your help, what is the SET command or necessary library I need to include to enable storing result dataset into ORC format?
Please help.
Subra
Check what pig version are you using.
ORC storage is available from pig14 as a build in function.
Check the examples:
https://pig.apache.org/docs/r0.14.0/func.html#OrcStorage
UPDATE
This pig just works fine:
data = LOAD 'SO/date.txt' USING PigStorage(' ') AS (ts:chararray);
STORE data INTO 'orc/snappy' using OrcStorage('-c SNAPPY');
data_orc = LOAD 'orc/snappy' using OrcStorage('-c SNAPPY');
DUMP data_orc;
You don't even need to register the kryo jar, because that not used directly from the pig so it will be optimized out, but you use it via reflection so you have to add the kryo jar to the classpath, like this:
pig -latest -useHCatalog -cp ./kryo-2.24.0.jar orc.pig

exist-db not reading gzip files

I have just installed a new Exist-db and I'm willing to use it to parse XML files that are actually compressed in gzip.
It is my understanding that exist-db has the cappability to perform this kind of operations, but I keep getting the error MIME type invalid.
I've added a new MIME type in the mime-types.xml file with the following parameters:
<mime-type name="application/zip" type="binary">
<description>GZIP archive</description>
<extensions>.gz</extensions>
</mime-type>
But I keep getting the same reading error.
Could somebody point me in the right direction? Am I missing something?
Thanks!
G.
eXist-db can only work on XML data that has been parsed and processed (and indexed) into the eXist-db internal storage format. This means that the data needs to be decompressed before it can be queried; A GZIPped XML document stored in the database is considered to be "a binary blob' and cannot be queried.
When the GZIP file is stored in the database, you can use the compression:unzip() function (link) to uncompress the document. The document can then be stored in the database.

Does libhdfs c/c++ api support read/write compressed file

I have found somebody talks libhdfs does not support read/write gzip file at about 2010.
I download the newest hadoop-2.0.4 and read hdfs.h. There is also no compressing arguments.
Now I am wondering if it supports reading compressed file now?
If it not, how can I make a patch for the libhdfs and make it work?
Thanks in advance.
Best Regards
Haiti
As I have known, libhdfs only uses JNI to access the HDFS. If you are familiar with HDFS Java API, libhdfs is just a wrapper of org.apache.hadoop.fs.FSDataInputStream. So it can not read compressed files directly now.
I guess that you want to access the file in the HDFS by C/C++. If so, you can use libhdfs to read the raw file, and use the zip/unzip C/C++ library to decompress the content. The compressed file format is the same. For example, if the files are compressed by lzo, then you can use lzo library to decompress them.
But if the file is a sequence file, then you may need to use the JNI to access them as they are Hadoop special file. I have seen Impala does the similar work before. But it's not out-of-the-box.
Thanks for the reply. Use libhdfs to read raw file, then use zlib to inflate the content. This can work. The file used gzip. I used the codes like these.
z_stream gzip_stream;
gzip_stream.zalloc = (alloc_func)0;
gzip_stream.zfree = (free_func)0;
gzip_stream.opaque = (voidpf)0;
gzip_stream.next_in = buf;
gzip_stream.avail_in = readlen;
gzip_stream.next_out = buf1;
gzip_stream.avail_out = 4096 * 4096;
ret = inflateInit2(&gzip_stream, 16 + MAX_WBITS);
if (ret != Z_OK) {
printf("deflate init error\n");
}
ret = inflate(&gzip_stream, Z_NO_FLUSH);
ret = inflateEnd(&gzip_stream);
printf("the buf \n%s\n", buf1);
return buf;

How to Store BLOB data in Sqlite Using Tcl

I have a Tcl TK application that has a Sqlite back-end. I pretty much understand the syntax for inserting, manipulating, and reading string data; however, I do not understand how to store pictures or files into Sqlite with Tcl.
I do know I have to create a column that holds BLOB data in Sqlite. I just don't know what to do on the Tcl side of things. If anyone knows how to do this or has a good reference to suggest for me, I would really appreciate it.
Thank you,
Damion
In my code, I basically open the file as a binary, load its content into a Tcl variable, and stuff that into the SQLite db. So, something like this...
# load the file's contents
set fileID [open $file RDONLY]
fconfigure $fileID -translation binary
set content [read $fileID}
close $fileID
# store the data in a blob field of the db
$db eval {INSERT OR REPLACE INTO files (content) VALUES ($content)}
Obviously, you'll want to season to taste, and you're table will probably contain additional columns...
The incrblob command looks like what you want: http://sqlite.org/tclsqlite.html#incrblob
The "incrblob" method
This method opens a TCL channel that
can be used to read or write into a
preexisting BLOB in the database. The
syntax is like this:
dbcmd incrblob ?-readonly?? ?DB? TABLE COLUMN ROWID
The command returns a new TCL channel
for reading or writing to the BLOB.
The channel is opened using the
underlying sqlite3_blob_open()
C-langauge interface. Close the
channel using the close command of
TCL.