Openvms: Extracting RMS Indexed file t to Windows as a sequential flat file - openvms

I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before

The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.

Related

Not able to filter files using pathGlobFilter

We are trying to read file from directory based on pattern from azure blob srorage.We are using
pathGlobFilter option to select files. The directory contains following files
Sales_51820_14529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
We need to process only those files which does not have "T" in file name .We need to process only these two files
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
But we are not able to read only these two files.
Here is the code,
df = spark.read.format("csv").schema(structSchema).options(header=False,inferSchema=True,sep='|',pathGlobFilter= "Sales_\d{5} _ \d{8}_[a-z0-9]+.csv$").load("wasbs://abc#xxxxx.blob.core.windows.net/abc/2022/02/11/"
Regards,
Rajib
Glob is not a standard regular expression, there is differences between them.
For example glob doesn't match the number of times.
For details, see:here
Back to this question, a relatively stupid way, looking forward to the perfect solution of the giant.
pathGlobFilter="Sales_[0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[a-z0-9]*.csv"

How to do an incremental read of binary files

TL;DR: can I do an incremental read of binary files with Red or Rebol?
I would like to use Red to process some large (13MB to 2GB) structured binary files (Kurzweil synthesizer files). I've used other languages (C, Go, Tcl, Ruby, Dart) to walk through these files, and now I'd like to do the same with Red or Rebol.
Is there a way to incrementally read binary files, byte by byte? All I see is read/binary which seems to slurp the entire file at once (or a part of a file).
I'll need to jump around a little bit, too (either peek at the next byte, or skip to the end of a section, or skip past variable length strings to the start of data).
(Yes, I could make some helpers that tracked the position and used read/part/seek.)
I would like to make a call to the low level OS read/seek if that is possible - something new to learn.
This is on macos, but a portable solution would be great.
Thanks!
PS: "open/read %abc" gives an error "*** Script Error: open does not allow file! for its port argument", even though the help message say the port argument is "port [port! file! url! block!]"
Rebol has ports for that, which are planned for 0.7.0 release in Red. So, current I/O is very basic and buffer-only, and open is a preliminary stub.
I would like to make a call to the low level OS read/seek if that is possible - something new to learn.
You can leverage Rebol or Red/System FFI as a learning excercise.
Here is how you would do it in Rebol:
>> file: open/direct/binary %file.dat
>> until [none? probe copy/part file 20]
>> close file
#{732F7072696E74657253657474696E6773312E62}
#{696E504B01022D00140006000800000021006149}
#{0910890100001103000010000000000000000000}
...
#{000000006A290000646F6350726F70732F617070}
#{2E786D6C504B0506000000000D000D0068030000}
#{292C00000000}
none
first file or pick file 1 will return the next byte value (integer!)
This even works with text files: open/lines/direct, in that case copy/part file 20 will return 20 lines, or you can use pick file 1 or first file to get the next line.
Soon this will be available on Red too.

Open a .cfile from rtl_sdr after convert with GNU Radio

I have a binary file (capture.bin) from the rtl_sdr tool. I convert it to a .cfile with this manual http://sdr.osmocom.org/trac/wiki/rtl-sdr#Usingthedata
Where can I get the data in this file? The goal is to get a numerical format output from the the source. Is this possible?
That actually is covered by a GNU Radio FAQ entry.
What is the file format of a file_sink? How can I read files produced by a file sink?
All files are in pure binary format. Just bits. That’s it. A floating point data stream is saved as 32 bits in the file, one after the other. A complex signal has 32 bits for the real part and 32 bits for the imaginary part. Reading back a complex number means reading in 32 bits, saving that to the real part of a complex data structure, and then reading in the next 32 bits as the imaginary part of the data structure. And just keep reading the data.
Take a look at the Octave and Python files in gr-utils for reading in data using Octave and Python’s Scipy module.
The exception to the format is when using the metadata file format. These files are produced by the File Meta Sink: http://gnuradio.org/doc/doxygen/classgr_1_1blocks_1_1file__meta__sink.html block and read by the File Meta Source block. >See the manual page on the metadata file format for more information about how to deal with these files.
A one-line Python command to read the entire file into a numpy array is:
f = scipy.fromfile(open("filename"), dtype=scipy.uint8)
Replace the dtype with scipy.int16, scipy.int32, scipy.float32, scipy.complex64 or >whatever type you were using.
Update
scipy.fromfile will be deprecated in v2.0 so instead use numpy library
f = numpy.fromfile(open("filename"), dtype=numpy.uint8)

How to split sql in MAC OSX?

Is there any app for mac to split sql files or even script?
I have a large files which i have to upload it to hosting that doesn't support files over 8 MB.
*I don't have SSH access
You can use this : http://www.ozerov.de/bigdump/
Or
Use this command to split the sql file
split -l 5000 ./path/to/mysqldump.sql ./mysqldump/dbpart-
The split command takes a file and breaks it into multiple files. The -l 5000 part tells it to split the file every five thousand lines. The next bit is the path to your file, and the next part is the path you want to save the output to. Files will be saved as whatever filename you specify (e.g. “dbpart-”) with an alphabetical letter combination appended.
Now you should be able to import your files one at a time through phpMyAdmin without issue.
More info http://www.webmaster-source.com/2011/09/26/how-to-import-a-very-large-sql-dump-with-phpmyadmin/
This tool should do the trick: MySQLDumpSplitter
It's free and open source.
Unlike the accepted answer to this question, this app will always keep extended inserts intact so the precise form of your query doesn't matter; the resulting files will always have valid SQL syntax.
Full disclosure: I am a share holder of the company that hosts this program.
The UploadDir feature in phpMyAdmin could help you, if you have FTP access and can modify your phpMyAdmin's configuration (or are allowed to install your own instance of phpMyAdmin).
http://docs.phpmyadmin.net/en/latest/config.html?highlight=uploaddir#cfg_UploadDir
You can split into working SQL statements with:
csplit -s -f db-part db.sql "/^# Dump of table/" "{99}"
Which makes up to 99 files named 'db-part[n]' from db.sql
You can use "CREATE TABLE" or "INSERT INTO" instead of "# Dump of ..."
Also: Avoid installing any programs or uploading your data into any online service. You don't know what will be done with your information!

Determining if two rar files are part of the same set

Let's say I have two files, (name).n.rar and (name).n+1.rar, which appear to be part of the same set (same size, etc). Is there any easy way to tell if they're actually part of the same set, without first downloading the full set? Currently the only way I can tell is by downloading an instance of every file and and then seeing if WinRAR gives me an error when I try to unwrap them.
(And on a related note, assuming there is such a method, can I do the same without having adjacent parts?)
Ideally there's an existing program that can do this, but I can code my own if necessary.
Further notes: These are two sets of archives of the same file. They appear identical to obvious checks: filenames are subsequent, contents are sane, sizes are identical, same number of parts. I then receive a full set of files. If they're not from the same set, I can't unrar them - though it seems that WinRAR will proceed to 100% before giving me the CRC error (file corrupt.)
New Answer
All tests were made using WinRAR 5.01 32-bit. Since the algorythm should remain the same, the following statements should be valid for any other previous version. Feel free to comment if you know that's not true.
I'll give a short briefing about the chat. I tried to pack a file larger than 1GB several times; Then I mixed up the files and tried to extract the archives: it worked. The problem was not the size of the file indeed.
I thought about three possible solutions to the problem:
Architecture was influent in the packaging process: so different people tried to pack the files, and mixing up them would result in an error;
Different people tried to pack the files, giving a slightly different size file (for example 250 MB and 250000 KB). This would have been noticed in the file properties, though;
Files were corrupted during the download: re-downloading them would confirm this hypothesis.
I was most curious about the first one: could architecture be influent in the packaging process?
I found out the answer is yes, it is. Here are the passages to repeat the experiment:
Pack your files in an archive, giving a precise part size, in computer A;
Pack the same exact files, giving the same exact part size, in computer B (TODO: Check if this experiment is still valid with similar architecture, e.g. Intel i7 with Intel i5) with a different architecture (e.g. Intel processor with AMD processor);
Transfer one (or more, if you wish, but of course not all of them!) parts from computer B to computer A. Remember to delete those files from computer A before the transfer;
Place all the files in the same directory, check if they all have the same name (e.g. "AAA part1", "AAA part2"...);
Extract them;
Enjoy your CRC Error!.
Tests were made using an Intel i7-3632QM and an AMD FX 6300.
I have some suspects about the fact that the compressed files are the same, but the CRC code is different.
Old Answer
There is a way indeed. During my Computer Science academic studies, we had a Computer Forensics class. I learned that every file has a static beginning (an header, we could say), that makes a program recognize its type and the way to decrypt it. To see it, you just have to open it with a text editor (Notepad++ is the best so far, I guess)
For example, jpeg images begin with ÿØÿá.
I tried to store a video in some splitted .rar files, and knowing if they are part of the same archive was simpler than I thought.
Every rar file begins with Rar!. On the second or third line, it should appear the name of the file stored in the archive: in my case, myVideo.mp4. If all your archives contain that filename, they're probably part of the same archive.
Things are getting worse if there are several files in the archive and you don't know their names. In fact, if there is more than one file, the RAR files structure is as follows:
File 1:
Rar!
NUL NUL NUL //Random things here
NUL NUL NUL NUL NUL myVideo.mp4 NUL NUL NUL NUL
//Random things here. If the dimensions of the file exceed the archive,
//the next file will begin with the same name.
//Let's assume that this is happening.
EOF
File 2:
Rar!
NUL NUL NUL //Random things here
NUL NUL myVideo.mp4 NUL NUL NUL
//This time the file is complete. Since there is still space in the archive,
//it will add another file
NUL NUL NUL NUL mySecondVideo.mp4 NUL NUL NUL NUL
EOF
Let's assume that at the end of the second archive, mySecondVideo hasn't been fully compressed yet.
File 3:
Rar!
NUL NUL NUL
NUL NUL NUL NUL mySecondVideo.mp4 NUL
NUL NUL NUL
NUL myTextFile.txt
NUL NUL NUL mySecondTextFile.txt NUL
EOF
If mySecondTextFile.txt isn't yet fully compressed, my fourth file will begin with its name.
I hope it's clear, I tried to keep it as simple as possible. In the case of more files, I would start from the last archive. I'd write down the first filename found on that file and I'd search it in the previous one. If I found that name, I'd repeat the sequence until the first archive.
I'm not familiar with RAR-format that much, but in case you decide to write your program in Java I can recommend using 7-Zip-JBinding.
http://sevenzipjbind.sourceforge.net/
http://sevenzipjbind.sourceforge.net/basic_snippets.html#open-multipart-rar-archives
You can download first n+1 parts of the archive and then call extract() method ignoring output data only caring for
IArchiveExtractCallback.setOperationResult(ExtractOperationResult)
calls (checking that CRC was ok) and monitoring files getting opened trough
IArchiveOpenVolumeCallback.getStream(java.lang.String)
If volume n+2 get requested, you can conclude that volume n+1 was the right one.
(I'm not 100% sure about this conclusion, but I would give it a try)