I'm working on a a tool to generate TSV files for import into a database using bcp.exe and I'd like to validate my output. I can do this by comparing the file I generate to the files produced by exporting using bcp from an existing database. My problem is that the ordering can sometimes be different between files. I'd like a tool that will tell me just if there are lines that have no exact match in a pair of files, irregardless of the order of the lines.
'Irregardless' of whether 'irregardless' is a word...
The reliable way to do that comparison is to sort the two files into the same order, and then do a file comparison. Since you mention 'bcp.exe', that sounds more like Windows and probably MS SQL Server than Unix and Sybase.
I'd probably use Cygwin and either diff or comm to compare (and sort to order) the files, or any equivalent Unix workalike toolset (MKS, ...). Other people might recommend other tools. It depends, in part, on how many differences you think you're likely to find normally, and how you will handle them after you find them. Is a GUI output necessary? Also, you face a problem tracking the differences back to specific line numbers in the unsorted data files.
Related
I suppose this is somewhat of an extension of the question asked here.
However, I believe the linked OP and my own reason for reading a file with SQL Developer are different. I am learning SQL and databases and am attempting to create a model database (as in, I won't be editing the data after insertion, just set up search queries and what not). I want it to be large (over 100,000 entries), so I've created a C++ program that wrote randomly generate entries for the database on a .txt file (one entry per line in the .txt file) instead of hard coding the insertion of each entry. Now what I want to do is read the .txt file in SQL Developer and inserts it into a table.
My problem lies in the fact that I am not able to create directories. I am using a university oracle connection and I do not have the privileges to actually make a directory so that I can use UTL_FILE on my .txt file as was answered in the linked question. Assuming there is no way for me to gain this permission, is there an alternate way I can accomplish reading a .txt file for data for my table? Is there a better way to go about creating "dummy data" for my database?
What I ended up doing to insert my mock data was change the way the .txt file was formatted. Instead of having my C++ code write the data one entry per row, I actually made the code so that it wrote SQL code to the .txt file as I think #toddlermenot was suggesting, more or less. After I had the C++ code write as many inserts-with-mock-entries as I needed to the text file, I just copy/pasted it to SQL developer and achieved the desired results.
My problem is a classic case of making the process more complicated than it needed to be.
Also, even though I did not use the method,#Multisync provided an interesting way to go about achieving my goal. I had no idea SQL had the tools for me to generate mock data. Thanks for introducing me to that.
I am wondering if it makes sense to use multiple SQL database files like sqllite (which I believe is single file based?) as project files in my software. The project files contain basic information as well as multiple records (spectra) with lists of parameters (floating point values) and lists of measurement data (also floating point).
I currently use my own binary format, which is a pain to maintain. I tried to use XML which works very well, but the file sizes explode (500 kB before, 7.5 MB as XML).
Now I wonder if I can structure SQL databases to contain this kind of information and effectively load and save this data in my .NET software.
(I am not very experienced in SQL) so:
Can SQL tables contain sub-tables (like subnodes in XML) or be linked to other tables?
E.g. Can I make a table for the record, and this table has subtables for the lists of measurement data and parameters?
Will this be more efficient than XML in terms of storage space?
I went with a SQLite database. It can be easily implemented into .NET using the System.Data.SQLite Project, that can even be used with AnyCPU Builds.
It is working very nicely, both performance and storage space wise.
You still need to take a lot of care with different versions of your databases. If you try and save a new scheme into a database using an older scheme, some columns or tables might not exist. You need to implement a migration method to a new database file for this.
The real advantage is, that it is an open format, and I stand behind the premise, that the stuff a user saves is his, and does not need to be hidden in an obscure, file structure, if the latter does not bring any significant advantages to the table.
If the user can no longer use your software, he or she can still access all data, using other tools like the Database Browser for SQLite if need be.
We're approaching the migration of legacy OpenVMS RMS files into relational database (both MS SQL 2012 and Oracle 10g are available).
I wonder if there are:
Tools to retrieve schema of indexed files
Tools to parse indexed files
Tools to deal with custom RMS data formats (zoned decimals etc)
as a bundle/API/Library
Perhaps I should change the approach?
There are several tools available, notably through ODBC vendors (I work for one: Attunity).
1 >> Tools to retrieve schema of indexed files
Please clarify. Looking for just record/column layout and indexes within the files or also relationships between files.
1a) How are the files currently being used? Cobol, Basic, Fortran programs? Datatrieve?
They will be using some data definition method, so you want a tool which can exploit that.
Connx, and Attunity Connect can 'import' CDD definitions, BASIC - MAP files, Cobol Copybooks. Variants are typically covered as well. I have written many a (perl/awk) script to convert special definition to XML.
1b ) Analyze/RMS, or a program with calling RMS XAB's can get available index information. Atunity connect will know how to map those onto the fields from 1a)
1c ) There is no formal, stored, relationship between (indexed) files on OpenVMS. That's all in the program logic. However, some modestly smart Perl/Awk/DCL script can often generate a tablem of likely foreign/primary keys by looking at filed names and datatypes matches.
How many files / layouts / gigabytes are we talking about?
2 >> Tools to parse indexed files
Please clarify? Once the structure is known (question 1), the parsing is done by reading using that structure right? You never ever want to understand the indexed file internals. Just tell RMS to fetch records.
3 >> Tools to deal with custom RMS data formats (zoned decimals etc) as a bundle/API/Library
Again, please clarify. Once the structure is known just use the 'right' tool to read using that structure and surely it will honor the detailed data definitions.
(I know it is quite simple to write one yourself, just thought there would be something in the industry)
Famous last words... 'quite simple'. Entire companies have been build and thrive doing just that for general cases. I admit that for specific cases it can be relatively straightforward, but 'the devil is in the details'.
In the Attunity Connect case we have a UDT (User Defined data Type) to handle the 'odd' cases, often involving DATES. Dates in integers, in strings, as units since xxx are all available out of the box, but for example some have -1 meaning 'some high date' which needs some help to be stored in a DB.
All the databases have some bulk load tool (BCP, SQL$LOADER).
As long as you can deliver data conforming to what those expect (tabular, comma-seperated, quoted-or-not, escapes-or-not) you should be in good shape.
The EGH tool Vselect may be a handy, and high performance, way to bulk read indexed files, filter and format some and spit out sequential files for the DB loaders. It can read RMS indexed file faster than RMS can! (It has its own metadata language though!)
Attunity offers full access and replication services.
They include a CDC (change data capture) to not a only load the data, but to also keep it up to date in near-real-time. That's useful for 'evolution' versus 'revolution'.
Check out Attunity 'Replicate'. Once you have a data dictionary, just point to the tables desired (include, exlude filters), point to a target DB and click to replicate. Of course there are options for (global or per-table) transformations (like an AREA-CODE+EXHANGE+NUMBER to single phone number, or adding a modified date columns ).
Will this be a single big switch conversion, or is there desire to migrate the data and keep the old systems alive for days, months, years perhaps, all along keeping the data in close sync?
Hope this helps some,
Hein van den Heuvel.
OP: Perhaps I should change the approach? Probably.
You might consider finding data migration vendors, some which likely have off-the-shelf solutions, if not as a COTS tool, more likely packaged as a service (I don't think this is a big market).
What this won't help you with is what I think of as much bigger problem with the application code: who is going to change all the code that is making RMS calls, in the corresponding code that makes relational DB calls? How will the entity ("Joe Programmer", or some tool), know where the data migrated to, so that he can write the correct call? What are you doing to do about the fact that the data representation is like to change?
Ideally you'd like an automated migration tool, that will move the data itself (therefore knows that datalayouts and representation changes), and will make the code changes that correspond. You can look for these kind of vendors, too.
This question is in continuation to my previous question related to File I/O.
I am using RFile to open a file and read/write data to it. Now, my requirement is such that I would have to modify certain fields within the file. I separate each field within a record with a colon and each record with a newline. Sample is below:
abc#def.com:Albert:1:2
def#ghi.com:Alice:3:1
Suppose I want to replace the '3' in the second record by '2'. I am finding it difficult to overwrite specific field in the file using RFile because RFile does not provide its users with such facility.
Due to this, to modify a record I have to delete the contents of the file and serialize ( that is loop through in memory representation of records and write to the file ). Doing this everytime there is a change in a record's value is quite expensive as there are hundreds of records and the change could be quite frequent.
I searched around for alternatives and found CPermanentFileStore. But I feel the API is hard to use as I am not able to find any source on the Internet that demonstrates its use.
Is there a way around this. Please help.
Depending on which version(s) of Symbian OS you are targetting, you could store the information in a relational database. Since v9.4, Symbian OS includes an SQL implementation (based on the open source SQLite engine).
Using normal files for this type of records takes a lot of effort no matter the operating system. To be able to do this efficiently you need to reserve space in the file for expansion of each record - otherwise you need to rewrite the entire file if a record value changes from say 9 to 10. Also storing a lookup table in the file will make it possible to jump directly to a record using RFile::Seek.
The CPermamanentFileStore simplifies the actual reading and writing of the file but basically does what you have to do yourself otherwise. A database may be a better choice in this instance. If you don't want to use a database I think using stores would be be a better solution.
My company has a legacy micro-simulation program that simulates a population and changes to that population over a period of years.
For each year, the program produces a binary file with a record for each individual that holds their characteristics (e.g., age, maritial status, income ... about 20 fields).
We currently have several utility programs that read these files and produce summary reports. Problem is that each time somebody wants a new report, a new utility program has to be written.
Changing the program so that the records are stored in a database instead of binary files is out of the question (I have asked ... several times). I have written a few programs that import the binary files into a database and then run queries on the tables I have created. The problem here is that it always takes longer to import the data and run the query than it does to run a utility program written in c++ that just read the records one by one and accumulate the desired data. Often the binary files contain over 30 million records and the import step alone takes forever.
So here is my question. Is there anything out there that would allow me to specify the structure of my binary file and then run SQL queries on the file? I think you can use ODBC to run queries on plain text files, but I've never seen anything like that for binary files.
If there isn't anything available, what are the steps I would need to take to build something that could run a query directly on my file? I understand this would probably be way beyond my ability, but it can't hurt to know where I would need to start.
OpenAccess is a toolkit that you can use to build ODBC or JDBC drivers for arbitrary systems. Disclaimer: I've not used it, and another division of my company sells it.
It's possible using SSIS: Loading Binary Files into SQL Server Using SSIS
This amight also be of interest: Reading and Writing Files in SQL Server using T-SQL
I do not have much experience with LINQ, but couldn't you use InteropServices to parse the binary files into C# objects and then query stuff out with LINQ's SQL?