Storing spreadsheets in a database - sql

I am attempting to create a relational database to hold data from experiments which return csv files filled with data. This would allow me to search up an experiment I want based on date, author, experimental values etc.
However, I am not sure how to implement the relational database with the experiments which each generate seperate csv files.
Would it be possible to have a csv file as a column of the database or would it just be better to hold the name of the file?

This is a bit long for a comment.
In general, databases have the ability to store large objects (usually, "BLOB"s -- binary large objects).
Whether this meets your needs depends on several factors. I would say that the first is accessibility to the data. Storing the strings in the database has some advantages:
Anyone with access to the database has access to the data.
To repeat: Users do not need separate user access to a file system.
The same API can be used for the metadata and for the underlying data.
You have more controls over the contents -- the underlying file cannot be deleted without deleting the row in the database, for instance.
The data is automatically included in backups and restores.
Of course, there are downsides as well, some of which are related to the above:
With a separate file, it is simpler to update the file, if that is necessary.
Storing the data in a database imposes overheads (although you might be able to get around this by compressing the data).
If the application is already file-based and you are added a database component, then changing the application to support the database could be cumbersome.
I'm sure these lists are not complete. The point is that there is no "right" answer. It depends on your needs.

Related

Using a SQL database file as project files

I am wondering if it makes sense to use multiple SQL database files like sqllite (which I believe is single file based?) as project files in my software. The project files contain basic information as well as multiple records (spectra) with lists of parameters (floating point values) and lists of measurement data (also floating point).
I currently use my own binary format, which is a pain to maintain. I tried to use XML which works very well, but the file sizes explode (500 kB before, 7.5 MB as XML).
Now I wonder if I can structure SQL databases to contain this kind of information and effectively load and save this data in my .NET software.
(I am not very experienced in SQL) so:
Can SQL tables contain sub-tables (like subnodes in XML) or be linked to other tables?
E.g. Can I make a table for the record, and this table has subtables for the lists of measurement data and parameters?
Will this be more efficient than XML in terms of storage space?
I went with a SQLite database. It can be easily implemented into .NET using the System.Data.SQLite Project, that can even be used with AnyCPU Builds.
It is working very nicely, both performance and storage space wise.
You still need to take a lot of care with different versions of your databases. If you try and save a new scheme into a database using an older scheme, some columns or tables might not exist. You need to implement a migration method to a new database file for this.
The real advantage is, that it is an open format, and I stand behind the premise, that the stuff a user saves is his, and does not need to be hidden in an obscure, file structure, if the latter does not bring any significant advantages to the table.
If the user can no longer use your software, he or she can still access all data, using other tools like the Database Browser for SQLite if need be.

Migrating RMS to RDB

We're approaching the migration of legacy OpenVMS RMS files into relational database (both MS SQL 2012 and Oracle 10g are available).
I wonder if there are:
Tools to retrieve schema of indexed files
Tools to parse indexed files
Tools to deal with custom RMS data formats (zoned decimals etc)
as a bundle/API/Library
Perhaps I should change the approach?
There are several tools available, notably through ODBC vendors (I work for one: Attunity).
1 >> Tools to retrieve schema of indexed files
Please clarify. Looking for just record/column layout and indexes within the files or also relationships between files.
1a) How are the files currently being used? Cobol, Basic, Fortran programs? Datatrieve?
They will be using some data definition method, so you want a tool which can exploit that.
Connx, and Attunity Connect can 'import' CDD definitions, BASIC - MAP files, Cobol Copybooks. Variants are typically covered as well. I have written many a (perl/awk) script to convert special definition to XML.
1b ) Analyze/RMS, or a program with calling RMS XAB's can get available index information. Atunity connect will know how to map those onto the fields from 1a)
1c ) There is no formal, stored, relationship between (indexed) files on OpenVMS. That's all in the program logic. However, some modestly smart Perl/Awk/DCL script can often generate a tablem of likely foreign/primary keys by looking at filed names and datatypes matches.
How many files / layouts / gigabytes are we talking about?
2 >> Tools to parse indexed files
Please clarify? Once the structure is known (question 1), the parsing is done by reading using that structure right? You never ever want to understand the indexed file internals. Just tell RMS to fetch records.
3 >> Tools to deal with custom RMS data formats (zoned decimals etc) as a bundle/API/Library
Again, please clarify. Once the structure is known just use the 'right' tool to read using that structure and surely it will honor the detailed data definitions.
(I know it is quite simple to write one yourself, just thought there would be something in the industry)
Famous last words... 'quite simple'. Entire companies have been build and thrive doing just that for general cases. I admit that for specific cases it can be relatively straightforward, but 'the devil is in the details'.
In the Attunity Connect case we have a UDT (User Defined data Type) to handle the 'odd' cases, often involving DATES. Dates in integers, in strings, as units since xxx are all available out of the box, but for example some have -1 meaning 'some high date' which needs some help to be stored in a DB.
All the databases have some bulk load tool (BCP, SQL$LOADER).
As long as you can deliver data conforming to what those expect (tabular, comma-seperated, quoted-or-not, escapes-or-not) you should be in good shape.
The EGH tool Vselect may be a handy, and high performance, way to bulk read indexed files, filter and format some and spit out sequential files for the DB loaders. It can read RMS indexed file faster than RMS can! (It has its own metadata language though!)
Attunity offers full access and replication services.
They include a CDC (change data capture) to not a only load the data, but to also keep it up to date in near-real-time. That's useful for 'evolution' versus 'revolution'.
Check out Attunity 'Replicate'. Once you have a data dictionary, just point to the tables desired (include, exlude filters), point to a target DB and click to replicate. Of course there are options for (global or per-table) transformations (like an AREA-CODE+EXHANGE+NUMBER to single phone number, or adding a modified date columns ).
Will this be a single big switch conversion, or is there desire to migrate the data and keep the old systems alive for days, months, years perhaps, all along keeping the data in close sync?
Hope this helps some,
Hein van den Heuvel.
OP: Perhaps I should change the approach? Probably.
You might consider finding data migration vendors, some which likely have off-the-shelf solutions, if not as a COTS tool, more likely packaged as a service (I don't think this is a big market).
What this won't help you with is what I think of as much bigger problem with the application code: who is going to change all the code that is making RMS calls, in the corresponding code that makes relational DB calls? How will the entity ("Joe Programmer", or some tool), know where the data migrated to, so that he can write the correct call? What are you doing to do about the fact that the data representation is like to change?
Ideally you'd like an automated migration tool, that will move the data itself (therefore knows that datalayouts and representation changes), and will make the code changes that correspond. You can look for these kind of vendors, too.

Merging CoreData SQL files

I've been wrestling with this problem with quite some time now and still have yet to figure out the most efficient way of dealing with it. Here are the details:
I have an app that uses Core Data to store the content for the app to show. The app downloads the content in the form of a SQLite database and attempts to merge it with it's local version. This is necessary because the downloaded content is often needs to be combined with content that the user downloaded prior.
To make things more complicated, on my end I also need a way of combining these files so their clean to download (in other words no extraneous relations or isolated objects in the core data file). I have already built the editor for this, but again I run into the problem of merging these sqlite files.
I'd like to find a better way of combining these sqlite databases if that even exists. I've seen that you can add many different store files using a persistent store coordinator, but coordinating all of the correct stores into a single download package becomes more difficult and dangerous.
The question is: what is the best way to use multiple sql stores and either make them into one convenient .sqlite file or have them operate seamlessly?
First, don't think of this as an SQLite problem. It is a Core Data problem. If you use Core Data and SQLite in the same paragraph, you have already lost. CD does (sometimes) use SQLite as its backing store, but that knowledge doesn't help you solve this problem.
When I have had to solve the problem of combining static and user-generated data, I have usually used two different data models, with a unique ID on the static side under my control. Any combined references between static and live data I handle programmatically, which has worked fine because the user data is tiny compared to my static data.
You might also investigate fetched properties, which allow you to obtain values from a different persistent store.
I think merging your original static data with updated static data is the wrong way to go. That sort of operation will take a long time on the device.
Can you use three different persistent stores? The first would be bundled with your application, available immediately. The second would be updated data, downloaded from a server, and would be a complete replacement for the first. Finally you'd have a userdata store, connected to master data either with fetched properties or your roll-your-own unique ID.
It's also possible that Core Data is the wrong hammer for this particular nail. If you have a lot of SQLite expertise, and if you really are more comfortable working with SQLite than with Core Data, just skip Core Data. Do the entire thing in direct SQLite.

should xml or sqlite3 be used?

I just started iOS development am currently developing an application that just reads data from a server and displays it onto the screen. What I am not sure of is whether to use XML or sqlite3 to store the data. Which method should be more preferred and why? thanks in advance.
It is important to remember they are two different things, suited to different tasks. Choose the one that fits the problem. (In this case I would likely use XML or "just plain text" because it sounds like just a simple download-cache. Either the raw response could be kept or, perhaps the data already transformed into objects and then automatically serialized into XML or whatnot. In any case, keep it simple.)
XML is (at the very core) a markup format. XML documents are a (hopefully well-defined) structure. There is a large set of tooling that supports manipulation and querying within a hierarchical "document" model. I use XML a good bit for a serialization format and also use it for local caching if appropriate (e.g. there are no non-hierarchical relationships). XML is often loaded entirely into memory (e.g. a DOM) for manipulation.
SQLite is a relational database that is designed around tables and relationships between sets of tables. Being able to run (complex) queries is where a relational database really shines. SQLite is also very fast and can process large data-sets which can't all fit in memory. Columns in SQLite can also contain text (read: XML) so the approaches are not orthogonal.
Happy coding.
Probably all depends on how data is processed after it was stored. If data must be sorted, uses specific selection etc. then, sqlite is better solution.
Second, not so important, concern is how much data will be stored, if it's just one "table" with 10 rows then sqlite is probably too much for it.
If you want to read data from server and want to display on screen and don't need to save it locally then use XML.
If you want to store it locally and don't want to fetch from server then use XML files or sqlite database in your project.
If you want to fetch from server and also to store it locally then first use XML to fetch data and then use sqlite to store it locally.
and look at #pst answer for what is the difference between them.

Transferring data from a database of unknown structure to a databases of known structure

I have a few challenges I need help on. I need to pull data in to my SQL database from arbitrary sources.
The details are: I know the exact structure of my database and the structure will not change. When I do take in new data, it will occur only one time, at the time I set up an instance of my database. I will make many instances of my database and each time it will have to pull data from a different source, and those sources will be structured in different ways.
The data will most likely contain thousands of rows of records. The data source will most likely be held in Excel, Access, more rare Word and even rarer, it'll be held in a SQL database.
I can assume that most of the core data will be the same, just put in different locations. They will follow a general grouping despite how there held.
Essentially, I'm transferring data from legacy systems to a SQL system and this must be done for many groups and they need their own private instance of the database.
Any thoughts on how I would do this? How hard would it be to write a program that would do most of this for me?
This is definitely a real-world question. Is it possible to write a program that will do most of this? Not most of this, I think, but perhaps some of it.
For each table in your target system, create a view that displays the source data you expect to be able to insert. Choose column names that make it easy to tell what has to be done; most likely you'll choose column names that match the target columns in your INSERT statement. Save your INSERT statements as stored procedures.
Now, when you are given a new source of data in a new format, you will still have to recreate your views, but once the views are displaying the right data under your chosen column names, you can run your stored procedures without change.
I have a similar type of project where data is being retrieved from Access, .ini file, file modification dates, and MySql. I scrape this data every morning and basically append to a set SqlServer schema.
I created a DataTable and as I iterate a set of directories, insert the data into each new row. Once I have the DataTable complete, I perform a bulkcopy to append to the database.
I hope that helps you out a bit. I know my project doesn't cover all the aspects of your question; but also don't have a DBA to provide views, stored procedures, etc. Nor do I have the additional time to devote to such things. Not the most favorable of conditions, but that's the way it is.
HTH...
The best way of solving this problem is with and ETL (Extract-Transform-Load) solution. A good choice is SSIS which is through Microsoft's BI suite.
This is the building blocks for consciousness or the base......
1 A data base that organizes thousands of files similar to dna,
2 user interface
3 parts are hidden, preventing a system breach/crash