Using a SQL database file as project files - sql

I am wondering if it makes sense to use multiple SQL database files like sqllite (which I believe is single file based?) as project files in my software. The project files contain basic information as well as multiple records (spectra) with lists of parameters (floating point values) and lists of measurement data (also floating point).
I currently use my own binary format, which is a pain to maintain. I tried to use XML which works very well, but the file sizes explode (500 kB before, 7.5 MB as XML).
Now I wonder if I can structure SQL databases to contain this kind of information and effectively load and save this data in my .NET software.
(I am not very experienced in SQL) so:
Can SQL tables contain sub-tables (like subnodes in XML) or be linked to other tables?
E.g. Can I make a table for the record, and this table has subtables for the lists of measurement data and parameters?
Will this be more efficient than XML in terms of storage space?

I went with a SQLite database. It can be easily implemented into .NET using the System.Data.SQLite Project, that can even be used with AnyCPU Builds.
It is working very nicely, both performance and storage space wise.
You still need to take a lot of care with different versions of your databases. If you try and save a new scheme into a database using an older scheme, some columns or tables might not exist. You need to implement a migration method to a new database file for this.
The real advantage is, that it is an open format, and I stand behind the premise, that the stuff a user saves is his, and does not need to be hidden in an obscure, file structure, if the latter does not bring any significant advantages to the table.
If the user can no longer use your software, he or she can still access all data, using other tools like the Database Browser for SQLite if need be.

Related

Storing spreadsheets in a database

I am attempting to create a relational database to hold data from experiments which return csv files filled with data. This would allow me to search up an experiment I want based on date, author, experimental values etc.
However, I am not sure how to implement the relational database with the experiments which each generate seperate csv files.
Would it be possible to have a csv file as a column of the database or would it just be better to hold the name of the file?
This is a bit long for a comment.
In general, databases have the ability to store large objects (usually, "BLOB"s -- binary large objects).
Whether this meets your needs depends on several factors. I would say that the first is accessibility to the data. Storing the strings in the database has some advantages:
Anyone with access to the database has access to the data.
To repeat: Users do not need separate user access to a file system.
The same API can be used for the metadata and for the underlying data.
You have more controls over the contents -- the underlying file cannot be deleted without deleting the row in the database, for instance.
The data is automatically included in backups and restores.
Of course, there are downsides as well, some of which are related to the above:
With a separate file, it is simpler to update the file, if that is necessary.
Storing the data in a database imposes overheads (although you might be able to get around this by compressing the data).
If the application is already file-based and you are added a database component, then changing the application to support the database could be cumbersome.
I'm sure these lists are not complete. The point is that there is no "right" answer. It depends on your needs.

Migrating RMS to RDB

We're approaching the migration of legacy OpenVMS RMS files into relational database (both MS SQL 2012 and Oracle 10g are available).
I wonder if there are:
Tools to retrieve schema of indexed files
Tools to parse indexed files
Tools to deal with custom RMS data formats (zoned decimals etc)
as a bundle/API/Library
Perhaps I should change the approach?
There are several tools available, notably through ODBC vendors (I work for one: Attunity).
1 >> Tools to retrieve schema of indexed files
Please clarify. Looking for just record/column layout and indexes within the files or also relationships between files.
1a) How are the files currently being used? Cobol, Basic, Fortran programs? Datatrieve?
They will be using some data definition method, so you want a tool which can exploit that.
Connx, and Attunity Connect can 'import' CDD definitions, BASIC - MAP files, Cobol Copybooks. Variants are typically covered as well. I have written many a (perl/awk) script to convert special definition to XML.
1b ) Analyze/RMS, or a program with calling RMS XAB's can get available index information. Atunity connect will know how to map those onto the fields from 1a)
1c ) There is no formal, stored, relationship between (indexed) files on OpenVMS. That's all in the program logic. However, some modestly smart Perl/Awk/DCL script can often generate a tablem of likely foreign/primary keys by looking at filed names and datatypes matches.
How many files / layouts / gigabytes are we talking about?
2 >> Tools to parse indexed files
Please clarify? Once the structure is known (question 1), the parsing is done by reading using that structure right? You never ever want to understand the indexed file internals. Just tell RMS to fetch records.
3 >> Tools to deal with custom RMS data formats (zoned decimals etc) as a bundle/API/Library
Again, please clarify. Once the structure is known just use the 'right' tool to read using that structure and surely it will honor the detailed data definitions.
(I know it is quite simple to write one yourself, just thought there would be something in the industry)
Famous last words... 'quite simple'. Entire companies have been build and thrive doing just that for general cases. I admit that for specific cases it can be relatively straightforward, but 'the devil is in the details'.
In the Attunity Connect case we have a UDT (User Defined data Type) to handle the 'odd' cases, often involving DATES. Dates in integers, in strings, as units since xxx are all available out of the box, but for example some have -1 meaning 'some high date' which needs some help to be stored in a DB.
All the databases have some bulk load tool (BCP, SQL$LOADER).
As long as you can deliver data conforming to what those expect (tabular, comma-seperated, quoted-or-not, escapes-or-not) you should be in good shape.
The EGH tool Vselect may be a handy, and high performance, way to bulk read indexed files, filter and format some and spit out sequential files for the DB loaders. It can read RMS indexed file faster than RMS can! (It has its own metadata language though!)
Attunity offers full access and replication services.
They include a CDC (change data capture) to not a only load the data, but to also keep it up to date in near-real-time. That's useful for 'evolution' versus 'revolution'.
Check out Attunity 'Replicate'. Once you have a data dictionary, just point to the tables desired (include, exlude filters), point to a target DB and click to replicate. Of course there are options for (global or per-table) transformations (like an AREA-CODE+EXHANGE+NUMBER to single phone number, or adding a modified date columns ).
Will this be a single big switch conversion, or is there desire to migrate the data and keep the old systems alive for days, months, years perhaps, all along keeping the data in close sync?
Hope this helps some,
Hein van den Heuvel.
OP: Perhaps I should change the approach? Probably.
You might consider finding data migration vendors, some which likely have off-the-shelf solutions, if not as a COTS tool, more likely packaged as a service (I don't think this is a big market).
What this won't help you with is what I think of as much bigger problem with the application code: who is going to change all the code that is making RMS calls, in the corresponding code that makes relational DB calls? How will the entity ("Joe Programmer", or some tool), know where the data migrated to, so that he can write the correct call? What are you doing to do about the fact that the data representation is like to change?
Ideally you'd like an automated migration tool, that will move the data itself (therefore knows that datalayouts and representation changes), and will make the code changes that correspond. You can look for these kind of vendors, too.

should xml or sqlite3 be used?

I just started iOS development am currently developing an application that just reads data from a server and displays it onto the screen. What I am not sure of is whether to use XML or sqlite3 to store the data. Which method should be more preferred and why? thanks in advance.
It is important to remember they are two different things, suited to different tasks. Choose the one that fits the problem. (In this case I would likely use XML or "just plain text" because it sounds like just a simple download-cache. Either the raw response could be kept or, perhaps the data already transformed into objects and then automatically serialized into XML or whatnot. In any case, keep it simple.)
XML is (at the very core) a markup format. XML documents are a (hopefully well-defined) structure. There is a large set of tooling that supports manipulation and querying within a hierarchical "document" model. I use XML a good bit for a serialization format and also use it for local caching if appropriate (e.g. there are no non-hierarchical relationships). XML is often loaded entirely into memory (e.g. a DOM) for manipulation.
SQLite is a relational database that is designed around tables and relationships between sets of tables. Being able to run (complex) queries is where a relational database really shines. SQLite is also very fast and can process large data-sets which can't all fit in memory. Columns in SQLite can also contain text (read: XML) so the approaches are not orthogonal.
Happy coding.
Probably all depends on how data is processed after it was stored. If data must be sorted, uses specific selection etc. then, sqlite is better solution.
Second, not so important, concern is how much data will be stored, if it's just one "table" with 10 rows then sqlite is probably too much for it.
If you want to read data from server and want to display on screen and don't need to save it locally then use XML.
If you want to store it locally and don't want to fetch from server then use XML files or sqlite database in your project.
If you want to fetch from server and also to store it locally then first use XML to fetch data and then use sqlite to store it locally.
and look at #pst answer for what is the difference between them.

Is there an editor for inserting/editing rows into a Core Data DB?

I've created a Core Data schema in xcode (3.2.5 if it matters) so I have the .xcdatamodel file with the proper entities and relations.
Now - How can I insert data, edit data and/or delete data from it, NOT from within the code ?
Like what phpMyAdmin is for MySql.
Thanks.
Core Data is meant to be used programmatically. Once you run the app once, it should create a file somewhere on disk (exactly where is probably specified in the AppDelegate class). It is likely that this file will be a SQLite database, but it doesn't have to be (the point of Core Data is to abstract your data away from the file format used to store it). It could also be an XML file or a binary file.
If it's a SQLite file, then you can open it in your favorite SQLite editor.
HOWEVER
The schema used in the SQLite format is not documented. If you go mucking around in it, you might get stuff to work, but it's also very likely that you could irreparably screw it up. (If it's an XML file or a binary file, you're probably totally out of luck)
In the end, Core Data is supposed to be used programmatically. To use it in a different way (such as what you're asking for) would be to use it in a way for which it was not intended and therefore not designed.
I don't know if you already solved your problem, but there's this SQLite Manager plug-in for firefox: http://code.google.com/p/sqlite-manager/
I haven't tried importing data or using the INSERT command to insert individual rows, but you could give it a try. It's free and works very well for me as is.
There's quite a few database management tools available for sqlite that allow you to do this. I've tried a few but to be honest none of them have impressed me that much as yet.
Would be great to have something like Toad available.
Anyway, find wherever your database file is, then drop it onto whichever application.
You can then add, delete, and edit rows and columns.
Of course, you will need to maintain any foreign keys and such like.
I find the generated Core Data models to be pretty easy to understand.
Example tools are SQLite Database Browser (free), SQLiteManager (not free), and Base. A quick Google search should reveal those and a few more.
I normally use SQLite Database Browser although it does crash occasionally.
See Christian Kienle's Core data editor. It's not free, but is designed to work directly with core data models and stores via Apple's API, supports binary data, builds relationships and even triggers validation, etc. I've found it's worth the $20.

Getting a significant amount of data into a SQL Server (Express) database at time of deployment

For most database-backed projects I've worked on, there is a need to get "startup" or test data into the database before deploying the project. Examples of startup data: a table that lists all the countries in the world or a table that lists a bunch of colors that will be used to populate a color palette.
I've been using a system where I store all my startup data in an Excel spreadsheet (with one table per worksheet), then I have a utility script in SQL that (1) creates the database, (2) creates the schemas, (3) creates the tables (including primary and foreign keys), (4) connects to the spreadsheet as a linked server, and (5) inserts all the data into the tables.
I mostly like this system. I find it very easy to lay out columns in Excel, verify foreign key relationships using simple lookup functions, perform concatenation operations, copy in data from web tables or other spreadsheets, etc. One major disadvantage of this system is the need to sync up the columns in my worksheets any time I change a table definition.
I've been going through some tutorials to learn new .NET technologies or design patterns, and I've noticed that these typically involve using Visual Studio to create the database and add tables (rather than scripts), and the data is typically entered using the built-in designer. This has me wondering if maybe the way I'm doing it is not the most efficient or maintainable.
Questions
In general, do you find it preferable to build your whole database via scripts or a GUI designer, such as SSMSE or Visual Studio?
What method do you recommend for populating your database with startup or test data and why?
Clarification
Judging by the answers so far, I think I should clarify something. Assume that I have a significant amount of data (hundreds or thousands of rows) that needs to find its way into the database. This data could be sourced from various places, such as text files, spreadsheets, web tables, etc. I've received several suggestions to script this process using INSERT statements, but is this really viable when you're talking about a lot of data?
Which leads me to...
New questions
How would you write a SQL script to take the country data on this page and insert it into the database?
With Excel, I could just copy/paste the table into a worksheet and run my utility script, and I'd basically be done.
What if you later realized you needed a new column, CapitalCity?
With Excel, I could take that information from this page, paste it into Excel, and with a quick text-to-column manipulation, I'd have the data in the format I need.
I honestly didn't write this question to defend Excel as the best way or even a good way to get data into a database, but the answers so far don't seem to be addressing my main concern--how to get all this data into your database. Writing a script with hundreds of INSERT statements by hand would be extremely time consuming and error prone. Somehow, this script needs to be machine generated, but how?
I think your current process is fine for seeding the database with initial data. It's simple, easy to maintain, and works for you. If you've got a good database design with adequate constraints then it doesn't really matter how you seed the initial data. You could use an intermediate tool to generate scripts but why bother?
SSIS has a steep learning curve, doesn't work well with source control (impossible to tell what changed between versions), and is very finicky about type conversions from Excel. There's also an issue with how many rows it reads ahead to determine the data type -- you're in deep trouble if your first x rows contain numbers stored as text.
1) I prefer to use scripts for several reasons.
• Scripts are easy to modify, and plus when I get ready to deploy my application to a production environment, I already have the scripts written so I'm all set.
• If I need to deploy my database to a different platform (like Oracle or MySQL) then it's easy to make minor modifications to the scripts to work on the target database.
• With scripts, I'm not dependent on a tool like Visual Studio to build and maintain the database.
2) I like good old fashioned insert statements using a script. Again, at deployment time scripts are your best friend. At our shop, when we deploy our applications we have to have scripts ready for the DBA's to run, as that's what they expect.
I just find that scripts are simple, easy to maintain, and the "least common denominator" when it comes to creating a database and loading up data to it. By least common denominator, I mean that the majority of people (i.e. DBA's, other people in your shop that might not have visual studio) will be able to use them without any trouble.
The other thing that's important with scripts is that it forces you to learn SQL and more specfically DDL (data definition language). While the hand-holding GUI tools are nice, there's no substitute for taking the time to learn SQL and DDL inside out. I've found that those skills are invaluable to have in almost any shop.
Frankly, I find the concept of using Excel here a bit scary. It obviously works, but it's creating a dependency on an ad-hoc data source that won't be resolved until much later. Last thing you want is to be in a mad rush to deploy a database and find out that the Excel file is mangled, or worse, missing entirely. I suppose the severity of this would vary from company to company as a function of risk tolerance, but I would be actively seeking to remove Excel from the equation, or at least remove it as a permanent fixture.
I always use scripts to create databases, because scripts are portable and repeatable - you can use (almost) the same script to create a development database, a QA database, a UAT database, and a production database. For this reason it's equally important to use scripts to modify existing databases.
I also always use a script to create bootstrap data (AKA startup data), and there's a very important reason for this: there's usually more scripting to be done afterward. Or at least there should be. Bootstrap data is almost invariably read-only, and as such, you should be placing it on a read-only filegroup to improve performance and prevent accidental changes. So you'll generally need to script the data first, then make the filegroup read-only.
On a more philosophical level, though, if this startup data is required for the database to work properly - and most of the time, it is - then you really ought to consider it part of the data definition itself, the metadata. For that reason, I don't think it's appropriate to have the data defined anywhere but in the same script or set of scripts that you use to create the database itself.
Test data is a little different, but in my experience you're usually trying to auto-generate that data in some fashion, which makes it even more important to use a script. You don't want to have to manually maintain an ad-hoc database of millions of rows for testing purposes.
If your problem is that the test or startup data comes from an external source - a web page, a CSV file, etc. - then I would handle this with an actual "configuration database." This way you don't have to validate references with VLOOKUPS as in Excel, you can actually enforce them.
Use SQL Server Integration Services (formerly DTS) to pull your external data from CSV, Excel, or wherever, into your configuration database - if you need to periodically refresh the data, you can save the SSIS package so it ends up being just a couple of clicks.
If you need to use Excel as an intermediary, i.e. to format or restructure some data from a web page, that's fine, but the important thing IMO is to get it out of Excel as soon as possible, and SSIS with a config database is an excellent repeatable method of doing that.
When you are ready to migrate the data from your configuration database into your application database, you can use SQL Server Management Studio to generate a script for the data (in case you don't already know - when you right click on the database, go to Tasks, Generate Scripts, and turn on "Script Data" in the Script Options). If you're really hardcore, you can actually script the scripting process, but I find that this usually takes less than a minute anyway.
It may sound like a lot of overhead, but in practice the effort is minimal. You set up your configuration database once, create an SSIS package once, and refresh the config data maybe once every few months or maybe never (this is the part you're already doing, and this part will become less work). Once that "setup" is out of the way, it's really just a few minutes to generate the script, which you can then use on all copies of the main database.
Since I use an object-relational mapper (Hibernate, there is also a .NET version), I prefer to generate such data in my programming language. The ORM then takes care of writing things into the database. I don't have to worry about changing column names in the data because I need to fix the mapping anyway. If refactoring is involved, it usually takes care of the startup/test data also.
Excel is an unnecessary component of this process.
Script the current version the database components that you want to reuse, and add the script to your source control system. When you need to make changes in the future, either modify the entities in the database and regenerate the script, or modify the script and regenerate the database.
Avoid mixing Visual Studio's db designer and Excel as they only add complexity. Scripts and SQL Management Studio are your friends.