SQLite diagram ERD for unknown number of tables - sql

I have a GPS android app that I have made. It uses a SQLite database on the SD card for storing the location data. I am trying make up an ERD (Entity–relationship diagrams) for the database. This is where I am having problems. The database has one master table for the tracks and one for the waypoint groups (a collection of one or more waypoints). These tables do not have the location data in them but just the name of the track or waypoint group, start/stop time and date, and a uid. For each row in these tables a new table is made that contains the latitude and longitude info. In the sub table each row is one point or vertex. And the sub table name is the uid of the "master" table plus "t_" or "w_" for traks or waypoints. This is what I came up with using https://www.draw.io:
http://s10.postimg.org/usqsrwjmx/Untitled_Diagram.png
(sorry I do not have the 10 rep points to post an image, lol)
I think that composition link between the tables is right as the sub table only exists if there is a row in the master table. If the master row is deleted the corresponding sub table is also deleted. But how to show that there is a sub table for every row in the master table? It is also a little weird because there is no need for a FK in the sub table as the table name provides this function.
I don't think that I want to change my database structure at this late date as the app is to be sent out for testing shortly, but I would be interested in other (superior) database designs for this problem.

In a relational database data is linked. In your database, however, not only data is linked but also data with table names. Thus you are mixing content with structure.
As far as I know you cannot display this in an ERD. You cannot implement this correctly in your dbms either (such that your dbms knows about the structure and helps you with appropriate constraints).
You are not using your RDBMS properly. So in spite of the late date: if I were in your place, I would change this into a proper relational database model.
EDIT: A proper relational model would simply be to have only one track_detail table with the track id in a column rather than in its name. Same for waypoint_detail with the master's waypoint id. No big change actually.

Related

Should I apply type 2 history to tables with duplicate keys?

I'm working on a data warehouse project using BigQuery. We're loading daily files exported from various mainframe systems. Most tables have unique keys which we can use to create the type 2 history, but some tables, e.g. a ledger/positions table, can have duplicate rows. These files contain the full data extract from the source system every day.
We're currently able to maintain a type 2 history for most tables without knowing the primary keys, as long as all rows in a load are unique, but we have a challenge with tables where this is not the case.
One person on the project has suggested that the way to handle it is to "compare duplicates", meaning that if the DWH table has 5 identical rows and the staging tables has 6 identical rows, then we just insert one more, and if it is the other way around, we just close one of the records in the DWH table (by setting the end date to now). This could be implemented by adding and extra "sub row" key to the dataset like this:
Row_number() over(partition by “all data columns” order by SystemTime) as data_row_nr
I've tried to find out if this is good practice or not, but without any luck. Something about it just seems wrong to me, and I can't see what unforeseen consequences can arise from doing it like this.
Can anybody tell me what the best way to go is when dealing with full loads of ledger data on a daily basis, for which we want to maintain some kind of history in the DWH?
No, I do not think this would be a good idea to introduce an artificial primary key based on all columns plus the index of the duplicated row.
You will solve the technical problem, but I doubt there will be some business value.
First of all you should distinct – the tables you get with primary key are dimensions and you can recognise changes and build history.
But the table without PK are most probably fact tables (i.e. transaction records) that are typically not full loaded but loaded based on some DELTA criterion.
Anyway you will never be able to recognise an update in those records, only possible change is insert (deletes are typically not relevant as data warehouse keeps longer history that the source system).
So my todo list
Check if the dup are intended or illegal
Try to find a delta criterion to load the fact tables
If everything fails, make the primary key of all columns with a single attribute of the number of duplicates and build the history.

Use DB Relation To Avoid Redundancy

I have designed an ERD of movies and tv series which is confidential. I can give you an overview of database.
It has more then 20 tables (more tables will be added later) and it is normalized. I have tables like Movie, Actors, Tv Seriers, Director, Producer etc. So these tables will contain most important information and also these tables are connected (by foreign keys and middle tables like MovieActor, MovieDirector etc).
So the scenario is like
1) The standard “starting” database should have Actors, Directors, Producers, Music Composers, Genres, Resolution Types… pre populated and pre defined by the Admin.
2) For every user creating his personal movie collection, he will be starting of his database with all the pre defined data, but if he wants to, he may add further data to his personal database. These changes will only be affecting his database and not the standard "starting" database (which was defined by Admin).
3) The Admin should have a separate view to add Actors, Directors, Producers… that will become part of the standard "starting" database. Any further changes done to this database will be available to the users as updates.
Suggested Solution
Question
The suggested solution is seems like I have to create new databases all the time for each user which seems not possible. My question is how can I manipulate the suggested solution so that my solution will be effective and possible. I would prefer to handle the situation by using database relations, not by separate storage.
You wouldn't create multiple databases, you would simply add an ownerId field to all relevant tables - admin would have ownerId = 0, indicating the row is part of the 'starting database' and new admin entries are instantly available to users.
In any output for a user where you want to display the starting data and their own, you would add WHERE (ownerId = 0 or ownerId = userId) to the appropriate query or if they need to see just their own, just ownerId = userId.
Presumably, they would be able to create relationships between their own data or 'starting' data and this approach should still work.
Foreign keys will still work but deleting will delete user data - basically you should only ever add to the starting data, not take away or you will run into problems.

Accepted methodology when using multiple Sqlite databases

Question
What is the accepted way of using multiple databases that record information about the same object that will ultimately end up living in one central database?
Example
There is one main SQL database about trees.
This database holds information about unique trees from all over the UK.
To collect the information a blank Sqlite database is created (with the same schema) and taken to the tree on a phone.
The collected information is then stored in the Sqlite database until it is brought back to the main database, Where it is then transferred into the main database.
Now this works fine as long as there is only one Sqlite database out for any one tree at a time.
However, if two people wanted to collect different information for the same tree at the same time, when they both came back and attempted to transfer their data in to the main database, there would be collisions on their primary key constraints.
ID Schemes (with example data)
There is a tree table which has unique identifier called treeID
TreeID - TreeName - Location
1001 - Teddington Field - Plymouth
Branch table
BranchID - BranchName - TreeID
1001-10001 - 1st Branch - 1001
1001-10002 - 2nd Branch -1001
Leave table
LeafID - LeafName - BranchId
1001-10001-1 - Bedroom - 1001-10001
1001-10002-2 - Bathroom - 1001-10001
Possible ideas
Assign each database 1000 unique ID's and then one they come back in as the ids have already been assigned the ids on each database won't collide.
Downfall
This isn't very dynamic and could fail if one database overruns on its preassigned ids.
Is there another way to achieve the same flexibility but with out the downfall mentioned above?
So, as an answer:
on the master db, store an extra id field identifying the source/collection database that the dataset was collected on, as well as the tree id.
(src01, 1001), (src02, 1001)
This also allows you to link back easily to the collection source of the information which is likely gonna be a future requirement. Now, you may or may not want to autogenerate another sequence id key value on the master db's table (I wouldn't but that's because I am not that fond of surrogate keys), but I would definitely keep track of the source/treeid it was originally collected with in the field, separately of any master db unique key considerations.
Apparently you are talking about auto-generated IDs for related objects, not the IDs for the trees themselves. Two different people collecting information about the same tree, starting from the same starting set, end up generating the same IDs independently. The two sets of generated IDs cannot coexist in the same DB.
Since you want to keep all the new data. One possible solution is to avoid using the field-generated IDs in the central database at all. When each set of data comes in, take the data that were added in the field, and programmatically add them to the central DB in a way equivalent to how they are added in the field, letting the central DB autogenerate its own IDs.
This requires a mechanism to distinguish newly-collected data from old, but that might be as simple as a timestamp.

creating lookups between two databases in vb.net

Here is quick overview. I have an employee database with ID, name, phone num, division, location etc. It will all get stored in a table in a database called EMP. Now i have another Database that is central to other apps that may use it for a look up that contains all the divisions and locations. I want to use a lookup for division and location and this resides in another database on the same server that stores the information. When i use a datagrid and bound controls for my employee table. I can change the columns to drop downs but i want to point it to another db and lookup table, with still creating the foreign relationship so when i update the lookup (source) it will update the other applications db. whats the easiest way to do lookups to other database tables to pull back information and set it. any ideas.
Assuming you are following NORMAL FORM, the only thing on the employee database that references the Location and Division tables is an ID, which would never change, only be deleted. Therefore you can create DELETE triggers on these tables that will do whatever you think it should do to the employee table when a Location or Division is deleted.
Just make sure you have the security setup so whoever can delete from the Division/Location tables can also perform the coded action on the employee table.

Difference between a db view and a lookuptable

When I create a view I can base it on multiple columns from different tables.
When I want to create a lookup table I need information from one table, for example the foreign key of an order table, to get customer details from another table. I can create a view having parameters to make sure it will get all data that I need. I could also - from what I have been reading - make a lookup table. What is the difference in this case and when should I choose for a lookup table?? I hope this ain't a bad question, I'm not very into db's yet ;).
Creating a view gives you a "live" representation of the data as it is at the time of querying. This comes at the cost of higher load on the server, because it has to determine the values for every query.
This can be expensive, depending on table sizes, database implementations and the complexity of the view definition.
A lookup table on the other hand is usually filled "manually", i. e. not every query against it will cause an expensive operation to fetch values from multiple tables. Instead your program has to take care of updating the lookup table should the underlying data change.
Usually lookup tables lend themselves to things that change seldomly, but are read often. Views on the other hand - while more expensive to execute - are more current.
I think your usage of "Lookup Table" is slightly awry. In normal parlance a lookup table is a code or reference data table. It might consist of a CODE and a DESCRIPTION or a code expansion. The purpose of such tables is to provide a lsit of permitted values for restricted columns, things like CUSTOMER_TYPE or PRIORITY_CODE. This category of table is often referred to as "standing data" because it changes very rarely if at all. The value of defining this data in Lookup tables is that they can be used in foreign keys and to populate Dropdowns and Lists Of Values.
What you are describing is a slightly different scenario:
I need information from one table, for
example the foreign key of an order
table, to get customer details from
another table
Both these tables are application data tables. Customer and Order records are dynamic. Now it is obviously valid to retrieve additional data from the Customer table to display along side the Order data, and in that sense Customer is a "lookup table". More pertinently it is the parent table of Order, because it has the primary key referenced by the foreign key on Order.
By all means build a view to capture the joining logic between Order and Customer. Such views can be quite helpful when building an application that uses the same joined tables in several places.
Here's an example of a lookup table. We have a system that tracks Jurors, one of the tables is JurorStatus. This table contains all the valid StatusCodes for Jurors:
Code: Value
WS : Will Serve
PP : Postponed
EM : Excuse Military
IF : Ineligible Felon
This is a lookup table for the valid codes.
A view is like a query.
Read this tutorial and you may find helpful info when a lookup table is needed:
SQL: Creating a Lookup Table
Just learn to write sql queries to get exactly what you need. No need to create a view! Views are not good to use in many instances, especially if you start to base them on other views, when they will kill performance. Do not use views just as a shorthand for query writing.