I’ve hit sort of a roadblock in a current project I’m working on, I don’t have a lot of web developers in my office and as a matter in fact the only other web dev just went on vacation. Anyway I was wondering if anyone could help me with structuring two of my postgres tables.
The user needs to be able to create custom data tables, one for each specific program (a parent record). The form I’ve setup for these tables allows you to add or remove inputs based on how many fields you need and then specify the name, data_type, etc.
My initial idea was to create a new table in the dB each time a user created one of these custom tables. The other web dev, who has created something similar, said it would be better to create a fields table that stores each custom field information and then have a data table that stores every cell of data tying to a field id.
I understand having the fields table so that I can retrieve just the field information and build my front-end tables and edit forms dynamically, but I’m a little confused on how to get the data into the table. I’m used to having an array of objects and each object relating to an entire row. But with this method it’s storing each cell of data instead of row of data and I don’t know the best way to select and organize it on the backend.
Data for these tables are going to be imported in from CSV files formatted to the custom table structure, below is the current structure I have for my two tables. I got a suggestion on reddit to use JSON to store each rows data, but I'm wondering how I'll be able to do sorting and filtering with this data. My current table structure is listed below, and this is before I got the suggestion to use the json data. I'm guessing if I went that route I would remove the fieldId column and instead use it for
the JSON key name, and store that fields data with it.
fields
id -- name -- program_id -- type -- required -- position -- createdAt -- updatedAt
data
id -- fieldId -- data -- createdAt -- updatedAt
So I guess my question is does this sound like the right way to structure these tables for my needs and if so can I still perform sorting and filtering on it?
Question
What is the accepted way of using multiple databases that record information about the same object that will ultimately end up living in one central database?
Example
There is one main SQL database about trees.
This database holds information about unique trees from all over the UK.
To collect the information a blank Sqlite database is created (with the same schema) and taken to the tree on a phone.
The collected information is then stored in the Sqlite database until it is brought back to the main database, Where it is then transferred into the main database.
Now this works fine as long as there is only one Sqlite database out for any one tree at a time.
However, if two people wanted to collect different information for the same tree at the same time, when they both came back and attempted to transfer their data in to the main database, there would be collisions on their primary key constraints.
ID Schemes (with example data)
There is a tree table which has unique identifier called treeID
TreeID - TreeName - Location
1001 - Teddington Field - Plymouth
Branch table
BranchID - BranchName - TreeID
1001-10001 - 1st Branch - 1001
1001-10002 - 2nd Branch -1001
Leave table
LeafID - LeafName - BranchId
1001-10001-1 - Bedroom - 1001-10001
1001-10002-2 - Bathroom - 1001-10001
Possible ideas
Assign each database 1000 unique ID's and then one they come back in as the ids have already been assigned the ids on each database won't collide.
Downfall
This isn't very dynamic and could fail if one database overruns on its preassigned ids.
Is there another way to achieve the same flexibility but with out the downfall mentioned above?
So, as an answer:
on the master db, store an extra id field identifying the source/collection database that the dataset was collected on, as well as the tree id.
(src01, 1001), (src02, 1001)
This also allows you to link back easily to the collection source of the information which is likely gonna be a future requirement. Now, you may or may not want to autogenerate another sequence id key value on the master db's table (I wouldn't but that's because I am not that fond of surrogate keys), but I would definitely keep track of the source/treeid it was originally collected with in the field, separately of any master db unique key considerations.
Apparently you are talking about auto-generated IDs for related objects, not the IDs for the trees themselves. Two different people collecting information about the same tree, starting from the same starting set, end up generating the same IDs independently. The two sets of generated IDs cannot coexist in the same DB.
Since you want to keep all the new data. One possible solution is to avoid using the field-generated IDs in the central database at all. When each set of data comes in, take the data that were added in the field, and programmatically add them to the central DB in a way equivalent to how they are added in the field, letting the central DB autogenerate its own IDs.
This requires a mechanism to distinguish newly-collected data from old, but that might be as simple as a timestamp.
I have a GPS android app that I have made. It uses a SQLite database on the SD card for storing the location data. I am trying make up an ERD (Entity–relationship diagrams) for the database. This is where I am having problems. The database has one master table for the tracks and one for the waypoint groups (a collection of one or more waypoints). These tables do not have the location data in them but just the name of the track or waypoint group, start/stop time and date, and a uid. For each row in these tables a new table is made that contains the latitude and longitude info. In the sub table each row is one point or vertex. And the sub table name is the uid of the "master" table plus "t_" or "w_" for traks or waypoints. This is what I came up with using https://www.draw.io:
http://s10.postimg.org/usqsrwjmx/Untitled_Diagram.png
(sorry I do not have the 10 rep points to post an image, lol)
I think that composition link between the tables is right as the sub table only exists if there is a row in the master table. If the master row is deleted the corresponding sub table is also deleted. But how to show that there is a sub table for every row in the master table? It is also a little weird because there is no need for a FK in the sub table as the table name provides this function.
I don't think that I want to change my database structure at this late date as the app is to be sent out for testing shortly, but I would be interested in other (superior) database designs for this problem.
In a relational database data is linked. In your database, however, not only data is linked but also data with table names. Thus you are mixing content with structure.
As far as I know you cannot display this in an ERD. You cannot implement this correctly in your dbms either (such that your dbms knows about the structure and helps you with appropriate constraints).
You are not using your RDBMS properly. So in spite of the late date: if I were in your place, I would change this into a proper relational database model.
EDIT: A proper relational model would simply be to have only one track_detail table with the track id in a column rather than in its name. Same for waypoint_detail with the master's waypoint id. No big change actually.
I'm starting a new application and was wondering what the best method of logging is. Some tables in the database will need to have every change recorded, and the user that made the change. Other tables may just need to have the last modified time recorded.
In previous applications I've used different methods to do this but want to hear what others have done.
I've tried the following:
Add a "modified" date-time field to the table to record the last time it was edited.
Add a secondary table just for recording changes in a primary table. Each row in the secondary table represents a changed field in the primary table. So one record update in the primary could create several records in the secondary table.
Add a table similar to no.2 but it records edits across three or fours tables, reference the table it relates to in an additional field.
what methods do you use and would recommend?
Also what is the best way to record deleted data? I never like the idea that a user can permanently delete a record from the DB, so usually I have a boolean field 'deleted' which is changed to true when its deleted, and then it'll be filtered out of all queries at model level. Any other suggestions on this?
Last one.. What is the best method for recording user activity? At the moment I have a table which records logins/logouts/password changes etc, and depending what the action is, gives it a code either 1,2, 3 etc.
Hope I haven't crammed too much into this question. thanks.
I know it's a very old question, but I'd wanted to add more detailed answer as this is the first link I got googling about db logging.
There are basically two ways to log data changes:
on application server layer
on database layer.
If you can, just use logging on server side. It is much more clear and flexible.
If you need to log on database layer you can use triggers, as #StanislavL said. But triggers can slow down your database performance and limit you to store change log in the same database.
Also, you can look at the transaction log monitoring.
For example, in PostgreSQL you can use mechanism of logical replication to stream changes in json format from your database to anywhere.
In the separate service you can receive, handle and log changes in any form and in any database (for example just put json you got to Mongo)
You can add triggers to any tracked table to olisten insert/update/delete. In the triggers just check NEW and OLD values and write them in a special table with columns
table_name
entity_id
modification_time
previous_value
new_value
user
It's hard to figure out user who makes changes but possible if you add changed_by column in the table you listen.
Edit: Let me completely rephrase this, because I'm not sure there's an XML way like I was originally describing.
Yet another edit: This needs to be a repeatable process, and it has to be able to be set up in a way that it can be called in C# code.
In database A, I have a set of tables, related by PKs and FKs. A parent table, with child and grandchild tables, let's say.
I want to copy a set of rows from database A to database B, which has identically named tables and fields. For each table, I want to insert into the same table in database B. But I can't be constrained to use the same primary keys. The copy routine must create new PKs for each row in database B, and must propagate those to the child rows. I'm keeping the same relations between the data, in other words, but not the same exact PKs and FKs.
How would you solve this? I'm open to suggestions. SSIS isn't completely ruled out, but it doesn't look to me like it'll do this exact thing. I'm also open to a solution in LINQ, or using typed DataSets, or using some XML thing, or just about anything that'll work in SQL Server 2005 and/or C# (.NET 3.5). The best solution wouldn't require SSIS, and wouldn't require writing a lot of code. But I'll concede that this "best" solution may not exist.
(I didn't make this task up myself, nor the constraints; this is how it was given to me.)
I think the SQL Server utility tablediff.exe might be what you are looking for.
See also this thread.
First, let me say that SSIS is your best bet. But, to answer the question you asked...
I don't believe you will be able to get away with creating new id's all around, although you could but you would need to take the original IDs to use for lookups.
The best you can get is one insert statement for table. Here is an example of the code to do SELECTs to get you the data from your XML Sample:
declare #xml xml
set #xml='<People Key="1" FirstName="Bob" LastName="Smith">
<PeopleAddresses PeopleKey="1" AddressesKey="1">
<Addresses Key="1" Street="123 Main" City="St Louis" State="MO" ZIP="12345" />
</PeopleAddresses>
</People>
<People Key="2" FirstName="Harry" LastName="Jones">
<PeopleAddresses PeopleKey="2" AddressesKey="2">
<Addresses Key="2" Street="555 E 5th St" City="Chicago" State="IL" ZIP="23456" />
</PeopleAddresses>
</People>
<People Key="3" FirstName="Sally" LastName="Smith">
<PeopleAddresses PeopleKey="3" AddressesKey="1">
<Addresses Key="1" Street="123 Main" City="St Louis" State="MO" ZIP="12345" />
</PeopleAddresses>
</People>
<People Key="4" FirstName="Sara" LastName="Jones">
<PeopleAddresses PeopleKey="4" AddressesKey="2">
<Addresses Key="2" Street="555 E 5th St" City="Chicago" State="IL" ZIP="23456" />
</PeopleAddresses>
</People>
'
select t.b.value('./#Key', 'int') PeopleKey,
t.b.value('./#FirstName', 'nvarchar(50)') FirstName,
t.b.value('./#LastName', 'nvarchar(50)') LastName
from #xml.nodes('//People') t(b)
select t.b.value('../../#Key', 'int') PeopleKey,
t.b.value('./#Street', 'nvarchar(50)') Street,
t.b.value('./#City', 'nvarchar(50)') City,
t.b.value('./#State', 'char(2)') [State],
t.b.value('./#Zip', 'char(5)') Zip
from
#xml.nodes('//Addresses') t(b)
What this does is take Nodes from the XML and parse out the data. To get the relational id from people we use ../../ to go up the chain.
Dump the XML approach and use the import wizard / SSIS.
By far the easiest way is Red Gate's SQL Data Compare. You can set it up to do just what you described in a minute or two.
I love Red Gate's SQL Compare and Data Compare too but it won't meet his requirements for the changing primary keys as far as I can tell.
If cross database queries/linked servers are an option you could do this with a stored procedure that copies the records from parent/child in DB A into temporary tables on DB B and then add a column for the new primary key in the temp child table that you would update after inserting the headers.
My question is if the records don't have the same primary key how do you tell if it's a new record? Is there some other candidate key? If these are new tables why can't they have the same primary key?
I have created the same thing with a set of stored procedures.
Database B will have its own primary keys, but store Database A's primary keys, for debuging purposes. It means I can have more than one Database A!
Data is copied via a linked server. Not too fast; SSIS is faster. But SSIS is not for beginners, and it is not easy to code something that works with changing source tables.
And it is easy to call a stored procedure from C#.
I'd script it in a Stored Procedure, using Inserts to do the hard work. Your code will take the PKs from Table A (presumably via ##Scope_Identity) - I assume that the PK for Table A is an Identity field?
You could use temporary tables, cursors or you might prefer to use the CLR - it might lend itself to this kind of operation.
I'd be surprised to find a tool that could do this off the shelf with either a) pre-determined keys, or b) identity fields (clearly Tables B & C don't have them).
Are you clearing the destination tables each time and then starting again? That will make a big difference to the solution you need to implement. If you are doing a complete re-import each time then you could do something like the following:
Create a temporary table or table variable to record the old and new primary keys for the parent table.
Insert the parent table data into the destination and use the OUTPUT clause to capture the new ID's and insert them with the old IDs into the temp table.
NOTE: Using the output clause is efficient and allows you to do the insert in bulk without cycling through each record to be inserted.
Insert the child table data. Join to the temp table to retrieve the new foreign key required.
The above process could be done using T-SQL Script, C# code or SSIS. My preference would be for SSIS.
If you are adding each time then you may need to keep a permanent table to track the relationship between source database primary keys and destination database primary keys (at least for the parent table). If you needed to keep this kind of data out of the destination database, you could get SSIS to store/retrieve it from some kind of logging database or even a flat file.
You could probably avoid the above scenario if there is a combination of fields in the parent table that can be used to uniquely identify that record and therefore "find" the primary key for that record in the destination database.
I think most likely what I'm going to use is typed datasets. It won't be a generalized solution; we'll have to regenerate them if any of the tables change. But based on what I've been told, that's not a problem; the tables aren't expected to change much.
Datasets will make it reasonably easy to loop through the data hierarchically and refresh PKs from the database after insert.
When dealing with similar tasks I simply created a set of stored procedures to do the job.
As the task that you specified is pretty custom, you are not likely to find "ready to use" solution.
Just to give you some hints:
If the databases are on different servers use linked servers so you can access both source and destination tables simply through TSQL
In the stored procedure:
Identify the parent items that need to be copied - you said that the primary keys are different so you need to use unique constraints instead (you should be able to define them if the tables are normalised)
Identify the child items that need to be copied based on the identified parents, to check if some of them are already in the destination db use the unique constraints approach again
Identify the grandchild items (same logic as with parent-child)
Copy data over starting with the lowest level (grandchildren, children, parents)
There is no need for cursors etc, simply store the immediate results in the temporary table (or table variable if working within one stored procedure)
That approach worked for me pretty well.
You can of course add parameter to the main stored procedure so you can either copy all new records or only ones that you specify.
Let me know if that is of any help.