Help me with my SQL project (please) - sql

For this grading period, my CS teacher left us a open-choice project involving SQL and Delphi/VB.
I ended up with the assignment of designing and building a program that allowed the users to, through a GUI in Delphi/VB, insert and read hurricane data pulled from a database (latest SQL Server, by the way). However, there are a few catches.
Three tables are required: Hurricanes, Hurricane_History, and Category
The Category table is not meant to be modified, and it contains the columns 'Min. Speed', 'Max. Speed', and 'Category'. The idea is that a hurricane with a rotational speed of X falls into category Y if X is within the minimum and maximum speed of category Y.
The Hurricane table is meant to be modified by the end-user, through the Delphi/VB gui. It contains the following columns: 'Name', 'Day', 'Time', 'Rotational_Speed', 'Movement_Speed', 'Latitude', 'Longitude', and 'Photo'.
Then there is the Hurricane_History table, which contains 'Name', 'Category', 'Starting_DateTime', 'Ending_DateTime', 'Starting Latitude', 'Starting Longitude', 'Ending Latitude', 'Ending Longitude'. This table is not meant to be directly modified, but rather automatically populated through SQL (I figure using SQL triggers and stored procedures).
What the program should end up doing is the following: The user opens the visual app, and enters in information for a certain hurricane. Since only the table Hurricanes is meant to be modified, the user would insert the Name, Day, Time, Current Rotational Speed, Current Movement speed, current latitude, current longitude, and, optionally, a picture.
If the user enters a hurricane that does not exist yet, then it would create a new hurricane with the corresponding data in the Hurricane_History table. If he enters data for a hurricane that already exists, then the data for that hurricane should be updated, and stored into the corresponding Hurricane_History row. Furthermore, the current category of the hurricane should be automatically populated with SQL using the data that was stored in the Category table.
So far, I have the three tables, the columns, the Delphi GUI, the connections (between Delphi and SQL Server), etc.
What I'm having a real hard time with is the SQL Triggers and Stored procedures needed to generate the data in the Hurricane_History table. Here's my algorithm, the first one for populating the category, and the second one for populating the data of the Hurricane_History table:
create trigger determine_category on Hurricanes for insert, update as
*when a value is inserted into Hurricanes.Rotational_Speed, match it with the corresponding row in the Categories table, and insert the corresponding category into the Category column of the hurricane's Hurricane_History row.*
create trigger populate_data on Hurricanes for insert, update as
*if Hurricane.name exists, perform an update instead of an insert for using Hurricanes.Day as Hurricanes_History.Ending_Day, Hurricanes.Latitude and Hurricane.Longitude as Hurricanes_History.Ending_Latitude and Hurricanes_History.Ending_Longitude, and the Category using the determine_category trigger.*
*if Hurricane.name does not exist, create a record in Hurricanes_History using the data from the newly inserted Hurricane record, and populating the Category using the determine_category trigger*
What I need help with is translating my thoughts and ideas into SQL code, so I was wondering if anyone might want to help me throughout this.
Thanks a bunch!
EDIT:
I just whipped up a simple stored procedure for determining the category. What I don't know how to do is use the result/output of the stored procedure as an insertion value. Does anyone have insight on how to do it?
CREATE PROCEDURE determine_category
#speed int(5)
AS
SELECT Category FROM Categories
WHERE Max_Speed >= #speed AND Min_Speed >= #speed

First, since you're using SQL Server and you can use stored procedures, don't use a trigger. It's not necessary. If your teacher needs justification, here's an article from SQL Server MVP Tom LaRock which discusses issues with handling triggers.
Second, as far as how to write the stored procedures, think about how to handle all the functionality logically. You've said you need to do the following:
Read existing hurricane information
Update existing hurricane information
Insert a new hurricane into the database
Your application should handle all of those as separate paths. And you need to think about the functionality before you write your first bit of T-SQL code. That means you have to have an interface which presents existing information. You're going to have to display the hurricanes existing in the database. Then once the user selects the one to get more information on, you'll have to pull back the hurricane history information. So I know in that situation I have two different data retrievals based on user input. That tells me I need to build the GUI interface to handle that progression logically and display the information in a way the user can use. And it also tells me I've got to build two different stored procedures. The second one will be passed some information identifying the hurricane to retrieve data on (which would be the primary key).
Now roll through the rest of the application's functionality. That should get you started.

Rather than use triggers to do this, I would be more inclined to perform logical DML SQL statements inside transactions. Triggers, whilst sometimes proving useful, are not really necessary in this scenario (unless they are required for your coursework).
As a first approach, think about what is required to complete the application -
A UI layer to present data to the user, allow a user to search, insert, update (and possibly delete) hurricane data.
In this layer, we'll most likely want to
1.present users with a list of previous hurricanes, perhaps with some key details displayed and give users the ability to select a particular hurricane and see all the details.
2.give users the ability to insert new hurricane data. Think about how category will be displayed to a user to choose and how inputted data will be taken from this layer and ultimately end up in the data layer. Think also about how and if we should validate the user input. What needs to be validated? Well, ensuring against SQL injection, that values are in permitted ranges and lengths, etc. if this were a real application, then user input validation would be a necessity.
A data layer used to store the data in a defined entity relationship.
A data access layer used to perform all data access logic in regard to manipulating the application data.
A Business logic layer that contains the classes required for the application. Will contain any of the rules associated with the entities and will be used to present data to the UI layer.
We could take an extremely simplified approach and have the UI layer call straight into the data layer through stored procedures (which would be acting as our data access layer and also our business logic layer as they will encapsulate the rules regarding whether a hurricane record already exists and needs updating or a new record needs creating, possibly some validation too).

Re: Inserting sproc output into a table. Use the following general syntax:
INSERT INTO table (field1, field2, field3)
EXEC yourSproc(param, param)
In the insert documentation, search for execute_statement for details.

Related

Custom user defined database fields, what is the best solution?

To keep this as short as possible I'm going to use and example.
So let's say I have a simple database that has the following tables:
company - ( "idcompany", "name", "createdOn" )
user - ( "iduser", "idcompany", "name", "dob", "createdOn" )
event - ( "idevent", "idcompany", "name", "description", "date", "createdOn" )
Many users can be linked to a single company as well as multiple events and many events can be linked to a single company. All companies, users and events have columns as show above in common. However, what if I wanted to give my customers the ability to add custom fields to both their users and their events for any unique extra information they wish to store. These extra fields would be on a company wide basis, not on a per record basis ( so a company adding a custom field to their users would add it to all of their users not just one specific user ). The custom fields also need to be sesrchable and have the ability to be reported on, ideally automatically with some sort of report wizard. Considering the database is expected to have lots of traffic as well as lots of custom fields, what is the best solution for this?
My current research and findings in possible solutions:
To have generic placeholder columns such as "custom1", "custom2" etc.
** This is not viable as there will eventually be too many custom columns and there will be too many NULL values stored in the database
To have 3x tables per current table. eg: user, user-custom-field, user-custom-field-value. The user table being the same. The user-custom-field table containing the information about the new field such as name, data type etc. And the user-custom-field-value table containing the value for the custom field
** This one is more of a contender if it were not for its complexity and table size implications. I think it will be impossible to avoid a user-custom-field table if I want to automatically report on these fields as I will have to store the information on how to report on these fields here. However, In order to pull almost any data you would have to do a million joins on the user-custom-field-value table as well as the fact that your now storing column data as rows which in a database expected to have a lot of traffic as well as a lot of custom fields would soon cause a problem.
Create a new user and event table for each new company that is added to the system removing the company id from within those tables and instead using it in the table name ( eg user56, 56 being the company id ). Then allowing the user to trigger DB commands that add the new custom columns to the tables giving them the power to decide if it has a default value or auto increments etc.
** Everytime I have seen this solution it has always instantly been shut down by people saying it would be unmanageable as you would eventually get thousands of tables. However nobody really explains what they mean by unmanageable. Firstly as far as my understanding goes, more tables is actually more efficient and produces faster search times as the tables are much smaller. Secondly, yes I understand that making any common table changes would be difficult but all you would have to do is run a script that changes all your tables for each company. Finally I actually see benefits using this method as it would seperate company data making it impossible for one to accidentally access another's data via a potential bug, plus it would potentially give the ability to back up and restore company data individually. If someone could elaborate on why this is perceived as a bad idea It would be appreciated.
Convert fully or partially to a NoSQL database.
** Honestly I have no experience with schemaless databases and don't really know how dynamic user defined fields on a per record basis would work ( although I know it's possible ). If someone could explain the implications of the switch or differences in queries and potential benefits that would be appreciated.
Create a JSON column in each table that requires extra fields. Then add the extra fields into that JSON object.
** The issue I have with this solution is that it is nearly impossible to filter data via the custom columns. You would not be able to report on these columns and until you have received and processed them you don't really know what is in them.
Finally if anyone has a solution not mentioned above or any thoughts or disagreements on any of my notes please tell me as this is all I have been able to find or figure out for myself.
A typical solution is to have a JSON (or XML) column that contains the user-defined fields. This would be an additional column in each table.
This is the most flexible. It allows:
New fields to be created at any time.
No modification to the existing table to do so.
Supports any reasonable type of field, including types not readily available in SQL (i.e. array).
On the downside,
There is no validation of the fields.
Some databases support JSON but do not support indexes on them.
JSON is not "known" to the database for things like foreign key constraints and table definitions.

Oracle Audit Trail to get the list of columns which got updated in last transaction

Consider a table(Student) under a schema say Candidates(NOT DBA):
Student{RollNumber : VARCHAR2(10),Name : VARCHAR2(100),CLass : VARCHAR2(5),.........}
Let us assume that the table already contains some valid data.
I executed an update query to modify the name and class of the Student table
UPDATE STUDENT SET Name = 'ASHWIN' , CLASS = 'XYZ'
WHERE ROLLNUMBER = 'AQ1212'
Followed by another update query in which I am updating some other fields
UPDATE STUDENT SET Math_marks = 100 ,PHY_marks , CLASS = 'XYZ'
WHERE ROLLNUMBER = 'AQ1212'
Since I modified different columns in two different queries. I need to fetch the particular list of columns which got updated in last transaction. I am pretty sure that oracle must be maintaining this in some table logs which could be accessed by DBA. But I don't have the DBA access.
All I need is a the list of columns that got updated in last transaction under schema Candidates I DO NOT have the DBA rights
Please suggest me some ways.
NOTE : Here above I mentioned a simple table. But In actual I have got 8-10 tables for which I need to do this auditing where a key factor lets say ROLLNUMBER acts a foreign key for all other tables. Writing triggers would be a complex for all tables. So please help me out if there exists some other way to fetch the same.
"I am pretty sure that oracle must be maintaining this in some table logs which could be accessed by DBA."
Actually, no, not be default. An audit trail is a pretty expensive thing to maintain, so Oracle does nothing out of the box. It leaves us to decide what we what to audit (actions, objects, granularity) and then to switch on auditing for those things.
The Oracle requires DBA access to enable the built-in functionality, so that may rule it out for you anyway.
Auditing is a very broad topic, with lots of things to consider and configure. The Oracle documentation devotes a big chunk of the Security manual to Auditing. Find the Introduction To Auditing here. For monitoring updates to specific columns, what you're talking about is Fine-Grained Audit. Find out more.
"I have got 8-10 tables ... Writing triggers would be a complex for all tables."
Not necessarily. The triggers will all resemble each other, so you could build a code generator using the data dictionary view USER_TAB_COLUMNS to customise some generic boilerplate text.

What is the best method of logging data changes and user activity in an SQL database?

I'm starting a new application and was wondering what the best method of logging is. Some tables in the database will need to have every change recorded, and the user that made the change. Other tables may just need to have the last modified time recorded.
In previous applications I've used different methods to do this but want to hear what others have done.
I've tried the following:
Add a "modified" date-time field to the table to record the last time it was edited.
Add a secondary table just for recording changes in a primary table. Each row in the secondary table represents a changed field in the primary table. So one record update in the primary could create several records in the secondary table.
Add a table similar to no.2 but it records edits across three or fours tables, reference the table it relates to in an additional field.
what methods do you use and would recommend?
Also what is the best way to record deleted data? I never like the idea that a user can permanently delete a record from the DB, so usually I have a boolean field 'deleted' which is changed to true when its deleted, and then it'll be filtered out of all queries at model level. Any other suggestions on this?
Last one.. What is the best method for recording user activity? At the moment I have a table which records logins/logouts/password changes etc, and depending what the action is, gives it a code either 1,2, 3 etc.
Hope I haven't crammed too much into this question. thanks.
I know it's a very old question, but I'd wanted to add more detailed answer as this is the first link I got googling about db logging.
There are basically two ways to log data changes:
on application server layer
on database layer.
If you can, just use logging on server side. It is much more clear and flexible.
If you need to log on database layer you can use triggers, as #StanislavL said. But triggers can slow down your database performance and limit you to store change log in the same database.
Also, you can look at the transaction log monitoring.
For example, in PostgreSQL you can use mechanism of logical replication to stream changes in json format from your database to anywhere.
In the separate service you can receive, handle and log changes in any form and in any database (for example just put json you got to Mongo)
You can add triggers to any tracked table to olisten insert/update/delete. In the triggers just check NEW and OLD values and write them in a special table with columns
table_name
entity_id
modification_time
previous_value
new_value
user
It's hard to figure out user who makes changes but possible if you add changed_by column in the table you listen.

Db design for data update approval

I'm working on a project where we need to have data entered or updated by some users go through a pending status before being added into 'live data'.
Whilst preparing the data the user can save incomplete records. Whilst the data is in the pending status we don't want the data to affect rules imposed on users editing the live data e.g. a user working on the live data should not run up against a unique contraint when entering the same data that is already in the pending status.
I envisage that sets of data updates will be grouped into a 'data submission' and the data will be re-validated and corrected/rejected/approved when someone quality control the submission.
I've thought about two scenarios with regards to storing the data:
1) Keeping the pending status data in the same table as the live data, but adding a flag to indicate its status. I could see issues here with having to remove contraints or make required fields nullable to support the 'incomplete' status data. Then there is the issue with how to handle updating existing data, you would have to add a new row for an update and link it back to existing 'live' row. This seems a bit messy to me.
2) Add new tables that mirror the live tables and store the data in there until it has been approved. This would allow me to keep full control over the existing live tables while the 'pending' tables can be abused with whatever the user feels he wants to put in there. The downside of this is that I will end up with a lot of extra tables/SPs in the db. Another issue I was thinking about was how might a user link between two records, whereby the record linked to might be a record in the live table or one in the pending table, but I suppose in this situation you could always take a copy of the linked record and treat it as an update?
Neither solutions seem perfect, but the second one seems like the better option to me - is there a third solution?
Your option 2 very much sounds like the best idea. If you want to use referential integrity and all the nice things you get with a DBMS you can't have the pending data in the same table. But there is no need for there to be unstructured data- pending data is still structured and presumably you want the db to play its part in enforcing rules even on this data. Even if you didn't, pending data fits well into a standard table structure.
A separate set of tables sounds the right answer. You can bring the primary key of the row being changed into the pending table so you know what item is being edited, or what item is being linked to.
I don't know your situation exactly so this might not be appropriate, but an idea would be to have a separate table for storing the batch of edits that are being made, because then you can quality control a batch, or submit a batch to live. Each pending table could have a batch key so you know what batch it is part of. You'll have to find a way to control multiple pending edits to the same rows (if you want to) but that doesn't seem too tricky a problem to solve.
I'm not sure if this fits but it might be worth looking into 'Master Data Management' tools such as SQL Server's Master Data Services.
'Unit of work' is a good name for 'data submission'.
You could serialize it to a different place, like (non-relational) document-oriented database, and only save to relational DB on approval.
Depends on how many of live data constraints still need to apply to the unapproved data.
I think second option is better. To manage this, you can use View which will contain both tables and you can work with this structure through view.
Another good approach is to use XML column in a separate table to store necessary data(because of unknown quantity/names of columns). You can create just one table with XML column ad column "Type" do determine which table this document is related with.
First scenerio seems to be good.
Add Status column in the table.There is no need to remove Nullable constraint just add one function to check the required fields based on flag like If flag is 1(incomplete) Null is allowed otherwise Not allowed.
regarding second doubt do you want to append the data or update the whole data.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)