Can I create new table on SQL without specifying the datatype? - sql

I want to create new table with empty columns and specify the datatype later. Is it possible? I try to do so on myscompiler.io and it works. I don't know if it's just possible in such site or is it actually possible to create that once I use other tools to write my SQL.

No. The SQL syntax requires that a table be well-defined, with column names and data types. This is true in every database that I can think of.
You could possibly do what you want in one of three ways:
Your database might support some sort of generic type which you could use to define the column. For instance, SQL Server has a sql_variant type.
You could define the table with a specific type such as a string and change the type later using alter table.
You could define the table with a single primary key column and add columns as you decide what they are.
I don't recommend any of these approaches. Instead, I would suggest that you need to re-think how your application is structured. Tables represent entities and entities have properties. Generally when using databases, these things are known before you start doing any work. There may be some cases where dynamic table creation is useful, but that is definitely not the common approach when using databases.

Related

Should tablevalued parameters in TSQL have database constraints?

Microsoft SQL Server has table valued parameters that help in encapsulating multiple values of an entity into single type. Refer here
Assume that CRUD APIs use this type to trigger different stored procedures. Should the checks on mandatory values (like field X is required while creation) be added to create_stored_procedure or these must be added as constraints like 'NOT NULL' to table type.
What is the suggestion or best practice from maintainability perspective ?
It depends... The whole purpose of any constraint is to control the the data being inserted... Adding constraints to your TVPs will help insure that you are getting only valid values. Of course that means invalid values will generate errors. So, the question becomes, where and how do you want to handle invalid values?
As a general rule, I create my TVP tables, the same way I create any other table... All columns have a datatype that is appropriate for the data being captured and set to NOT NULL unless there is a specific reason to leave it NULLable.

How to join a table within a user defined function whose name is provided as parameter?

Context
I have three tables in my SQL Server database: 1) School, 2) College, 3) University.
Then I have another table: Tags.
Each of the three tables (School, College, University) can have Tags associated with them. For which purpose I have three association tables: SchoolTags, CollegeTags, UniversityTags.
Problem
I am trying to create a user-defined function that will take the name of association table as parameter (i.e. 'SchoolTags') and the Id of the entity (school/college/university) and will return a list of tags associated with that entityId.
The issue I am having is I have got to join Tags with a table whose name will come in as parameter. For that I am creating a dynamic query. And we can not run dynamic queries in SQL Server user-defined functions.
Question
Any idea how can that be acheived?
Note: I want separate association tables as I have created and do not want to convert them into a generic association table and I do not want to add If-Else based on table names in my function so that if a new association table is created, I do not need to update my function.
I am using Microsoft SQL Server.
Whatever language you are using, you would probably use if:
begin
if table = 'school' then
begin
. . .
end;
else if table = 'college' then
. . .
end;
The exact syntax depends on the scripting language for the database you are using.
What you desire is impossible. You cannot pass a table name as a parameter to a UDF and use dynamic sql in the UDF to then create and execute a statement that is specific to the table passed as the argument. You already know that you have no choice but to use if-else statements in your UDF to achieve your goal - it is your pipe-dream of "never having to update (or verify) your code when the schema changes" (yes - I rephrased it to make your issue more obvious) that is a problem.
There are likely to be other ways of implementing some useful functionality - but I suggest that you are thinking too far ahead and trying to implement generic functions without a clear purpose. And that is a very difficult and trouble-prone path that requires sophisticated tsql skills.
And to re-iterate the prior responses, you have a schema problem. You purposely created three different entities - and now you want a common function to use with any of them. So before you spend much time on this particular aspect, you should take some time to think carefully about how you intend to use (i.e., write queries against) these tables. If you find yourself using unions frequently to combine these entities into a common resultset, then you have might have a mismatch between your actual business and your model (schema) of it.
Consider normalizing your database in related, logical groupings for one EducationInstitution table and one JoinEducTags table. Those tables sound like they maintain the same structure but of different typology and hence should be saved in one table with different Type field for School, College, University, etc.
Then, add necessary constraints, primary/foreign keys for the one-to-many relationship between all three sets:
You never want to keep restructuring your schema (i.e., add tables) for each new type. With this approach, your user-defined function would just need to receive value parameters not identifiers like tables to be run in dynamic querying. Finally, this approach scales better with efficient storage. And as you will see normalization saves on complex querying.

SQL database design: storing the type of a row

I am designing a database to contain a table reference, with a column type that is one of several predefined values (e.g., book, movie, magazine, etc.). I intend the range of possible values to expand over time (e.g. if I realize that I missed the academic_paper type, I want to be able to put that in).
The easiest solution would seem to be to simply store a string representing the type into the table. But this sounds like it would result in a lot of wasted space.
The other solution I thought of is creating a new table reference_types, which the type column references in its foreign key. This seems to have the added benefit of ensuring valid foreign keys (so that I won't accidentally mistype a "magzine" somewhere in my code), possible allow for faster queries for all media of a certain type (since integer comparisons should be much faster than string comparisons), but also slow my application down a bit as joins would be required whenever I need the reference type, and probably complicate logic because of those extra joins.
What are your thoughts on schema design for this problem?
Your second solution is the correct one. Create a secondary table to store your reference types and link them using a foreign key.
For further reading on this subject the search term you'd want to use is 'database normalisation'.
Create the reference_types table. And in your references table use integer and also add a reference_type_name field.
You can query the references table to get the integer key and print its name when needed without performing a join to the other table, and still use that table to perfom other operations, just keep both tables with equal type names.
I know it sonds redundant, but it's really the fastest way to do a simple query by int key and have it all together.
It depends, if you will want to add some other information to reference types, then use the second approach. If not, use the first one because it's faster and the information stored is only a string (you can always select unique to retrieve your types). Read this article for more info.

SQL: Advantages of an ENUM vs. a one-to-many relationship?

I very rarely see ENUM datatypes used in the wild; a developer almost always just uses a secondary table that looks like this:
CREATE TABLE officer_ranks (
id int PRIMARY KEY
,title varchar NOT NULL UNIQUE);
INSERT INTO officer_ranks VALUES (1,'2LT'),(2,'1LT'),(3,'CPT'),(4,'MAJ'),(5,'LTC'),(6,'COL'),(7,'BG'),(8,'MG'),(9,'LTG'),(10,'GEN');
CREATE TABLE officers (
solider_name varchar NOT NULL
,rank int NOT NULL REFERENCES officer_ranks(id) ON DELETE RESTRICT
,serial_num varchar PRIMARY KEY);
But the same thing can also be shown using a user-defined type / ENUM:
CREATE TYPE officer_rank AS ENUM ('2LT', '1LT','CPT','MAJ','LTC','COL','BG','MG','LTG','GEN');
CREATE TABLE officers (
solider_name varchar NOT NULL
,rank officer_rank NOT NULL
,serial_num varchar PRIMARY KEY);
(Example shown using PostgreSQL, but other RDBMS's have similar syntax)
The biggest disadvantage I see to using an ENUM is that it's more difficult to update from within an application. And it might also confuse an inexperienced developer who's used to using a SQL DB simply as a bit bucket.
Assuming that the information is mostly static (weekday names, month names, US Army ranks, etc) is there any advantage to using a ENUM?
Example shown using PostgreSQL, but other RDBMS's have similar syntax
That's incorrect. It is not an ISO/IEC/ANSI SQL requirement, so the commercial databases do not provide it (you are supposed to provide Lookup tables). The small end of town implement various "extras", but do not implement the stricter requirements, or the grunt, of the big end of town.
We do not have ENUMs as part of a DataType either, that is absurd.
The first disadvantage of ENUMs is that is it non-standard and therefore not portable.
The second big disadvantage of ENUMs is, that the database is Closed. The hundreds of Report Tools that can be used on a database (independent of the app), cannot find them, and therefore cannot project the names/meanings. If you had a normal Standard SQL Lookup table, that problem is eliminated.
The third is, when you change the values, you have to change DDL. In a Normal Standard SQL database, you simply Insert/Update/Delete a row in the Lookup table.
Last, you cannot easily get a list of the content of the ENUM; you can with a Lookup table. More important, you have a vector to perform any Dimension-Fact queries with, eliminating the need for selecting from the large Fact table and GROUP BY.
I don't see any advantage in using ENUMS.
They are harder to maintain and don't offer anything that a regular lookup table with proper foreign keys wouldn't allow you to do.
A disadvantage of using something like an ENUM is that you can't get a list of all the available values if they don't happen to exist in your data table, unless you hard-code the list of available values somewhere. For example, if in your OFFICERS table you don't happen to have an MG on post there's no way to know the rank exists. Thus, when BG Blowhard is relieved by MG Marjorie-Banks you'll have no way to enter the new officer's rank - which is a shame, as he is the very model of a modern Major General. :-) And what happens when a General of the Army (five-star general) shows up?
For simple types which will not change I've used domains successfully. For example, in one of my databases I've got a yes_no_domain defined as follows:
CREATE DOMAIN yes_no_dom
AS character(1)
DEFAULT 'N'::bpchar
NOT NULL
CONSTRAINT yes_no_dom_check
CHECK ((VALUE = ANY (ARRAY['Y'::bpchar, 'N'::bpchar])));
Share and enjoy.
ENUMS are very-very-very useful! You just have to know how to use them:
An ENUM uses only 2 Bytes of storage.
No need for additional constraint (as replacement for FK).
Cheaper changes of Values compared to natural values in FKs.
No need for additional JOIN
ENUMs are ordered, ex you can compare if Monday < Friday, or January is < June or Project Initiation is < Payroll.
Thus if you have a fixed list of string values, which you want to use, an ENUM is a better solution compared to a lookup table. Let's say you need to List Amino-Acids in your products, with their respective weight. Today there are ~20 Amino Acids. If you would store their full names, you'd need much more space each time then 2 Bytes. The other option is to use artificial keys and to link to a foreign table. But how would the foreign Table look like? Would it have 2 columns: ID and Amino Acid Name? And you would join that table every time? What if your main table has >40 such fields? Querying that table would involve >40 Joins.
If your database hosts 1600 Tables, 400 of which are lookup tables which just replace ENUMs, your devs will waste lots of time navigating through them (in addition to the JOINs). Yes, you can work with prefixes, schemas and such.... but why not just kick those tables out?
ENUMS are Enumerated lists / ordered. That means that if you have values which are ordered, you are actually saving the hassle of maintaining a 3 columns lookup table.
The question is rather: why do I need lookup tables then?
Well, the answer is easy:
When your values are changing often
When you need to store more additional attributes --> The lookup table corresponds to a full fledged data object, and not a lookup list.
When you need it quick and dirty
And now the funny thing:
Lookup Tables and ENUMS are not complete replacements for each other!!!!
If you have a list, where the PK is single-column natural key. The list can grow or the values can change their names (for some reason), then you could define an ENUM and use it for both: PK in lookup and FK in main tables!
Example benefit:
you have to change the name of a lookup key. Without using the ENUM the DBMS will have to cascade the changes to all tables, where you use this value and not just your lookup table. If you are using ENUM, then you just change the value of ENUM, and there are no changes to the data.
A small advantage may lie in the fact, that you have a sort of UDT when creating an ENUM. A user defined type can be reused formally in many other database objects, e.g. in views, other tables, other types, stored procedures (in other RDBMS), etc.
Another advantage is for documentation of the allowed values of a field. Examples:
A yes/no field
A male/female field
A mr/mrs/ms/dr field
Probably a matter of taste. I prefer ENUMs for these kinds of fields, rather than foreign keys to lookup tables for such simple concepts.
Yet another advantage may be that when you use code generation or ORMs like jOOQ in Java, you can use that ENUM to generate a Java enum class from it, instead of joining the lookup table, or working with the ENUM literal's ID
It's a fact, though, that only few RDBMS support a formal ENUM type. I only know of Postgres and MySQL. Oracle or DB2 don't have it.
Advantages:
Type safety for stored procedures: will raise a type error if argument can not be coerced into the type. Like: select court_martial('3LT') would raise a type error automatically.
Custom coalition order: In your example, officers could be sorted without a ranking id.
Generally speaking, enum is better for things that don't change much, and it uses slightly fewer resources, since there's no FK checks or anything like to execute on insert etc.
Using a lookup table is more elegant and or traditional and it's much easier to add and remove options than an enum. It's also easier to mass change the values than an enum.
Well, you don't see, because usually developers are using enums in programming languages such as Java, and the don't have their counterparts in database design.
In database such enums are usually text or integer fields, with no constraints. Database enums will not be translated into Java/C#/etc. enums, so the developers see no gain in this.
There are very many very good database features which are rarely used because most ORM tools are too primitive to support them.
Another benefit of enums over a lookup table is that when you write SQL functions you get type checking.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)