In PostgreSQL, efficiently using a table for every row in another table - sql

I am sorry for the lack of notation in my question but I am not too familiar with SQL. Despite searching the internet for a decent amount of hours, I couldn't find that how to do efficiently what I wanted to do, but that is maybe because I am not familiar with the notation. Here comes the question:
I want to create a table, say Forms, in which each Form row has an ID, some metadata and a pointer(?) to the table of that Form row, lets say Form12 table, which directs me to Form12 table. I need it because every Form has different number, name and type of columns depending on users configuration for a particular Form.
So, I thought I can put the Table ID of Form12 as a column to Form table. But is this approach considered OK, or is there a better way to do it?
Thank you for your time.

Storing the names of tables in a column is generally not a good solution in a relational database. In order to use the information, you need to use dynamic SQL.
I would instead ask why you cannot store the information in a single table or well-defined sets of tables. Postgres has lots of options to help with this:
NULL data values, so columns do not need to be filled in.
Table inheritance, so tables can share columns.
JSON columns to support a flexible set of columns.
Entity-attribute-value (EAV) data models, which allow for lots of flexibility.

Related

one table with a lot of rows or a lot of tables with a view? on SQL Server

My question comes from what is more efficient when making queries and insert, since the number of registers(data) in my table will grow a lot.
I would like to know what is more efficient to do if all the data is placed within a single table or is the partition and through a View and trigger is more efficient to obtain and enter registers(data).
As already mentioned take a look at database normalization.
SQL is a way to work with relational databases and is built on the idea that we should have many tables that are linked with each other trough relationships. Thus I recommend multiple tables, because you will be able to reuse data (for example user name and surname) through specific IDs rather than copying that data each time a user performs some action on your platform and you need to insert or update some information.
Hope this helps!

Best Practice - Should I make one table or two for two similar sets of data?

I need a table to store types of tests. I've been provided with two excel spreadsheets, one for microbial tests, one for pathogens. Microbial has 5 columns and Pathogens has 10. The 5 columns are in both tables. So one has 5 extra columns.
Just to give you an idea, the table columns would be something like this:
**Microbial**
Test Method IncubationStage1
**Pathogens**
Test Method IncubationStage1 IncubationStage2 Enrichment
So Is it better to have one table for Microbial and one for Pathogens, or better to have one table for Tests and have both within it? Is it bad to have a Microbial in a table where I know for certain only half the columns will be utilized? Or is it better to keep related items in the same table, and separate them by a column "Type"?
Obviously both will work fine but I'm wondering which is better.
The answer to these sorts of questions is always "it depends."
For my opinion, if you think you'll ever want to aggregate the data by test or by method across pathogenic or microbial types, then certainly you should put the data in the same table with an additional column that differentiates them.
You also could potentially better "normalize" your tables like this:
Table1: ExperimentID_PK ExperimentTypeID_FK Test Method
Table2: MeasurementRecordID_PK ExperimentID_FK Timestamp Other metadata about the record
Table3 MeasurementID_PK MeasurementTypeID_FK MeasurementValue MeasurementRecordID_FK
Table4: MeasurmentTypeId_PK Metadata About Measurement Types
Table5: ExperimentTypeId_PK Metadata About Experiment Types
... where all the leaf data elements point back to their parent data elements through foreign keys, and then you'd join data together in SQL statements, with indexes applied for optimal performance based on the types of queries you wanted to make. Obviously one of your rows in the question would end up appearing as multiple rows across multiple tables in this schema, and only at query time could they conceivably be reunited into individual rows (e.g. bound by MeasurementRecordID).
But there are other patterns too, in No-SQL land normalization can be the enemy. Slicing and dicing data sets turns out to be easier in some domains if it is stored in a more bloated format to make query structures more obvious. So it kind of comes down to thinking through your use cases.

multiple different record types within same file must be converted to sql table

One for the SQL data definition gurus:
I have a mainframe file that has about 35-100 different record types within it. Depending upon the type of record the layout and each column is redefined into whatever. Any column on any different record could become a different length or type. I am not really wanting to split this thing up into 35-100 different tables and relating them together. I did find out that postgres has %ROWTYPE with cursor or table based records. However in all examples the data looked the same. How can I setup a table that would handle this and what sql queries would be needed to return the data? Doesn't have to be postgres but that was the only thing I could find, that looked similar to my problem.
I would just make a table with all TEXT datatype fields at first. TEXT is variable, so it only takes up the space it needs, so it performs very well. From there, you may find it quicker to move the data into better formed tables, if certain data is better with a more specific data type.
It's easier to do it in this order, because bulk insert with COPY is very picky... so with TEXT you just worry about the number of columns and get it in there.
EDIT: I'm referring to Postgres with this answer. Not sure if you wanted another DB specific answer.

SQL lookup in SELECT statement

I've got and sql express database I need to extract some data from. I have three fields. ID,NAME,DATE. In the DATA column there is values like "654;654;526". Yes, semicolons includes. Now those number relate to another table(two - field ID and NAME). The numbers in the DATA column relate to the ID field in the 2nd table. How can I via sql do a replace or lookup so instead of getting the number 654;653;526 I get the NAME field instead.....
See the photo. Might explain this better
http://i.stack.imgur.com/g1OCj.jpg
Redesign the database unless this is a third party database you are supporting. This will never be a good design and should never have been built this way. This is one of those times you bite the bullet and fix it before things get worse which they will. Yeu need a related table to store the values in. One of the very first rules of database design is never store more than one piece of information in a field.
And hopefully those aren't your real field names, they are atriocious too. You need more descriptive field names.
Since it a third party database, you need to look up the split function or create your own. You will want to transform the data to a relational form in a temp table or table varaiable to use in the join later.
The following may help: How to use GROUP BY to concatenate strings in SQL Server?
This can be done, but it won't be nice. You should create a scalar valued function, that takes in the string with id's and returns a string with names.
This denormalized structure is similar to the way values were stored in the quasi-object-relational database known as PICK. Cool database, in many respects ahead of its time, though in other respects, a dinosaur.
If you want to return the multiple names as a delimited string, it's easy to do with a scalar function. If you want to return the multiple rows as a table, your engine has to support functions that return a type of TABLE.

Sql Design Question

I have a table with 25 columns where 20 columns can have null values for some (30-40%) rows.
Now what is the cost of having rows with 20 null columns? Is this OK?
Or
is it a good design to have another table to store those 20 columns and add a ref to the first table?
This way I will only write to the second table only when there is are values.
I am using SQL server 2005. Will migrate to 2008 in future.
Only 20 columns are varchar, rest smallint, smalldate
What I am storing:
These columns store different attributes of the row it belongs to. These attributes can be null sometimes.
The table will hold ~billion of rows
Please comment.
You should describe the type of data you are storing. It sounds like some of those columns should be moved to another table.
For example, if you have several columns that represent multiple columns for the same type of data, then I would say move it to another table On the other hand, if you need this many columns to describe different types of data, then you may need to keep it as it is.
So it kind of depends on what you are modelling.
Are there some circumstances where some of those columns are required? If so, then perhaps you should use some form of inheritance. For instance, if this were information about patients in a hospital, and there was some data that only made sense for female patients, then you could create a FemalePatients table with those columns. Those columns that must always be collected for female patients could then be declared NOT NULL in that separate table.
It depends on the data types (40 nullable ints is going to basically take the same space as 40 non-nullable ints, regardless of the values). In SQL Server, the space is fairly efficient with ordinary techniques. In 2008, you do have the SPARSE feature.
If you do split the table vertically with an optional 1:1 relationship, there is a possibility of wrapping the two tables with a view and adding triggers on the view to make it updatable and hide the underlying implementation.
So there are plenty of options, many of which can be implemented after you see the data load and behavior.
Create tables based on the distinct sets of attributes you have. So if you have some data where some of your columns do not apply then it would make sense to have that data in a table which doesn't have those columns. As far as possible, avoid repeating the same attribute in multiple tables. Make sure your data is in at least Boyce-Codd / 5th Normal Form and you won't go far wrong.