Difference between an array and single-column in SQL - sql

For databases that support arrays -- for example, Postgres -- what would be the difference between the following two items:
`name` `field_a` (row array)
Tom [1, 2, 3]
And:
`name` `field_a` (single column)
Tom 1
Tom 2
Tom 3
The above would be two 'variations' of combining two tables:
name
`name`
Tom
numbers
`field_a`
1
2
3
If the array version vs the other version are not interchangeable, what are the main differences between the two?

Array stores the data in a single row and it needs some kind of processing (Different for different databases) before accessing/searching/sorting the particular value. It saves the size of table as repeating data occurs in single row but ultimately had more processing time when it comes to updating/searching/sorting and many more operations on the data stored as array.
Single values in each row is more preferred in databases as it is easy to find the record, update particular data, sort the data and many more operations.
So according to me only insertion of array is faster than the individual values and it saves some space of the table but all other operations will be time consuming. So it is better to store individual values in each row for database.
Databases are designed to handle single values more easily and operations on single values are faster than the arrays.
Simple example of complexity from your question: Replace 2 with 5 in field_a for name = 'Tom'

Another way of thinking about it is that an array named column is effectively column0, column1, column2 etc which the DB handles for you, whereas the table is normalized (1st Normal Form) into rows.
It is, however, harder to enforce a fixed size array in a normalized structure. You can enforce a maximum by defining a third table with numbers 0,1,2 and foreign-keying the child table on that. You cannot enforce a minimum like this (except in certain DBMSs with DB level constraints).
Rarely are fixed size arrays actually necessary. The majority of cases when they are used just break 1st Normal Form

Related

SQL Database Design Approach

Approach 1- We have a different factories with the same code base structure. But different factories are storing different set of information like Factory A stores Parameter and its Values eg: Current = 110 and Factory B stores different parameter (eg:- Voltage = 10, etc) and its values. So we decided to go with different tables approach for different factories to avoid NULL values and defined Parameter as Columns (Eg:- Table_For_Factory_A Columns are Current, ParameterB,etc) Values of these parameters are stored as Rows in the database. If there are 10 parameters then one row was inserting into the database with its values.
Approach 2- Now, We are seeing a different approach from one of our team member saying that Keep the one Big table for factories with differentiating column called (Factory Name) and stored the Parameter and its values in one single table as rows.So if there are 10 parameters for one factory then insert the 10 parameters and its values as rows in tables where the table is growing exponentially.
Eg:- Table Columns are Parameter, value, FactoryName,...)
Kindly suggest which approach is better and why.we believed keeping the table smaller and minimal inserts will boast the performance and load on the database server. Also in Approach 2 parameter name gets repeated in every time user updates or saves it which is not good as per normalization. Please help us in deciding the approach.
factories table
---------------
id
name
...
parameters table
----------------
id
name
...
factory_values table
--------------------
factory_id
parameter_id
value

SQL Server : selecting row with column that contains multiple comma delimited values

I have a table (a) that contains imported data, and one of the values in that table needs to be joined to another table (b) based on that value. In table b, sometimes that value is in a comma separated list, and it is stored as a varchar. This is the first time I have dealt with a database column that contains multiple pieces of data. I didn't design it, and I don't believe it can be changed, although, I believe it should be changed.
For example:
Table a:
column_1
12345
67890
24680
13579
Table b:
column_1
12345,24680
24680,67890
13579
13579,24680
So I am trying to join these table together, based on this number and 2 others, but when I run my query, I'm only getting the one that contain 13579, and none of the rest.
Any ideas how to accomplish this?
Storing lists as a comma delimited data structure is a sign of bad design, particularly when storing ids, which are presumably an integer in their native format.
Sometimes, this is necessary. Here is a method:
select *
from a join
b
on ','+b.column_1+',' like '%,'+cast(a.column_1 as varchar(255))+',%'
This will not perform particularly well, because the query will not take advantage of any indexes.
The idea is to put the delimiter (,) at the beginning and end of b.column_1. Every value in the column then has a comma before and after. Then, you can search for the match in a.column_1 with commas appended. The commas ensure that 10 does not match 100.
If possible, you should consider an alternative way to represent the data. If you know there are at most two values, you might consider having two columns in a. In general, though, you would have a "join" table, with a separate row for each pair.

SQL query: have results into a table named the results name

I have a very large database I would like to split up into tables. I would like to make it so when I run a distinct, it will make a table for every distinct name. The name of the table will be the data in one of the fields.
EX:
A --------- Data 1
A --------- Data 2
B --------- Data 3
B --------- Data 4
would result in 2 tables, 1 named A and another named B. Then the entire row of data would be copied into that field.
select distinct [name] from [maintable]
-make table for each name
-select [name] from [maintable]
-copy into table name
-drop row from [maintable]
Any help would be great!
I would advise you against this.
One solution is to create indexes, so you can access the data quickly. If you have only a handful of names, though, this might not be particularly effective because the index values would have select almost all records.
Another solution is something called partitioning. The exact mechanism differs from database to database, but the underlying idea is the same. Different portions of the table (as defined by name in your case) would be stored in different places. When a query is looking only for values for a particular name, only that data gets read.
Generally, it is bad design to have multiple tables with exactly the same data columns. Here are some reasons:
Adding a column, changing a type, or adding an index has to be done times instead of one time.
It is very hard to enforce a primary key constraint on a column across the tables -- you lose the primary key.
Queries that touch more than one name become much more complicated.
Insertions and updates are more complex, because you have to first identify the right table. This often results in overuse of dynamic SQL for otherwise basic operations.
Although there may be some simplifications (security comes to mind), most databases have other mechanisms that are superior to splitting the data into separate tables.
what you want is
CREATE TABLE new_table
AS (SELECT .... //the data that you want in this table);

Sorting across a row in Microsoft Access

What I need is to re-arrange the columns in a table by the order specified in a row.
So if I had:
one four two three
1 4 2 3
How could I get:
one two three four
1 2 3 4
I have considered creating a new table and looking at each of the elements and its neighbor individually and copying the lowest element to the new table and repeating throughout the table until all the elements have moved.
Would this method work?
If so is it necessary I do it in VBA (I don't have much experience with this)?
Or is there a method in SQL?
Thanks for any help.
SQL is based on the relational model of data. One of the principles of the relational model is that the order of columns is meaningless.
But if you absolutely have to do this in Access, use a query, a form, or a report. You can put the columns in any order you like in any of these three, and it won't affect the base table at all.
If the order of items is important, they are typically stored in rows, not columns, for example, a table with the following fields : StudentID, ExamID, ExamDate can be sorted by StudentID and ExamDate to give a useful order, regardless of the order of entry. Furthermore, a crosstab query will allow the presentation of data in columns.
If the order of columns has become important, it is nearly always an indication of an error in the table design. You may wish to read Fundamentals of Relational Database Design, Paul Litwin, 2003

Bending the rules of UNIQUE column SQLITE

I am working with an extensive amount of third party data. Each data set has items with unique identifiers. So it is very easy for me to utilise UNIQUE column in SQLITE to enforce some data integrity.
Out of thousands of records I have id from third party source A matching 2 unique ids from third party source B.
Is there a way of bending the rules, and allowing a duplicate entry in a unique column? If not how should I reorganise my data to take care of this single edge case.
UPDATE:
CREATE TABLE "trainer" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"name" TEXT NOT NULL,
"betfair_id" INTEGER NOT NULL UNIQUE,
"racingpost_id" INTEGER NOT NULL UNIQUE
);
Problem data:
Miss Beverley J Thomas http://www.racingpost.com/horses/trainer_home.sd?trainer_id=20514
Miss B J Thomas http://www.racingpost.com/horses/trainer_home.sd?trainer_id=11096
vs. Miss Beverley J. Thomas http://form.horseracing.betfair.com/form/trainer/1/00008861
Both Racingpost entires (my primary data source) match a single Betfair entry. This is the only one (so far) out of thousands of records.
If racingpost should have had only 1 match it is an error condition.
If racingpost is allowed to have 2 matches per id, you must either have two ids, select one, or combine the data.
Since racingpost is your primary source, having 2 ids may make sense. However if you want to improve upon that data set combining that data or selecting the most useful may be more accurate. The real question is how much data overlaps between these two records and when it does can you detect it reliably. If the overlap is small or you have good detection of an overlap condition, then combining makes more sense. If the overlap is large and you cannot detect it reliably, then selecting the most recent updated or having two ids is more useful.