Normalization of multiple similar tables - sql

i'm quite new to all this tech stuff so excuse me for making mistakes - beforehand.
My question is regarding data normalization. I'm using PGadmin4 for this task.
I have multiple tables one for each year containing multiple columns. I wish to normalize these data in order to make further inquiries. The data is in this form:
Table 1
| id | name1 | code1| code2 | year|
| 1 | Peter | 111 | 222 | 2007|
Table 2
| id | name1 | code1| code2 | year|
| 2 | Peter | 111 | 223 | 2008|
So my tables area similar but with some different data each year
I have broken it down so i have multiple tables containing only one column of information:
name1_table
| id | name1 |
And i have done this by every column. Now i need to link it all together - am heading in the right direction or have i gone of in a bad one?
What is the next step and if possible what is the code i need to use.

The easiest way to combine two tables with identical schemas is to create a new third table with the same schema and copy all the records into it.
Something like this:
INSERT INTO Table3 SELECT * FROM Table1;
INSERT INTO Table3 SELECT * FROM Table2;
Or if you simply need a combined query result you can use UNION:
SELECT * FROM Table1
UNION
SELECT * FROM Table2;

You are not headed in the right direction. The best approach is simply to store all the data in one table and to use indexes and/or partitions to access particular rows.
Sometimes this is not possible, notably because the tables have different formats. Possible solutions:
Break the existing tables into similarity sets based on columns, and create one table for each similarity set.
Create a table based on the most recent definition of the table, NULLing out columns that don't exist in historical tables.
Use a facility such as JSON for columns that have changed over time.
Use a facility such as inheritance for columns that have changed over time.

Related

Which query in database is better?

suppose to I have one table and filter by type like wordpress database
// general table
id | title | content | type
---------------------------
1 | hello | some... | post
---------------------------
2 | image | con | image
select * post where type = post
or
// table post
id | title | content
--------------------
1 | hello | some...
select * post
//table image
id | title | content
---------------------
2 | image | con
select * image
so I mean that if I make more table is better or make a single table for my database?
The idea in database design is to have one table per "entity" -- hence the name, "entity-relationship modeling".
It seems reasonable to think that "images" and "posts" are quite different things and should go into their own tables.
That does not mean that "more tables are better". It means that "the appropriate tables are best". In particular, it is generally a bad idea to split an entity across multiple tables.
It is always better to have table per business entity. Business keeps on changing in future, and thus it becomes hard to maintain data in single table. So Posts and Image should be 2 tables. No second thoughts.
the multi table in large project and for small project the general table is better

Update a MS Access field with the column count of another, not JOINed table

What I'm trying to do is create an update query in MS Access 2013 for a table separate from the actual data tables (meaning that there is no connection between the data table and the statistics table) to store some statistics (e.g. Count of records) that need to be stored for further calculations and later use.
I've looked up a bunch of tutorials in the past few days on this, with no luck of finding a solution to my problem, as all solutions included joining the tables, which - in my case - is irrelevant, as the table to-be-calculated-on is temporary with constantly changing data, thus I always want to count every record, find the max in the whole temp table, etc. on a given date (like logging).
The structure of statisticsTable:
| statDate (Date/time) | itemCount (integer) | ... |
----------------------------------------------------
| 01/01/2017 | 50 | ... |
| 02/01/2017 | 47 | ... |
| 03/01/2017 | 43 | ... |
| ... | ... | ... |
What I want to do, in semi-gibberish code:
UPDATE statisticsTable
SET itemCount = (SELECT Count(*) FROM tempTable)
WHERE statDate = 01/01/2017;
This should update the itemCount field of 01/01/2017 in the statisticsTable with the current row count of the temp table.
I know that this might not be the standard OR the correct use of MS Access or any DBMS in general, however, my assignment is rather limited, meaning I can't (shouldn't) modify any table structures, connections or the database structure in general, only create the update query that works as described above.
Is it possible to update a table's field value with the output of a query calculating on another table, WITHOUT joining the two tables in MS Access?
EDIT 1:
After further research, the function DCount() might be able to give the results I'm looking for, I will test it.
EDIT: I wrote a way more complicated answer that might not have even worked in Access (it would work in MS SQL-Server). Anyway.
What you need is a join criteria that is always true on which to base your update. You can just use is not null:
SELECT s.*, a.itemCount
FROM statisticsTable as s
INNER JOIN
(
SELECT count(*) as itemCount
from tempTable
) as a
on s.[some field that is always populated] is not null
and a.itemCount is not null

SQL Server 2012 Query to extract subsets of data

I'm trying to 2nf some data:
Refid | Reason
------|---------
1 | Admission
1 | Advice and Support
1 | Behaviour
As you can see one person might have multiple reasons so i need another table to have the following format:
Refid | Reason1 | Reason2 | Reason3 | ETC...
------|-----------|--------------------|-----------
1 | Admission | Advice and Support | Behaviour
But I don't know how to write a query to extract the data and write it in a new table like this. The reasons don't have dates of other criteria that would make any reason to be in any special order. All reasons are assigned at the time of referral.
Thanks For yor Help.. SQL Server 2012
You are modelling a many to many relationship
You need 3 tables
- One for Reasons (say ReasonID and Reason)
- One for each entity identified by RefID (say RefID and ReferenceOtherData)
- An junction (or intersection) table with the keys (RefID, ReasonID)
This way,
Multiple reasons can apply to one Ref entity
Multiple Refs can have the same reason
You turn repeated columns into rows.

How to bond N database table with one master-table?

Lets assume that I have N tables for N Bookstores. I have to keep data about books in separate tables for each bookstore, because each table has different scheme (number and types of columns is different), however there are same set of columns which is common for all Bookstores table;
Now I want to create one "MasterTable" with only few columns.
| MasterTable |
|id. | title| isbn|
| 1 | abc | 123 |
| MasterToBookstores |
|m_id | tb_id | p_id |
| 1 | 1 | 2 |
| 1 | 2 | 1 |
| BookStore_Foo |
|p_id| title| isbn| date | size|
| 1 | xyz | 456 | 1998 | 3KB |
| 2 | abc | 123 | 2003 | 4KB |
| BookStore_Bar |
|p_id| title| isbn| publisher | Format |
| 1 | abc | 123 | H&K | PDF |
| 2 | mnh | 986 | Amazon | MOBI |
My question, is it right to keep data in such way? What are best-practise about this and similar cases? Can I give particular Bookstore table an aliase with number, which will help me manage whole set of tables?
Is there a better way of doing such thing?
I think you are confusing the concepts of "store" and "book".
From you comments and the example data, it appears the problem is in having different sets of attributes for books, not stores. If so, you'll need a structure similar to this:
The symbol: denotes inheritance1. The BOOK is the "base class" and BOOK1/BOOK2/BOOK3 are various "subclasses"2. This is a common strategy when entities share a set of attributes or relationships3. For the fuller explanation of this concept, please search for "Subtype Relationships" in the ERwin Methods Guide.
Unfortunately, inheritance is not directly supported by current relational databases, so you'll need to transform this hierarchy into plain tables. There are generally 3 strategies for doing so, as described in these posts:
Interpreting ER diagram
Parent and Child tables - ensuring children are complete
Supertype-subtype database design
NOTE: The structure above allows various book types to be mixed inside the same bookstore. Let me know if that's not desirable (i.e. you need exactly one type of books in any given bookstore)...
1 Aka. category, subclassing, subtyping, generalization hierarchy etc.
2 I.e. types of books, depending on which attributes they require.
3 In this case, books of all types are in the many-to-many relationship with stores.
If you had at least two columns which all other tables use it then you could have base table for all books and add more tables for the rest of the data using the id from Base table.
UPDATE:
If you use entity framework to connect to your DB I suggest you to try this:
Create your entities model something like this:
then let entity framework generate the database(Update database from Model) for you. Note this uses inheritance(not in database).
Let me know if you have questions.
Suggest data model:
1. Have a master database, which saves master data
2. The dimension tables in master database, transtional replicated to your distributed bookstore database
3. You can choose to use updatable scriscriber or merge replication is also a good choice
4. Each distributed bookstore database still work independently, however master data either merge back by merge replication or updatable subscriber.
5. If you want to make sure master data integrity, you can only read-only subscriber, and use transational replication to distribute master data into distributed database, but in this design, you need to have store proceduces in master database to register your dimension data. Make sure there is no double-hop issue.
I would suggest you to have two tables:
bookStores:
id name someMoreColumns
books:
id bookStore_id title isbn date publisher format size someMoreColumns
It's easy to see the relationship here: a bookStore has many books.
Pay attention that I'm putting all the columns you have in all of your BookStore tables in just one table, even if some row from some table does not have a value to some column.
Why I prefer this way:
1) To all the data from BookStore tables, just few columns will never have a value on table books (as example, size and format if you don't have an e-book version). The other columns can be filled someday (you can set a date to your e-books, but you don't have this column on your table BookStore_Bar, which seems to refer to the e-books). This way you can have much more detailed infos from all your books if someday you want to update it.
2) If you have a bunch of tables BookStore, lets say 12, you will not be able to handle your data easily. What I say is, if you want to run some query to all your books (which means to all your tables), you will have at least three ways:
First: run manually the query to each of the 12 tables and so merge the data;
Second: write a query with 12 joins or set 12 tables on your FROM clause to query all your data;
Third: be dependent of some script, stored procedure or software to do for you the first or the second way I just said;
I like to be able to work with my data as easy as possible and with no dependence of some other script or software, unless I really need it.
3) As of MySQL (because I know much more of MySQL) you can use partitions on your table books. It is a high level of data management in which you can distribute the data from your table to several files on your disk instead of just one, as generally a table is allocated. It is very useful when handling a large ammount of data in a same table and it speeds up queries based on your data distribution plan. Lets see an example:
Lets say you already have 12 distinct bookStores, but under my database model. For each row in your table books you'll have an association to one of the 12 bookStore. If you partition your data over the bookStore_id it will be almost the same as you had 12 tables, because you can create a partition for each bookStore_id and so each partition will handle only the related data (the data that match the bookStore_id).
Lets say you want to query the table books to the bookStore_id in (1, 4, 9). If your query really just need of these three partitions to give you the desired output, then the others will not be queried and it will be as fast as you were querying each separated table.
You can drop a partition and the other will not be affected. You can add new partitions to handle new bookStores. You can subpartition a partition. You can merge two partitions. In a nutshell, you can turn your single table books in an easy-to-handle, multi-storage table.
Side Effects:
1) I don't know all of table partitioning, so it's good to refer to the documentation to learn all important points to create and manage it.
2) Take care of data with regular backups (dumps) as you probably may have a very populated table books.
I hope it helps you!

How do I relate multiple rows in two tables?

I have two tables:
table1:
id(int) | stuff(text)
-------------------------
1 | foobarfoobarfoo
2 | blahfooblah
3 | foo
table2:
id(int) | otherstuff(text)
--------------------------
1 | foo
2 | bar
3 | blah
A row in table1 can have more than one of foo, bar etc. And, each row in table2 can appear in more than one row of table1.
Which is a better way of keeping this straight. Should I create a third table like this:
table3:
id_from2(int) | id_from1(int)
-----------------------------
1 | 1
1 | 2
1 | 3
2 | 1
3 | 2
Or, should I have an column of type array added to table1 and table2 to keep track of the same information?
Yes, using junction tables is the correct way of implementing many-to-many relations in RDBMS.
You can add more attributes to your junction table (i.e. table3) if necessary. For example, if the relations are ordered, you can add a third field that specifies an ordering of the (table1, table2) combinations. Here is a link to an answer on Stack Overflow that gives a nice detailed example of a many-to-many table.
This is a standard Many-To-Many design, most flexible solution would be a third table with id associations as you shown.
Can't agree more. Your design of adding a third table is correct.
Relation table is the best way to relate many-to-many relationships. You did it well.
So you want a many to many relationship? One from table 1 can have relation to more objects in 2, and the other way around? Yes, use a third table like you said, it's a best practice. Also attach a primary key autoincrement column, just to be safe