Pivot Way or Straight Way in SQL - sql

I have following association in pivot way.
| DOCID | Note1 | Note2 | Note3 |
|-------|-------|-------|-------|
| 1 | N11 | N21 | N31 |
| 2 | N12 | NULL | N32 |
| 3 | N13 | N23 | N33 |
| 4 | N14 | N24 | NULL |
| 5 | NULL | N25 | N35 |
Other way of storing above is as below.
| DOCID | Field | Value |
|-------|---------|-------|
| 1 | Note1 | N11 |
| 1 | Note2 | N21 |
| 1 | Note3 | N31 |
| 2 | Note1 | N12 |
| 2 | Note3 | N32 |
| 3 | Note1 | N13 |
| 3 | Note2 | N23 |
| 3 | Note3 | N33 |
| 4 | Note1 | N14 |
| 4 | Note2 | N24 |
| 5 | Note2 | N25 |
| 5 | Note3 | N35 |
which of the above two option is better.
I might have more null values. in that case 2nd option seems better. as it will have less records.
but when I have 10 million records, it will be multiplied by notes (in our case it will be (30 million - null) records).
So considering performance for fetching associated records. which option is better and why?
I will have more notes associated with DocIDs.

"Better" is often subjective. In this case, though, I think one method is generally better than the other.
The second approach is the better approach -- one row per document/note pair. In general, when you have columns that are only distinguished by a number -- but otherwise contain the same things -- then the data model is suspect. There may be good reasons for representing the data across columns, but the structure should be questioned. If you still need it, then fine.
Consider a simple query such as which ids have a particular note. In the first representation, you need to check all three columns. This makes it hard to use an index. And, it negates the value of columnar storage.
If the business changes and you suddenly want 4 notes per docid -- or want to limit them to 2 -- then the table needs to be restructured. That is an expensive process.
I'm not sure what the notes refer to. But if they represent a foreign key relationship to another table, then the pivoted version needs to maintain multiple foreign key relationships -- for essentially the same purpose.

Related

Select data from multiple existing tables dynamically

I have tables "T1" in the database that are broken down by month of the form (table_082020, table_092020, table_102020). Each contains several million records.
+----+----------+-------+
| id | date | value |
+----+----------+-------+
| 1 | 20200816 | abc |
+----+----------+-------+
| 2 | 20200817 | xyz |
+----+----------+-------+
+----+----------+-------+
| id | date | value |
+----+----------+-------+
| 1 | 20200901 | cba |
+----+----------+-------+
| 2 | 20200901 | zyx |
+----+----------+-------+
There is a second table "T2" that stores a reference to the primary key of the first one and actually to the table itself only without the word "table_".
+------------+--------+--------+--------+--------+
| rec_number | period | field1 | field2 | field3 |
+------------+--------+--------+--------+--------+
| 777 | 092020 | aaa | bbb | ccc |
+------------+--------+--------+--------+--------+
| 987 | 102020 | eee | fff | ggg |
+------------+--------+--------+--------+--------+
| 123456 | 082020 | xxx | yyy | zzz |
+------------+--------+--------+--------+--------+
There is also a third table "T3", which is the ratio of the period and the table name.
+--------+--------------+
| period | table_name |
+--------+--------------+
| 082020 | table_082020 |
+--------+--------------+
| 092020 | table_092020 |
+--------+--------------+
| 102020 | table_102020 |
+--------+--------------+
Tell me how you can combine 3 tables to get dynamic data for several periods. For example: from 15082020 to 04092020, where the data will be located in different tables, respectively
There really is no good reason for storing data in this format. It makes querying a nightmare.
If you cannot change the data format, then add a view each month that combines the data:
create view t as
select '202010' as YYYYMM, t.*
from table_102020
union all
select '202008' as YYYYMM, t.*
from table_092020
union all
. . .;
For a once-a-month effort, you can spend 10 minutes writing the code and do so with a calendar reminder. Or, better yet, set up a job that uses dynamic SQL to generate the code and run this as a job after the underlying tables are using.
What should you be doing? Well, 5 million rows a months isn't actually that much data. But if you are concerned about it, you can use table partitioning to store the data by month. This can be a little tricky; for instance, the primary key needs to include the partitioning key.

How to design tables to allow for multi-field query on one row

I am very new to database design and am using MS Access to try achieve my task. I am trying to create a database design that will allow for the name and description of two items to be queried
on a single row of information. Here is the problem: certain items are converted to other particular items -
any item can have multiple conversions performed on it, and all conversions will have two (many) items involved.
In this sense, we have a many-to-many relationship which necessitates the use of an intermediate table. My
tables must be structured in a way that allows for me to, in one row, query the Item ID's and names
of which items were involved in conversions.
My current table layout is as follows:
Items
+--------+----------+------------------+--+
| ItemID*| ItemName | ItemDescription | |
+--------+----------+------------------+--+
| 1 | DESK | WOOD, 4 LEG | |
| 2 | SHELF | WOOD, SOLID BASE | |
| 3 | TABLE | WOOD, 4 LEG | |
+--------+----------+------------------+--+
ItemConversions
+------------------+--------------+
| ConversionID(CK) | Item1_ID(CK) |
+------------------+--------------+
| 1 | 2 |
| 2 | 2 |
| 3 | 1 |
+------------------+--------------+
Conversions
+---------------+----------+----------+
| ConversionID* | Item1_ID | Item2_ID |
+---------------+----------+----------+
| 1 | 2 | 1 |
| 2 | 2 | 3 |
| 3 | 1 | 3 |
+---------------+----------+----------+
What I want is for it to be possible to achieve the kind of query I described above, though I don't think
my current layout is going to work for this, since the tables are only being joined on Item1_ID. Any advice
would be appreciated, hopefully my tables are not too specific and this is easily understandable.
A sample query output might look like this:
+--------------+----------+----------+----------+----------+
| ConversionID | Item1_ID | ItemName | Item2_ID | ItemName |
+--------------+----------+----------+----------+----------+
| 1 | 2 | SHELF | 1 | DESK |
+--------------+----------+----------+----------+----------+
I got it working how I wanted to with the help of June7's suggestion - I didn't know you could add in tables
multiple times in the query design page (very useful!). As for the tables, I edited the layout so that I have only
Items and Conversions (I deleted ItemConversions). Using the AS sql command I was able to write a query that pulls
the data I want from the tables. The table and query layout can be seen below:
Items
+--------+----------+------------------+--+
| ItemID*| ItemName | ItemDescription | |
+--------+----------+------------------+--+
| 1 | DESK | WOOD, 4 LEG | |
| 2 | SHELF | WOOD, SOLID BASE | |
| 3 | TABLE | WOOD, 4 LEG | |
+--------+----------+------------------+--+
Conversions
+---------------+----------+----------+
| ConversionID* | Item1_ID | Item2_ID |
+---------------+----------+----------+
| 1 | 2 | 1 |
| 2 | 2 | 3 |
| 3 | 3 | 1 |
+---------------+----------+----------+
Query:
SELECT
Conversions.ConversionID,
Conversions.Item1_ID,
Conversions.Item2_ID,
Items.ItemName,
Items_1.ItemName,
FROM
(
Conversions
INNER JOIN
Items
ON Conversions.Item1_ID = Items.ItemID
)
INNER JOIN
Items AS Items_1
ON Conversions.Item2_ID = Items_1.ItemID;

How to add data or change schema to production database

I am new to working with databases and I want to make sure I understand the best way to add or remove data from a database without making a mess of any related data.
Here is a scenario I am working with:
I have a Tags table, with an Identity ID column. The Tags can be selected via the web application to categorize stories that are submitted by a user. When the database was first seeded; like tags were seeded in order together. As you can see all the Campuses (cities) were 1-4, the Colleges (subjects) are 5-7, and Populations are 8-11.
If this database is live in production and the client wants to add a new Campus (City) tag, what is the best way to do this?
All the other city tags are sort of organized at the top, it seems like the only option is to insert any new tags at to bottom of the table, where they will end up taking whatever the next ID available is. I suppose this is fine because the Display category column will allow us to know which categories these new tags actually belong to.
Is this typical? Is there better ways to set up the database or handle this situation such that everything remains more organized?
Thank you
+----+------------------+---------------+-----------------+--------------+--------+----------+
| ID | DisplayName | DisplayDetail | DisplayCategory | DisplayOrder | Active | ParentID |
+----+------------------+---------------+-----------------+--------------+--------+----------+
| 1 | Albany | NULL | 1 | 0 | 1 | NULL |
| 2 | Buffalo | NULL | 1 | 1 | 1 | NULL |
| 3 | New York City | NULL | 1 | 2 | 1 | NULL |
| 4 | Syracuse | NULL | 1 | 3 | 1 | NULL |
| 5 | Business | NULL | 2 | 0 | 1 | NULL |
| 6 | Dentistry | NULL | 2 | 1 | 1 | NULL |
| 7 | Law | NULL | 2 | 2 | 1 | NULL |
| 8 | Student-Athletes | NULL | 3 | 0 | 1 | NULL |
| 9 | Alumni | NULL | 3 | 1 | 1 | NULL |
| 10 | Faculty | NULL | 3 | 2 | 1 | NULL |
| 11 | Staff | NULL | 3 | 3 | 1 | NULL |
+----+------------------+---------------+-----------------+--------------+--------+----------+
The terms "top" and "bottom" which you use aren't really applicable. "Albany" isn't at the "Top" of the table - it's merely at the top of the specific view you see when you query the table without specifying a meaningful sort order. It defaults to a sort order based on the Id or an internal ROWID parameter, which isn't the logical way to show this data.
Data in the table isn't inherently ordered. If you want to view your tags organized by their category, simply order your query by DisplayCategory (and probably by DisplayOrder afterwards), and you'll see your data properly organized. You can even create a persistent View that sorts it that way for your convenience.

SAP Business Objects Cross Table Data Duplication

I'm using Business Objects to construct a simple report on whether a unit is on or off for a given day. When constructing a vertical table, the data is correct and looks like such:
Unit ID | Status | Date
1 | On | 2016-09-10
1 | On | 2016-09-11
1 | Off | 2016-09-12
2 | Off | 2016-09-10
2 | Off | 2016-09-11
2 | On | 2016-09-12
However the cross table I've created, with columns of "date" and rows of "Unit ID" is duplicating Unit ID and having an entire row of 'On' followed by an entire row of 'Off' like:
____| 2016-09-10 | 2016-09-11 | 2016-09-12
1 | On | On | On
1 | Off | Off | Off
2 | On | On | On
2 | Off | Off | Off
instead of what it should be as:
____| 2016-09-10 | 2016-09-11 | 2016-09-12
1 | On | On | Off
2 | Off | Off | On
Any suggestions as to why it's doing this? The table isn't particularly useful if it has these duplicate rows and I can't understand why it's resulting in this odd table.
Turns out what happened is the "Status" field was a dimension type, but the cross table requires the data field to be a measure type. Simply making a new variable that was a measure equal to "Status" solved the issue.

Merge computed data from two tables back into one of them

I have the following situation (as a reduced example). Two tables, Measures1 and Measures2, each of which store an ID, a Weight in grams, and optionally a Volume in fluid onces. (In reality, Measures1 has a good deal of other data that is irrelevant here)
Contents of Measures1:
+----+----------+--------+
| ID | Weight | Volume |
+----+----------+--------+
| 1 | 100.0000 | NULL |
| 2 | 200.0000 | NULL |
| 3 | 150.0000 | NULL |
| 4 | 325.0000 | NULL |
+----+----------+--------+
Contents of Measures2:
+----+----------+----------+
| ID | Weight | Volume |
+----+----------+----------+
| 1 | 75.0000 | 10.0000 |
| 2 | 400.0000 | 64.0000 |
| 3 | 100.0000 | 22.0000 |
| 4 | 500.0000 | 100.0000 |
+----+----------+----------+
These tables describe equivalent weights and volumes of a substance. E.g. 10 fluid ounces of substance 1 weighs 75 grams. The IDs are related: ID 1 in Measures1 is the same substance as ID 1 in Measures2.
What I want to do is fill in the NULL volumes in Measures1 using the information in Measures2, but keeping the weights from Measures1 (then, ultimately, I can drop the Measures2 table, as it will be redundant). For the sake of simplicity, assume that all volumes in Measures1 are NULL and all volumes in Measures2 are not.
I can compute the volumes I want to fill in with the following query:
SELECT Measures1.ID, Measures1.Weight,
(Measures2.Volume * (Measures1.Weight / Measures2.Weight))
AS DesiredVolume
FROM Measures1 JOIN Measures2 ON Measures1.ID = Measures2.ID;
Producing:
+----+----------+-----------------+
| ID | Weight | DesiredVolume |
+----+----------+-----------------+
| 4 | 325.0000 | 65.000000000000 |
| 3 | 150.0000 | 33.000000000000 |
| 2 | 200.0000 | 32.000000000000 |
| 1 | 100.0000 | 13.333333333333 |
+----+----------+-----------------+
But I am at a loss for how to actually insert these computed values into the Measures1 table.
Preferably, I would like to be able to do it with a single query, rather than writing a script or stored procedure that iterates through every ID in Measures1. But even then I am worried that this might not be possible because the MySQL documentation says that you can't use a table in an UPDATE query and a SELECT subquery at the same time, and I think any solution would need to do that.
I know that one workaround might be to create a new table with the results of the above query (also selecting all of the other non-Volume fields in Measures1) and then drop both tables and replace Measures1 with the newly-created table, but I was wondering if there was any better way to do it that I am missing.
UPDATE Measures1
SET Volume = (Measures2.Volume * (Measures1.Weight / Measures2.Weight))
FROM Measures1 JOIN Measures2
ON Measures1.ID = Measures2.ID;