Is it possible to convert table with many columns to many tables of two columns without losing data?
I will show what I mean:
Let say I have a table
+------------+----------+-------------+
|country code| site | advertiser |
+------------+----------+-------------|
| US | facebook | Cola |
| US | yahoo | Pepsi |
| FR | facebook | BMW |
| FR | yahoo | BMW |
+------------+----------+-------------+
The number of rows = [(number of countries) X (number of sites)] and the advertiser column is a variable that gets a value from a list with a limited number of advertisers
Is it possible to transform the 3 columns table to several tables with 2 columns without losing data?
If create two tables likes this I will surly lose data:
+------------+------------+
|country code| advertiser |
+------------+------------+
| US | Cola,Pepsi |
|-------------------------|
| FR | BMW |
+-------------------------+
+------------+------------+
| site | advertiser |
+------------+------------+
| facebook | Cola,BMW |
|-------------------------|
| yahoo | Pepsi,BMW |
+-------------------------+
But is I add a third "connection" table this will it help keep all the data and have the ability to recreate the original table?
+--------------+--------------------+
| country code | site |
+--------------+--------------------+
| US | facebook,yahoo |
|-----------------------------------|
| FR | facebook,yahoo |
+-----------------------------------+
Whether the table you specify can be 'converted' into into multiple tables is determined by whether the table is in fifth normal form i.e. if and only if every non-trivial join dependency in it is implied by the candidate keys.
If the table is in fifth normal form then it cannot be converted into multiple tables. If the table is not in fifth normal form then it is in one of the four lower normal forms and can be further normalized into fifth normal form by 'converting' it into multiple tables.
A table's normal form is determined by the column dependencies. These are determined by the meaning of the table i.e. what this table represents in the real world. You have not stated what the meaning of this table is and so whether this particular table can be converted into multiple tables is unknown.
You need to understand the process of normalization and using this you should be able to determine if it is possible to convert table with many columns to many tables of two columns without losing data? based on the column dependencies in the table.
You may be looking for Entity-Attribute-Value. Certainly it is much better than your proposal for keeping field values organized and not requiring a search of the field to determine if a value is present.
Related
I am doing an integration with a customer's ERP. The database tables have a normalization so that the columns that have the same name throughout any table, must have the same data type.
With this premise, I would like to generate a SQL, or a stored procedure that drags data from several source tables in a given order always matching the column names, to 2 target tables. As it is highly probable that the ERP vendor will add new columns without notifying my department, I need the columns to be obtained dynamically.
All this is to generate a single record in a table (in this case, the head data of a purchase to a supplier), and several rows in another table (the items of the purchase).
My idea is to have an auxiliary table where I put the information coming from my system, and then, execute that SQL/procedure to consolidate the information into the ERP purchase tables.
Let's take an example.
My tables would have information similar to this
(Purchase header)
ExternalOrderId | SupplierCode | PurchaseDate | PurchaseStatus | FiscalYear | Series
--------------------------------------------------------------------------------------------
ABCD | 00001 | 2021-12-11 12:00:00 | DRAFT | 2021 | S
(Purchase items)
ExternalOrderId | ArticleCode | ItemOrder | Units
--------------------------------------------------
ABCD | 1234 | 1 | 2
ABCD | 2345 | 5 | 4
ABCD | 3456 | 10 | 10
ABCD | 1234 | 15 | 3 (very important, same article can be repeated multiple times in one purchase)
.....
ABCD | 9999 | 100 | 10
Very important step is to take fiscal year, series and number from a table of counters. The counter should be incremented after the process.
Example of table "Counters" (note that there may be several numbers for one type depending on the series and the exercise):
Type | FiscalYear | Series | LastNumber
----------------------------------------------------
SupplierPurchase | 2021 | S | 26
SupplierPurchase | 2021 | A | 60
SupplierPurchase | 2021 | B | 15
SaleOrder | 2021 | S | 19
SaleOrder | 2021 | X | 200
Table "Accounting data".
SupplierCode | AdditionalColumn1 | AdditionalColumn2 | AdditionalColumn3
-------------------------------------------------------------------------
00001 | AC1A | AC2A | AC3A
Table "Company data".
SupplierCode | AdditionalColumn2 | AdditionalColumn3 | AdditionalColumn4
-------------------------------------------------------------------------
00001 | AC2B | AC3B | AC4B
Table "Supplier data".
SupplierCode | AdditionalColumn3 | AdditionalColumn5
-----------------------------------------------------
00001 | AC3C | AC5C
In this case the result should be something like this: for the columns with the same name, the data coming from the last table read should be kept. For example, AdditionalColumn1, will have the value of the first table (AC1A) because is the only table with that column name, and in the case of AdditionalColumn3, the data from the last one (AC3C).
The final result should look something like this:
Purchase Header
FiscalYear | Series | Number | SupplierCode | AdditionalColumn1 | AdditionalColumn2 | AdditionalColumn3 | AdditionalColumn4 | AdditionalColumn5 | PurchaseStatus | PurchaseDate | ExternalPurchaseID
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2021 | S | 27 | 00001 | AC1A | AC2B | AC3C | AC4B | AC5C | DRAFT | 2021-10-11 12:00:00:00 | ABCD
Note that the purchase number is 27, because in the counters table the last number used for the series "S" was 26. After creating this row, the counter must be set to 27.
In the case of the purchase items, it would be the same, obtaining the data from:
The purchase header created in the previous step.
Data from the Articles table
Data from another table with additional information about the articles.
The data from the purchase items table that I generated earlier.
But in this case, instead of being a single record, it will be a record for each item that I reflect in my auxiliary table, matching the info by the item's "ArticleCode".
I could do all this through programmed code, but I would like to abstract from the programming language and include all this in the database logic, to make a very fast, transactional process that can be retried in case of failure. Besides, as I said, they will be dynamic columns, since the ERP provider will be able to create new columns. In this way, I will not have to worry about having to escape the information of possible unicode characters and I will be sure that the data types are respected at all times.
It would be nice if i can get a boolean flag marked on my auxiliary table to indicate that the purchase has been consolidated correctly.
Thanks in advance
EDIT
As #JeroenMostert said in one response this question is too vague. The purpose of my question is to know how to use the column names obtained, for example from INFORMATION_SCHEMA.COLUMNS, from a table A and use them in a query, but only the ones that intersect with the columns of a table B, and do it several times with several tables so that I can generate the header of the purchase. And then use the same process (and the resulting data) to generate the purchase rows.
My table currently has a number of similar numerical columns I'd like to nest under a common label.
My current table is something like:
| Week | Seller count, total | Seller count, churned | Seller count, resurrected |
| ---- | ------------------- | --------------------- | ------------------------- |
| 1 | 100 | 10 | 4 |
| 2 | 105 | 12 | 5 |
And I'd like it to be:
| | Seller count |
| Week | Total | Churned | Resurrected |
| ---- | ----- | ------- | ----------- |
| 1 | 100 | 10 | 4 |
| 2 | 105 | 12 | 5 |
I've seen examples of this, including a related instructional video, but this video hides the actual creation of the nested object (called "Segment").
I also tried creating a hierarchy by dragging items in the "Data" tab on top of one another. This function appears to only be possible for dimensions (categorical data), not measures (numerical data) like mine.
Even so, I can drag my column names from the measures side onto the dimensions side to get them to be considered dimensions. Then I can drag to nest and create the hierarchy. But then when I drag the top item of the hierarchy ("Seller count" in the example below) into the "Columns" field, I get the warning "the field being added contains 92,000 members, and maximum recommended is 1,000". It thinks this is categorical data, and is maybe planning to create a subheading for each value (100, 105, etc.), instead of the desired hierarchy sub-items as subheadings.
Any idea how to accomplish this simple hierarchical restructuring of my column labels?
Actually, this is some data restructuring and Tableau isn't best suited for it. Still, it is simple one and you can do it like this-
I recreated one table like yours in excel, and imported it in Tableau
Rename the three cols, (removed seller count from their names)
selected these three columns at once, and select pivot to transform these like
Rename these columns again
create a text table in tableau, as you have shown in question
I have a table that has user a user_id and a new record for each return reason for that user. As show here:
| user_id | return_reason |
|--------- |-------------- |
| 1 | broken |
| 2 | changed mind |
| 2 | overpriced |
| 3 | changed mind |
| 4 | changed mind |
What I would like to do is generate a foreign key for each combination of values that are applicable in a new table and apply that key to the user_id in a new table. Effectively creating a many to many relationship. The result would look like so:
Dimension Table ->
| reason_id | return_reason |
|----------- |--------------- |
| 1 | broken |
| 2 | changed mind |
| 2 | overpriced |
| 3 | changed mind |
Fact Table ->
| user_id | reason_id |
|--------- |----------- |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
My thought process is to iterate through the table with a cursor, but this seems like a standard problem and therefore has a more efficient way of doing this. Is there a specific name for this type of problem? I also thought about pivoting and unpivoting. But that didn't seem too clean either. Any help or reference to articles in how to process this is appreciated.
The problem concerns data normalization and relational integrity. Your concept doesn't really make sense - Dimension table shows two different reasons with same ID and Fact table loses a record. Conventional schema for this many-to-many relationship would be three tables like:
Users table (info about users and UserID is unique)
Reasons table (info about reasons and ReasonID is unique)
UserReasons junction table (associates users with reasons - your
existing table). Assuming user could associate with same reason
multiple times, probably also need ReturnDate and OrderID_FK fields
in UserReasons.
So, need to replace reason description in first table (UserReasons) with a ReasonID. Add a number long integer field ReasonID_FK in that table to hold ReasonID key.
To build Reasons table based on current data, use DISTINCT:
SELECT DISTINCT return_reason INTO Reasons FROM UserReasons
In new table, rename return_reason field to ReasonDescription and add an autonumber field ReasonID.
Now run UPDATE action to populate ReasonID_FK field in UserReasons.
UPDATE UserReasons INNER JOIN UserReasons.return_reason ON Reasons.ReasonDescription SET UserReasons.ReasonID_FK = Reasons.ReasonID
When all looks good, delete return_reason field.
I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.
I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .