SQL Merge multiple tables with different columns into one - sql

I need to create one master table from 5 tables, the difficulty is that the same column across the tables may have a different name. so for instance
For simplicity I`m just going to give an example for 2 tables
+----+----+
| 1 | 2 |
+----+----+
| PO | P |
| VE | V |
| TE | TE |
| LO | LO |
| IN | |
| D | |
| X | |
| Y | |
| | A |
| | B |
| | C |
+----+----+
so as you can see PO doesn`t have the same column name as the corresponding value in table 2 yet they are the same record. I need to aggregate these 2 tables into one master.
What I did was began with the table that has the most repeated columns and I am trying to merge the other tables into it. When there is a column only found on one table I want the other fields to display null. Also I don't want any duplicates. Hope someone can help me out!
Cheers

yet they are the same record.
No, they are not.
They could, however, represent different views of the same business entities. To "merge" them you must first specify what the JOIN criterion between them shall be.
Given it is
one.PO = two.P.
Then you must write a SQL statement like
SELECT one.PO AS ID,
one.VE,
/*same for TE, LO, IN, D, X, Y, */
two.A,
two.B,
two.C
INTO t_what_the_frak_the_new_table_shall_be_called
FROM t_what_the_frak_table_1_is_called AS one,
JOIN t_what_the_frak_table_2_is_called AS two
ON one.PO = two.P;
GO

Related

SQL - combining two rows of data into one with a common identifier

I am working on a project where I have to solve the following problem.
Goal:
If there are two rows that same the same identifier, but additional data that is different, how can I combine all of that data into one row with individual columns?
Example:
DateBase:
| ID | Rating | Rating Provider|
--------------------------------
| 5055 | A+ | Moodys |
---------------------------------
| 5055 | Bb+ | SNP |
Desired End Result:
| ID | Moodys | SNP |
--------------------
| 5005 | A+ | Bb+ |
I believe you simply need a Pivot -
SELECT *
FROM YOUR_TABLE
PIVOT(MAX(Rating)
FOR Rating_Provider IN (Moodys AS 'Moodys', SNP AS 'SNP'));
Quantnesto, i believe that what you are looking for it's the JOIN function. You have the information in different databases, right?
You SELECT all the fields that you want from the different tables
SELECT a.ID,a.Moodys,B.SNP
FROM DataBase a
JOIN Database b on a.ID = b.ID
And that's it.
There are different kinds of JOIN's, for further information let me know, i can explain each type.

Oracle SQL query comparing multiple rows with same identifier

I'm honestly not sure how to title this - so apologies if it is unclear.
I have two tables I need to compare. One table contains tree names and nodes that belong to that tree. Each Tree_name/Tree_node combo will have its own line. For example:
Table: treenode
| TREE_NAME | TREE_NODE |
|-----------|-----------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 1 | E |
| 2 | A |
| 2 | B |
| 2 | D |
| 3 | C |
| 3 | D |
| 3 | E |
| 3 | F |
I have another table that contains names of queries and what tree_nodes they use. Example:
Table: queryrecord
| QUERY | TREE_NODE |
|---------|-----------|
| Alpha | A |
| Alpha | B |
| Alpha | D |
| BRAVO | A |
| BRAVO | B |
| BRAVO | D |
| CHARLIE | A |
| CHARLIE | B |
| CHARLIE | F |
I need to create an SQL where I input the QUERY name, and it returns any ‘TREE_NAME’ that includes all the nodes associated with the query. So if I input ‘ALPHA’, it would return TREE_NAME 1 & 2. If I ask it for CHARLIE, it would return nothing.
I only have read access, and don’t believe I can create temp tables, so I’m not sure if this is possible. Any advice would be amazing. Thank you!
You can use group by and having as follows:
Select t.tree_name
From tree_node t
join query_record q
on t.tree_node = q.tree_node
WHERE q.query = 'ALPHA'
Group by t.tree_name
Having count(distinct t.tree_node)
= (Select count(distinct q.tree_node) query_record q WHERE q.query = 'ALPHA');
Using an IN condition (a semi-join, which saves time over a join):
with prep (tree_node) as (select tree_node from queryrecord where query = :q)
select tree_name
from treenode
where tree_node in (select tree_node from prep)
group by tree_name
having count(*) = (select count(*) from prep)
;
:q in the prep subquery (in the with clause) is the bind variable to which you will assign the various QUERY values at runtime.
EDIT
I don't generally set up the test case on online engines; but in a comment below this answer, the OP said the query didn't work for him. So, I set up the example on SQLFiddle, here:
http://sqlfiddle.com/#!4/b575e/2
A couple of notes: for some reason, SQLFiddle thinks table names should be at most eight characters, so I had to change the second table name to queryrec (instead of queryrecord). I changed the name in the query, too, of course. And, second, I don't know how I can give bind values on SQLFiddle; I hard-coded the name 'Alpha'. (Note also that in the OP's sample data, this query value is not capitalized, while the other two are; of course, text values in SQL are case sensitive, so one should pay attention when testing.)
You can do this with a join and aggregation. The trick is to count the number of nodes in query_record before joining:
select qr.query, t.tree_name
from (select qr.*,
count(*) over (partition by query) as num_tree_node
from query_record qr
) qr join
tree_node t
on t.tree_node = qr.tree_node
where qr.query = 'ALPHA'
group by qr.query, t.tree_name, qr.num_tree_node
having count(*) = qr.num_tree_node;
Here is a db<>fiddle.

Access text count in query design

I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.

1 to Many Query: Help Filtering Results

Problem: SQL Query that looks at the values in the "Many" relationship, and doesn't return values from the "1" relationship.
Tables Example: (this shows two different tables).
+---------------+----------------------------+-------+
| Unique Number | <-- Table 1 -- Table 2 --> | Roles |
+---------------+----------------------------+-------+
| 1 | | A |
| 2 | | B |
| 3 | | C |
| 4 | | D |
| 5 | | |
| 6 | | |
| 7 | | |
| 8 | | |
| 9 | | |
| 10 | | |
+---------------+----------------------------+-------+
When I run my query, I get multiple, unique numbers that show all of the roles associated to each number like so.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 4 | C |
| 4 | A |
| 5 | B |
| 5 | C |
| 5 | D |
| 6 | D |
| 6 | A |
+---------------+-------+
I would like to be able to run my query and be able to say, "When the role of A is present, don't even show me the unique numbers that have the role of A".
Maybe if SQL could look at the roles and say, WHEN role A comes up, grab unique number and remove it from column 1.
Based on what I would "like" to happen (I put that in quotations as this might not even be possible) the following is what I would expect my query to return.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 5 | B |
| 5 | C |
| 5 | D |
+---------------+-------+
UPDATE:
Query Example: I am querying 8 tables, but I condensed it to 4 for simplicity.
SELECT
c.UniqueNumber,
cp.pType,
p.pRole,
a.aRole
FROM c
JOIN cp ON cp.uniqueVal = c.uniqueVal
JOIN p ON p.uniqueVal = cp.uniqueVal
LEFT OUTER JOIN a.uniqueVal = p.uniqueVal
WHERE
--I do some basic filtering to get to the relevant clients data but nothing more than that.
ORDER BY
c.uniqueNumber
Table sizes: these tables can have anywhere from 50,000 rows to 500,000+
Pretending the table name is t and the column names are alpha and numb:
SELECT t.numb, t.alpha
FROM t
LEFT JOIN t AS s ON t.numb = s.numb
AND s.alpha = 'A'
WHERE s.numb IS NULL;
You can also do a subselect:
SELECT numb, alpha
FROM t
WHERE numb NOT IN (SELECT numb FROM t WHERE alpha = 'A');
Or one of the following if the subselect is materializing more than once (pick the one that is faster, ie, the one with the smaller subtable size):
SELECT t.numb, t.alpha
FROM t
JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') = 0) AS s USING (numb);
SELECT t.numb, t.alpha
FROM t
LEFT JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') > 0) AS s USING (numb)
WHERE s.numb IS NULL;
But the first one is probably faster and better[1]. Any of these methods can be folded into a larger query with multiple additional tables being joined in.
[1] Straight joins tend to be easier to read and faster to execute than queries involving subselects and the common exceptions are exceptionally rare for self-referential joins as they require a large mismatch in the size of the tables. You might hit those exceptions though, if the number of rows that reference the 'A' alpha value is exceptionally small and it is indexed properly.
There are many ways to do it, and the trade-offs depend on factors such as the size of the tables involved and what indexes are available. On general principles, my first instinct is to avoid a correlated subquery such as another, now-deleted answer proposed, but if the relationship table is small then it probably doesn't matter.
This version instead uses an uncorrelated subquery in the where clause, in conjunction with the not in operator:
select num, role
from one_to_many
where num not in (select otm2.num from one_to_many otm2 where otm2.role = 'A')
That form might be particularly effective if there are many rows in one_to_many, but only a small proportion have role A. Of course you can add an order by clause if the order in which result rows are returned is important.
There are also alternatives involving joining inline views or CTEs, and some of those might have advantages under particular circumstances.

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .