How do I structure my SQL query to prevent the return of duplicate rows with related data?

How do I structure my SQL query to prevent the return of duplicate rows with related data? - sql

I need some help with an SQL Query. I have a database table that has related data with other tables. When I query the table it returns the duplicate rows for every row of related data i.e.
|-------------| |-------------| |-------------|
| Cars | | Options | | Value |
|-------------| ------> |-------------| ------> |-------------|
| CarId | | OptionsId | | ValueId |
| CarMake | | OptionName | | CostValue |
| CarModel | | Confirmed | | CarId |
|-------------| | CarId | | OptionsId |
|-------------| |-------------|
|
|
---------------> |-------------|
| Warranty |
|-------------|
| WarrantyId |
| WarrantyType|
| CarId |
|-------------|
The query that I have made, which was designed in the query builder of SSMS (because of this it is not using aliases and has the 3 stage naming convention, this will be changed) is as follows:
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
dbo.Warranty.WarrantyType,
dbo.Value.CostValue
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
LEFT JOIN Value ON Options.OptionsId = Value.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
Executing this query as it stands returns my data, however, for cars with multiple options I receive duplicate rows i.e.
Id | Make | Model | Option Name | Warranty Type | Value
27 | Ford | Fiesta | Heated Seats | Static | 500
27 | Ford | Fiesta | Front Fog Lights | Static | 400
I've been looking around for possible answers to this question and found that the proposed solution is to use the keyword DISTINCT or to create a subquery. I added DISTINCT to my query but the same data was returned, probably because the options are both distinct in their own right, I don't know I'm guessing.
I'm happy to use a subquery but not sure how to apply that to my above query code. All I want to do here is return one single row for each car with the highest option value i.e.
27 | Ford | Fiesta | Heated Seats | Static | 500
Can anyone help me write this query? I think I've included everything in this question but if I can offer more, please let me know.

Instead of joining the table Value which gives you multiple rows,
you must join this query:
SELECT
dbo.Value.CarId,
dbo.Value.OptionsId,
MAX(dbo.Value.CostValue) AS CostValue
FROM dbo.Value
GROUP BY dbo.Value.CarId, dbo.Value.OptionsId
which you will give you from the table Value for each car the option with the max value.
So try this:
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
v.CostValue,
dbo.Warranty.WarrantyType
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
INNER JOIN (
SELECT
dbo.Value.CarId,
dbo.Value.OptionsId,
MAX(dbo.Value.CostValue) AS CostValue
FROM dbo.Value
GROUP BY dbo.Value.CarId, dbo.Value.OptionsId
) AS v ON Options.OptionsId = v.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId

you can try like below by using window function
with cte as(
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
Value.CostValue,
row_number() over(partition by dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model order by Value.CostValue desc) rn
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
LEFT JOIN Value ON Options.OptionsId = Value.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
) select * from cte where rn=1

Related

SQL Join to the latest record in MS ACCESS

I want to join tables in MS Access in such a way that it fetches only the latest record from one of the tables. I've looked at the other solutions available on the site, but discovered that they only work for other versions of SQL. Here is a simplified version of my data:
PatientInfo Table:
+-----+------+
| ID | Name |
+-----+------+
| 1 | John |
| 2 | Tom |
| 3 | Anna |
+-----+------+
Appointments Table
+----+-----------+
| ID | Date |
+----+-----------+
| 1 | 5/5/2001 |
| 1 | 10/5/2012 |
| 1 | 4/20/2018 |
| 2 | 4/5/1999 |
| 2 | 8/8/2010 |
| 2 | 4/9/1982 |
| 3 | 7/3/1997 |
| 3 | 6/4/2015 |
| 3 | 3/4/2017 |
+----+-----------+
And here is a simplified version of the results that I need after the join:
+----+------+------------+
| ID | Name | Date |
+----+------+------------+
| 1 | John | 4/20/2018 |
| 2 | Tom | 8/8/2010 |
| 3 | Anna | 3/4/2017 |
+----+------+------------+
Thanks in advance for reading and for your help.

You can use aggregation and JOIN:
select pi.id, pi.name, max(a.date)
from appointments as a inner join
patientinfo as pi
on a.id = pi.id
group by pi.id, pi.name;

something like this:
select P.ID, P.name, max(A.Date) as Dt
from PatientInfo P inner join Appointments A
on P.ID=A.ID
group by P.ID, P.name

Both Bing and Gordon's answers work if your summary table only needs one field (the Max(Date)) but gets more tricky if you also want to report other fields from the joined table, since you would need to include them either as an aggregated field or group by them as well.
Eg if you want your summary to also include the assessment they were given at their last appointment, GROUP BY is not the way to go.
A more versatile structure may be something like
SELECT Patient.ID, Patient.Name, Appointment.Date, Appointment.Assessment
FROM Patient INNER JOIN Appointment ON Patient.ID=Appointment.ID
WHERE Appointment.Date = (SELECT Max(Appointment.Date) FROM Appointment WHERE Appointment.ID = Patient.ID)
;
As an aside, you may want to think whether you should use a field named 'ID' to refer to the ID of another table (in this case, the Apppintment.ID field refers to the Patient.ID). You may make your db more readable if you leave the 'ID' field as an identifier specific to that table and refer to that field in other tables as OtherTableID or similar, ie PatientID in this case. Or go all the way and include the name of the actual table in its own ID field.
Edited after comment:
Not quite sure why it would crash. I just ran an equivalent query on 2 tables I have which are about 10,000 records each and it was pretty instanteneous. Are your ID fields (i) unique numbers and (ii) indexed?
Another structure which should do the same thing (adapted for your field names and assuming that there is an ID field in Appointments which is unique) would be something like:
SELECT PatientInfo.UID, PatientInfo.Name, Appointments.StartDateTime, Appointments.Assessment
FROM PatientInfo INNER JOIN Appointments ON PatientInfo_UID = Appointments.PatientFID
WHERE Appointments.ID = (SELECT TOP 1 ID FROM Appointments WHERE Appointments.PatientFID = PatientInfo_UID ORDER BY StartDateTime DESC)
;
But that is starting to look a bit contrived. On my data they both produce the same result (as they should!) and are both almost instantaneous.
Always difficult to troubleshoot Access when it crashes - I guess you see no error codes or similar? Is this against a native .accdb database or another server?

Oracle SQL query comparing multiple rows with same identifier

I'm honestly not sure how to title this - so apologies if it is unclear.
I have two tables I need to compare. One table contains tree names and nodes that belong to that tree. Each Tree_name/Tree_node combo will have its own line. For example:
Table: treenode
| TREE_NAME | TREE_NODE |
|-----------|-----------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 1 | E |
| 2 | A |
| 2 | B |
| 2 | D |
| 3 | C |
| 3 | D |
| 3 | E |
| 3 | F |
I have another table that contains names of queries and what tree_nodes they use. Example:
Table: queryrecord
| QUERY | TREE_NODE |
|---------|-----------|
| Alpha | A |
| Alpha | B |
| Alpha | D |
| BRAVO | A |
| BRAVO | B |
| BRAVO | D |
| CHARLIE | A |
| CHARLIE | B |
| CHARLIE | F |
I need to create an SQL where I input the QUERY name, and it returns any ‘TREE_NAME’ that includes all the nodes associated with the query. So if I input ‘ALPHA’, it would return TREE_NAME 1 & 2. If I ask it for CHARLIE, it would return nothing.
I only have read access, and don’t believe I can create temp tables, so I’m not sure if this is possible. Any advice would be amazing. Thank you!

You can use group by and having as follows:
Select t.tree_name
From tree_node t
join query_record q
on t.tree_node = q.tree_node
WHERE q.query = 'ALPHA'
Group by t.tree_name
Having count(distinct t.tree_node)
= (Select count(distinct q.tree_node) query_record q WHERE q.query = 'ALPHA');

Using an IN condition (a semi-join, which saves time over a join):
with prep (tree_node) as (select tree_node from queryrecord where query = :q)
select tree_name
from treenode
where tree_node in (select tree_node from prep)
group by tree_name
having count(*) = (select count(*) from prep)
;
:q in the prep subquery (in the with clause) is the bind variable to which you will assign the various QUERY values at runtime.
EDIT
I don't generally set up the test case on online engines; but in a comment below this answer, the OP said the query didn't work for him. So, I set up the example on SQLFiddle, here:
http://sqlfiddle.com/#!4/b575e/2
A couple of notes: for some reason, SQLFiddle thinks table names should be at most eight characters, so I had to change the second table name to queryrec (instead of queryrecord). I changed the name in the query, too, of course. And, second, I don't know how I can give bind values on SQLFiddle; I hard-coded the name 'Alpha'. (Note also that in the OP's sample data, this query value is not capitalized, while the other two are; of course, text values in SQL are case sensitive, so one should pay attention when testing.)

You can do this with a join and aggregation. The trick is to count the number of nodes in query_record before joining:
select qr.query, t.tree_name
from (select qr.*,
count(*) over (partition by query) as num_tree_node
from query_record qr
) qr join
tree_node t
on t.tree_node = qr.tree_node
where qr.query = 'ALPHA'
group by qr.query, t.tree_name, qr.num_tree_node
having count(*) = qr.num_tree_node;
Here is a db<>fiddle.

Make a 1 to 1 multi-field SQL join where only some of the values match

I am trying to build a table that will be used as a conversion chart. I aim to make a simple join with this conversion table on multiple fields (8 in my case), and get a result. I will try to simplify the examples as much as I can because the original chart is a 40x10 matrix.
Let's say that I have these two (I know they don't make much sense and have bad design but they are just examples):
supply_conversion_chart
---
supply (integer)
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
purchases
---
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
and conversion chart would look something like this:
| supply | customer_id | product_id | size | purchase_type |
|--------|--------------|------------|----------|---------------|
| 100 | 1 | anything | anything | online |
| 101 | 1 | anything | anything | offline |
| 102 | other than 1 | anything | anything | online |
| 103 | 1 | 5 | XXL | online |
The main goal was to get an exact supply value by simply doing a join by doing something like:
SELECT supply
FROM purchases p
JOIN supply_conversion_chart scc ON
p.customer_id = scc.customer_id AND
p.product_id = scc.product_id AND
p.size = scc.size AND
p.purchase_type = scc.purchase_type;
Let's say that these are the records on purchases table:
| customer_id | product_id | size | purchase_type |
|-------------|------------|------|---------------|
| 1 | 3 | M | online |
| 1 | 5 | S | offline |
| 12345 | 4 | XL | online |
| 1 | 5 | XXL | online |
| 4353 | null | M | online |
I would expect first record's supply value to be 101, second record's to be 102, third 102, fourth 103, and fifth to be 102. However, as far as I know, SQL won't be able to do a proper join on all of these records except the fourth one, which is fully matching with supply 103 on supply_conversion_chart table. I don't know if it is possible in the first place to do a join using multiple fields when some of those fields are not fully matching.
My approach is probably faulty and there are better ways to get the results I am trying to achieve but I don't even know where to start. What should I do?
The original chart is much bigger that the provided example, and that I will be doing a join on 8 different fields.

You approach is a lateral join:
select p.*, scc.*
from purchases p left join lateral
(select scc.*
from supply_conversion_chart scc
where (scc.customer_id = p.customer_id or scc.customer_id is null) and
(scc.product_id = p.product_id or scc. product_id is null) and
(scc.size = p.size or scc.size is null) and
(scc.purchase_type = p.purchase_type or scc.purchase_type is null)
order by ( (scc.customer_id = p.customer_id)::int +
(scc.product_id = p.product_id)::int
(scc.size = p.size)::int
(scc.purchase_type = p.purchase_type)::int
) desc
limit 1
) scc;
Note: This represents "everything" as NULL. It doesn't have special logic for "customer other than 1". However, it does show you how to implement basically what you are trying to do.

MS Access SQL join only display extra records

I'm trying to display left over records after matching one-to-one rows. How do I display extra/left over records after joining two tables?
Suppose I have two tables, A and B. They both display the the same transactions at the end of the day. However, Table A has more detail about the records but is late in getting updated. Table B, on the other hand, has limited information about transactions but is updated several hours before Table A.
I need a query that can return which records have yet to appear in Table A from Table B.
TABLE A
+-------+-----+---------+----------+---------------------------+
| NAME | ID | AMOUNT | TYPE | PROCESSED TIMESTAMP |
+-------+-----+---------+----------+---------------------------+
| ABC | 123 | -420.07 | PURCHASE | 2018-09-06-08.26.32.000000|
| ABC | 123 | 420.07 | REFUND | 2018-09-06-07.12.18.000000|
| BBC | 456 | -5.00 | PURCHASE | 2018-09-06-10.25.13.000000|
+-------+-----+---------+----------+---------------------------+
TABLE B
+----+----------+---------------------------+
| ID | AMOUNT | RECEIVED TIMESTAMP |
+----+----------+---------------------------+
|123 | -420.07 | 2018-09-05-09.26.15.000000|
|123 | 420.07 | 2018-09-05-08.12.03.000000|
|123 | -420.07 | 2018-09-05-08.40.00.000000|
|456 | -5.00 | 2018-09-05-08.45.00.000000|
+----+----------+---------------------------+
QUERY RESULTS
+----+----------+
| ID | AMOUNT |
+----+----------+
|123 | -420.07 |
+----+----------+
I can manage to find all the records related to the ID that is "off balance" but I need only the specific records that are extra:
SELECT * FROM b
WHERE id
IN
(SELECT d.id AS id
FROM
(SELECT * FROM
(SELECT id, ROUND(SUM(amount),2) AS balance FROM a GROUP BY id) c
RIGHT JOIN
(SELECT id, ROUND(SUM(amount),2) AS balance FROM b GROUP BY id) d
ON c.id = d.id
WHERE c.balance <> d.balance))
Yields...
+----+----------+
| ID | AMOUNT |
+----+----------+
|123 | -420.07 |
|123 | 420.07 |
|123 | -420.07 |
+----+----------+

You need to read up more on joins . There are 3 basic joins which can make life much simpler.
INNER JOIN: First, this is not asked, but the query you have provided for finding off balance items is too complicated. It can be simplified by an inner join.
Inner join is a set operation which will basically get data from both tables (set) which match the condition.
select * from
(
(select id, sum(amount) from a group by id) group_A
INNER JOIN (select id, sum(amount) from b group by id) group_B
ON group_A.id = group_B.id
WHERE group_A.balance != group_B.balance
)
LEFT/RIGHT OUTER JOIN: Left outer join is an operation which will return all the data that is present in both sets and also the data that is in left set but not the right set. Right join is essential same operation on the right set. Important to notice that the extra records pulled here would be null for the side where they do not exist.
Since you want records which are present in table B but not in A, there are multiple ways of achieving this, one would be to get records present in both tables (inner join) and then get all the records in table B but not in the inner join done earlier. Using definition of group_A/group_B from the inner join example.
select id from b where id not in (
select id from group_A INNER JOIN group_B on group_A.id = group_B.id)
Or you can do a right outer join and then using the property of that fields fetched from table A would be coming as null, can filter out the required ID.
select group_B.id from group_A RIGHT OUTER JOIN group_B ON group_A.id = group_B.id
where group_A.id is null
Please use the primary keys on the joins to get the correct results as mentioned by user #ComputerVersteher

I think, you should add PK col.
I can't match data with table A and B, and can't seperate 2 rows at table B.
+----+----------+---------------------------+
| ID | AMOUNT | RECEIVED TIMESTAMP |
+----+----------+---------------------------+
|123 | -420.07 | 2018-09-05-09.26.15.000000|<-
|123 | 420.07 | 2018-09-05-08.12.03.000000|
|123 | -420.07 | 2018-09-05-08.40.00.000000|<-
|456 | -5.00 | 2018-09-05-08.45.00.000000|
+----+----------+---------------------------+
I add new col(deal_no) and made it.
https://www.db-fiddle.com/f/3GfZoQwGhBLf7YWf2RucBF/4
select tmp_B.deal_no, tmp_B.id, tmp_B.amount, tmp_A.deal_no
from tmp_B
left outer join tmp_A
on tmp_A.deal_no = tmp_B.deal_no
where tmp_A.deal_no is null

Oracle 10 SQL: FULL JOIN through Cross Reference Table

http://sqlfiddle.com/#!4/24637/1
I have three tables, (better details/data shown in sqlfiddle link), one replacing another, and a cross reference table in between. One of the fields in each of the table uses the cross reference (version), and another one of the fields in each of the tables is the same (changeID).
I need a query that when passed a list of new_version + new_changeType, along with the equivalent original_version + old_changeType (if there is an old version equivalent) PLUS any old changeIDs that were 'missed' in the conversion of data.
TABLES (fields on the same line are equivalent)
OLD_table | XREF_table | NEW_Table
original_version | original_version |
changeID | | changeID
OLD_changeType | |
| new_version | new_version
| | NEW_changeType
DATA
111,1,CT1 | 111,AAA | AAA,1,ONE
111,2,CT2 | 222,BBB | AAA,2,TWO
222,1,CT1 | 333,DDD | BBB,1,ONE
222,2,CT2 | | BBB,2,TWO
222,3,CT3 | | CCC,1,ONE
333,1,CT1 | |
444,1,CT1 | |
If passed the following list, the result set should look like so. (order doesnt matter)
AAA,BBB,CCC
| NEW_VERSION | NEW_CHANGE_TYPE| ORIGINAL_VERSION | CHANGEID | OLD_CHANGE_TYPE |
|-------------|----------------|------------------|----------|-----------------|
| AAA | ONE | 111 | 1 | CT1 |
| AAA | TWO | 111 | 2 | CT2 |
| BBB | ONE | 222 | 1 | CT1 |
| BBB | TWO | 222 | 2 | CT2 |
| CCC | ONE | (null) | (null) | (null) |
| (null) | (null) | 222 | 3 | CT3 |
I'm having trouble getting ALL the data required. I've played with the following query, however I seem to either 1) miss a row or 2) get additional rows not matching the requirements.
The following queries I've played with are as follows.
select
a.new_version,
a.Change_type,
c.original_version,
c.changeID,
c.OLD_Change_type
from NEW_TABLE a
LEFT OUTER JOIN XREF_TABLE b on a.new_version = b.new_version
FULL OUTER JOIN OLD_TABLE c on
b.original_version = c.original_version and a.changeID = c.changeID
where (b.new_version in ('AAA','BBB','CCC') or b.new_version is null);
select
a.new_version,
a.Change_type,
c.original_version,
c.changeID,
c.OLD_Change_type
from NEW_TABLE a
FULL JOIN XREF_TABLE b on a.new_version = b.new_version
FULL JOIN OLD_TABLE c on
b.original_version = c.original_version and a.changeID = c.changeID
where (a.new_version in ('AAA','BBB','CCC'));
The first returns one 'extra' row with the 333,DDD data, which is not specified from the input.
The seconds returns one less row (with the changeID from the old table "missed" from when this data was converted over.
Any thoughts or suggestions on how to solve this?

First inner join old_table and xref_table, as you are not interested in any old_table entries without an xref_table entry. Then full outer join new_table. In your WHERE clause be aware that new_table.new_version can be null, so use coalesce to use xref_table.new_version in this case to limit your results to AAA, BBB and CCC. That's all.
select
coalesce(n.new_version, x.new_version) as new_version,
n.change_type,
o.original_version,
o.changeid,
o.old_change_type
from old_table o
inner join xref_table x
on x.original_version = o.original_version
full outer join new_table n
on n.new_version = x.new_version
and n.changeid = o.changeid
where coalesce(n.new_version, x.new_version) in ('AAA','BBB','CCC')
order by 1,2,3,4,5
;
Here is your fiddle: http://sqlfiddle.com/#!4/24637/11.
BTW: Better never use random aliases like a, b and c that don't indicate what table is meant. That makes the query harder to understand. Use the table's first letter(s) or an acronym instead.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas