I need to show all records for a specific value if ANY one of those records have another specific value. Essentially, if field3 = 'b', what is field1? Show all records with value of field1 regardless of their field3 value.
Record Number External Id Letter
1 123456 a
2 123456 b
3 123456 c
4 456852 t
5 456852 b
record 2 has a letter value of 'b' - so I want to look at externalid, which is 123456, now I want to pull all records for external id regardless if the other records have a letter value of 'b'
Use EXISTS and a correlated subquery:
SELECT *
FROM mytable t
WHERE
t.letter = 'b'
OR EXISTS (
SELECT 1
FROM mytable t1
WHERE
t1.record_number != t.record_number
AND t1.external_id = t.external_id
AND t1.letter = 'b'
)
Another option is to use a window function:
SELECT record_number, external_id, letter
FROM (
SELECT
t.*,
MAX(CASE WHEN letter = 'b' THEN 1 END) OVER(PARTITION BY external_id) mx
FROM mytable t
) x WHERE mx = 1
Demo on DB Fiddle:
record_number | external_id | letter
------------: | ----------: | :-----
1 | 123456 | a
2 | 123456 | b
3 | 123456 | c
4 | 456852 | t
5 | 456852 | b
Use exists, but don't worry about filtering in the outer query:
select t.*
from t
where exists (select 1
from t t2
where t2.external_id = t.external_id and t2.letter = 'b'
);
With an index on (external_id, letter), I would expect this to have very good performance.
table 1
id | name | gender
1 | ABC | M
2 | CDE | M
3 | FGH | M
table 2
id | name | gender
4 | BAC | F
5 | DCE | F
6 | GFH | F
how to make output in oracle database like this :
id | name | gender
1 | ABC | M
2 | CDE | M
3 | FGH | M
4 | BAC | F
5 | DCE | F
6 | GFH | F
Use UNION [ALL]:
select * from table1
union all
select * from table2;
P.S. If there exists any duplicated row for individual SELECT statements, UNION would remove duplicates, but UNION ALL concatenates rows even they are duplicates.
If you really need to "join" 2 tables:
with a as (
select 1 id, 'ABC' name, 'M' gender from dual union all
select 2 id, 'CDE' name, 'M' gender from dual union all
select 3 id, 'FGH' name, 'M' gender from dual ),
b as (
select 4 id, 'BAC' name, 'F' gender from dual union all
select 5 id, 'DCE' name, 'F' gender from dual union all
select 6 id, 'GFH' name, 'F' gender from dual )
select coalesce(a.id, b.id) id,
coalesce(a.name, b.name) name,
coalesce(a.gender, b.gender) gender
from a
full join b
on a.id = b.id
/* if name, gender not in pk */
-- and a.name = b.name
-- and a.gender = b.gender
;
In this case all duplicated "ID"s will be removed. And first not null value of "name", "gender" columns will be returned becouse of coalesce function.
You can even use greatest, least and ets, instead of coalesce..
p.s. Be careful if you don't have PK on table!
I am working on a project where I must examine duplicate records and discern which of the records I must keep. There is a general criteria to be met for the record based on the attributes we are looking at. The following table examines the relationships between the criteria.
Table1
+----------+-----+-------+-------+-------+-------+
| dup_id | idm | ucode | great | good |yo2005 |
+----------+-----+-------+-------+-------+-------+
| a | 1 | 6 | yes | yes | yes |
| a | 2 | 1 | no | yes | yes |
| a | 3 | 1 | no | no | yes |
| b | 4 | 1 | yes | yes | no |
| b | 5 | 1 | no | no | no |
| c | 6 | 7 | no | no | yes |
| c | 7 | 1 | yes | no |no |
| d | 8 | 6 | no | yes |no |
| d | 9 | 1 | yes | no |no |
| e | 10 | 3 | yes | no |yes |
| e | 11 | 4 | no | yes |no |
| f | 12 | 1 | yes | yes | yes |
| f | 13 | 1 | yes | no |yes |
| g | 14 | 1 | no | no |yes |
| g | 15 | 1 | yes | no |no |
+----------+----+--------+-------+-------+-------+
Table 2
+-----+-------+
| ido | yo1998|
+-----+-------+
| 1 | yes |
| 2 | no |
| 3 | no |
| 4 | no |
| 5 | no |
| 6 | no |
| 7 | no |
| 8 | yes |
| 9 | yes |
| 10 | yes |
| 11 | yes |
| 12 | yes |
| 13 | no |
| 14 | yes |
| 15 | no |
+----+-------+
The tables have other records we would like to keep, but these are the main ones that fit the criteria
Table1
• dup_id- this is the id of the collection of all duplicates that are associated with it. This can have 2 or more records associated with it
• idm-the id of records in table 1, matches the ido in table 2
• ucode-this attribute has a duplicate signifier from a previous classification. If it is a value of 6, then it is considered a duplicate (but for some reason the new algorithm accepted it as non duplicate)
• great-this is a field that is preferred because it was verified at some point
• good-this is a field that is preferred, but has not been verified
• yo2005- data that was collected in 2005
Table2
• ido-the id of records in table 2; matches the idm in table 1
• yo1998-data that was collected in 1998
The issue is, we have so many records to sift through. What I have been attempting to do is to develop a query for each criteria to attempt to filter the data we need to look at down.
The criteria
The order of importance of the criteria is as follows:
• ucode- if one of the records in a dupid has a ucode =6, that means it is already known as a duplicate record, so the other ucodes take precedence. For example, dupid d has 2 records, so we know that the correct one is idm=8. For example, if our table has 10,000 records, this may pick up 2000 of them, which leaves us with 8000 to be manually examined.
• great- this is the 2nd level of importance for us. If great = yes, then we would like this record to be selected from any records that were not resolved by the first query. For example, of the 8000 left from the query above, this might pick up another 1000, leaving us with 7000 to be manually examined.
• good-this is 3rd level of importance to us. If great = no, but good = yes, then this would be our choice for anything not earlier resolved. For example, of the 7000 left from the query above, this might pick up another 500, leaving us with 6500 to be manually examined.
• At this point we have 2 tables involved; our 4th priority is that both yo2005 and yo1998 = yes. For example, of the 6500 left in the query above, this might pick up another 1000, leaving us with 5500 to be manually examined.
• If both are not equal to yes, yo2005 is our 5th priority. For example, of the 5500 left in the query above, this might pick up another 2000, leaving us with 3500 to be manually examined.
• yo1998 = ‘yes’ is our final priority. For example, of the 3500 left in the query above, this might pick up another 1000, leaving us with 2500 to be manually examined.
As you can see, this would cut out a great deal of the manual examination of the records.
Ideally, there would be 2 output tables; one for all the records that fit the critera (which is 7500 records). Maybe even a new field can be created with the justifications for it, to be populated by which criteria it was based on. We would also need another table that contains the records that did not meet any of the criteria, so that we can further investigate those records to decide which is the duplicate. Unfortunately, I am not very well versed in sql, so I don’t even know if something like this is possible.
Thank you for your time.
You can write all of these in SQL. Below is the ucode one. It selects all the dupids that have two records, with one of them having ucode = 6. Then picks the other record:
SELECT *
FROM t1
WHERE ucode <> 6
AND dupid IN
(SELECT dupid
FROM t1
INNER JOIN t2
ON t1.idm = t2.ido
GROUP BY dupid
HAVING COUNT(*) = 2
AND EXISTS
(SELECT 1
FROM t1 sub
WHERE ucode = 6
AND sub.dupid = t1.dupid))
This one will give you all the records marked as great and ucode does not = 6:
SELECT *
FROM t1
WHERE great = 'yes'
AND ucode <> 6
This one will give you all the records marked as good that do not have a record in the same dupid marked as great, excluding those with ucode = 6:
SELECT *
FROM t1
WHERE good = 'yes'
AND ucode <> 6
AND NOT EXISTS
(SELECT 1
FROM t1 sub
WHERE great = 'yes'
AND sub.dupid = t1.dupid)
This one finds all records where yo2005 = yes and great = no and good = no and unicode is not equal 6:
SELECT *
FROM t1
WHERE yo2005 = 'yes'
AND ucode <> 6
AND NOT EXISTS
(SELECT 1
FROM t1 sub
WHERE (great = 'yes'
OR good = 'yes')
AND sub.dupid = t1.dupid)
Finally, this one shows the records where yo1998 = yes and all other conditions fail:
SELECT *
FROM t1
INNER JOIN t2
ON t1.idm = t2.ido
WHERE yo1998 = 'yes'
AND ucode <> 6
AND NOT EXISTS
(SELECT 1
FROM t1 sub
WHERE (great = 'yes'
OR good = 'yes'
OR yo2005 = 'yes')
AND sub.dupid = t1.dupid)
Hopefully these will be useful to you!
I am not sure how you are going to use this, but it may help. I believe it gives you the two tables in your last paragraph (combined in one); the "priority" is a number, corresponding to the six criteria you have.
with
table_1 ( dup_id, idm, ucode, great, good, yo2005 ) as (
select 'a', 1, 6, 'yes', 'yes', 'yes' from dual union all
select 'a', 2, 1, 'no' , 'yes', 'yes' from dual union all
select 'a', 3, 1, 'no' , 'no' , 'yes' from dual union all
select 'b', 4, 1, 'yes', 'yes', 'no' from dual union all
select 'b', 5, 1, 'no' , 'no' , 'no' from dual union all
select 'c', 6, 7, 'no' , 'no' , 'yes' from dual union all
select 'c', 7, 1, 'yes', 'no' , 'no' from dual union all
select 'd', 8, 6, 'no' , 'yes', 'no' from dual union all
select 'd', 9, 1, 'yes', 'no' , 'no' from dual union all
select 'e', 10, 3, 'yes', 'no' , 'yes' from dual union all
select 'e', 11, 4, 'no' , 'yes', 'no' from dual union all
select 'f', 12, 1, 'yes', 'yes', 'yes' from dual union all
select 'f', 13, 1, 'yes', 'no' , 'yes' from dual union all
select 'g', 14, 1, 'no' , 'no' , 'yes' from dual union all
select 'g', 15, 1, 'yes', 'no' , 'no' from dual
),
table_2 ( ido, yo1998 ) as (
select 1, 'yes' from dual union all
select 2, 'no' from dual union all
select 3, 'no' from dual union all
select 4, 'no' from dual union all
select 5, 'no' from dual union all
select 6, 'no' from dual union all
select 7, 'no' from dual union all
select 8, 'yes' from dual union all
select 9, 'yes' from dual union all
select 10, 'yes' from dual union all
select 11, 'yes' from dual union all
select 12, 'yes' from dual union all
select 13, 'no' from dual union all
select 14, 'yes' from dual union all
select 15, 'no' from dual
)
select t1.dup_id, t1.idm, t1.ucode, t1.great, t1.good, t1.yo2005, t2.yo1998,
case when ucode = 6 then 1
when great = 'yes' then 2
when good = 'yes' then 3
when yo2005 = 'yes' then case when yo1998 = 'yes' then 4
else 5
end
when yo1998 = 'yes' then 6
end as priority
from table_1 t1 left outer join table_2 t2 on t1.idm = t2.ido
order by dup_id, priority
;
Output:
DUP_ID IDM UCODE GREAT GOOD YO2005 YO1998 PRIORITY
------ ---- ----- ----- ---- ------ ------ --------
a 1 6 yes yes yes yes 1
a 2 1 no yes yes no 3
a 3 1 no no yes no 5
b 4 1 yes yes no no 2
b 5 1 no no no no
c 7 1 yes no no no 2
c 6 7 no no yes no 5
d 8 6 no yes no yes 1
d 9 1 yes no no yes 2
e 10 3 yes no yes yes 2
e 11 4 no yes no yes 3
f 12 1 yes yes yes yes 2
f 13 1 yes no yes no 2
g 15 1 yes no no no 2
g 14 1 no no yes yes 4
15 rows selected
ADDED: Here is one way to use this (as a subquery) to further analyze the results. See OP's comments below. DUP_ID = a and d don't appear in the output at all, since each has a row with UCODE=6; for every other DUP_ID the row with the highest PRIORITY is selected (if there are ties, one random row for that DUP_ID, from those with highest PRIORITY, is selected).
with
table_1 ( dup_id, idm, ucode, great, good, yo2005 ) as (
....
),
table_2 ( ido, yo1998 ) as (
....
),
final ( dup_id, idm, ucode, great, good, yo2005, yo1998, priority ) as (
select t1.dup_id, t1.idm, t1.ucode, t1.great, t1.good, t1.yo2005, t2.yo1998,
case when ucode = 6 then 1
when great = 'yes' then 2
when good = 'yes' then 3
when yo2005 = 'yes' then case when yo1998 = 'yes' then 4
else 5
end
when yo1998 = 'yes' then 6
end as priority
from table_1 t1 left outer join table_2 t2 on t1.idm = t2.ido
),
o ( dup_id, idm, ucode, great, good, yo2005, yo1998, priority, rn ) as (
select dup_id, idm, ucode, great, good, yo2005, yo1998, priority,
row_number() over (partition by dup_id order by priority)
from final
)
select dup_id, idm, ucode, great, good, yo2005, yo1998, priority
from o
where rn = 1 and priority > 1;
DUP_ID IDM UCODE GREAT GOOD YO2005 YO1998 PRIORITY
------ --- ----- ----- ----- ------ ------ --------
b 4 1 yes yes no no 2
c 7 1 yes no no no 2
e 10 3 yes no yes yes 2
f 12 1 yes yes yes yes 2
g 15 1 yes no no no 2
I have found it quite hard to word what I want to do in the title so I will try my best to explain now!
I have two tables which I am using:
Master_Tab and Parts_Tab
Parts_Tab has the following information:
Order_Number | Completed| Part_Number|
| 1 | Y | 64 |
| 2 | N | 32 |
| 3 | Y | 42 |
| 1 | N | 32 |
| 1 | N | 5 |
Master_Tab has the following information:
Order_Number|
1 |
2 |
3 |
4 |
5 |
I want to generate a query which will return ALL of the Order_Numbers listed in the Master_Tab on the following conditions...
For each Order_Number I want to check the Parts_Tab table to see if there are any parts which aren't complete (Completed = 'N'). For each Order_Number I then want to count the number of uncompleted parts an order has against it. If an Order_Number does not have uncompleted parts or it is not in the Parts_Table then I want the count value to be 0.
So the table that would be generated would look like this:
Order_Number | Count_of_Non_Complete_Parts|
1 | 2 |
2 | 1 |
3 | 0 |
4 | 0 |
5 | 0 |
I was hoping that using a different kind of join on the tables would do this but I am clearly missing the trick!
Any help is much appreciated!
Thanks.
I have used COALESCE to convert NULL to zero where necessary. Depending on your database platform, you may need to use another method, e.g. ISNULL or CASE.
select mt.Order_Number,
coalesce(ptc.Count, 0) as Count_of_Non_Complete_Parts
from Master_Tab mt
left outer join (
select Order_Number, count(*) as Count
from Parts_Tab
where Completed = 'N'
group by Order_Number
) ptc on mt.Order_Number = ptc.Order_Number
order by mt.Order_Number
You are looking for a LEFT JOIN.
SELECT mt.order_number, count(part_number) AS count_noncomplete_parts
FROM master_tab mt LEFT JOIN parts_tab pt
ON mt.order_number=pt.order_number AND pt.completed='N'
GROUP BY mt.order_number;
It is also possible to put pt.completed='N' into a WHERE clause, but you have to be careful of NULLs. Instead of the AND you can have
WHERE pt.completed='N' OR pr.completed IS NULL
SELECT mt.Order_Number SUM(tbl.Incomplete) Count_of_Non_Complete_Parts
FROM Master_Tab mt
LEFT JOIN (
SELECT Order_Number, CASE WHEN Completed = 'N' THEN 1 ELSE 0 END Incomplete
FROM Parts_Tab
) tbl on mt.Order_Number = tbl.Order_Number
GROUP BY mt.Order_Number
Add a WHERE clause to the outer query if you need to filter for specific order numbers.
I think it's easiest to get a subquery in there. I think this should be self-explanitory, if not feel free to ask any questions.
CREATE TABLE #Parts
(
Order_Number int,
Completed char(1),
Part_Number int
)
CREATE TABLE #Master
(
Order_Number int
)
INSERT INTO #Parts
SELECT 1, 'Y', 64 UNION ALL
SELECT 2, 'N', 32 UNION ALL
SELECT 3, 'Y', 42 UNION ALL
SELECT 1, 'N', 32 UNION ALL
SELECT 1, 'N', 5
INSERT INTO #Master
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6
SELECT M.Order_Number, ISNULL(Totals.NonCompletedCount, 0) FROM #Master M
LEFT JOIN (SELECT P.Order_Number, COUNT(*) AS NonCompletedCount FROM #Parts P
WHERE P.Completed = 'N'
GROUP BY P.Order_Number) Totals ON Totals.Order_Number = M.Order_Number
I have a table with the following structure:
timestamp | name | value
0 | john | 5
1 | NULL | 3
8 | NULL | 12
12 | john | 3
33 | NULL | 4
54 | pete | 1
180 | NULL | 4
400 | john | 3
401 | NULL | 4
592 | anna | 2
Now what I am looking for is a query that will give me the sum of the values for each name, and treats the nulls in between (orderd by the timestamp) as the first non-null name down the list, as if the table were as follows:
timestamp | name | value
0 | john | 5
1 | john | 3
8 | john | 12
12 | john | 3
33 | pete | 4
54 | pete | 1
180 | john | 4
400 | john | 3
401 | anna | 4
592 | anna | 2
and I would query SUM(value), name from this table group by name. I have thought and tried, but I can't come up with a proper solution. I have looked at recursive common table expressions, and think the answer may lie in there, but I haven't been able to properly understand those.
These tables are just examples, and I don't know the timestamp values in advance.
Could someone give me a hand? Help would be very much appreciated.
With Inputs As
(
Select 0 As [timestamp], 'john' As Name, 5 As value
Union All Select 1, NULL, 3
Union All Select 8, NULL, 12
Union All Select 12, 'john', 3
Union All Select 33, NULL, 4
Union All Select 54, 'pete', 1
Union All Select 180, NULL, 4
Union All Select 400, 'john', 3
Union All Select 401, NULL, 4
Union All Select 592, 'anna', 2
)
, NamedInputs As
(
Select I.timestamp
, Coalesce (I.Name
, (
Select I3.Name
From Inputs As I3
Where I3.timestamp = (
Select Max(I2.timestamp)
From Inputs As I2
Where I2.timestamp < I.timestamp
And I2.Name Is not Null
)
)) As name
, I.value
From Inputs As I
)
Select NI.name, Sum(NI.Value) As Total
From NamedInputs As NI
Group By NI.name
Btw, what would be orders of magnitude faster than any query would be to first correct the data. I.e., update the name column to have the proper value, make it non-nullable and then run a simple Group By to get your totals.
Additional Solution
Select Coalesce(I.Name, I2.Name), Sum(I.value) As Total
From Inputs As I
Left Join (
Select I1.timestamp, MAX(I2.Timestamp) As LastNameTimestamp
From Inputs As I1
Left Join Inputs As I2
On I2.timestamp < I1.timestamp
And I2.Name Is Not Null
Group By I1.timestamp
) As Z
On Z.timestamp = I.timestamp
Left Join Inputs As I2
On I2.timestamp = Z.LastNameTimestamp
Group By Coalesce(I.Name, I2.Name)
You don't need CTE, just a simple subquery.
select t.timestamp, ISNULL(t.name, (
select top(1) i.name
from inputs i
where i.timestamp < t.timestamp
and i.name is not null
order by i.timestamp desc
)), t.value
from inputs t
And summing from here
select name, SUM(value) as totalValue
from
(
select t.timestamp, ISNULL(t.name, (
select top(1) i.name
from inputs i
where i.timestamp < t.timestamp
and i.name is not null
order by i.timestamp desc
)) as name, t.value
from inputs t
) N
group by name
I hope I'm not going to be embarassed by offering you this little recursive CTE query of mine as a solution to your problem.
;WITH
numbered_table AS (
SELECT
timestamp, name, value,
rownum = ROW_NUMBER() OVER (ORDER BY timestamp)
FROM your_table
),
filled_table AS (
SELECT
timestamp,
name,
value
FROM numbered_table
WHERE rownum = 1
UNION ALL
SELECT
nt.timestamp,
name = ISNULL(nt.name, ft.name),
nt.value
FROM numbered_table nt
INNER JOIN filled_table ft ON nt.rownum = ft.rownum + 1
)
SELECT *
FROM filled_table
/* or go ahead aggregating instead */