Select only the "most complete" record - sql

I need to solve the following problem.
Let's suppose I have a table with 4 fields called a, b, c, d.
I have the following records:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4
As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.
What I would like to do is, to extract only 2nd row.
Could you help me please?
Thanks in advance for the answer
EDIT: I need to be more specific.
I could have also the cases:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3
where I need to extract the 2nd row,
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3
and again I need to extract the 2nd row.
Same for couples,
a | b | c | d
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3
and so on for the other examples.
(Of course, it's now always 2nd row)

Using a NOT EXISTS the records that have a better duplicate can be filtered out.
create table abcd (
a int,
b int,
c int,
d int
);
insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)
,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)
,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)
SELECT *
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);
a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null
db<>fiddle here

You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:
(CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex
Then SELECT the top 1 ordered by CompletionIndex in descending order.
This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.

Most complete rows, by your definition, are the ones with the least null columns:
SELECT * FROM tablename
WHERE (
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END)
) =
(SELECT MAX(
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END))
FROM tablename)

Hmmm . . . I think you can use not exists:
with t as (
select t.*, row_number() over (order by a) as id
from t
)
select t.*
from t
where not exists (select 1
from t t2
where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
(t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
(t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
(t2.d is not distinct from t.d or t2.d is not null and t.d is null)
) and
t2.id <> t.id
);
The logic is that no more specific row exists, where the values match
Here is a db<>fiddle.

As mentioned by Gordon Linoff, we do have to use something like not exists too,
Edit Using EXCEPT helps
This might work...
SELECT * from table1
EXCEPT
(
SELECT t1.*
FROM table1 t1
JOIN table1 t2
ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
)
Here, t1 is every subset row.
Note: We are assuming value -1 as sentinel value and it does not occur in any column.

Related

How to get 2 values of same num to 2 separate columns

I got table like this:
num |type|value
--------------
1 | a | 5
1 | b | 7
3 | c | 9
2 | a | 6
2 | b | 9
and want this kind of result:
num| value (a) | value (b)
-------------------------
1 | 5 | 7
2 | 6 | 9
You can use a self-join which will also remove the rows with just one value (num = 3 in your sample data)
select t1.num, t1.value as value_a, t2.value as value_b
from the_table t1
join the_table t2 on t1.num = t2.num and t2.type = 'b'
where t1.type = 'a'
You can use GROUP BY and CASE, as in:
select
num,
max(case when type = 'a' then value end) as value_a,
max(case when type = 'b' then value end) as value_b
from t
group by num
I'd join the table on itself, once for a and once for b
SELECT a.num, a.value, b.value
FROM mytable a
JOIN mytable b ON a.num = b.num AND a.type = 'a' AND b.type = 'b'

SQL Query for below scenario for multiple rows

Need response as per expected result column in attached image.
The row filtration is required in multiple rows
The rule is (x.attr2 = '1' AND x.attr3 = '1') AND (x.attr2='' AND x.attr3='2') then its expected column value is true but all other conditions its false
Its MS SQL
Key Atr2 Atr3 expected result
111 1 1 TRUE
111 2 2
112 1 4 FALSE
113 1 4 FALSE
113 2 2
114 1 1 FALSE
Check the below script-
IF OBJECT_ID('[Sample]') IS NOT NULL
DROP TABLE [Sample]
CREATE TABLE [Sample]
(
[Key] INT NOT NULL,
Attr1 INT NOT NULL,
Attr2 INT NOT NULL,
Attr3 INT NOT NULL
)
GO
INSERT INTO [Sample] ([Key],Attr1,Attr2,Attr3)
VALUES (111,62,1,1),
(111,62,2,2),
(112,62,1,4),
(113,62,1,4),
(113,62,2,2),
(114,62,1,1)
--EXPECTED_RESULT:
SELECT S.*,CASE WHEN T.[KEY] IS NOT NULL THEN 'TRUE' ELSE 'FALSE' END AS Expected_Result
FROM [Sample] S LEFT JOIN
(
SELECT T.[KEY] FROM
(
SELECT x.*,
ROW_NUMBER() OVER( PARTITION BY x.[KEY],x.attr1 ORDER BY x.attr2,x.attr3) AS r_no
--,CASE WHEN (x.attr2 = 1 AND x.attr3 = 1) OR (x.attr2 = 2 AND x.attr3 = 2)
--then 'TRUE' else 'FALSE' end as expected_result
FROM [Sample] x WHERE x.attr2=x.attr3
) T WHERE T.r_no>1
) T ON S.[KEY]=T.[KEY]
This query:
select key from tablename
group by key
having sum(case when atr2 = '1' and atr3 = '1' then 1 else 0 end) > 0
and sum(case when atr2 = '2' and atr3 = '2' then 1 else 0 end) > 0
and count(*) = 2
uses conditional aggregation to find the keys for which the result should be true.
So join it to the table like this:
select t.*,
case when g.[key] is null then 'FALSE' else 'TRUE' end result
from tablename t left join (
select [key] from tablename
group by [key]
having sum(case when atr2 = '1' and atr3 = '1' then 1 else 0 end) > 0
and sum(case when atr2 = '2' and atr3 = '2' then 1 else 0 end) > 0
and count(*) = 2
) g on g.[key] = t.[key]
See the demo.
Results:
> Key | Atr2 | Atr3 | result
> --: | ---: | ---: | :-----
> 111 | 1 | 1 | TRUE
> 111 | 2 | 2 | TRUE
> 112 | 1 | 4 | FALSE
> 113 | 1 | 4 | FALSE
> 113 | 2 | 2 | FALSE
> 114 | 1 | 1 | FALSE

Making a conditional aggregate

I have tricky grouping problem for our business reasons, I have a table which has values like this
----------------------------
| NAME | TYPE | VALUE |
----------------------------
| N1 | T1 | V1 |
| N1 | T2 | V2 |
| N1 | NULL | V3 |
| N2 | T2 | V4 |
| N2 | NULL | V5 |
| N3 | NULL | V6 |
-----------------------------
I need to group it in a way that,
The first level grouping will be by name.
At the second level,
When the available types are T1,T2 and NULL, group T1 and NULL together and have T2 grouped seperately.
When the available types are T2 and NULL, group NULL with T2.
When NULL is the only available type, just have it as it is.
The expected O/P for the above table is,
----------------------------
| N1 | T1 | V1+V3 |
| N1 | T2 | V2 |
| N2 | T2 | V4+V5 |
| N3 | NULL | V6 |
-----------------------------
How to achieve this in snowflake sql. Or any other server, so that I can find an equivalent in Snowflake.
The following query should work:
SELECT t1.NAME, COALESCE(TYPE, MIN_TYPE), SUM(VALUE)
FROM mytable AS t1
JOIN (
SELECT NAME, MIN(TYPE) AS MIN_TYPE
FROM mytable
GROUP BY NAME
) AS t2 ON t1.NAME = t2.NAME
GROUP BY t1.NAME, COALESCE(TYPE, MIN_TYPE)
The query uses a derived table in order to extract the MIN(TYPE) value per NAME. Using COALESCE we can then convert NULL to either T1 or T2.
Edit:
You can create a pivoted version of the expected result set using the following query:
SELECT NAME,
CASE
WHEN T1SUM IS NULL THEN 0
ELSE COALESCE(T1SUM, 0) + COALESCE(NULLSUM,0)
END AS T1SUM,
CASE
WHEN T1SUM IS NULL AND T2SUM IS NOT NULL
THEN COALESCE(T2SUM, 0) + COALESCE(NULLSUM,0)
ELSE COALESCE(T2SUM, 0)
END AS T2SUM,
CASE
WHEN T1SUM IS NULL AND T2SUM IS NULL THEN COALESCE(NULLSUM,0)
ELSE 0
END AS NULLSUM
FROM (
SELECT NAME,
SUM(CASE WHEN TYPE = 'T1' THEN VALUE END) AS T1SUM,
SUM(CASE WHEN TYPE = 'T2' THEN VALUE END) AS T2SUM,
SUM(CASE WHEN TYPE IS NULL THEN VALUE END) AS NULLSUM
FROM mytable
GROUP BY NAME) AS t
So in Giorgos's answer that totals are given in a pivoted, or single row be case form, not many rows per case, and this can be written simpler:
with this data:
WITH data_table(name, type, value) AS (
SELECT * FROM VALUES
(10, 1, 100 ),
(10, 2, 200 ),
(10, null, 400 ),
(11, 2, 100 ),
(11, null, 200 ),
(12, null, 100 )
)
and this SQL
SELECT name
,SUM(IFF(type=1, value, null)) as t1_val
,SUM(IFF(type=2, value, null)) as t2_val
,SUM(IFF(type is null, value, null)) as tnull_val
,IFF(t1_val is not null, t1_val + zeroifnull(tnull_val), null) as c1_sum
,IFF(t1_val is not null, t2_val, t2_val + zeroifnull(tnull_val)) as c2_sum
,IFF(t1_val is null AND t2_val is null, tnull_val, null) as c3_sum
FROM data_table
GROUP BY 1;
we get:
NAME
T1_VAL
T2_VAL
TNULL_VAL
C1_SUM
C2_SUM
C3_SUM
10
100
200
400
500
200
null
11
null
100
200
null
300
null
12
null
null
100
null
null
100
which shows for the 10 row the null sum binds with 1 sum, for the 11 row the null sum binds with the 2 sum, and in the 12 row we get the null sum by itself.
We can unpivot these values if we wish, but joining to a mini table with 3 rows like so:
SELECT d.name,
p.c2 as type,
case p.c1
WHEN 1 then d.c1_sum
WHEN 2 then d.c2_sum
ELSE d.c3_sum
end as value
FROM (
SELECT name
,SUM(IFF(type=1, value, null)) as t1_val
,SUM(IFF(type=2, value, null)) as t2_val
,SUM(IFF(type is null, value, null)) as tnull_val
,IFF(t1_val is not null, t1_val + zeroifnull(tnull_val), null) as c1_sum
,IFF(t1_val is not null, t2_val, t2_val + zeroifnull(tnull_val)) as c2_sum
,IFF(t1_val is null AND t2_val is null, tnull_val, null) as c3_sum
FROM data_table
GROUP BY 1
) AS d
JOIN (
SELECT column1 as c1, column2 as c2
FROM VALUES (1,'T1'),(2,'T2'),(null,'null')
) AS p
ON ((d.c1_sum is not null AND p.c1 = 1)
OR (d.c2_sum is not null AND p.c1 = 2)
OR (d.c3_sum is not null AND p.c1 is null))
ORDER BY 1,2;
which gives the original requested output:
NAME
TYPE
VALUE
10
T1
500
10
T2
200
11
T2
300
12
null
100

join 2 tables in oracle sql

Here is the configuration I am starting with:
DROP TABLE ruleset1;
CREATE TABLE ruleset1 (id int not null unique,score_rule1 float default 0.0,score_rule2 float default 0.0,score_rule3 float default 0.0);
DROP TABLE ruleset2;
CREATE TABLE ruleset2 (id int not null unique,score_rule1 float default 0.0,score_rule2 float default 0.0,score_rule3 float default 0.0);
insert into ruleset1 (id, score_rule1, score_rule2, score_rule3) values (0,0.8,0,0);
insert into ruleset1 (id, score_rule1, score_rule2, score_rule3) values (1,0,0.1,0);
insert into ruleset2 (id, score_rule1, score_rule2, score_rule3) values (0,0,0,0.3);
insert into ruleset2 (id, score_rule1, score_rule2, score_rule3) values (2,0,0.2,0);
what I have is this now is 2 tables
ruleset1:
| ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3
================================================
| 0 | 0.8 | 0 | 0
| 1 | 0 | 0.1 | 0
and ruleset2:
| ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3
================================================
| 0 | 0 | 0 | 0.3
| 2 | 0 | 0.2 | 0
and I want to outer join them and calculate the mean of non zero columns, like this:
| ID | Average
================
| 0 | 0.55
| 1 | 0.1
| 2 | 0.2
My current query is:
select * from ruleset1 full outer join ruleset2 on ruleset1.id = ruleset2.id;
which gives an ugly result:
| ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3 | ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3
============================================================================================
| 0 | .8 | 0 | 0 | 0 | 0 | 0 | .3
| - | - | - | - | 2 | 0 | .2 | 0
| 1 | 0 | .1 | 0 | - | - | - | -
Can anyone help with a better query please?
Thank you very much!
Of course avg doesn't ignore zeroes, only NULLs, thus NULLIF(column, 0) could be used.
But as you got denormalized data you can simply normalize it on-the-fly:
select id, avg(score)
from
(
select id, score_rule1 score
from ruleset1 where score_rule1 <> 0
union all
select id, score_rule2 from ruleset1 where score_rule2 <> 0
union all
select id, score_rule3 from ruleset1 where score_rule3 <> 0
union all
select id, score_rule1 from ruleset2 where score_rule1 <> 0
union all
select id, score_rule2 from ruleset2 where score_rule2 <> 0
union all
select id, score_rule3 from ruleset2 where score_rule3 <> 0
) dt
group by id;
To avoid five Unions you could use only one and do some additional logic:
select id, sum(score) / sum(score_count)
from
(
select id, score_rule1 + score_rule2 + score_rule3 score,
case when score_rule1 = 0 then 0 else 1 end +
case when score_rule2 = 0 then 0 else 1 end +
case when score_rule3 = 0 then 0 else 1 end score_count
from ruleset1
union all
select id, score_rule1 + score_rule2 + score_rule3 score,
case when score_rule1 = 0 then 0 else 1 end +
case when score_rule2 = 0 then 0 else 1 end +
case when score_rule3 = 0 then 0 else 1 end score_count
from ruleset2
) dt
group by id;
This assumes there are no NULLs in the core_rule columns.
Here's an example with PostgreSQL that you could adapt with Oracle (sorry, SQLFiddle's Oracle isn't cooperating). Thanks to Juan Carlos Oropeza's suggestion, the code below runs on Oracle well: http://rextester.com/DVP59353
select
r.id,
sum(coalesce(r1.score_rule1,0) +
coalesce(r1.score_rule2,0) +
coalesce(r1.score_rule3,0) +
coalesce(r2.score_rule1,0) +
coalesce(r2.score_rule2,0) +
coalesce(r2.score_rule3,0)
)
/
sum(case when coalesce(r1.score_rule1,0) <> 0 then 1 else 0 end +
case when coalesce(r1.score_rule2,0) <> 0 then 1 else 0 end +
case when coalesce(r1.score_rule3,0) <> 0 then 1 else 0 end +
case when coalesce(r2.score_rule1,0) <> 0 then 1 else 0 end +
case when coalesce(r2.score_rule2,0) <> 0 then 1 else 0 end +
case when coalesce(r2.score_rule3,0) <> 0 then 1 else 0 end) as Average
from
(select id from ruleset1
union
select id from ruleset2) r
left join ruleset1 r1 on r.id = r1.id
left join ruleset2 r2 on r.id = r2.id
group by r.id
SQLFiddle with PostgreSQL version is here: http://sqlfiddle.com/#!15/24e3f/1.
This example combines id from both tables using a union. Doing so allows the same ID in both ruleset1 and ruleset2 to appear just once in the result. r is an alias given to this generated table.
All the ids are then left joined with both tables. During the summation process, it is possible that the NULL values resulting from left join may impact the result. So the NULLs are coalesced to zero in the math.
dnoeth is the easy and clean answer.
here I was just playing with COALESCE and NVL2
select COALESCE(r.ID, s.ID),
COALESCE(r.score_rule1, 0) +
COALESCE(r.score_rule2, 0) +
COALESCE(r.score_rule3, 0) +
COALESCE(s.score_rule1, 0) +
COALESCE(s.score_rule2, 0) +
COALESCE(s.score_rule3, 0) as sum,
NVL2(r.score_rule1, 0, 1) +
NVL2(r.score_rule2, 0, 1) +
NVL2(r.score_rule3, 0, 1) +
NVL2(s.score_rule1, 0, 1) +
NVL2(s.score_rule2, 0, 1) +
NVL2(s.score_rule3, 0, 1) as tot
from ruleset1 r
full outer join ruleset2 s
on ruleset1.id = ruleset2.id;
Then your avg is sum/tot
union all your two tables, unpivot, change the zeros into null with nullif, and use standard avg() aggregate function:
select id, avg(nullif(value, 0)) as avg_value from (
select * from ruleset1
union all
select * from ruleset2
)
unpivot ( value for column_name in (score_rule1, score_rule2, score_rule3))
group by id
order by id
;
ID AVG_VALUE
---------- ----------
0 .55
1 .1
2 .2
SELECT s.id, AVG(s.score)
FROM(
SELECT id,score_rule1+score_rule2+score_rule3 as score
FROM ruleset2
UNION ALL
SELECT id,(score_rule1+score_rule2+score_rule3) as score
FROM ruleset1) s
group by s.id

SQL: Get multiple line entries linked to one item?

I have a table:
ID | ITEMID | STATUS | TYPE
1 | 123 | 5 | 1
2 | 123 | 4 | 2
3 | 123 | 5 | 3
4 | 125 | 3 | 1
5 | 125 | 5 | 3
Any item can have 0 to many entries in this table. I need a query that will tell me if an ITEM has all it's entries in either a state of 5 or 4. For example, in the above example, I would like to end up with the result:
ITEMID | REQUIREMENTS_MET
123 | TRUE --> true because all statuses are either 5 or 4
125 | FALSE --> false because it has a status of 3 and a status of 5.
If the 3 was a 4 or 5, then this would be true
What would be even better is something like this:
ITEMID | MET_REQUIREMENTS | NOT_MET_REQUIREMENTS
123 | 3 | 0
125 | 1 | 1
Any idea how to write a query for that?
Fast, short, simple:
SELECT itemid
,count(status = 4 OR status = 5 OR NULL) AS met_requirements
,count(status < 4 OR status > 5 OR NULL) AS not_met_requirements
FROM tbl
GROUP BY itemid
ORDER BY itemid;
Assuming all columns to be integer NOT NULL.
Builds on basic boolean logic:
TRUE OR NULL yields TRUE
FALSE OR NULL yields NULL
And NULL is not counted by count().
->SQLfiddle demo.
SELECT a.ID FROM (SELECT ID, MIN(STATUS) AS MINSTATUS, MAX(STATUS) AS MAXSTATUS FROM TABLE_NAME AS a GROUP BY ID)
WHERE a.MINSTATUS >= 4 AND a.MAXSTATUS <= 5
One way of doing this would be
SELECT t1.itemid, NOT EXISTS(SELECT 1
FROM mytable t2
WHERE itemid=t1.itemid
AND status NOT IN (4, 5)) AS requirements_met
FROM mytable t1
GROUP BY t1.itemid
UPDATE: for your updated requirement, you can have something like:
SELECT itemid,
sum(CASE WHEN status IN (4, 5) THEN 1 ELSE 0 END) as met_requirements,
sum(CASE WHEN status IN (4, 5) THEN 0 ELSE 1 END) as not_met_requirements
FROM mytable
GROUP BY itemid
simple one:
select
"ITEMID",
case
when min("STATUS") in (4, 5) and max("STATUS") in (4, 5) then 'True'
else 'False'
end as requirements_met
from table1
group by "ITEMID"
better one:
select
"ITEMID",
sum(case when "STATUS" in (4, 5) then 1 else 0 end) as MET_REQUIREMENTS,
sum(case when "STATUS" in (4, 5) then 0 else 1 end) as NOT_MET_REQUIREMENTS
from table1
group by "ITEMID";
sql fiddle demo
WITH dom AS (
SELECT DISTINCT item_id FROM items
)
, yes AS ( SELECT item_id, COUNT(*) AS good_count FROM items WHERE status IN (4,5) GROUP BY item_id
)
, no AS ( SELECT item_id, COUNT(*) AS bad_count FROM items WHERE status NOT IN (4,5) GROUP BY item_id
)
SELECT d.item_id
, COALESCE(y.good_count,0) AS good_count
, COALESCE(n.bad_count,0) AS bad_count
FROM dom d
LEFT JOIN yes y ON y.item_id = d.item_id
LEFT JOIN no n ON n.item_id = d.item_id
;
Can be done with an outer join, too:
WITH yes AS ( SELECT item_id, COUNT(*) AS good_count FROM items WHERE status IN (4,5) GROUP BY item_id)
, no AS ( SELECT item_id, COUNT(*) AS bad_count FROM items WHERE status NOT IN (4,5) GROUP BY item_id)
SELECT COALESCE(y.item_id, n.item_id) AS item_id
, COALESCE(y.good_count,0) AS good_count
, COALESCE(n.bad_count,0) AS bad_count
FROM yes y
FULL JOIN no n ON n.item_id = y.item_id
;
Nevermind, it was actually easy to do:
select ITEM_ID ,
sum (case when STATUS >= 3 then 1 else 0 end ) as met_requirements,
sum (case when STATUS < 3 then 1 else 0 end ) as not_met_requirements
from TABLE as d
group by ITEM_ID