SQL Server Weird Grouping Scenario by multiple columns and OR - sql

I have a weird grouping scenario and have some troubles finding out what would be the best way for grouping in SQL.
Imagine we have the following one table
CREATE TABLE Item
(
KeyId VARCHAR(1) NOT NULL,
Col1 INT NULL,
Col2 INT NULL,
Col3 INT NULL
)
GO
INSERT INTO Item (KeyId, Col1, Col2, Col3)
VALUES
('a',1,2,3),
('b',5,4,3),
('c',5,7,6),
('d',8,7,9),
('e',11,10,9),
('f',11,12,13),
('g',20,22,21),
('h',23,22,24)
I need to group records in this table so that if Col1 OR Col2 OR Col3 is the same for two records, then these two records should be in the same group, and there should be chaining.
In other words, with the data as above record 'a' (first record) has Col3 = 3 and record 'b' (second record) has also Col3 = 3, so these two should be in one group. But then record 'b' has the same Col1 as record 'c', so record 'c' should be in the same group as 'a' and 'b'. And then record 'd' has the same Col2 as in 'c', so this should also be in the same group. Similarly 'e' and 'f' has the same values in Col3 and Col1 respectively.
On the other hand records 'g' and 'h' will be in one group (because they have the same Col2 = 22), but this group will be different from the group for records 'a','b','c','d','e','f'.
The result of the query should be something like
KeyId GroupId
'a' 1
'b' 1
'c' 1
'd' 1
'e' 1
'f' 1
'g' 2
'h' 2
There is probably a way of doing this with some loops/cursors, but I started thinking about cleaner way and this seems quite difficult.

Here you go:
with g (rootid, previd, level, keyid, col1, col2, col3) as (
select keyid, '-', 1, keyid, col1, col2, col3 from item
union all
select g.rootid, g.keyid, g.level + 1, i.keyid, i.col1, i.col2, i.col3
from g
join item i on i.col1 = g.col1 or i.col2 = g.col2 or i.col3 = g.col3
where i.keyid > g.keyid
),
m (keyid, rootid) as (
select keyid, min(rootid) from g group by keyid
)
select * from m;
Result:
keyid rootid
----- ------
a a
b a
c a
d a
e a
f a
g g
h g
Note: Keep in mind that SQL Server has by default a limit of 100 iterations (number of rows per group) when processing recursive CTEs. In English: even though it's possible to do this as shown above, there are clear limitations to what SQL Server can process. If you reach this limit you'll get the message:
The maximum recursion 100 has been exhausted before statement completion.
If this happens consider adding the clause option (maxrecursion 32767).

Related

Break out nested data within SQL, criteria across multiple rows (similar to dcast in R)

I'm trying to write a simple query to take a data set that looks like this:
ID | Col2
X B
X C
Y B
Y D
and return this:
ID | Col2 | Col3
X B C
Y B D
Essentially, I have an ID column that can have either B, C, or D in Col2. I am trying to identify which IDs only have B and D. I have a query to find both, but not only that combination. Query:
select ID, Col2
from Table1
where ID in (
select ID from Table1
group by ID
having count(distinct Col2) = 2)
order by ID
Alternatively, I could use help in finding a way to filter that query on B and D and leave off B and C. I have seen perhaps a self join, but am not sure how to implement that.
Thanks!
EDIT: Most of the data set has, for a given ID, all three of B, C, and D. The goal here is to isolate the IDs that are missing one, namely missing C.
I am trying to identify which IDs only have B and D. I have a query to find both
If this is what you want, you don't need multiple columns:
select id
from table1
where col2 in ('B', 'D')
group by id
having count(distinct col2) = 2;
If you want only 'B' and 'D' and no others, then:
select id
from table1
group by id
having sum(case when col2 = 'B' then 1 else 0 end) > 0 AND
sum(case when col2 = 'C' then 1 else 0 end) > 0 AND
sum(case when col2 not in ('B', 'D') then 1 else 0 end) = 0;
If there are only two columns, you can also easily pivot the values using aggregation:
select id, min(col2), nullif(max(col2), min(col2))
from table1
group by id;

How can I acces the output from the first select statement

I have a table Like this
Col1 | Col2
-----------
a | d
b | e
c | a
Now I want to create an statement to get an output like this:
First| Second
-------------------
a | Amsterdamm
b | Berlin
c | Canada
...
So far I have this consturct what is not working
SELECT *
FROM(
SELECT DISTINCT
CASE
when Col1 IS NULL then 'NA'
else Col1
END
FROM Table1
UNION
SELECT DISTINCT
CASE
when Col2 IS NULL then 'NA'
else Col2
END
FROM Table1
) AS First
,
(
SELECT DISTINCT
when First= 'a' then 'Amsterdam'
when First= 'b' then 'Berlin'
when First= 'c' then 'Canada'
) AS Second
;
can you help me with that
Sorry I have to edit my question to be more specific.
Not as familiar with DB2... I'll lookup if it has a concat function in a sec... and it does.
SELECT First, case when first = 'a' then
concat('This is a ',first)
case when first = 'b' then
concat('To Be or not to ',first)
case else
concat('This is a ',first) end as Second
FROM (
SELECT coalesce(col1, 'NA') as First
FROM Table
UNION
SELECT coalesce(col2, 'NA')
FROM table) SRC
WHERE first <> 'NA'
What this does is generate a single inline view called src with a column called first. If col1 or col2 of table are null then it substitutes NA for that value. It then concatenates first and the desired text excluding records with a first value of 'NA'
Or if you just create an inline table with the desired values and join in...
SELECT First, x.b as Second
FROM (
SELECT coalesce(col1, 'NA') as First
FROM Table
UNION
SELECT coalesce(col2, 'NA')
FROM table) SRC
INNER JOIN (select a,b
from (values ('a', 'This is a'),
('b', 'To B or not to' ),
('c', 'I like cat whose name starts with')) as x(a,b)) X;
on X.a = src.first
WHERE first <> 'NA'
Personally I find the 2nd option easier to read. Though if you have meaning for a,b,c I would think you'd want that stored in a table somewhere for additional access. In code seems like a bad place to store data like this that could change.
Assuming you want
a this is a a
b this is a b
c this is a c
d this is a d
e this is a e
thanks to xQbert
I could solve this problem like this
SELECT FirstRow, concat
(
CASE FirstRow
WHEN 'AN' then 'amerstdam'
WHEN 'G' then 'berlin'
ELSE 'NA'
END, ''
) AS SecondRow
FROM(
Select coalesce (Col1, 'NA') as FirstRow
FROM Table1
UNION
Select coalesce (Col2, 'NA')
FROM Table1) SRC
WHERE FirstRow <> 'NA'
;

Getting parent data if child data is null in Oracle hierarchical table

In Oracle 10g I have the following hierarchical table:
corp_id, parent_corp_id, col1, col2, col3
I want to flatten out the structure such that we get the first row's data where col1 OR col2 OR col3 is not null.
So for example, suppose I have:
corp_id = 1
parent_corp_id = null
col1 = 5
col2 = NULL
col3 = NULL
corp_id = 3
parent_corp_id = 1
col1 = NULL
col2 = NULL
col3 = NULL
the results of this query would give me:
corp_id, parent_corp_id, col1, col2, col3
3 , 1 , 5 , NULL, NULL
Another scenario:
Suppose I put col2 = 6 where corp_id = 3
Well, then the result set should be:
corp_id, parent_corp_id, col1, col2, col3
3 , 1 , NULL, 6, NULL
In other words, if the child has data in one of these three columns we grab it. Otherwise, we try the parent and so on and so forth. Shouldn't be more than 3 levels deep but it could have 3 levels to look into.
Pretty new to hierarchical queries, so pardon me if this is a rudimentary question.
Use the coalesce() function, which returns the first non-null value in its list:
select
c.corp_id,
c.parent_corp_id,
coalesce(c.col1, p.col1) col1,
coalesce(c.col2, p.col2) col2,
coalesce(c.col3, p.col3) col3
from mytable c
left join mytable p on p.corp_id = c.parent_corp_id
to get the "first row that has a not-null value", add:
where coalesce(c.col1, p.col1, c.col2, p.col2, c.col3, p.col3) is not null
and rownum = 1
You do need to use a hiearchial query (w/ the connect by clause) because of the fact that you have a parent with a child and that child is the parent of another child (although your example data doesn't bring that into play) however the requirement that you show the 'first not null col1' and the 'first not null col2' etc. is a separate issue from the hierarchical relationship altogether.
Try the following, I added some additional sample data to the fiddle (check the DDL on the left side) for illustrative purposes.
It looks like in your expected output you don't want to show the highest level parents, which is why I put "where s.parent_corp_id is not null" at the end. If you actually do want to show those, take that line out.
Otherwise, this will show the col1/col2/col3 values based on their group. Notice how in the example 2 is a high level parent and has 4 as a child, and 4 is also a parent of 8. So corp_id 8 and 4 are part of the same branch and they therefore show the same col1/col2/col3 values, and those are, based on your requirements, the first not null value of each throughout the branch.
Fiddle: http://sqlfiddle.com/#!4/ef218/14/0
with sub as
(select corp_id,
parent_corp_id,
col1,
col2,
col3,
level as lvl,
rownum - level as grp
from tbl
connect by prior corp_id = parent_corp_id
start with parent_corp_id is null),
col1_lvl as
(select grp, col1
from sub s
where s.lvl = (select min(x.lvl)
from sub x
where x.col1 is not null
and x.grp = s.grp)),
col2_lvl as
(select grp, col2
from sub s
where s.lvl = (select min(x.lvl)
from sub x
where x.col2 is not null
and x.grp = s.grp)),
col3_lvl as
(select grp, col3
from sub s
where s.lvl = (select min(x.lvl)
from sub x
where x.col3 is not null
and x.grp = s.grp))
select s.corp_id, s.parent_corp_id, c1.col1, c2.col2, c3.col3
from sub s
left join col1_lvl c1
on s.grp = c1.grp
left join col2_lvl c2
on s.grp = c2.grp
left join col3_lvl c3
on s.grp = c3.grp
where s.parent_corp_id is not null
If this doesn't provide the output you're expecting based on the sample data I used please provide the expected output for the data I used in the DDL on the fiddle.

How to write query to return value regardless of existance?

Given this:
with data_row as (select 1 as col_1 from dual)
select 'Y' as row_exists from dual where exists
(select null
from data_row
where col_1 in (2,1))
How can I get this?
Col_1 Row_exists
--------------------
1 Y
2 N
In order to get a row of output, you need a row of input. You want to get the second row with a "2", but there is no table with that value.
The approach is to generate a table that has the values that you want, and then use left outer join to find which match:
with data_row as (
select 1 as col_1
from dual
),
what_i_care_about as (
select 1 as col from dual union all
select 2 from dual
)
select wica.col,
(case when dr.col_1 is NULL then 'N' else 'Y' end) as row_exists
from what_i_care_about wica left outer join
data_row dr
on wica.col = dr.col_1;
You cannot do directly what you want -- which is to create a row for each missing value in the in list. If you have a lot of values and they are consecutive numeric, then you can use connect by or a recursive CTE to generate the values.

SQL query grouped parameter maximum

Let's say I had two columns in a database, col1 and col2. Column 2 is the time, Column 1 something. In my query, I want to do the following:
I want to SELECT * from my table and group the results by col1. However, I only want those entries where for the grouped col1 there is no value of col2 higher than a certain value. Meaning that, I only want those col1-s for which col2 does not exceed a certain value.
If, for instance, I had three rows, as follows:
ROW1: col1 = val1, col2 = 3
ROW2: col1 = val1, col2 = 5
ROW3: col1 = val2, col2 = 3
ROW4: col1 = val2, col2 = 4
And I do not want the time for any of them to exceed 4, then, as a result, I would only want ROW3 or ROW4, which, does not matter, for col1 is the same and is grouped. But in rows 1 and 2, that are grouped by col1's value "val1", in one of them col2 DOES exceed 4, therefore, I do not want any of them.
SELECT col1 FROM table GROUP BY col1 HAVING MAX(col2) <= 4
Because you want only the common value (col1) from the group, you can use GROUP BY. When you do a GROUP BY (aggregate) query, you can use the HAVING clause to apply a filter to the aggregated data set.
I am not use I got the point (my english is not good).
I think sub-query is the best choice.
Note: this example should work with mySql ...
SELECT *
FROM table
WHERE col1 IN
(SELECT col1 FROM table WHERE col2 < 5 GROUP BY col1)
ORDER BY col1
CREATE TABLE x (
t TIME NOT NULL,
v INT NOT NULL );
INSERT INTO x VALUES
('13:14:00', 24),
('13:14:00', 27),
('13:14:00', 29),
('17:12:00', 14),
('17:12:00', 20),
('17:12:00', 24);
SELECT t, MAX(v) AS mv FROM x
GROUP BY t
HAVING mv <= 25;
Or do I misunderstand the question?