Record Flattening in SQL

Record Flattening in SQL - sql

I've been trying to find the answer to this without success. This may be because I don't know the right term to search for.
I've looked at some flattening suggestions using Pivot or Cross Apply but none of the examples I've looked at seem to do exactly what I want.
I basically need the most efficient way to flatten records in a table (because it's over a table with millions of records).
(Please excuse the formatting, I wasn't sure of the best way to make it readable).
Here's an example of what the original records look like :-
MainId--ID1--ID2--ID3
20241--0--2881--0
20241--0--2871--0
20241--0--2884--0
20241--1580--0--0
20241--1588--0--0
20241--0--0--1205
20241--0--0--1001
20241--0--0--1268
20241--0--0--1311
And here is what I need them to end up like :-
MainId--ID1--ID2--ID3
20241--1580--2881--1205
20241--1588--2871--1001
20241--0--2884--1268
20241--0--0--1311
So, I just need them to be the fewest number of records for each MainId.
It actually doesn't matter which ID's are in which record as there is no relation between them. They just need to be related to the correct MainId.
NB. This is a simplified example. The table in question can actually have up to 10 different ID columns but if I get it working for 3 ID's, I should be able to extend this out to more.
Thanks in advance.
Regards,
Jason

This is a little tricky. But you can do it using union all and aggregation. However, you need a column that specifies the ordering if you care about the actual order in the results:
select main_id,
max(id1), max(id2), max(id3)
from ((select t.*, row_number() over (partition by main_id order by ?) as seqnum
from t
where id1 <> 0
) union all
(select t.*, row_number() over (partition by main_id order by ?) as seqnum
from t
where id2 <> 0
) union all
(select t.*, row_number() over (partition by main_id order by ?) as seqnum
from t
where id3 <> 0
)
) t
group by main_id, seqnum;
This enumerates the valid values for each column and then combines them on a single row. The ? is for the ordering column. If you don't care about the ordering just use main_id.
Note: This turns the 0s into NULLs -- which seems useful to me. You can use coalesce() if you really want them as 0s.

In SQL, datasets / tables don't have an implicit order. There isn't a "first" record, unless you can enforce an order using an ORDER BY clause.
This means that there's currently no way to determine that 2881 should be in the same row as 1580 (2881 could just as easily be put in the same row as 1588.)
To resolve this, I'm going to assume you actually have an additional column:
MainId SubID ID1 ID2 ID3
20241 1 0 2881 0
20241 2 0 2871 0
20241 3 0 2884 0
20241 1 1580 0 0
20241 2 1588 0 0
20241 1 0 0 1205
20241 2 0 0 1001
20241 3 0 0 1268
20241 4 0 0 1311
Then it's a matter of aggregating the rows together...
SELECT
MainID,
SubID,
COALESCE(MAX(ID1), 0) AS ID1,
COALESCE(MAX(ID2), 0) AS ID2,
COALESCE(MAX(ID3), 0) AS ID3
FROM
yourTable
GROUP BY
MainID,
SubID

Related

Updating column according to index within group

In our databases we have a table called conditions which references a table called attributes.
So it looks like this (ignoring some other columns that aren't relevant to the question)
id
attribute_id
execution_index
1
1000
1
2
1000
2
3
1000
1
4
2000
1
5
2000
2
6
2000
2
In theory the combination of attribute_id and execution_index should always be unique, but in practice they're not, and the software ends up essentially using the id to decide which comes first between two conditions with the same execution index. We want to add a uniqueness constraint to the table, but before we do that we need to update the execution indexes. So essentially we want to group them by attribute_id, order them by execution_index then id, and give them new execution indexes so that it becomes
id
attribute_id
execution_index
1
1000
1
2
1000
3
3
1000
2
4
2000
1
5
2000
2
6
2000
3
I'm not sure how to do this without just ordering by attribute_id, execution_index, id and then iterating through incrementing the execution_index by 1 each time and resetting it to be 1 whenever the attribute_id changes. (That would work but it'd be slow and someone is going to have to run this script on several dozen databases so I'd rather it didn't take more than a couple of seconds per database.)
Really I'd like to do something along the lines of
UPDATE c
SET c.execution_index = [this needs to be the index within the group somehow]
FROM condities c
GROUP BY c.attribute_id
ORDER BY c.execution_index asc, c.id asc
But I don't know how to make that actually work.

It looks like you can use an updatable CTE:
with cte as (
select *,
Row_Number() over(partition by attribute_id order by execution_index, id) new
from conditions
)
update cte set execution_index = new
I would suggest adding a new column and first updating that and checking the results are as expected.
Example Fiddle

WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER
(
PARTITION BY attribute_id
ORDER BY execution_index, id
) AS RowNum
FROM condities
)
UPDATE cte
SET execution_index = RowNum

SQL Compare Rows With Duplicate IDs and Return One With Lowest Sequence Number

Reaching out for help. I've seen plenty of answers on how to use DUPLICATE, but not quite how I need it. Let's say I have the result of query that looks like the following.
query result
Incident_No Open_Approval_Step Approval_ID
------------- -------------------- -------------------
1 3 Tech
1 4 Cust_Serv
2 1 Incident_Recorder
2 2 Estimation
2 3 Tech
3 4 Cust_Serv
3 5 Mgmt
3 6 Closure
And I need one row for each incident number with the smallest numbered approval step. So the result should look like this.
filtered query result
Incident_No Open_Approval_Step Approval_ID
------------- -------------------- -------------------
1 3 Tech
2 1 Incident_Recorder
3 4 Cust_Serv
Edit This is what I came up with in the end
SELECT DISTINCT
MIN(OPEN_APPROVAL_STEP) OVER(PARTITION BY INCIDENT_NO ORDER BY OPEN_APPROVAL_STEP ASC) AS CUR_APP_STEP,
INCIDENT_NO
FROM T

You can use row_number():
select *
from (
select
t.*,
row_number() over(partition by incident_no order by open_approval_step) rn
from mytable t
) t
where rn = 1
With just one extra column appart from the incident number and approval step, another option is aggregation and Oracle's keep syntax:
select
incident_no,
min(open_approval_step) open_approval_step,
min(approval_id) keep(dense_rank first order by open_approval_step) approval_id
from mytable
group by incident_no

If you have just three columns, you can easily use aggregation:
select incident_no, min(open_approval_step),
min(approval_id) keep (dense_rank first order by open_approval_step)
from t
group by incident_no;

Using IF or Case with multiple in SQL Statement

I want to do something like this
this works
Select ID, number, cost from table order by number
number can be 2-xtimes but the cost and the same
1 A33 66.50
2 A34 73.50
3 A34 73.50
But I want to have
1 A33 66.50
2 A34 73.50
3 A34 0
I want to change it in the Sql to 0
I tried distinct or if then else.
I want to do something like this
declare #oldcost int;
Select ID, number,
if(cost=#oldcost) then
cost=0;
else
cost=cost;
end if
#oldcost=cost;
from table order by number
How can I do it in SQL?

You can use window functions and a case expression:
select ID, number,
(case when row_number() over (partition by number order by id) = 1
then cost else 0
end) as cost
from table
order by number, id;
Note that SQL generally does not take ordering into account, so results can be returned in any order -- and even with an order by, rows with the same keys can be in any order (and in different orders on different executions).
Hence, the order by includes id as well as number so you get the cost on the "first" row for each number.

SQL: How to exclude group from result set by one of the elements, not using subqueries

Input:
id group_id type_id
1 1 aaaaa
2 1 BAD
3 2 bbbbb
4 2 ccccc
5 3 ddddd
6 3 eeeee
7 3 aaaaa
I need to output group_ids which consist only of a members for which type_id <> 'BAD'. A whole group with at least one BAD member should be excluded
Use of subqueries (or CTE or NOT EXISTS or views or T-SQL inline functions) is not allowed!
Use of except is not allowed!
Use of cursors is not allowed.
Any solutions which trick the rules above are appreciated. Any RDBMS is ok.
Bad example solution producing correct results, (using except):
select distinct group_id
from input
except
select group_id
from input
where type_id = 'bad'
group by group_id, type_id
Output:
group_id
2
3

I would just use group by and having:
select group_id
from input
group by group_id
having min(type_id) = 'good' and max(type_id) = min(type_id);
This particular version assumes that type_id (as in the question) does not take on NULL values. It is easily modified to take that into account.
EDIT:
If you are looking for one bad, then just do:
select group_id
from input
where type_id = 'bad'
group by group_id;

Group by group_id and count occurrences of 'BAD':
select group_id
from mytable
group by group_id
having count(case when type_id = 'BAD' then 'count me' end) = 0;

Update rows in table

I have a table (Fruits) with following column
Fruit_Name(varchar2(10)) | IsDuplicate Number(1)
Mango 0
Orange 0
Mango 0
What i have to do is to update IsDuplicate column to 1 where Fruit_Name in Distinct i.e
Fruit_Name(varchar2(10)) | IsDuplicate Number(1)
Mango 1
Orange 1
Mango 0
How should I do this?

This should do it as far as I can tell
update fruits
set is_duplicate =
(
select case
when dupe_count > 1 and row_num = 1 then 1
else 0
end as is_dupe
from (
select f2.fruit_name,
count(*) over (partition by f2.fruit_name) as dupe_count,
row_number() over (partition by f2.fruit_name order by f2.fruit_name) as row_num,
rowid as row_id
from fruits f2
) ft
where ft.row_id = fruits.rowid
and ft.fruit_name = fruits.fruit_name
)
Edit
But instead of actually updating the table, why don't you create a view that returns the information. Depending on the size of the table it might be more efficient.
create view fruit_dupe_view
as
select fruit_name,
case
when dupe_count > 1 and row_num = 1 then 1
else 0
end as is_duplicate
from (
select fruit_name,
count(*) over (partition by fruit_name) as dupe_count,
row_number() over (partition by fruit_name order by fruit_name) as row_num
from fruits
) ft

Straight and simple -- you can't. Not with vanilla SQL. SQL is a set-based processing language, and you do things in sets. There is no way for SQL to know which one of your many Mango's should be tagged 1. You can probably tag one of them with 1 using windowing functions or ROWNUM etc. in a SELECT, but I don't think it can be done with an UPDATE.
In other words, your table lacks a unique key in the first place, so it is not something that SQL is designed to process.
However, you may try adding a sequential primary key to each row. Then you can easily write an UPDATE query to set to 1 all the rows with COUNT > 1 and key = MIN(key).
In other words, you really have to look at your database design. Relational databases are not supposed to contain "duplicates". That fact that you need to mark something as a duplicate means that your tables are designed wrong in the first place. The database should not even allow duplications to enter into its data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Record Flattening in SQL - sql

Related

Updating column according to index within group

SQL Compare Rows With Duplicate IDs and Return One With Lowest Sequence Number

Using IF or Case with multiple in SQL Statement

SQL: How to exclude group from result set by one of the elements, not using subqueries

Update rows in table

Categories

Resources