Derive groups of records that match over multiple columns, but where some column values might be NULL - sql

I would like an efficient means of deriving groups of matching records across multiple fields. Let's say I have the following table:
CREATE TABLE cust
(
id INT NOT NULL,
class VARCHAR(1) NULL,
cust_type VARCHAR(1) NULL,
terms VARCHAR(1) NULL
);
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL);
What I am looking to get is the set of IDs for which matching values unify a set of records over the three fields (class, cust_type and terms), so that I can apply a unique ID to the group.
In the example, records 1-4 constitute one match group over the three fields, while records 5-6 form a separate match.
The following does the job:
SELECT
DISTINCT
a.id,
DENSE_RANK() OVER (ORDER BY max(b.class),max(b.cust_type),max(b.terms)) AS match_group
FROM cust AS a
INNER JOIN
cust AS b
ON
a.class = b.class
OR a.cust_type = b.cust_type
OR a.terms = b.terms
GROUP BY a.id
ORDER BY a.id
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 2
6 2
**But, is there a better way?** Running this query on a table of over a million rows is painful...
As Graham pointed out in the comments, the above query doesn't satisfy the requirements if another record is added that would group all the records together.
The following values should be grouped together in one group:
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL),
(7,'D','B','C');
Would yield:
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 1
6 1
...because the class value of D groups records 5, 6 and 7. The terms value of C matches records 1, 2 and 4 to that group, and cust_type value B ( or class value A) pulls in record 3.
Hopefully that all makes sense.

I don't think you can do this with a (recursive) Select.
I did something similar (trying to identify unique households) using a temporary table & repeated updates using following logic:
For each class|cust_type|terms get the minimum id and update that temp table:
update temp
from
(
SELECT
class, -- similar for cust_type & terms
min(id) as min_id
from temp
group by class
) x
set id = min_id
where temp.class = x.class
and temp.id <> x.min_id
;
Repeat all three updates until none of them updates a row.

Related

Update a column in table which has a temp id with real id from the same column

I have come across a unique situation where I have a column called id which may have temp id until the final id comes through like:
id
temp id
1
null
2
1
6
null
7
6
I want a query that updates the table as :
id
temp id
2
null
2
1
7
null
7
6
basically once the id has a temp id associated with id, we just update all those temp ids with the real_id.
Any idea if this can be achieved. I try using case statements inside the updated table set but this doesn't work for me and also there are thousands of such records.
No issues with the temp id being redundant later because that id cannot repeat itself and thus it will not be a concern for analysis as we will use id only for analysis
You can use an update:
update t
set id = (select t2.id from t t2 where t2.tempid = t.id)
where t.tempid is null;

Update multiple rows based on unique values in another column in same table

I have a table with two columns. The table columns are name, and companyID, and they are in the [dbo].[Suppliers] table.
I need to update the CompanyID values ONLY for Unique Names.
UPDATE [dbo].[Suppliers]
SET CompanyId = 46
WHERE Name IN
(
SELECT DISTINCT Name
FROM [dbo].[Suppliers]
);
i.e.
Trying to get this
Name CompanyID
A 5
B 5
C 5
A 5
To look like:
Name CompanyID
A 6
B 6
C 6
A 5
Unfortunately, my query above is not doing the trick.
Appreciate any and all help. Thanks.
You can use a Common Table Expression to add a row number to each name, then update that CTE but specify only the first row for each name...
WITH
uniquely_identified AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY companyID) AS name_row_id,
*
FROM
[dbo].[Suppliers]
)
UPDATE
uniquely_identified
SET
CompanyId = 46
WHERE
name_row_id = 1
;
Example: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=4b5eba30b3bed71216ec678e9cffa6b9

SQL Query to get specific child records

I have a requirement to get child table records based on parent table search criteria but they need to be distinct and output should be like below:
Table A, have three rows. Row one is for generic rules, Row 2 is for specific Category and Row 3 is for Specific Branch, Category and Sub-Category.
Now, my output should consists of the rules which are specific to generic.
Below are the rules for the output:
Input to the query will be Branch, Category and Sub-Category
Each record-set in Table-A is comprised of 03 rows
Row 1 has Branch but Category and Sub-Category as Null
Row 2 has Branch and Category Sub-Category as Null
Row 3 has Branch, Category and Sub-Category.
Each Row in a record-set of Table-A has child records in Table-B
Record with Branch only (Row 1), have generic records and these records can also be child records of Row 2 and Row 3
Record with Branch and Category Sub-Category as Null (Row 2) has child records in Table-B and they are overriding child records of Row 1
Record with Branch, Category and Sub-Category (Row 3) has child records in Table-B and they are overriding child records of Row 1 and Row 2.
All child records of Row 1,2 & 3 will be part of the output but if a child is present in Row 3 then despite if it is present in other Rows output will consists of child record of Row 3
If a child record is present in Row 1 & 2 but not in 3 then output
will have child record of Row 2
if a child record is present in Row 1 but not in Row 2 & 3 then it
will be part of output.
Now,
In the sample output, 'Pay' is present in Row 1,2 and 3 but in the
output we are considering child record of Row 3 as it overrides both Record 1 & 2
'Discount' is present in Record 1 & 3 but output includes child of Row 3
'Items' is not part of Row 1 and Row 2 childs but as it is present in Row 3 so it will be part of output
'Paris' is only part of Row 2 but as it is not overriden by Row 2 so
it is part of output as it is
I have tried following query but it is not giving the required output:
SELECT DISTINCT RULE,
value
FROM siebel.b rxm
WHERE par_row_id IN (SELECT row_id
FROM siebel.a
WHERE ( branch = 'Civil'
AND category = 'C.M.> (Civil)'
AND sub_category IS NULL )
OR ( branch = 'Civil'
AND category = 'C.M. (Civil)'
AND sub_category = 'Pauper' )
OR ( branch = 'Civil'
AND category IS NULL
AND sub_category IS NULL ))
I am using Oracle as RDBMS.
Schema statements:
Create Table A (ROW_ID int, BRANCH varchar(50), CATEGORY varchar(50), SUB_CATEGORY varchar(50))
Create Table B (PAR_ROW_ID int, RULE varchar(50), Value varchar(50))
INSERT INTO A (ROW_ID, BRANCH)
VALUES (1,'Civil')
INSERT INTO A (ROW_ID, BRANCH, CATEGORY)
VALUES (2,'Civil','C.M. (Civil)')
INSERT INTO A (ROW_ID, BRANCH, CATEGORY, SUB_CATEGORY)
VALUES (3,'Civil','C.M. (Civil)','Pauper')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (1,'Pay','10')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (1','Days','25')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (1,'Discount','20')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (2,'Pairs','5')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (2,'Pay','30')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (3,'Pay','15')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (3,'Discount','20')
INSERT INTO B (PAR_ROW_ID, RULE, VALUE)
VALUES (3,'items','30')
SELECT MAX( par_row_id ) AS par_row_id,
rule,
MAX( value ) KEEP ( DENSE_RANK LAST OVER ORDER BY par_row_id ) AS value
FROM table_b
GROUP BY rule
Or:
SELECT par_row_id,
rule,
value
FROM (
SELECT b.*,
ROW_NUMBER() OVER ( PARTITION BY rule ORDER BY par_row_id DESC ) AS rn
FROM table_b b
)
WHERE rn = 1;

How to delete only recode from table?

I have one table in my database
Id Name
-------------------------
1 1 a
2 1 a
3 1 a
4 2 b
5 2 b
6 2 b
This my database table it's has 6 rows and 2 columns Id and Name
In this table field Id is not a primary key and i want to delete 2 number row from my by id field table
After Delete 2 row of table i want output like this
Id Name
-------------------------
1 1 a
3 1 a
4 2 b
5 2 b
6 2 b
Is it possible?
Your ID should be unique but here is the sql to delete all IDs that are 2.
Delete FROM table WHERE table.Id=2;
Replace 'table' with your table name.
Edit:
It appears like you want to delete the second result. I don't know why but here is the sql:
with rn AS
(
SELECT *, rn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM table
)
DELETE
FROM rn
WHERE rn = 2
There must be some criteria in a table by which you could identify its rows. That is the primary key. How do know that the order of the rows stays the same? Your table is not even sortable, I mean you can't be sure that the same SELECT statement returns rows in the same order.
That's why I'd answer that you CAN'T delete only and exactly record number two, cause you have no order in your table. And one SELECT would result in different rows on the 2nd position.
If the Id field must have such values, probably you could add a surrogate primary key.

return IDs that have ALL of a list of correpsonding values

I have table with 2 columns ....
id id2
1 1
1 2
1 3
2 1
2 2
2 4
3 2
3 3
3 4
I want to return the ids which have for example id2 in (1, 2, 4) but that has all of the values in the list.
In this above case it would return id = 2. Is this possible?
select id
from MyTable
where id2 in (1, 2, 4)
group by id
having count(distinct id2) = 3 --this must match the number of elements in IN clause
Update:
If the list of IDs is variable, then you should create an additional table that contains the varying sets of IDs, which you can then JOIN against to do your filtering.
Are you alluding to relational division? e.g. the supplier who supplies all products, the pilot that can fly all the planes in the hanger, etc?
If so, this article has many example implementations in SQL.
Do a self-join to test different rows on the same table in one go:
SELECT id
FROM t AS t0
JOIN t AS t1 ON t1.id=t0.id
JOIN t AS t2 ON t2.id=t1.id
WHERE t0.id2=1
AND t1.id2=2
AND t2.id2=4