Find matching first 7 chars to identify duplicates

Find matching first 7 chars to identify duplicates - sql

I'm trying to identify duplicate state_num that are failing validation. The R is causing issues with validation, but I want to just search the first 7 characters and find the duplicate values, so that it returns the row that has an R in the string and the row that doesn't. The column is a type: char(15) But when trying to run a query it is not finding the matching 7 characters. My table only showing how it should look, its not showing what is actually being returned. It basically is just finding the state and only finding non R state_num in results. It should be returning around 480 rows but is returning like 20k rows and not just showing the duplicates
I've tried querying a bunch of different ways but i've spen the last hour only being able to return the R row if i ad AND state_num[8] = 'R' to the end of the query. Which defeats what I'm trying to find the duplicate first 7 characters. This is an informix db.
My Query:
SELECT id_ref, cont_ref, formatted, state_num, type, state
FROM state_form sf1
WHERE EXISTS (select cont_ref, san
FROM state_form sf2
WHERE sf1.cont_ref = sf2.cont_ref and left(sf1.state_num,7) = LEFT(sf2.state_num,7)
GROUP BY cont_ref, state_num
HAVING COUNT(state_num) > 1)
AND state = 'MT';
This is what I'd like my results to return:
id_ref
cont_ref
formatted
state_num
type
state
658311
5237
71-75011R
7175011R
Y
MT
1459
5237
71-75011
7175011
I
MT
7501
555678
99-67894
9967894
I
MT
345443
555678
99-67894R
9967894R
Y
MT

Here are a couple options producing the same results. This may need to be changed if you need to identify the 8th character as something such as a Letter. That is, this will also catch 12345678 and 1234567.
create table my_data (
id_ref integer,
cont_ref integer,
state_num varchar(20),
type varchar(5),
state varchar(5)
);
insert into my_data values
(1, 5237, '7175011R', 'Y', 'MT'),
(2, 5237, '7175011', 'I', 'MT'),
(3, 6789, '7878787', 'Y', 'CA'),
(4, 6789, '7878787R', 'I', 'CA'),
(5, 555678, '9967894', 'I', 'MT'),
(6, 555678, '9967894R', 'Y', 'MT'),
(7, 98765, '123456', 'I', 'MT');
Query #1
with dupes as (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
)
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;
Query #2
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
where m.cont_ref in (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
);
id_ref
cont_ref
state_num
type
state
1
5237
7175011R
Y
MT
2
5237
7175011
I
MT
5
555678
9967894
I
MT
6
555678
9967894R
Y
MT
View on DB Fiddle
UPDATE
If Informix does not want to group by left(column, 7), then you could get the target cont_ref values using this. Here's the CTE method, but you could also do with sub-query.
with dupes as (
select cont_ref
from (
select cont_ref, left(state_num, 7) as left_seven
from my_data
where state = 'MT'
)z
group by cont_ref
having count(*) > 1
)
select m.*
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;

Related

How to select just the rows in a table that fit a criteria , avoiding duplicates in SQL

So I have a df like follows:
USER Value object
0001 V V
0002 A NULL
0002 C C
0003 A NULL
0004 A NULL
0004 A NULL
0003 V V
So I basically want USER to be the unique id for each row of this DF. If there is an A in the Value column, I only want it if that's the only option for the ID. So there are two 002's, I only want to see the instance where it is not A , so C.
Because 0004 doesn't have a non-A Value, I'll take the A.
Final result:
USER Value
0001 V
0002 C
0003 V
0004 A

I think you are looking for the following:
select user,
'A' as value
from tbl
group by user
having sum(case when value = 'A' then 1 else 0 end) > 0
and sum(case when value <> 'A' then 1 else 0 end) = 0
union all
select user,
value
from tbl
where value <> 'A'
order by user;
See Fiddle:
http://www.sqlfiddle.com/#!9/b28f4c/2/0
The desired result is achieved with your example data. However, your example data does not contain any users having more than one non-A value row. The above query will keep all of them. If you only want to keep one or some, explain how to pick which you want.

This will return the one Value per tuple, returning A at last resort (if A is the smallest of the potential values):
select USER, max(Value) as value from Table
group by User
or, this might return multiple users if they have several tuples with different object (when not null)
select distinct user, coalesce(object, value)
from table ;

Here's a solution if you don't like typing :-)
select
distinct USR
,VAL
from
TBL
qualify
max(ascii(VAL)) over (partition by USR ) = ascii(VAL)
Copy|Paste|Run in snowflake:
CREATE or replace TABLE tbl( USR varchar(4), VAL varchar(1), OBJ varchar(4));
INSERT INTO tbl (USR,VAL,OBJ)
VALUES
('0001', 'V', 'V'),
('0002', 'A', NULL),
('0002', 'C', 'C'),
('0003', 'A', NULL),
('0004', 'A', NULL),
('0004', 'A', NULL),
('0003', 'V', 'V');
select
distinct USR
,VAL
from
TBL
qualify
max(ascii(VAL)) over (partition by USR ) = ascii(VAL);

You can try the following if you are using SQL-Server
select distinct USER
,Value
from
(
select *,rank() over (partition by USER order by Value desc) as ranking
from your_table_name
) as t
where ranking =1

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019

You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo

Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);

My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

SQL Server loop through a table for every 5 rows

I need to write a stored procedure or table function to return a new data table as a new data source.
I wish to loop through the original table for every 5 rows base on the invoice ID column (it's possible not start from 1), the first 5 rows add to the left of the new table and the second 5 rows add to the right of the new table, the third 5 rows to the left and so on.
For example, Here is the original table:
Here is the expect table:
Thanks in advance!

declare #rowCount int = 5;
with cte as (
select *,( (IN_InvoiceID-1) / #rowCount ) % 2 group1
,( (IN_InvoiceID-1) / #rowCount ) group2
,IN_InvoiceID % #rowCount group3
from T
)
select * from cte
select T1.INID,T1.IN_InvoiceID,T1.IN_InvoiceAmount,T2.INID,T2.IN_InvoiceID,T2.IN_InvoiceAmount
from CTE T1
left join CTE T2 on T2.group1 = 1 and T1.group2 = T2.group2-1 and T1.group3 = T2.group3
where T1.group1 = 0
Test DDL
CREATE TABLE T
([INID] varchar(38), [IN_InvoiceID] int, [IN_InvoiceAmount] int)
;
INSERT INTO T
([INID], [IN_InvoiceID], [IN_InvoiceAmount])
VALUES
('DB3E17E6-35C5-41:121-93B1-F809BF6B2972', 1, 2999),
('3212F048-8213-4FCC-AB64-121485B77D4E43', 2, 3737),
('E3526373-A204-40F5-801C-7F8302A4E5E2', 3, 3175),
('76CC9C19-BF79-4E8A-8034-A33805AD3390', 4, 391),
('EC7A2FBC-B62D-4865-88DE-A8097975F125', 5, 1206),
('52AD3046-21331-4F0A-BD1D-67F232C54244', 6, 402),
('CA48F132-A9F5-4516-9E58-CDEE6644AAD1', 7, 1996),
('02E10C31-CAB2-4220-B66A-CEE5E67A9378', 8, 3906),
('98F1EEFF-B07A-4B65-87F4-E165264284DD', 9, 2575),
('91EBDD8B-B73C-470C-8900-DD66078483DB', 10, 2965),
('6E2490E5-C4DE-4833-877F-1590F7BDC1B8', 11, 1603),
('00985921-AC3C-4E3E-BAE1-7F58302F831A', 12, 1302)
;
Result:

Could you please check article Display Data in Multiple Columns using SQL showing with example case how a database developer can show the list of data rows in a columnar mode using Row_Number() function and mode arithmetic expression
You need to add additional columns from the same row that is different in the sample

Seems as if you want to split the table into 2 tables with alternating 5 rows. An easy way to do this would be:
Take data into a temp table having an extra column (lets say
grouping_id)
Update the grouping id so that each 5 rows have the same id. You can
use in_invoiceId % 5 (the nod function). After this step the first 5
rows will have grouping_id 0, next 5 will have 1, next will have 2
(assuming your invoice id is incremented +1 for all rows).
You can just do a normal select with where clause for odd and even grouping_id

Ideally, you can manage with the 2 tables Master and detail table.
But due to my curiosity, I am able to solve and give the answer as
Declare #table table(id int identity, invoice_id int)
; WITH Numbers AS
(
SELECT n = 1
UNION ALL
SELECT n + 1
FROM Numbers
WHERE n+1 <= 50
)
insert into #table SELECT n
FROM Numbers
Select (a.id )%5 ,* from #table a join #table b on a.id+5 = b.id and a.id != b.id
;WITH Numbers AS
(
SELECT n = 1, o = 5
UNION ALL
SELECT n + 10, o = o+10
FROM Numbers
WHERE n+1 <= 50
)
select a.id ParentId,a.invoice_id ParentInvoiceId, --b.n, b.o,
c.invoice_id childInvoiceID from #table a
join Numbers b on a.id between b.n and b.o
left join #table c on a.id + 5 = c.id

Here is my solution
First i create grps based on whether the in_invoiceid is divisible by 5 or not.(Ignore the remainders)
After that i create a category to indicate between alternative groups(ie by checking if the remainder is 0 or otherise)
Then its a matter of dense_ranking the records on the basis of the category field ordered by in_invoiceid
Lastly a join with category=1 rows with same dense_rank as those records in category=0
create table Invoicetable(IN_ID varchar(100), IN_InvoiceID int)
INSERT INTO Invoicetable (IN_ID, IN_InvoiceID)
VALUES
('2345-BCDE-6645-1DDF', 1),
('2345-BCDE-6645-3DDF', 2),
('2345-BCDE-6645-4DDF', 3),
('2345-BCDE-6645-5DDF', 4),
('2345-BCDE-6645-6DDF', 5),
('2345-BCDE-6645-7DDF', 6),
('2345-BCDE-6645-aDDF', 7),
('2345-BCDE-6645-sDDF', 8),
('2345-BCDE-6645-dDDF', 9),
('2345-BCDE-6645-dDDF', 10),
('2345-BCDE-6645-dDDF', 11),
('2345-BCDE-6645-dDDF', 12);
with data
as (
select *
,(in_invoiceid-1)/5 as grp
,case when ((in_invoiceid-1)/5)%2=0 then '1' else '0' end as category
,dense_rank() over(partition by case when ((in_invoiceid-1)/5)%2=0 then '1' else '0' end
order by in_invoiceid) as rnk
from invoicetable a
)
select *
from data a
left join data b
on a.rnk=b.rnk
and b.category=0
where a.category=1
Here is db fiddle link.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=287f101737c580ca271940764b2536ae

You may try with the following approach. Dividing the table is done with (((ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) - 1) / 5) % 2 = 0) which groups records in left and right groups.
CREATE TABLE #InvoiceTable(
IN_ID varchar(24),
IN_InvoiceID int
)
INSERT INTO #InvoiceTable (IN_ID, IN_InvoiceID)
VALUES
('2345-BCDE-6645-1DDF', 1),
('2345-BCDE-6645-3DDF', 2),
('2345-BCDE-6645-4DDF', 3),
('2345-BCDE-6645-5DDF', 4),
('2345-BCDE-6645-6DDF', 5),
('2345-BCDE-6645-7DDF', 6),
('2345-BCDE-6645-aDDF', 7),
('2345-BCDE-6645-sDDF', 8),
('2345-BCDE-6645-dDDF', 9),
('2345-BCDE-6645-dDDF', 10),
('2345-BCDE-6645-dDDF', 11),
('2345-BCDE-6645-dDDF', 12);
WITH cte AS (
SELECT
IN_ID,
IN_InvoiceID,
CASE
WHEN (((ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) - 1) / 5) % 2 = 0) THEN 'L'
ELSE 'R'
END AS IN_Position
FROM #InvoiceTable
),
cteL AS (
SELECT IN_ID, IN_InvoiceID, ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) AS IN_RowNumber
FROM cte
WHERE IN_Position = 'L'
),
cteR AS (
SELECT IN_ID, IN_InvoiceID, ROW_NUMBER() OVER (ORDER BY IN_InvoiceID) AS IN_RowNumber
FROM cte
WHERE IN_Position = 'R'
)
SELECT cteL.IN_ID, cteL.IN_InvoiceID, cteR.IN_ID, cteR.IN_InvoiceID
FROM cteL
LEFT JOIN cteR ON (cteL.IN_RowNumber = cteR.IN_RowNumber)
Output:
IN_ID IN_InvoiceID IN_ID IN_InvoiceID
2345-BCDE-6645-1DDF 1 2345-BCDE-6645-7DDF 6
2345-BCDE-6645-3DDF 2 2345-BCDE-6645-aDDF 7
2345-BCDE-6645-4DDF 3 2345-BCDE-6645-sDDF 8
2345-BCDE-6645-5DDF 4 2345-BCDE-6645-dDDF 9
2345-BCDE-6645-6DDF 5 2345-BCDE-6645-dDDF 10
2345-BCDE-6645-dDDF 11 NULL NULL
2345-BCDE-6645-dDDF 12 NULL NULL

Combining rows to create two columns of data

I'm a bit confused on how to do this query properly. I have a table that looks like this. Where district 0 represent a value that should be applied to all district (global).
[ district ] [ code ] [ value ]
1 A 11
1 C 12
2 A 13
2 B 14
0 B 15
I have built a query (below) to combine the "global value" on each district.
[ district ] [ code ] [ district value ] [ global value ]
1 A 11 null -> row 1
1 B null 15 -> row 2
1 C 12 null -> row 3
2 A 13 null -> row 4
2 B 14 15 -> row 5
2 C null null -> row 6 (optional)
I did it by joining on the list of all possible district/code.
select all_code.district, all_code.code, table_d.value, table_g.value
from (select distinct b.district, a.code
from temp_table a
inner join (select distinct district
from temp_table
where district <> 0) b
on 1 = 1) all_code
left join temp_table table_d
on table_d.code = all_code.code
and table_d.district = all_code.district
left join temp_table table_g
on table_g.code = all_code.code
and table_g.district = 0
This query works great but seems pretty ugly. Is there a better way of doing this? (note that I don't care if row #6 is there or not).
Here's a script if needed.
create table temp_table
(
district VARCHAR2(5) not null,
code VARCHAR2(5) not null,
value VARCHAR2(5) not null
);
insert into temp_table (district, code, value)
values ('1', 'A', '11');
insert into temp_table (district, code, value)
values ('1', 'C', '12');
insert into temp_table (district, code, value)
values ('2', 'A', '13');
insert into temp_table (district, code, value)
values ('2', 'B', '14');
insert into temp_table (district, code, value)
values ('0', 'B', '15');

Here is one of the options. Since you are on 10g you can make use of partition outer join(partition by() clause) to fill the gaps:
with DCodes(code) as(
select 'A' from dual union all
select 'B' from dual union all
select 'C' from dual
),
DGlobal(code, value1) as(
select code
, value
from temp_table
where district = 0
)
select tt.district
, dc.code
, tt.value
, dg.value1 as global_value
from temp_table tt
partition by(tt.district)
right join DCodes dc
on (dc.code = tt.code)
left join DGlobal dg
on (dg.code = dc.code)
where tt.district != 0
order by 1, 2
Result:
DISTRICT CODE VALUE GLOBAL_VALUE
-------- ---- ----- ------------
1 A 11
1 B 15
1 C 12
2 A 13
2 B 14 15
2 C

I would argue that a lot of the "ugliness" comes from a lack of lookup tables for district and code. Without an authoritative source for those, you have to fabricate one from the values that are in use (hence the sub-queries with distinct).
In terms of cleaning up the query you have, the best I can come up with is to remove an unnecessary sub-query and use the proper syntax for the cross join:
SELECT a.district,
b.code,
c.value1,
d.value1
FROM (SELECT DISTINCT district FROM temp_table WHERE district <> 0) a
CROSS JOIN (SELECT DISTINCT code FROM temp_table) b
LEFT JOIN temp_table c
ON b.code = c.code AND a.district = c.district
LEFT JOIN temp_table d
ON b.code = d.code AND d.district = 0
ORDER BY district, code

Joining a list of values with table rows in SQL

Suppose I have a list of values, such as 1, 2, 3, 4, 5 and a table where some of those values exist in some column. Here is an example:
id name
1 Alice
3 Cindy
5 Elmore
6 Felix
I want to create a SELECT statement that will include all of the values from my list as well as the information from those rows that match the values, i.e., perform a LEFT OUTER JOIN between my list and the table, so the result would be like follows:
id name
1 Alice
2 (null)
3 Cindy
4 (null)
5 Elmore
How do I do that without creating a temp table or using multiple UNION operators?

If in Microsoft SQL Server 2008 or later, then you can use Table Value Constructor
Select v.valueId, m.name
From (values (1), (2), (3), (4), (5)) v(valueId)
left Join otherTable m
on m.id = v.valueId
Postgres also has this construction VALUES Lists:
SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter)
Also note the possible Common Table Expression syntax which can be handy to make joins:
WITH my_values(num, str) AS (
VALUES (1, 'one'), (2, 'two'), (3, 'three')
)
SELECT num, txt FROM my_values
With Oracle it's possible, though heavier From ASK TOM:
with id_list as (
select 10 id from dual union all
select 20 id from dual union all
select 25 id from dual union all
select 70 id from dual union all
select 90 id from dual
)
select * from id_list;

the following solution for oracle is adopted from this source. the basic idea is to exploit oracle's hierarchical queries. you have to specify a maximum length of the list (100 in the sample query below).
select d.lstid
, t.name
from (
select substr(
csv
, instr(csv,',',1,lev) + 1
, instr(csv,',',1,lev+1 )-instr(csv,',',1,lev)-1
) lstid
from (select ','||'1,2,3,4,5'||',' csv from dual)
, (select level lev from dual connect by level <= 100)
where lev <= length(csv)-length(replace(csv,','))-1
) d
left join test t on ( d.lstid = t.id )
;
check out this sql fiddle to see it work.

Bit late on this, but for Oracle you could do something like this to get a table of values:
SELECT rownum + 5 /*start*/ - 1 as myval
FROM dual
CONNECT BY LEVEL <= 100 /*end*/ - 5 /*start*/ + 1
... And then join that to your table:
SELECT *
FROM
(SELECT rownum + 1 /*start*/ - 1 myval
FROM dual
CONNECT BY LEVEL <= 5 /*end*/ - 1 /*start*/ + 1) mypseudotable
left outer join myothertable
on mypseudotable.myval = myothertable.correspondingval

Assuming myTable is the name of your table, following code should work.
;with x as
(
select top (select max(id) from [myTable]) number from [master]..spt_values
),
y as
(select row_number() over (order by x.number) as id
from x)
select y.id, t.name
from y left join myTable as t
on y.id = t.id;
Caution: This is SQL Server implementation.
fiddle

For getting sequential numbers as required for part of output (This method eliminates values to type for n numbers):
declare #site as int
set #site = 1
while #site<=200
begin
insert into ##table
values (#site)
set #site=#site+1
end
Final output[post above step]:
select * from ##table
select v.id,m.name from ##table as v
left outer join [source_table] m
on m.id=v.id

Suppose your table that has values 1,2,3,4,5 is named list_of_values, and suppose the table that contain some values but has the name column as some_values, you can do:
SELECT B.id,A.name
FROM [list_of_values] AS B
LEFT JOIN [some_values] AS A
ON B.ID = A.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find matching first 7 chars to identify duplicates - sql

Related

How to select just the rows in a table that fit a criteria , avoiding duplicates in SQL

Group by absorb NULL unless it's the only value

SQL Server loop through a table for every 5 rows

Combining rows to create two columns of data

Joining a list of values with table rows in SQL

Categories

Resources