How to check for duplicated addresses in tables KNA1 and KNB1? - abap

I need to check if there are duplicated records in the address tables.
The customers are stored in the following tables:
KNA1 Master data with address, global view
KNB1 Master data for company code
We were filling into the field KNB1-ALTKN the previous number.
Now the idea is to find all duplicated records with this number, but there are only duplicated records if the KUNNR is different. Because the same customer can be in several company codes (BUKRS), there are for sure more than one entry with the same ALTKN.
So the condition here is: different KUNNR with the same ALTKN means duplicated record.
Can someone please help me do this?

I know this is an old question, but there's a vastly easier solution for this, that can be expressed with a single pure SQL query... assuming it works ( seeing as some SQL features fail in ABAP Open SQL)...
Anyways, to get your filter criteria, all you need to do is group by the keys you use for finding duplicates (in this case ALTKN, BUKRS) and only take groups that have count ( * ) (record count in a group) higher than 1.
Next use your filter criteria to get the actual results you need.
Here's a full query that solves the problem in this particular case:
Select kunnr from kna1 as outer_kna1
where exists (
"what you select here doesn't matter
Select 1 from kna1 inner join knb1 on kna1~kunnr = knb1~kunnr
where kna1~kunnr = outer_kna1~kunnr
group by ALTKN, BUKRS "Keys, by which duplicates are searched for
having count( * ) > 1 "Condition to only select duplicates (in this case, record count in a group is more than 1)
)
into table #lt_duplicate_account_data.
lt_kunnrs_of_duplicate_accounts will contain customer kunnrs, but if you also need ALTKN and BUKRS too, just add a join in the primary query and add corresponding conditions in the sub-query.
Select kunnr, altkn, bukrs from kna1 as outer_kna1
inner join outer_knb1 on outer_kna1~kunnr = outer_knb1~kunnr "added join
where exists (
"what you select here doesn't matter
Select kna1~kunnr from kna1 inner join knb1 on kna1~kunnr = knb1~kunnr
where kna1~kunnr = outer_kna1~kunnr
and altkn = outer_knb1~altkn and bukrs = outer_knb1~bukrs "added conditions
group by ALTKN, BUKRS "Keys, by which duplicates are searched for
having count( * ) > 1 "Condition to only select duplicates (in this case, record count in a group is more than 1)
)
into table #lt_duplicate_account_data.

I suppose, in this case You should create an selection Screen, which can take selection options for BUKRS as selection parameter.
And You need a type like types begion of tty_mytype .
kunnr type kunnr,
kunnr2 type kunnr,
altkn type altkn.
end of tty_mytype.
Then You can use a full table scan on KNA1 into hashed table type hashed table of tty_my type.
Join over all tables into the hashed table using a entire table scan, just select KUNNR from KNA1 and ALTKN from KNB1. Thats it.
Use a outer join.
And then You will have to sort the results, best thing, by ALTKN and KUNNR, i think.
And then You need another buffer table, same type.
You must loop over the first hashed table and "collect" all those fitting kunnrs, which have the same ALTKN into the buffer table.
The second buffer table whith the second KUNNR field can hold FIRST KUNNR with ALTKN and the second kunnr field can hold the actual one, which is looped over, where current KUNNR ne "former kunnr" BUT ALTKN eq "former ALTKN".
During loop You can always compare actual KUNNR/ALTKN agains last one.
Do not forget to check for first loop and last one.
Was this helpful ?

Finding all duplicate records would be accomplished by this code. It looks up all ALTKN for a company code and checks for duplicates:
DATA: BEGIN OF duplicate,
kunnr TYPE knb1-kunnr,
bukrs TYPE knb1-bukrs,
altkn TYPE knb1-altkn,
END OF duplicate.
DATA: duplicates LIKE TABLE OF duplicate.
DATA: BEGIN OF altkn_rec,
altkn TYPE knb1-altkn,
kunnr TYPE knb1-kunnr,
bukrs TYPE knb1-bukrs,
END OF altkn_rec.
DATA: altkn_recs LIKE TABLE OF altkn_rec.
DATA: g_bukrs TYPE bukrs,
previous_record_line TYPE i,
o_alv TYPE REF TO cl_salv_table.
SELECT bukrs FROM t001 INTO g_bukrs.
"Get all reference numbers for company code
SELECT altkn kunnr bukrs
FROM knb1
INTO TABLE altkn_recs
WHERE bukrs = g_bukrs
ORDER BY altkn.
"loop over all customer reference numbers, look for duplicates
LOOP AT altkn_recs INTO altkn_rec.
AT NEW altkn.
"Check if more than one line has been read since last record.
IF ( sy-tabix - previous_record_line ) > 1.
"Duplicate Found
READ TABLE altkn_recs INDEX previous_record_line
INTO altkn_rec.
MOVE-CORRESPONDING altkn_rec TO duplicate.
APPEND duplicate TO duplicates.
ENDIF.
previous_record_line = sy-tabix.
ENDAT.
ENDLOOP.
ENDSELECT.
IF duplicates IS NOT INITIAL.
cl_salv_table=>factory( IMPORTING r_salv_table = o_alv
CHANGING t_table = duplicates ).
o_alv->display( ).
ELSE.
WRITE 'No Duplicates Found'.
ENDIF.

Related

get Value that does not exist in another table and vice versa

I have two table named lu_timepoint which holds default timepoints and another operational table called tbl_data.
The tbl_data contains details about a candidate and a timepoint when he has to come for lab test. The timepoint will range from -30 mins to 24 hrs
The lu_timepoint table is the lookup table for the default timepoints.
I need to write a query that will check whether the timepoint in tbl_data exist in the lu_timepoint table and if its not there i need have the value as false in a column called checked.
Likewise if the timepoint in the lu_timepoint table does not exist in the tbl_data table i need have the value as false in the column checked. else true in a checked column.
I tried with Left Join however i'm getting more rows count due to incorrect join statement.
below is the code i used to get all the candidate id whose timepoint is not equal to the other table
select distinct PT, PCTPT
from tbl_data s
left join lu_Timepoint t
on s.STUDY = t.Study
where s.PCTPT = t.Timepoint
Data is attached in the below link...
Table Data
If you want to get the records that doesn't exists in the joined tabled and vice versa, you can use FULL OUTER JOIN that display the distinct values from each table.
Specifying the database you are using and providing the tables structures and some of your data will help to build the final query.
I have found out the solution for this. I did a left join with the lu_timepoint table and tbl_data and got the values that does not exist in both the tables.
Below is the query i used.
select Candidate, CPEVENT, Test_Number, DosedTime, DoseTime, ExpectedTime, s.Timepoint as tmpt, t.Timepoint as tmpt1, CASE WHEN t.Timepoint IS NULL THEN 'Not Collected' WHEN s.timepoint IS NULL THEN 'Not Collected' ELSE 'Collected' END as Timepoint_Collection, case when t.timepoint is null THEN s.timepoint WHEN s.timepoint IS NULL THEN t.Timepoint WHEN s.timepoint = t.TIMEPOINT THEN s.timepoint END as Timepoint from vw_data s FULL OUTER JOIN lu_pk_Timepoint t on s.PCTPT = t.Timepoint AND s.STUDY=t.Study

SQL INNER JOIN vs. WHERE ID IN(...) not the same results

I was surprised by the outcome of these two queries. I was expecting same from both. I have two tables that share a common field but there is not a relationship set up. The table (A) has a field EventID varchar(10) and table (B) has a field XXNumber varchar(15).
Values from table B column XXNumber are referenced in table A column EventID. Even though XXNumber can hold 15 chars, none of the 179K rows of data is longer than 10 chars.
So the requirement was:
"To avoid Duplicate table B and table A entries, if the XXNumber is contained in a table A >“Event ID” number, then it should not be counted."
To see how many common records I have I ran this query first - call it query alpha"
SELECT dbo.TableB.XXNumber FROM dbo.TableB WHERE dbo.TableB.XXNumber in
( select distinct dbo.TableA.EventId FROM dbo.TableA )
The result was 5322 rows.
The following query - call it query delta which looks like this:
SELECT DISTINCT dbo.TableB.XXNumber, dbo.TableB.EventId
FROM dbo.TableB INNER JOIN dbo.TableA ON dbo.TableB.XXNumber= dbo.TableB.EventId
haas returned 4308 rows.
Shouldn't the resulting number of rows be the same?
The WHERE ID IN () version will select all rows that match each distinct value in the list (regardless of whether you code DISTINCT indide the inner select or not - that's irrelevant). If a given value appears in the parent table more than once, you'll get multipke rows selected from the parent table for that single value found in the child table.
The INNER JOIN version will select each row from the parent table once for every successful join, so if there are 3 rows in the child table with the value, and 2 in the parent, then there will be 6 rows rows in the result for that value.
To make them "the same", add 'DISTINCT' to your main select.
To explain what you're seeing, we'd need to know more about your actual data.

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.

Excluding results from an SQL search based on the contents of another column

The table I'm using has 3 important columns on which to exclude entries in a table. I'll call them:
SampleTable:
Thing_Type (varchar)
Thing_ID (int)
Parent_ID (int)
I want to find all 'leaf' entries of a specific type. However, it is possible that an entry of that type has a child that is not of that type, and thus I don't want to exclude it without first filtering the table to only that type. Then I want to include all entries which have an ID that is not present anywhere in the ParentID column.
There's no EXISTS in the tool I'm using (not that I'm sure it would help).
Ignoring the fact that the tool I'm using doesn't like the following, and that it might not be syntactically correct, here's what I feel it should be like.
SELECT * FROM (
SELECT *
FROM SampleTable
WHERE SampleTable.Thing_Type = 'DesiredType'
)
WHERE Thing_ID NOT IN Parent_ID
I'm pretty sure this is wrong but I'm not sure how to make it right.
First, NOT IN has to go against a set, not a single value, so NOT IN parent_id doesn't actually make sense. This is one way to approach this problem:
SELECT
thing_type,
thing_id,
parent_id
FROM
Sample_Table T1
WHERE
thing_type = 'Desired Type' AND
thing_id NOT IN (
SELECT parent_id
FROM Sample_Table T2
WHERE T2.thing_type = 'Desired Type'
)

How to count number of records of a field in internal table in select statment?

I took a internal table of type ty_marc.
in this internal table i took 2 fields matnr and werks_d.
I want to count number of materials manufactured in the plant (marc-werks) based on the entry given by user.
I wrote the code as...
if so_matnr is not initial.
select matnr werks from marc
into table it_marc
where matnr in so_matnr.
endif.
loop at it_marc into w_marc.
write :/ w_marc-matnr. ( how to count total number of material eg:- material number : 100-100 to 100-110).
w_marc-werks.
endloop.
I want to count total number of material and display the count in another field of same internal table.
Note : there could be 10 material for material number 100-100, so i want the count as 10 in another field of same internal table and 100-110 could have n records and the count should be n in the field.
There are two easy options.
If you don't actually care about the plant (werks), use the group by clause and the count function in your select statement. Something like:
select matnr, count(*)
from marc
where matnr in so_matnr
group by matnr
into table it_marc_count.
Structure it_marc_count would need to have an integer field in the second position (and matnr in the first obviously).
If you do need werks, the easiest is to sort it_marc by matnr then use the at end and sum constructs in the loop at loop (or something similar). The examples at the end of Processing Table Entries in Loops should get you started.