I am trying to delete duplicates from an internal table, comparing all columns excluding some of them. Obviously I can list all the columns that I want to compare using COMPARING, but this would not look good in code.
So let's say there are 100 columns and I want to exclude from the comparing 2.
How can I achieve that with a smart way?
You could use the DELETE ADJUSTMENT DUPLICATES operator, there you can define which columns you compare. You'll just have to sort the itab before this operation.
Related
the data has over 50 columns, there is one or multiple columns which contain a value, say "US". and i want to find out the columns names. I can check column one by one, but it is time consuming. Anyone has an efficient way to do it? thank you so much!
My requirement is simple...
I have a table in SQL with more than 300 columns in it and I need to compare the data present in those columns with the source table.
Now every time I cannot waste my time finding where is the column I want to compare the data by dragging.
So my question is...Is there any short cut or code to go to a specific column when in case of a large table as in my requirement.
Any help is highly appreciated.
There are multiple ways to compare two tables based on the requirement.
If your source can be accessible for you then simple MINUS will do.
In Oracle:
SELECT COL1,COL2,COL3.... FROM SOURCE
MINUS
SELECT COL1,COL2,COL3.... FROM TARGET
In Informatica you can generate MD5 hash for the column you want to compare from both Source and Target and check whether both hashes are matching.
What is the most (time) efficient way of removing all exact duplicates from an unsorted standard internal table (non-deep structure, arbitrarily large)?
All I can think of is simply sorting the entire thing by all of its fields before running DELETE ADJACENT DUPLICATES FROM itab COMPARING ALL FIELDS. Is there a faster or preferred alternative? Will this cause problems if the structure mixes alphanumeric fields with numerics?
To provide context, I'm trying to improve performance on some horrible select logic in legacy programs. Most of these run full table scans on 5-10 joined tables, some of them self-joining. I'm left with hundreds of thousands of rows in memory and I'm fairly sure a large portion of them are just duplicates. However, changing the actual selects is too complex and would require /ex[tp]ensive/ retesting. Just removing duplicates would likely cut runtime in half but I want to make sure that the deduplication doesn't add too much overhead itself.
I would investigate two methods:
Store the original index in an auxiliary field, SORT BY the fields you want to compare (possibly using STABLE), DELETE ADJACENT DUPLICATES, then re-SORT BY the stored index.
Using a HASHED TABLE for the fields you want to compare, LOOP through the data table. Use READ TABLE .. TRANSPORTING NO FIELDS on the hashed table to find out whether the value already existed and if so, remove it - otherwise add the values to the hashed table.
I'm not sure about the performance, but I would recommend to use SAT on a plausible data set for both methods and compare the results.
If there are entries with the same key.
sort itab by key.
delete adjacent duplicates from itab comparing key.
Does anyone know which one will be deleted if delete adjacent duplicates..comparing key?
The first one or second one?
From F1 help on "delete adjacent duplicate"
In the case of several double lines following one another, all the
lines - except for the first - are deleted.
So the second (identical) line should be deleted
Regards,
Instead of sorting a standard table, you could consider declaring another internal table as a sorted table of the same type with a unique key corresponding to the fields you're comparing to eliminate the duplicates. It's faster, allows you to keep your original table unchanged, and, in my opinion, makes your code more readable because it's easier to understand which rows are kept and which ones are not. Example:
LOOP AT itab ASSIGNING <itab_row>.
INSERT <itab_row> INTO TABLE sorted_itab.
ENDLOOP.
If data in your itab are fetched from DB, it's better than you use ORDER BY addition in SELECT and than you can use the delete adjacent duplicates . Sorting algorithm costs nlog(n) and is better that DBMS does these type of operation instead ABAP.
Obviously that if you can do the DISTINCT or GROUP BY in SQL you avoid to use both SORT and delete adjacent duplicates and you should solve all performance problems
I am creating a project in VB.NET in which one of the reports require that the name of employees should be displayed as column names and whatever work they have done for a stated period should appear in rows below that particular column.
Now as it is clear, the columns will have to be added at runtime. Am using an ODBC Data source to populate the grid. Also since a loop will have to be done to find out the work done by employees individually, so the number of rows under one column might be less or more than the rows in the next column.
Is there a way to create an empty data table and then update its contents based on columns and not add data based on addition of new rows.
Regards
A table consists of rows and columns: the rows hold the data, the columns define the data.
So you have no other choice than to add at least that many rows as your longest column will need. You could just fill up empty values in the other columns. That should give you the view you need.
Wouldn't it be better to simply switch the table orientation?
If most of your columns are names or maybe regroupment I dont' know,
then you'd have one column for each of the data you could display,
And you'd add each rows with the names and stats dynamically, which is more common.
I'm only suggesting that, because I don't know all your table structure.