Most performant way to filter an internal table based on a where condition

Most performant way to filter an internal table based on a where condition - abap

So far, I always used this to get specific lines from an internal table:
LOOP AT it_itab INTO ls_itab WHERE place = 'NEW YORK'.
APPEND ls_itab TO it_anotherItab
INSERT ls_itab INTO TABLE it_anotherItab
ENDLOOP.
However, with 7.40 there seems to be REDUCE, FOR, LINES OF and FILTER. FILTER requires a sorted or hashed key, which isn't the case in my example. So I guess only FOR comes into question.
DATA(it_anotherItab) = VALUE t_itab( FOR wa IN it_itab WHERE ( place = 'LONDON' )
( col1 = wa-col2 col2 = wa-col3 col3 = ....... ) ).
The questions are:
Are both indeed doing the same? Is the 2nd one an APPEND or INSERT?
Is it possible in the second variant to use the whole structure and not specifying every column? Like just ( wa )
Is the second example faster?

In accordance to your comment, you can also define a sorted secondary key on a standard table. Just look at this example here:
TYPES:
BEGIN OF t_line_s,
name1 TYPE name1,
name2 TYPE name2,
ort01 TYPE ort01,
END OF t_line_s,
t_tab_tt TYPE STANDARD TABLE OF t_line_s
WITH NON-UNIQUE EMPTY KEY
WITH NON-UNIQUE SORTED KEY place_key COMPONENTS ort01. "<<<
DATA(i_data) = VALUE t_tab_tt( ). " fill table with test data
DATA(i_london_only) = FILTER #(
i_data
USING KEY place_key " we want to use the secondary key
WHERE ort01 = CONV #( 'london' ) " stupid conversion rules...
).
" i_london_only contains the filtered entries now
UPDATE:
In my quick & dirty performance test, FILTER is slow on first call but beats the LOOP-APPEND variant afterwards.
UPDATE 2:
Found the reason today...
... the administration of a non-unique secondary table key is updated at the next explicit use of the secondary table key (lazy update).

Related

Do not Update the Values in Merge statement if old values do not change while update in Merge

MERGE PFM_EventPerformance_MetaData AS TARGET
USING
(
SELECT
[InheritanceMeterID] = #InheritanceMeterPointID
,[SubHourlyScenarioResourceID] = #SubHourlyScenarioResourceID
,[MeterID] = #MeterID--internal ID
,[BaselineID] = #BaselineID--internal ID
,[UpdateUtc] = GETUTCDATE()
)
AS SOURCE ON
TARGET.[SubHourlyScenarioResourceID] = SOURCE.[SubHourlyScenarioResourceID]
AND TARGET.[MeterID] = SOURCE.[MeterID]--internal ID
AND TARGET.[BaselineID] = SOURCE.[BaselineID]--internal ID
WHEN MATCHED THEN UPDATE SET
#MetaDataID = TARGET.ID--get preexisting ID when exists (must populate one row at a time)
,InheritanceMeterID = SOURCE.InheritanceMeterID
,[UpdateUtc] = SOURCE.[UpdateUtc]
WHEN NOT MATCHED
THEN INSERT
(
[InheritanceMeterID]
,[SubHourlyScenarioResourceID]
,[MeterID]--internal ID
,[BaselineID]--internal ID
)
VALUES
(
SOURCE.[InheritanceMeterID]
,SOURCE.[SubHourlyScenarioResourceID]
,SOURCE.[MeterID]--internal ID
,SOURCE.[BaselineID]--internal ID
);
In the above query I do not want to update the values in the Target table if there is no change in old values. I am not sure how to achieve this as I have used Merge statement rarely. Please help me with the solution. Thanks in advance

This is done best in two stages.
Stage 1: Merge Update on condition
SO Answer from before (Thanks to #Laurence!)
Stage 2: hash key condition to compare
Limits: max 4000 characters, including column separator characters
A rather simple way to compare multiple columns in one condition is the use of a computed column on both sides that HASHBYTES( , <column(s)> ) generates.
This moves writing lots of code from the merge statement to the table generation.
Quick example:
CREATE TABLE dbo.Test
(
id_column int NOT NULL,
dsc_name1 varchar(100),
dsc_name2 varchar(100),
num_age tinyint,
flg_hash AS HashBytes( 'SHA1',
Cast( dsc_name1 AS nvarchar(4000) )
+ N'•' + dsc_name2 + N'•' + Cast( num_age AS nvarchar(3) )
) PERSISTED
)
;
Comparing columns flg_hash between source and destination will make comparison quick as it is just a comparison between two 20 bit varbinary columns.
Couple of Caveat Emptor for working with HashBytes:
Function only works for a total of 4000 nvarchar characters
Trade off for short comparison code lies in generation of correct order in views and tables
There is a duplicate collision chance of around an 2^50+ for SHA1 - as security mechanism this is now considered insecure and a few years ago MS tried to drop SHA1 as algorithm
Added columns to tables and views can be overlooked from comparison if hash bytes code is outside of consideration for amendments
Overall I found that when comparing multiple columns this can overload my server engines but never had an issue with hash key comparisons

Modify a db_table type table of a db

I created a database table with ID, firstname, lastname.
I created following program:
data: db_table type table of ztabletest. "Create my db data
select * from z6148tabletest into table db_table. "Fill my db data
data: modifiedLine type z6148tabletest. "Create my new line
modifiedLine-firstname = 'hey'.
modifiedLine-lastname = 'test'.
Now I want to modify the line in my db table index 2.
So I'm trying to do something like:
modify ztabletest from table db_table values modifiedLine at index 2.
I don't understand the logic for modifying.
To insert something I just do:
insert INTO ztabletest VALUES modifiedLine.
So here the logic is simple because I add in my table the values.
Can you explain me the logic to modify a line ?

A database table has no "index". The order of the table rows is unspecified. When you do a SELECT without an ORDER BY, then the database can give you the row in whatever order it feels like. Most SQL databases tend to always give you the same order, but that's for their convenience, not for yours. Especially SAP HANA tends to be very moody in this regard.
But what database tables do have is a primary key. The primary key can be thought of as an unique identifier of each table row. So when you make the primary key a number, you can simulate an index pretty well. I assume that this is the purpose of the field "ID" in your databse column and that you therefore marked it as "key" when you defined your database.
INSERT adds a new line when no line with the same key values exists. When there already is one, it fails with sy-subrc = 4.
modifiedLine-id = 2.
INSERT ztabletest FROM modifiedLine.
UPDATE changes an existing table line with the same key values. When no line with these primary key values exists in the table, it fails with sy-subrc = 4.
modifiedLine-id = 2.
UPDATE ztabletest FROM modifiedLine.
or alternative the more "traditional SQL" like syntax with SET and WHERE:
UPDATE ztabletest
SET firstname = 'hey'
lastname = 'test'
WHERE id = 2
MODIFY is the combination of INSERT and UPDATE (also known as an "upsert"). It checks if the line is already there. When it's there, it modifies the line. When it isn't, it inserts it.
modifiedLine-id = 2.
MODIFY ztabletest FROM modifiedLine.
Which is basically a shorthand for:
modifiedLine-id = 2.
UPDATE ztabletest FROM modifiedLine.
IF sy-subrc = 4.
INSERT ztabletest FROM modifiedLine.
ENDIF.

Insert strategy for tables with one-to-one relationships in Teradata

In our data model, which is derived from the Teradata industry models, we observe a common pattern, where the superclass and subclass relationships in the logical data model are transformed into one-to-one relationships between the parent and the child table.
I know you can roll-up or roll-down the attributes to end up with a single table but we are not using this option overall. At the end what we have is a model like this:
Where City Id references a Geographical Area Id.
I am struggling with a good strategy to load the records in these tables.
Option 1: I could select the max(Geographical Area Id) and calculate the next Ids for a batch insert and reuse them for the City Table.
Option 2: I could use an Identity column in the Geographical Area Table and retrieve it after I insert every record in order to use it for the City table.
Any other options?
I need to assess the solution in terms of performance, reliability and maintenance.
Any comment will be appreciated.
Kind regards,
Paul

When you say "load the records into these tables", are you talking about a one-time data migration or a function that creates records for new Geographical Area/City?
If you are looking for a surrogate key and are OK with gaps in your ID values, then use an IDENTITY column and specify the NO CYCLE clause, so it doesn't repeat any numbers. Then just pass NULL for the value and let TD handle it.
If you do need sequential IDs, then you can just maintain a separate "NextId" table and use that to generate ID values. This is the most flexible way and would make it easier for you to manage your BATCH operations. It requires more code/maintenance on your part, but is more efficient than doing a MAX() + 1 on your data table to get your next ID value. Here's the basic idea:
BEGIN TRANSACTION
Get the "next" ID from a lookup table
Use that value to generate new ID values for your next record(s)
Create your new records
Update the "next" ID value in the lookup table and increment it by the # rows newly inserted (you can capture this by storing the value in the ACTIVITY_COUNT value variable directly after executing your INSERT/MERGE statement)
Make sure to LOCK the lookup table at the beginning of your transaction so it can't be modified until your transaction completes
END TRANSACTION
Here is an example from Postgres, that you can adapt to TD:
CREATE TABLE NextId (
IDType VARCHAR(50) NOT NULL,
NextValue INTEGER NOT NULL,
PRIMARY KEY (IDType)
);
INSERT INTO Users(UserId, UserType)
SELECT
COALESCE(
src.UserId, -- Use UserId if provided (i.e. update existing user)
ROW_NUMBER() OVER(ORDER BY CASE WHEN src.UserId IS NULL THEN 0 ELSE 1 END ASC) +
(id.NextValue - 1) -- Use newly generated UserId (i.e. create new user)
)
AS UserIdFinal,
src.UserType
FROM (
-- Bulk Upsert (get source rows from JSON parameter)
SELECT src.FirstName, src.UserId, src.UserType
FROM JSONB_TO_RECORDSET(pUserDataJSON->'users') AS src(FirstName VARCHAR(100), UserId INTEGER, UserType CHAR(1))
) src
CROSS JOIN (
-- Get next ID value to use
SELECT NextValue
FROM NextId
WHERE IdType = 'User'
FOR UPDATE -- Use "Update" row-lock so it is not read by any other queries also using "Update" row-lock
) id
ON CONFLICT(UserId) DO UPDATE SET
UserType = EXCLUDED.UserType;
-- Increment UserId value
UPDATE NextId
SET NextValue = NextValue + COALESCE(NewUserCount,0)
WHERE IdType = 'User'
;
Just change the locking statement to Teradata syntax (LOCK TABLE NextId FOR WRITE) and add an ACTIVITY_COUNT variable after your INSERT/MERGE to capture the # rows affected. This assumes you're doing all this inside a stored procedure.
Let me know how it goes...

change ID number to smooth out duplicates in a table

I have run into this problem that I'm trying to solve: Every day I import new records into a table that have an ID number.
Most of them are new (have never been seen in the system before) but some are coming in again. What I need to do is to append an alpha to the end of the ID number if the number is found in the archive, but only if the data in the row is different from the data in the archive, and this needs to be done sequentially, IE, if 12345 is seen a 2nd time with different data, I change it to 12345A, and if 12345 is seen again, and is again different, I need to change it to 12345B, etc.
Originally I tried using a where loop where it would put all the 'seen again' records in a temp table, and then assign A first time, then delete those, assign B to what's left, delete those, etc., till the temp table was empty, but that hasn't worked out.
Alternately, I've been thinking of trying subqueries as in:
update table
set IDNO= (select max idno from archive) plus 1
Any suggestions?

How about this as an idea? Mind you, this is basically pseudocode so adjust as you see fit.
With "src" as the table that all the data will ultimately be inserted into, and "TMP" as your temporary table.. and this is presuming that the ID column in TMP is a double.
do
update tmp set id = id + 0.01 where id in (select id from src);
until no_rows_changed;
alter table TMP change id into id varchar(255);
update TMP set id = concat(int(id), chr((id - int(id)) * 100 + 64);
insert into SRC select * from tmp;

What happens when you get to 12345Z?
Anyway, change the table structure slightly, here's the recipe:
Drop any indices on ID.
Split ID (apparently varchar) into ID_Num (long int) and ID_Alpha (varchar, not null). Make the default value for ID_Alpha an empty string ('').
So, 12345B (varchar) becomes 12345 (long int) and 'B' (varchar), etc.
Create a unique, ideally clustered, index on columns ID_Num and ID_Alpha.
Make this the primary key. Or, if you must, use an auto-incrementing integer as a pseudo primary key.
Now, when adding new data, finding duplicate ID number's is trivial and the last ID_Alpha can be obtained with a simple max() operation.
Resolving duplicate ID's should now be an easier task, using either a while loop or a cursor (if you must).
But, it should also be possible to avoid the "Row by agonizing row" (RBAR), and use a set-based approach. A few days of reading Jeff Moden articles, should give you ideas in that regard.

Here is my final solution:
update a
set IDnum=b.IDnum
from tempimiportable A inner join
(select * from archivetable
where IDnum in
(select max(IDnum) from archivetable
where IDnum in
(select IDnum from tempimporttable)
group by left(IDnum,7)
)
) b
on b.IDnum like a.IDnum + '%'
WHERE
*row from tempimport table = row from archive table*
to set incoming rows to the same IDnum as old rows, and then
update a
set patient_account_number = case
when len((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)))= 7 then a.IDnum + 'A'
else left(a.IDnum,7) + char(ascii(right((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)),1))+1)
end
from tempimporttable a
where not exists ( *select rows from archive table* )
I don't know if anyone wants to delve too far into this, but I appreciate contructive criticism...

SQL query select from table and group on other column

I'm phrasing the question title poorly as I'm not sure what to call what I'm trying to do but it really should be simple.
I've a link / join table with two ID columns. I want to run a check before saving new rows to the table.
The user can save attributes through a webpage but I need to check that the same combination doesn't exist before saving it. With one record it's easy as obviously you just check if that attributeId is already in the table, if it is don't allow them to save it again.
However, if the user chooses a combination of that attribute and another one then they should be allowed to save it.
Here's an image of what I mean:
So if a user now tried to save an attribute with ID of 1 it will stop them, but I need it to also stop them if they tried ID's of 1, 10 so long as both 1 and 10 had the same productAttributeId.
I'm confusing this in my explanation but I'm hoping the image will clarify what I need to do.
This should be simple so I presume I'm missing something.

If I understand the question properly, you want to prevent the combination of AttributeId and ProductAttributeId from being reused. If that's the case, simply make them a combined primary key, which is by nature UNIQUE.
If that's not feasible, create a stored procedure that runs a query against the join for instances of the AttributeId. If the query returns 0 instances, insert the row.
Here's some light code to present the idea (may need to be modified to work with your database):
SELECT COUNT(1) FROM MyJoinTable WHERE AttributeId = #RequestedID
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO MyJoinTable ...
END

You can control your inserts via a stored procedure. My understanding is that
users can select a combination of Attributes, such as
just 1
1 and 10 together
1,4,5,10 (4 attributes)
These need to enter the table as a single "batch" against a (new?) productAttributeId
So if (1,10) was chosen, this needs to be blocked because 1-2 and 10-2 already exist.
What I suggest
The stored procedure should take the attributes as a single list, e.g. '1,2,3' (comma separated, no spaces, just integers)
You can then use a string splitting UDF or an inline XML trick (as shown below) to break it into rows of a derived table.
Test table
create table attrib (attributeid int, productattributeid int)
insert attrib select 1,1
insert attrib select 1,2
insert attrib select 10,2
Here I use a variable, but you can incorporate as a SP input param
declare #t nvarchar(max) set #t = '1,2,10'
select top(1)
t.productattributeid,
count(t.productattributeid) count_attrib,
count(*) over () count_input
from (select convert(xml,'<a>' + replace(#t,',','</a><a>') + '</a>') x) x
cross apply x.x.nodes('a') n(c)
cross apply (select n.c.value('.','int')) a(attributeid)
left join attrib t on t.attributeid = a.attributeid
group by t.productattributeid
order by countrows desc
Output
productattributeid count_attrib count_input
2 2 3
The 1st column gives you the productattributeid that has the most matches
The 2nd column gives you how many attributes were matched using the same productattributeid
The 3rd column is how many attributes exist in the input
If you compare the last 2 columns and the counts
match - you can use the productattributeid to attach to the product which has all these attributes
don't match - then you need to do an insert to create a new combination

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Most performant way to filter an internal table based on a where condition - abap

Related

Do not Update the Values in Merge statement if old values do not change while update in Merge

Modify a db_table type table of a db

Insert strategy for tables with one-to-one relationships in Teradata

change ID number to smooth out duplicates in a table

SQL query select from table and group on other column

Categories

Resources