Binary search not working when looping on the same table

Binary search not working when looping on the same table - abap

I'm making a function that exports a table with some data T_DATA[].
There is a part when I loop through an internal table( T_ENTRIES[]) and use binary search on the T_DATA[].
Before the Loop, T_DATA[] is sorted by the key I use in the read statement.
For some reason, the read fails a lot of time even if it has same key in both tables.
If i remove the binary search it works well.
Is this a common problem with tables that declared as exporting in the function?
Because when I move the table (T_DATA[]) to a different internal table and use the binary search on it, it works fine.
Thank you!
SORT t_patient_list[] BY kunnr.
LOOP AT lt_cov_entry[] ASSIGNING FIELD-SYMBOL(<ls_cov_entry>).
READ TABLE t_patient_list[]
ASSIGNING FIELD-SYMBOL(<fs_patient_list>)
WITH KEY kunnr = <ls_cov_entry>-kunnr.
BINARY SEARCH.
IF sy-subrc <> 0.
CLEAR ls_patient_record.
MOVE-CORRESPONDING <ls_cov_entry> TO ls_patient_record.
APPEND ls_patient_record TO t_patient_list[].
ELSE.
<fs_patient_list>-hosp_type = <ls_cov_entry>-hosp_type.
ENDIF.
ENDLOOP.

It does not work with BINARY SEARCH becuse the APPEND ruins the ordering. The best solution is to define t_patient_list as SORTED by kunnr. 1
If you cannot do that, you should use INSERT ... INDEX sy-tabix instead of APPEND, as even a failed READ TABLE sets it to the right value:
SORT t_patient_list[] BY kunnr.
LOOP AT lt_cov_entry[] ASSIGNING FIELD-SYMBOL(<ls_cov_entry>).
READ TABLE t_patient_list[]
ASSIGNING FIELD-SYMBOL(<fs_patient_list>)
WITH KEY kunnr = <ls_cov_entry>-kunnr
BINARY SEARCH.
IF sy-subrc <> 0.
CLEAR ls_patient_record.
MOVE-CORRESPONDING <ls_cov_entry> TO ls_patient_record.
INSERT ls_patient_record INTO t_patient_list[] INDEX sy-tabix.
ELSE.
<fs_patient_list>-hosp_type = <ls_cov_entry>-hosp_type.
ENDIF.
ENDLOOP.
1 ) If you can change it to SORTED, you still need to replace APPEND with INSERT, just without the INDEX part.

The problem is that you are appending to the table while you are reading it. After you append something, it might no longer be sorted, so BINARY SEARCH can no longer be expected to work reliably.
Possible workaround:
When the definition of the table t_patient_list is under your control, then add a secondary sorted table key to its declaration:
DATA t_patient_list TYPE TABLE OF patients
WITH NON-UNIQUE SORTED KEY key_kunnr COMPONENTS kunnr.
(you might use an even faster UNIQUE HASHED KEY when you can guarantee that kunnr includes only unique values)
Then explicitly use that key when searching:
READ TABLE t_patient_list[]
ASSIGNING FIELD-SYMBOL(<fs_patient_list>)
WITH TABLE KEY key_kunnr COMPONENTS kunnr = <ls_cov_entry>-kunnr.
(The addition BINARY SEARCH isn't required here because using a sorted key implies binary search).
Secondary keys are updated whenever the table is changed, so you can rely on them staying consistent.
More hackish workaround
When the definition of the table t_patient_list is not under your control, then you have to re-sort the table after changing it:
IF sy-subrc <> 0.
CLEAR ls_patient_record.
MOVE-CORRESPONDING <ls_cov_entry> TO ls_patient_record.
APPEND ls_patient_record TO t_patient_list[].
SORT t_patient_list[].
ELSE.
But you might want to measure if the performance penalty of sorting it after every append isn't more than you save by using BINARY SEARCH.

Related

How to loop at a dynamic internal table?

I'm working on an Enhancement Implantation on ZXMBCU10 which is called in a custom program couple of levels down the execution path. Inside ZXMBCU10 I want to access the a table in the parent program, which I do in the following method;
Declare the parent program name;
DATA: ex_tbl_name TYPE char100 VALUE '(ZPROGRAM)G_TAB'.
Getting the value through field symbol assignment.
FIELD-SYMBOLS: <fs> TYPE any.
ASSIGN (ex_tbl_name) TO <fs>.
Then I check for successful assignment (which is true).
IF <fs> IS ASSIGNED.
Problem I have is how to read the data in the <fs> field symbol.
I've tried LOOP and READ TABLE, but getting the following;
Both Read Table and Loop is added here just to get the syntax checks
LOOP;
Internal table "<FS>" has no header line - one of the additions "INTO
wa", "ASSIGNING", "REFERENCE INTO", "TRANSPORTING NO FIELDS" required.
required.
READ TABLE;
You cannot use explicit or implicit index operations on tables with
types "HASHED TABLE" or "ANY TABLE". "<FS>" has the type "ANY TABLE".
It is possible that the "TABLE" addition was not specified before
"<FS>".

LOOP AT
The error about LOOP AT (Internal table "<FS>" has no header line - one of the additions "INTO wa", "ASSIGNING", "REFERENCE INTO", "TRANSPORTING NO FIELDS" required), is that you don't indicate the "result" part of LOOP AT i.e. ASSIGNING, REFERENCE INTO... (as said in the message).
For a field symbol, LOOP AT alone is always invalid, and if it's a variable instead of a field symbol it's obsolete because that would imply the use of a header line.
LOOP AT <fs>. " always invalid !
A valid syntax could be as follows: you must declare the field symbol as being an internal table (with at least the word TABLE, or refer to a "Table Type"), any category of internal table is supported for LOOP AT (hashed, sorted, standard), so you can use TYPE ANY TABLE :
DATA: ex_tbl_name TYPE char100 VALUE '(ZPROGRAM)G_TAB'.
FIELD-SYMBOLS: <fs> TYPE ANY TABLE.
ASSIGN (ex_tbl_name) TO <fs>.
LOOP AT <fs> ASSIGNING FIELD-SYMBOL(<line>).
ENDLOOP.
READ TABLE
The error about READ TABLE (You cannot use explicit or implicit index operations on tables with types "HASHED TABLE" or "ANY TABLE". "<FS>" has the type "ANY TABLE". It is possible that the "TABLE" addition was not specified before "<FS>") is that you used READ TABLE ... INDEX ... whose INDEX means that it can only be used with an internal table with category SORTED or STANDARD.
The next code is invalid because of the combination of ANY TABLE and READ TABLE INDEX, because <FS> could eventually be a hashed internal table (who knows), then READ TABLE INDEX would fail, hence the compiler error:
DATA: ex_tbl_name TYPE char100 VALUE '(ZPROGRAM)G_TAB'.
FIELD-SYMBOLS: <fs> TYPE ANY TABLE. " <=== impossible with READ TABLE INDEX !
ASSIGN (ex_tbl_name) TO <fs>.
READ TABLE <fs> ASSIGNING FIELD-SYMBOL(<line>) INDEX 1. " <=== impossible with ANY TABLE !
Solution: to use READ TABLE <fs> INDEX ... you may declare the field-symbol as SORTED, STANDARD, or INDEX (the latter is a generic name corresponding to SORTED and STANDARD).
This code is valid:
DATA: ex_tbl_name TYPE char100 VALUE '(ZPROGRAM)G_TAB'.
FIELD-SYMBOLS: <fs> TYPE INDEX TABLE.
ASSIGN (ex_tbl_name) TO <fs>.
READ TABLE <fs> ASSIGNING FIELD-SYMBOL(<line>) INDEX 1.
Of course, it's assumed that G_TAB is an "index" table, not a hashed table!
PS: in your code you used INTO DATA(lv_fs) but usually if you have a generic internal table ASSIGNING is preferred.

change field symbol type to
any table.
instead of:
any.

Reducing file size via keys/indices

I'm using MS Access, so file size is a real constraint (2 gigs I think). Am I saving space in the most efficient way?
tbl1: tbl_NamesDescs
pid_NamesDescs <-autonumber
ColName <-text field, Indexed: Yes (No Duplicates)
Descs <- text field
tbl2: tbl_HistStatsSettings
pid_HistStatsSettings <-autonumber
Factor <-text field
etc... (other fields)
So using the two tables above, tbl2 has ~800k records and all of Factor's unique possibilities are listed in ColName (i.e. there is a one to many relationship relationship between ColName and Factor receptively). When I look at the tables in Datasheetview I see all of the names listed (full text) in both Factor and ColName.
Question:
Is that the best way to save space? I would think that Factor should be a list of indices (numbers, not text) corresponding to ColName.
In other words, wouldn't it be more file-space efficient to populate Factor with the pid_NamesDescs autonumers since numbers are smaller than text? If that is true, what is the best way to make this happen (either steps in MS Access or VBA is what I am after here)?
EDIT: added table names and pid names as they really exist

Yes, putting the FactorID as a number instead of text will save space. I can't really answer whether it's the "best" way, but it will definitely save space.
The easiest way to do this is to run the following query:
Update tbl2 LEFT JOIN tbl1 ON tbl2.Factor = tbl1.ColName
SET tbl2.Factor = CStr(tbl1.PID_tbl1)
WHERE Not IsNull(tbl1.ColName)
Then, in design view change the datatype of "factor" to Long. I'd also then change the name of the Field to "FactorID" and change the name of "ColName" To "Factor." I'd make some other changes to the column/table (although you may be giving fake names) names for clarity.
OR make a helper column (as a long int as you suggested in comments) and update the helper field, and then delete the original field.
Then, go into the relationships table and add a relationship between tbl1.PID_tbl1 and tbl2.FactorID
After this, Compact and Repair the database to reduce the size.
*EDIT to add portion about adding the relationship between the tables.

In addition to normalization, also check all your text fields. The default is 255 characters for short text. When you are storing less than 255 characters in a text field, make sure the field size is set to no more than what you typically store. Upon changing, perform compact and repair to reduce file size. Where possible, use the short text over long text.
Also consider a split database approach where data is on the back end and your UI and VBA in the front end.

Storing HASHBYTES output in NVARCHAR vs BYTES

I am going to create:
a table for storing IDs and unique text values (which are expected to
be large)
a stored procedure which will have a text value as input parameter
(it will check if the value exists in the above table and return the
corresponding ID if it exists, or inserted a new record if not and
return the new ID as well)
I want to optimize the search of text values using hash value of the text and created index on it. So, during the search I expect a non-clustered index to be used (not the clustered index).
I decided to use the HASHBYTES with SHA2_256 and I am wondering are there any differences/benefits if I am storing the hash value as BINARY(32) or NVARCHAR(16)?

You can't reasonably store a hash value as chars because binary data is not text. Various text processing and comparison functions interpret those chars. For example trailing whitespace is sometimes ignored leading to incorrect results.
Since you've got 32 totally random unstructured bytes to store a binary(32) is the most natural format and it is the fastest one.

Fastest way to find string by substring in SQL?

I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.
My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:
SELECT * FROM t LIKE '%abc%'
I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?
I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.

if you dont care about storage, then you can create another table with partial Title entries, beginning with each substring (up to 255 entries per normal title ).
in this way, you can index these substrings, and match only to the beginning of the string, should greatly improve performance.

If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it.
You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.

Sounds like you've ruled out all good alternatives.
You already know that your query
SELECT * FROM t WHERE TITLE LIKE '%abc%'
won't use an index, it will do a full table scan every time.
If you were sure that the string was at the beginning of the field, you could do
SELECT * FROM t WHERE TITLE LIKE 'abc%'
which would use an index on Title.
Are you sure full text search wouldn't help you here?
Depending on your business requirements, I've sometimes used the following logic:
Do a "begins with" query (LIKE 'abc%') first, which will use an index.
Depending on if any rows are returned (or how many), conditionally move on to the "harder" search that will do the full scan (LIKE '%abc%')
Depends on what you need, of course, but I've used this in situations where I can show the easiest and most common results first, and only move on to the more difficult query when necessary.

You can add another calculated column on the table: titleLength as len(title) PERSISTED. This would store the length of the "title" column. Create an index on this.
Also, add another calculated column called: ReverseTitle as Reverse(title) PERSISTED.
Now when someone searches for a keyword, check if the length of keyword is same as titlelength. If so, do a "=" search. If length of keyword is less than the length of the titleLength, then do a LIKE. But first do a title LIKE 'abc%', then do a reverseTitle LIKE 'cba%'. Similar to Brad's approach - ie you do the next difficult query only if required.
Also, if the 80-20 rules applies to your keywords/ substrings (ie if most of the searches are on a minority of the keywords), then you can also consider doing some sort of caching. For eg: say you find that many users search for the keyword "abc" and this keyword search returns records with ids 20, 22, 24, 25 - you can store this in a separate table and have this indexed.
And now when someone searches for a new keyword, first look in this "cache" table to see if the search was already performed by an earlier user. If so, no need to look again in main table. Simply return results from "cache" table.
You can also combine the above with SQL Server TextSearch. (assuming you have a valid reason not to use it). But you could nevertheless use Text search first to shortlist the result set. and then run a SQL query against your table to get exact results using the Ids returned by the TExt Search as a parameter along with your keyword.
All this is obviously assuming you have to use SQL. If not, you can explore something like Apache Solr.

Create index view there is new feature in sql create index on the column that you need to search and use that view after in your search that will give your more faster result.

Use ASCII charset with clustered indexing the char column.
The charset influences the search performance because of the data
size on both ram and disk. The bottleneck is often I/O.
Your column is 255 characters long so you can use normal index on
your char field rather than full text, which is faster. Do not
select unnecessary columns in your select statement.
Lastly, add more RAM to the server and Increase cache size.

Do one thing, use primary key on specific column & index it in cluster form.
Then search using any method (wild card or = or any), it will search optimally because the table is already in clustered form, so it knows where he can find (because column is already in sorted form)

Theory of storing a number and text in same SQL field

I have a three tables
Results:
TestID
TestCode
Value
Tests:
TestID
TestType
SysCodeID
SystemCodes
SysCodeID
ParentSysCodeID
Description
The question I have is for when the user is entering data into the results table.
The formatting code when the row gets the focus changes the value field to a dropdown combobox if the testCode is of type SystemList. The drop down has a list of all the system codes that have a parentsyscodeID of the test.SysCodeID. When the user chooses a value in the list it translates into a number which goes into the value field.
The datatype of the Results.Value field is integer. I made it an integer instead of a string because when reporting it is easier to do calculations and sorting if it is a number. There are issues if you are putting integer/decimal value into a string field. As well, when the system was being designed they only wanted numbers in there.
The users now want to put strings into the value field as well as numbers/values from a list and I'm wondering what the best way of doing that would be.
Would it be bad practice to convert the field over to a string and then store both strings and integers in the same field? There are different issues related to this one but i'm not sure if any are a really big deal.
Should I add another column into the table of string datatype and if the test is a string type then put the data the user enters into the different field.
Another option would be to create a 1-1 relationship to another table and if the user types in a string into the value field it adds it into the new table with a key of a number.
Anyone have any interesting ideas?

What about treating Results.Value as if it were a numeric ValueCode that becomes an foreign key referencing another table that contains a ValueCode and a string that matches it.
CREATE TABLE ValueCodes
(
Value INTEGER NOT NULL PRIMARY KEY,
Meaning VARCHAR(32) NOT NULL UNIQUE
);
CREATE TABLE Results
(
TestID ...,
TestCode ...,
Value INTEGER NOT NULL FOREIGN KEY REFERENCES ValueCodes
);
You continue storing integers as now, but they are references to a limited set of values in the ValueCodes table. Most of the existing values appear as an integer such as 100 with a string representing the same value "100". New codes can be added as needed.

Are you saying that they want to do free-form text entry? If that's the case, they will ruin the ability to do meaningful reporting on the field, because I can guarantee that they will not consistently enter the strings.
If they are going to be entering one of several preset strings (for example, grades of A, B, C, etc.) then make a lookup table for those strings which maps to numeric values for sorting, evaluating, averaging, etc.
If they really want to be able to start entering in free-form text and you can't dissuade them from it, add another column along the lines of other_entry. Have a predefined value that means "other" to put in your value column. That way, when you're doing reporting you can either roll up all of those random "other" values or you can simply ignore them. Make sure that you add the "other" into your SystemCodes table so that you can keep a foreign key between that and the Results table. If you don't already have one, then you should definitely consider adding one.
Good luck!

The users now want to put strings into
the value field as well as
numbers/values from a list and I'm
wondering what the best way of doing
that would be.
It sounds like the users want to add new 'testCodes'. If that is the case why not just add them to your existing testcode table and keep your existing format.
Would it be bad practice to convert
the field over to a string and then
store both strings and integers in
the same field? There are different
issues related to this one but i'm not
sure if any are a really big deal.
No it's not a big deal. Often PO numbers or Invoice numbers have numbers or a combination of letters and numbers. You are right however about the performance of the database on a number field as opposed to a string, but if you index the string field you end up with the database doing it's scans on numeric indexes anyway.
The problems you may have had with your decimals as strings probably have to do with the floating point data types in which the server essentially estimates the value of the field and only retains accuracy to a certain number of digits. This can lead to a whole host of rounding errors if you are concerned about the digits. You can avoid that issue by using currency fields or the like that have static accuracy of the decimals. lol I learned this the hard way.
Tom H. did a great job addressing everything else.

I think the easiest way to do it would be to convert Results.Value to a "string" (char, varchar, whatever). Yes, this ruins the ability to do numeric sorting (and you won't be able to do a cast or convert on the column any longer since text will be intermingled with integer values), but I think any other method would be too complex to maintain properly. (For example, in the 1-1 case you mentioned, is that integer value the actual value or a foreign key to the string table? Now we need another column to determine that.)

I would create the extra column for string values. It's not true normalization but it's the easiest to implement and to work with.
Using the same field for both numbers and strings would work to as long as you don't plan on doing anything with the numbers like summing or sorting.
The extra table approach while good from a normalization standpoint is probably overly complex.

I'd convert the value field to string and add a column indicating what the datatype should be treated as for post processing and reporting.

Sql Server at least has an IsNumeric function you can use:
ORDER BY IsNumeric(Results.Value) DESC,
CASE WHEN IsNumeric(Results.Value) = 1 THEN Len(Results.Value) ELSE 99 END,
Results.Value

One of two solutions comes to mind. It kind of depends on what you're doing with the numbers. If they just represent a choice of some kind, then pick one. If you need to do math on it (sorting, conversion, etc..) then pick another.
Change the column to be a varchar, and then either put numbers or text in it. Sorting numerically will suck, but hey, it's one column.
Have both a varchar column for the text, and an int column for the number. Use a view to hide the differences, and to control the sorting if necessary. You can coalesce the two columns together if you don't care about whether you're looking at numbers or text.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas