How is the internal table sorted by default? - abap

So i was wondering if when i declare
lt_table TYPE STANDARD TABLE OF mara.
Is it the same as
lt_table TYPE STANDARD TABLE OF mara WITH DEFAULT KEY.
Or are the standard table keys selected differently when not declaring DEFAULT KEY?

That's the same, as explained in the ABAP documentation:
If no explicit primary key is defined for a standard table, it automatically has a standard key.
A standard key is when you indicate DEFAULT KEY (first bullet point below) or nothing (second one):
The standard key can be declared as follows:
Explicitly, using the additions UNIQUE|NON-UNIQUE KEY of the statements TYPES, DATA and so on, where the addition DEFAULT KEY is specified instead of the list of components.
Implicitly, if no explicit primary key specification is made in the declaration of a standard table with the statement DATA.
Implicitly, if a standard table type with a generic primary table key is specified behind TYPE in the statement DATA.
EDIT May 31st, 2022: there can be some confusion about the meaning of the "keys of a standard table". That could make people think that the table is sorted and the access is then faster.
That's wrong.
It will be faster only if you sort explicitly your internal table SORT itab BY comp1 comp2 (once as it's time consuming), and use READ TABLE itab WITH KEY comp1 = ... comp2 = ... BINARY SEARCH.
Declaring the primary key (default key or explicit components) of a standard table is a way to not mention the components after SORT, READ TABLE, etc., but the ABAP documentation recommends to explicitly declare them after SORT, READ TABLE, etc.
Consequently, I don't see any interest in declaring the primary key of a standard table.
NB: COLLECT works only based on the primary key of the standard table, so here there's no choice except if you replace COLLECT with code like this for instance:
ASSIGN itab[ c1 = line-c1 c2 = line-c2 ] TO FIELD-SYMBOL(<exist_line>).
IF sy-subrc = 0.
<exist_line>-counter = <exist_line>-counter + line-counter.
ELSE.
INSERT line INTO TABLE itab.
ENDIF.
If you want to use a sorted table for faster access, prefer declaring the table with TYPE SORTED TABLE or TYPE HASHED TABLE (or any alternate syntax to have Secondary Keys), it will really sort the table and accesses are faster, the compiler will send better warning or error messages with SORT (error because already sorted), READ TABLE, etc., than for a standard table (only some warnings if you use ATC).
For more information, see ABAP documentation - itab - Selection of the Table Category

Related

Behavior of a SORT without BY on standard internal tables? Is it safe?

What exactly does the SORT statement without key specification do when run on a standard internal table? As per the documentation:
If no explicit sort key is entered using the addition BY, the internal table itab is sorted by the primary table key. The priority of the sort is based on the order in which the key fields are specified in the table definition. In standard keys, the sort is prioritized according to the order of the key fields in the row type of the table. If the primary table key of a standard table is empty, no sort takes place. If this is known statically, the syntax check produces a warning.
With the primary table key being defined as:
Each internal table has a primary table key that is either a self-defined key or the standard key. For hashed tables, the primary key is a hash key, for sorted tables, the primary key is a sorted key. Both of these table types are key tables for which key access is optimized and the primary key thus has its own administration. The key fields of these tables are write-protected when you access individual rows. Standard tables also have a primary key, but the corresponding access is not optimized, there is no separate key administration, and the key fields are not write-protected.
And for good measure, the standard key is defined as:
Primary table key of an internal table, whose key fields in a structured row type are all table fields with character-like data types and byte-like data types. If the row type contains substructures, these are broken down into elementary components. The standard key for non-structured row types is the entire table row if the row type itself is not a table type. If there are no corresponding table fields, or the row type itself is a table type, the standard key from standard tables is empty or contains no key fields.
All of which mainly just confuses me as I'm not sure if I can really rely on the basic SORT statement to provide a reliable or safe result. Should I really just avoid it in all situations or does it have a purpose if used properly?
By extension, if I want to run a DELETE ADJACENT DUPLICATES FROM itab COMPARING ALL FIELDS, when would it be safe to do so after a simple SORT itab.? Only if I added a key on all fields? Without an explicit key only if I have an internal table with clike and xsequence columns? If I want to execute that DELETE statement, what is the most optimal SORT statement to run on the internal table?
SORT without BY should be avoided in all situations because it "makes the program difficult to understand and possibly unpredictable" (dixit ABAP documentation). I think that if you don't mention BY, there is a warning by a static check in the Code Inspector. You should use SORT itab BY table_line where table_line is a special name ("pseudo-component") meaning "all fields of the line".
Not your question, but you may also define the internal table with primary and secondary keys, so that you don't need to sort explicitly - DELETE ADJACENT DUPLICATES can be used with any of those keys.
Internal tables can have keys that can be inherited from structures the itab is based on or specified. As the documentation says, sort without by sorts by primary key, and that is safe assuming the internal table is implemented correctly.
I think this feature is designed as a dynamic feature to be used with smart table key design. If done correctly, sort without by can get your program to adapt to table key changes in the future. (so if your key changes, sort with change with it). Problems might arise when key is modified in an odd way.
As rule of a thumb:
The more specific your program code is, the less prone to errors (and safer) it is.
So sort by key_id, key_date will always produce the same sort by those 2 fields.
Dynamic components in an application make it more flexible, but tend to have (often hard to notice) bugs coming out when things they rely on are modified .
So if you take the previous example with 2 key fields, you add 1 in the middle (let's say key_is_active between 2 existing fields), sorting results might change in a way you did not expect.
If you had an algorithm that processes based on date, your algorithm might be broken by that change.
In your particular case with delete adjacent I would follow Sandra Rossi's advice.

Confusing t-sql exam answer about sequence or uniqueidentifier

I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.

SQL database design: storing the type of a row

I am designing a database to contain a table reference, with a column type that is one of several predefined values (e.g., book, movie, magazine, etc.). I intend the range of possible values to expand over time (e.g. if I realize that I missed the academic_paper type, I want to be able to put that in).
The easiest solution would seem to be to simply store a string representing the type into the table. But this sounds like it would result in a lot of wasted space.
The other solution I thought of is creating a new table reference_types, which the type column references in its foreign key. This seems to have the added benefit of ensuring valid foreign keys (so that I won't accidentally mistype a "magzine" somewhere in my code), possible allow for faster queries for all media of a certain type (since integer comparisons should be much faster than string comparisons), but also slow my application down a bit as joins would be required whenever I need the reference type, and probably complicate logic because of those extra joins.
What are your thoughts on schema design for this problem?
Your second solution is the correct one. Create a secondary table to store your reference types and link them using a foreign key.
For further reading on this subject the search term you'd want to use is 'database normalisation'.
Create the reference_types table. And in your references table use integer and also add a reference_type_name field.
You can query the references table to get the integer key and print its name when needed without performing a join to the other table, and still use that table to perfom other operations, just keep both tables with equal type names.
I know it sonds redundant, but it's really the fastest way to do a simple query by int key and have it all together.
It depends, if you will want to add some other information to reference types, then use the second approach. If not, use the first one because it's faster and the information stored is only a string (you can always select unique to retrieve your types). Read this article for more info.

Why is prefixing column names with the table name a convention?

If this is a duplicate I am sorry, I tried looking but this is an odd question to word.
I have seen this convention in many databases, but is seems redundant to me. I have found a few answers that say it is to reduce confusion during complex joins, but this doesn't seem like a sufficient reason. If you are making complex joins, make aliases. Do joins really represent such a common task that we should make standard tasks like selects, inserts, and updates redundant?
I don't think there is actually a convention of prefixing column names with the table name.
As Philippe Grondier details, the 'proper' approach to data modelling is to first create a dictionary of data element names. Following the international standard ISO 11179 guidelines:
[Object] [Qualifier] Property RepresentationTerm
you end up with data elements that are fully qualified. Here the qualifier elements Object, Qualifier and sometimes Property are in combination what you consider to be the 'prefix'.
On implementation of the data model in SQL, the table name can provide the context and leads the designer to drop the qualifying terms from the column name. I think this is convention you prefer.**
In other words, in the convention you are questioning it is not that the table name has been prefixed to the column name, rather it is that the qualifying terms have been retained.
** whether or not yours or any other is a good convention is subjective and Stackoverflow is not the place for such discussion. However, I will mention in passing that retaining qualification terms does have a practical benefits (as well as being theoretically sound) e.g. consider that SQL's NATURAL JOIN lends itself to columns that are named consistently throughout the schema.
It is true that such "developped column names" methods are widely used for column naming where, for example, Tbl_Person will have an id_Person primary key column, and a personName text column.
Though it might seem at first quite painfull to write 'developped' column names like "id_Person", "personName", "personAdress", etc, everything gets clearer when you have to write SELECT's on multiple tables, which is something that happens each time you open a form or a report.
There is also a theoretical/historical dimension to this "developped column names" method. First relational databases theories and methods (like MERISE) were proposing, as a first step, to build the so-called "data dictionary", ie the list of all data to be manipulated by the app\database.
This dictionary has to be established even before any "Entity-Relation" model is proposed. data names/descriptions have then to be fully developped, this to avoid confusion between 'similar' data entries, like, for example "companyName" and "personName".
Thus, the "developped column names" convention reflects the fact that, at the data level, similar columns (such as a Company.name and a Person.name columns) are not as equivalent as they seem to be. Though they both look like being here to hold a name, one of them is made to hold a company name, while the other is made to hold a person's name!
This convention can then be considered as a way to reflect the exact meaning of each of the database's column, or to reflect the exact meaning of each entry in the data dictionary.
I've never seen the full table name prefixed, but usually at least an abbreviation. And you're exactly right, it's for simplicity in joins and the like. It's easier to write ur_id all the time than it is to write id sometimes and userrights.id other times, for example. It's not that uncommon to need to access more than one table at a time.
Join is part of a select, so that comparison doesn't hold.
That aside, I don't think you should prefix the field with the table name, except for primary keys. I like to give every table a surrogate key, which I rather name after the table. So the table 'Orders' will get an 'OrderId' PK. An order line will have a foreign key OrderId to point to the order. That way, the field names are the same across tables, and you can tell by the name, which data it presents. You could name the field just 'Id' in all tables, but you do have to read the alias to see which ID you mean. Some queries I wrote are over 400 lines. You don't want to rely on table aliases alone. A little context in the fieldname itself does help.
It's not a convention; some people do it, some people don't. More often I see an ID column prefixed with the table name, but no other columns. Some (all?) DBs also allow prefixing with the table name in queries, but it's neither required, nor part of the actual column name.
In addition to what others said, it is also makes things simpler in the presence of identifying relationships (a.k.a. identifying FOREIGN KEYs).
An identifying relationship "migrates" the parent's primary key into a part of child's primary key. Prefix ensures there will be no collision and you won't need to rename the migrated fields, even when there are multiple levels of identifying relationships. For example:
PARENT:
PARENT_NAME PK
CHILD:
PARENT_NAME PK, FK referencing PARENT
CHILD_NAME PK
GRANDCHILD:
PARENT_NAME PK, FK referencing CHILD
CHILD_NAME PK, FK referencing CHILD
GRANDCHILD_NAME PK
Keeping the same name throughout the whole data model avoids any confusion as to what the field means and where it came from.
On the other hand, prefixing can take a toll on readability, so I usually take a compromise: prefix primary key fields but leave other fields unprefixed.
I dislike such naming conventions. It encourages sloth, specifically the use of unqualified references in queries. Use an alias for each table in your query and qualify each column reference with the appropriate alias.
The only such naming convention I like has to do with primary/foreign keys:
I like to name primary keys something clever, like id.
I like to name prefix the names of foreign key columns with the name of the table containing the primary key.
It makes for much more legible SQL, IMHO. An example:
create table foo
(
id int not null primary key ,
...
)
create table bar
(
id int not null primary key ,
foo_id int not null foreign key references foo (id) ,
...
)
select *
from foo foo
join bar bar on bar.foo_id = foo.id
This scheme falls down, of course, when you get to compound keys. But I like it. YMMV.

Recommended structure for table that has FK for 3 other tables

I have a table that will contain information for 3 other tables. The design I have is that this table will have a column that will tell the objects's ID and another column will tell the objects's type (and thus the table that that row refers to).
Two questions:
a) Is that the best design or is there something else more widely accepted?
b) What is the recommend procedure to assure that IDs are valid for the given objects's type?
If I understood your question correctly, each row in your table links to exactly one of the three other tables.
Your approach (type field + one foreign key field) is a valid design, and it's useful if you want to create a general-purpose table that contains meta-information about your data (e.g. a list of records that should be retransmitted for replication).
Another approach, which might be more suitable for real application-level data, would be to have three columns, each being a foreign key to one of the three tables, and to add a constraint that requires exactly two of those fields to be null. The has the following advantages:
The three FKs do not need to have the same data type.
The JOIN syntax becomes more natural (not involving the type field).
You can add referential integrity constraints on those FK columns.
You don't need to ensure correctness of the type field -- in fact, you don't need the type field at all. The type is determined implicitly by the one FK column which is not null.
a) I'm supposing you have a relationship one to many between objects and object types. In a normal design you'd have a reference from the objecttype column in the objects table to the primary key of the object types table
b) I would enforce referential integrity in the relationship properties (this depends on the dbms you are using). It's also up to you to use cascading on updates and deletes. This way, an update or a delete of the primary key on object types table would be reflected on the objects one, updating its foreign key column (object type column) or deleting the registers that have that object type.
The basics of DB schema design are easy, but more complicated situations can be really complicated to figure out what's best. There is a lot of personal subjectivity that can come into play here, and even performance can be a factor in denormalizing a design.
Disclaimer aside, my personal recommendation is to never use a column to store more than one kind of FK, i.e. a column for FKs should store FKs that point only to a single table. If you don't do this, you have to map the cascade of that column's data into multiple sub-select queries inside your code, and it can begin to get more messy than you expected. Your given "Problem No. 2, ensuring validity between type and FK" is just the beginning of a whole world of pain that will cascade throughout your source code.
Assuming you change the design to use one field per FK reference, I would also check whether each FK field in your main "information-holding table" will be fully valid for each record. If not, I would move out the FK columns that will only be applicable some of the time to a separate table.