When to use FIELD-SYMBOLS - abap

I don't quite understand when to use FIELD-SYMBOLS. They are like the Pointers in C/C++ but we are using them everywhere in that language. Simply:
What is the difference between using
DATA gt_mara LIKE TABLE OF mara.
DATA gs_mara LIKE LINE OF gt_mara.
LOOP AT gt_mara INTO gs_mara.
"code
ENDLOOP.
and
DATA gt_mara LIKE TABLE OF mara.
FIELD-SYMBOLS <gs_mara> like LINE OF gt_mara.
LOOP AT gt_mara ASSIGNING <gs_mara>.
"code
ENDLOOP.

Your question boils down to Value vs. Reference Semantics, I'll quote some parts of that article:
Reference semantics are A Good Thing. We can’t live without pointers. We just don’t want our software to be One Gigantic Rats Nest Of Pointers. In C++, you can pick and choose where you want reference semantics (pointers/references) and where you’d like value semantics (where objects physically contain other objects etc). In a large system, there should be a balance. However if you implement absolutely everything as a pointer, you’ll get enormous speed hits.
Objects near the problem skin are larger than higher level objects. The identity of these “problem space” abstractions is usually more important than their “value.” Thus reference semantics should be used for problem-space objects.
In ABAP you'd model "problem-space objects" either as structures or classes, thus those are values for which one would generally use reference semantics. For referencing class instances one has to use references, which can also be used to reference structures. So for references in ABAP, one would generally use references instead of the more limited field symbols.
Now with references you usually have the problem of memory management,
as as long as a value on the heap is referenced, it cannot be garbage collected and references onto the stack can only be accessed till the stack is popped off. Thus there is always some overhead for managing the referenced memory and checking whether references are still valid. The ABAP Documentation states:
From a technical perspective, the field symbols are implemented by references or pointers, which are comparable to references in data reference variables. A data reference variable is declared in the same way as every other data object and the memory area for the reference it contains is in the data area of the ABAP program. However, the pointer assigned to a field symbol is exclusively managed by the ABAP runtime environment and is located in a memory area, which cannot be accessed directly in an ABAP program.
Additionally (local) field symbols can only be declared inside a procedure (and not e.g. as an instance member), and as such the lifetime of a field symbol is never longer than that of a local variable. This is also true for references to table lines as long as nothing is deleted from the table. These guarantees probably allow the ABAP Kernel to handle field symbols in such cases more efficiently (e.g. this discussion).
So field symbols make sense where performance is crucial and copying a structure² is way more expensive than passing around a reference (although you want to keep value semantics). The prime example for that are loops over large tables. When writing back into the table a field symbol is also easier than a MODIFY to copy back the values into the table.
TLDR: Prefer FIELD-SYMBOLS inside of loops!
²When passing around other large data objects such as internal tables or strings it would also always make sense to pass by reference instead of copying. Fortunately ABAP already does that under the hood for these types, and copies when they're mutated (to keep the value semantics).

With a LOOP ... INTO structure, the structure is a copy of the table line. At the beginning of each loop iteration, the content of the table line is copied into the structure. You can change the structure, but those changes won't get reflect in the table. When you want to change the content of the table, then you have to explicitly use the UPDATE instruction.
LOOP AT itab INTO structure.
structure-betrh = structure-betrw * exchange_rate.
UPDATE itab FROM structure INDEX sy-tabix.
ENDLOOP.
With a LOOP ... ASSIGNING <field_symbol> (or LOOP ... REFERENCE INTO reference), the field-symbol points directly at the table line. You can change it, and those changes are applied directly to the table.
LOOP AT itab ASSIGNING <field_symbol>.
<field_symbol>-betrh = <field_symbol>-betrw * exchange_rate.
ENDLOOP.
This is also faster when you are only reading the table, because no time gets wasted with unnecessary copying of data.

Related

Why assignment is faster than APPEND LINES OF?

I'm currently learning ABAP and can anyone explain why t_table2 = t_table1 is significantly faster than APPEND LINES OF t_table1 TO t_table2?
t_table1, t_table2 are internal tables
In addition to the answers by Zero and Cameron Smith, there's also a concept called "table sharing" (AKA "copy-on-write") which delays the copy until any of the source or target internal table is changed.
If I simplify a lot, one could represent it like the assignment like a copy of 8 bytes (the address of the source internal table). Anyway, most of the time, one of the 2 internal tables will be changed (otherwise, why would there be a copy in the code!) so the final performance is often almost the same, it's just that sometimes there's a benefit because of some code "badly" written.
Memory allocation
When you define an internal table with DATA, the kernel allocates more than one row's space in the memory, so they are stored togehter. Also every time you fill these rows, again a bigger batch will be booked.
You can see this in memory dumps, in this case 16 rows would have been allocated:
When you copy with APPEND LINES OF, the kernel copies line-by-line.
When you just say itab1 = itab2, it gets copied in blocks.
How much faster
Based on the information above, you might think line-by-line is 16 times slower. It is not, in practice, depending on row width, number of lines, kernel version and many other things it is just 10-30% slower.
I can't say this is a full reason (there's probably more going on behind the scenes that I don't know), but some of the reasons definitely include the following.
A thing to note here: on small to medium data sets the difference in speed is negligible.
t_table2 = t_table1 just takes all of data and copies it, overwriting t_table2 ( it does NOT append). In some cases (such as when passing parameters) the data does not even get copied. The same data may be used and a copy will only be produced if a t_table2 needs to be changed.
APPEND LINES OF t_table1 TO t_table2 is basically a loop, which appends records row by row.
The reason I mention the append is because overwrite of a table can be as simple as copy data (or data reference in rare cases) from a to b, while append performs checks whether or not the table is sorted, indexed and such. Even if the table is in its most basic state, append of an internal table is a slightly more complex procedure than an overwrite of a variable.

How should I store update history (auditing)

I am wondering how to properly store object update information (ie auditing). My constraints are:
I have 1 type of objects with 10-25 properties that will be updated: These properties are well defined.
Each property is expected to be updated hundred to hundred thousand times over the life of the object (a hard limit can be set if needed).
The number of objects cannot be limited.
Most accesses will only read the last 100 updates.
My purist instinct want's me to put all updates in a table (per property). I can then query + merge the results to get my history. Trying to keep it in normal form as much as possible.
On the other hand I'd like to keep updates close to their parent object (I will never mix updates from 2 objects). This would make it easier to shard (keeping updates and objects on the same machine).

Resetting Unused column to used in Oracle

A column in a table is marked as UNUSED. I want it to make it a regular column (as used) again, for accessing it by SELECT.
I know that next step of UNUSED is DROP.
But is there any way that I can retrieve the data from the UNUSED column ?
"is there any way that I can retrieve the data from the UNUSED column ?"
No. The SET UNUSED syntax is a convenience for DBAs. Dropping a column is potentially a resource-intensive exercise. Marking it as UNUSED is a lot quicker, so it allows them to withdraw a column from use in busy times and run DROP UNUSED when the database is quieter. But the data is as lost as if they had just dropped the column.
The only way to retrieve the data would be to restore the column, through one of the various Flashback features (depending on what you've got configured) or else RMAN (or whichever Backup/Recovery solution you have in place).

NSMutableArray or NSMutableDictionary, which one is a better choice as the model for a turn based game?

I'm making a turn base game, take Civilization as example. All units on the map are instance of Agent class. And I'm designing a AgentQueue class to control the order of units' actions in each turn.
I can come up with two options for the model base, one is NSMutableArray, another one is NSMutableDictionary.
Plan A:
NSMutableArray contains the references to each Agent instance.
How to get next Agent: in the order of index in this array.
Insert and delete Agent: Use NSMutableArray's methods to insert or remove object at a given index.
When all the indexes are called in one turn, the flag will be reset and a new turn begins from the index 0.
Plan B:
NSMutableDictionary contains the references to each Agent. The keys are Agent ID, which will be generated in order or randomly when a new Agent is inserted. Each Agent instance has two references called PreviousAgent and NextAgent.
How to get next Agent: use currentAgent.NextAgent
Insert and delete Agent: Add/remove an Agent instance in the dictionary, and modify PreviousAgent/NextAgent of related Agents(actually only three agents will be modified: the one being added/removed, the previous one, and the next one)
In the beginning pick one Agent as the last Agent. When this one is called, one turn is done and a new turn begins.
I was planning to use the plan A, however considering the whole array/dictionary will contain around 400 to 90000 agents, and there'll be lots of inserting and deleting happen, efficiency is very important. NSMutableArray will modify every following objects' indexes when you add or remove one that is not the last object. This sounds very expensive to me. Using Array is fast to find the object with a given key, however finding an object in dictionary with a given key is not slow either(I've read that dictionary is actually a hash table, which is very quick in finding). Then I came up with the Plan B, which avoid modifying other objects.
Please help me to figure out which one is better in terms of efficiency. And let me know if I'm having the right thinking during the design process. Thanks!
Your assumptions how NSMutableArray works are wrong.
Inserting or deleting an item from NSMutableArray takes time proportional to the distance to the start or the end of the array, whichever is shorter. Adding or removing close to the start or close to the end is fast.
But then of course your assumption that finding an object in an array is fast is also wrong. It's linear in the distance from the start of the array (or a full search if the object is not present). If you want fast access, that is NSDictionary. It's a hash table.

What is the cost of object reference in Scala?

Assume we build an object to represent some network (social, wireless, whatever). So we have some 'node' object to represent the KIND of network, different nodes might have different behaviors and so forth. The network has a MutableList of nodes.
But each node has neighbors, and these neighbors are also nodes. So somewhere, there has to be a list, per node, of all of the neighbors of that node--or such a list has to be generated on the fly whenever it is needed. If the list of neighbors is stored in the node objects, is it cheaper to store it (a) as a list of nodes, or (b) as list of numbers that can be used to reference nodes out of the network?
Some code for clarity:
//approach (a)
class network {
val nodes = new MutableList[Node]
// other stuff //
}
class Node {
val neighbors = new MutableList[Node]
// other stuff //
}
//approach (b)
class Network {
val nodes = new MutableList[Node]
val indexed_list = //(some function to get an indexed list off nodes)
//other stuff//
}
class Node {
val neighbors = MutableList[Int]
//other stuff//
}
Approach (a) seems like the easiest. My first question is whether this is costly in Scala 2.8, and the second is whether it breaks the principle of DRY?
Short answer: premature optimization is the root of etc. Use the clean reference approach. When you have performance issues there's no substitute for profiling and benchmarking.
Long answer: Scala uses the exact same reference machinery as Java so this is really a JVM question more than a Scala question. Formally the JVM spec doesn't say one word about how references are implemented. In practice they tend to be word sized or smaller pointers that either point to an object or index into a table that points to the object (the later helps garbage collectors).
Either way, an array of refs is about the same size as a array of ints on a 32 bit vm or about double on a 64bit vm (unless compressed-oops are used). That doubling might be important to you or might not.
If you go with the ref based approach, each traversal from a node to a neighbor is a reference indirection. With the int based approach, each traversal from a node to a neighbor is a lookup into a table and then a reference indirection. So the int approach is more expensive computationally. And that's assuming you put the ints into a collection that doesn't box the ints. If you do box the ints then it's just pure craziness because now you've got just as many references as the original AND you've got a table lookup.
Anyway, if you go with the reference based approach then the extra references can make a bit of extra work for a garbage collector. If the only references to nodes lie in one array then the gc will scan that pretty damn fast. If they're scattered all over in a graph then the gc will have to work harder to track them all down. That may or may not affect your needs.
From a cleanliness standpoint the ref based approach is much nicer. So go with it and then profile to see where you're spending your time. That or benchmark both approaches.
The question is - what kind of a cost? Memory-wise, the b) approach would probably end up consuming more memory, since you have both mutable lists, and boxed integers in that list, and another global structure holding all the indices. Also, it would probably be slower because you would need several levels of indirection to reach the neighbour node.
One important note - as soon as you start storing integers into mutable lists, they will undergo boxing. So, you will have a list of heap objects in both cases. To avoid this, and furthermore to conserve memory, in the b) approach you would have to keep a dynamically grown array of integers that are the indices of the neighbours.
Now, even if you modify the approach b) as suggested above, and make sure the indexed list in the Network class is really an efficient structure (a direct lookup table or a hash table), you would still pay an indirection cost to find your Node. And memory consumption would still be higher. The only benefit I see is in keeping some sort of a table of weak references if you're concerned you might run out of memory, and recreate the Node object when you need it and you cannot find it in your indexed_list which keeps a set of weak references.
This is, of course, just a hypothesis, you would have to profile/benchmark your code to see the difference.
My suggestion would be to use something like an ArrayBuffer in Node and use it store direct references to nodes.
If memory concerns are an issue, and you want to do the b) approach together with weak references, then I would further suggest rolling in your own dynamically grown integer-array for neighbours, to avoid boxing with ArrayBuffer[Int].