Creating a local variable instead of calling a method to get data - memory-efficient

My question is about the efficient code. Please let me know which is the efficient approach among the given below.
There's a method call to get an object. For eg.,
relationship.getCommerceItem()
But, we need to call this method multiple times in a single line itself. So, I'm planning to create a local variable to replace the method call and store the return value. Like given below.
commerceItem = relationship.getCommerceItem()
Now, which approach is more efficient and why?
Considering that this code will be executed in an environment where thousands and thousands of requests will be received.

It depends on whether or not the logic executed in the called function needs to be ran every time. In other words, does the return value change?
If not, saving it in a variable saves you the resources needed for the function call (which is the most sane thing to do IMO).

Related

Completely reject a GET event in abap

I would like to know how many times the GET event would happen without actually calling it (or calling it only one time).
By now, I know how to get the total number of iterations: lines( (ldb_include)node_table[] ) , but this only works if the GET has been called, and if the GET has been called, it will iterate through node_table and if it has 8798237 entries, they all will be PUT, but as I already have the number of iterations (that's all I need), I don't want to have all the values put.
I can leave the GET by using REJECT, but it will only go to the next iteration... currently, I don't know how to completely quit the GET.
I've tried using STOP, but it raises the event of end-of-selection immediately, which is not the idea...
at selection-screen output.
"process the selection screen
start-of-selection.
get <node_tab>
"lv_total = lines( (ldb_include)node_table[] )
"some sort of REJECT to all get events
"continue processing the rest of the code, using the lv_total
end-of-selection.
"display the output
I can achieve it using a flag like
if first execution = abap_true. "process it
else reject <node_tab>.
But it would, in all cases, iterate through all the GET events, against the idea. I would like to understand if there is a smarter (possibly more elegant) to iterate only the first GET, and skip all the others.
It's like saying there's a database view with joined tables, but one table is not needed, how to make the program read the view but tell the database not read one table...
Impossible!
The only solution is to copy and adapt the Logical Database. As simple as that.
And, of course, logical databases are obsolete for a long time, so prefer using a database join, or anything else better.

Using pymc.potential to prevent evaluation of function at meaningless parameters values

I am building a pymc model which must evaluate a very cpu expensive function (up to 1 sec per call on a very decent hardware). I am trying to limit the explored parameter space to meaningful solutions by means of a potential (the sum of a list of my variables has to stay within a given range). This works but I noticed that even when my potential returns an infinite value and forbids the parameters choice, this function gets evaluated. Is there a way to prevent that? Can one force the sampler to use a given evaluation sequence (pick up the necessary variables, check if the potential is ok and proceed if allowed)
I thought of using the potential inside the function itself and use it to determine whether it must proceed or immediately return, but is there a better way?
Jean-François
I am not aware of a way of ordering the evaluation of the potentials. This might not be the best way of doing so, but you might be able to check if the parameters are within reasonable at the beginning of the simulation. If the parameters are not within reasonable bounds you can return a value that will create your posterior to be zero.
Another option is to create a function for your likelihood. At the beginning of this function you could check if the parameters are within reasonable limits. If they are not you can return -inf without running your simulation. If they are reasonable you can run your model and calculate the log(p).
This is definitely not an elegant solution but it should work.
Full disclosure - I am not by any means a pymc expert.

Is there a way for VBA UDF to "know" what other functions will be run?

Assume I have a UDF that will be used in a worksheet 100,000+ times. Is there a way, within the function, for it to know how many more times it is going to be called in the batch? Basically what I want to do is have every function create a to-do list of work to do. I want to do something like:
IF remaining functions to be executed after this one = 0 then ...
Is there a way to do this?
Background:
I want to make a UDF that will perform SQL queries with the user just giving parameters(date, hour, node, type). This is pretty easy to make if you're willing to actually execute the SQL query every time the function is run. I know its easy because I did this and it was ridiculously slow. My new idea is to have the function first see if the data it is looking for exists in a global cache variable and if it isn't to add it to a global variable "job-list".
What I want it to do is when the last function is called to then go through the job list and perform the fewest number of SQL queries and fill the global cache variable. Once the cache variable is full it would do a table refresh to make all the other functions get called again since on the subsequent call they'll find the data they need in the cache.
Firstly:
VBA UDF performance is extremely sensitive to the way the UDF is coded:
see my series of posts about writing efficient VBA UDFs:
http://fastexcel.wordpress.com/2011/06/13/writing-efficient-vba-udfs-part-3-avoiding-the-vbe-refresh-bug/
http://fastexcel.wordpress.com/2011/05/25/writing-efficient-vba-udfs-part-1/
You should also consider using an Array UDF to return multiple results:
http://fastexcel.wordpress.com/2011/06/20/writing-efiicient-vba-udfs-part5-udf-array-formulas-go-faster/
Secondly:
The 12th post in this series outlines using the AfterCalculate event and a cache
http://fastexcel.wordpress.com/2012/12/05/writing-efficient-udfs-part-12-getting-used-range-fast-using-application-events-and-a-cache/
Basically the approach you would need is for the UDF to check the cache & if not current or available then add a request to the queue. Then use the after-calculation event to process the queue and if neccessary trigger another recalc.
Performing 100,000 SQL queries from an Excel spreadsheet seems like a poor design. Creating a cache'ing mechanism on top of these seems to compound the problem, making it more complicated than it probably needs to be. There are some circumstances where this might be appropriate, but I would consider other design approaches instead.
The most obvious is to take the data from the Excel spreadsheet and load it into a table in the database. Then use the database to do the processing on all the rows as once. The final step is to read the result back into Excel.
I find that the best way to get large numbers of rows from Excel into a database is to save the Excel file as csv and bulk insert them.
This approach may not work for your problem. In general, though, set-based approaches running in the database are going to perform much better.
As for the cach'ing mechanism, if you have to go down that route. I can imagine a function that has the following pseudo-code:
Check if input values are in cache.
If so, read values from cache.
Else do complex processing.
Load values in cache.
This logic could go in the function. As #Bulat suggests, though, it is probably better to add an additional caching layer around the function.

What state is saved between rerunning queries in Linqpad?

What state is saved between rerunning queries in Linqpad? I presumed none, so if you run a script twice it will have the same results both time.
However run the C# Program below twice in the same Linqpad tab. You'll find the first it prints an empty list, the second time a list with the message 'hey'. What's going on?
System.ComponentModel.TypeDescriptor.GetAttributes(typeof(String)).OfType<ObsoleteAttribute>().Dump();
System.ComponentModel.TypeDescriptor.AddAttributes(typeof(String),new ObsoleteAttribute("hey"));
LINQPad caches the application domain between queries, unless you request otherwise in Edit | Preferences (or press Ctrl+Shift+F5 to clear the app domain). This means that anything stored in static variables will be preserved between queries, assuming the types are numerically identical. This is why you're seeing the additional type description attribute in your code, and also explains why you often see a performance advantage on subsequent query runs (since many things are cached one way or another in static variables).
You can take advantage of this explicitly with LINQPad's Cache extension method:
var query = <someLongRunningQuery>.Cache();
query.Select (x => x.Name).Dump();
Cache() is a transparent extension method that returns exactly what it was fed if the input was not already seen in a previous query. Otherwise, it returns the enumerated result from the previous query.
Hence if you change the second line and re-execute the query, the query will execute quickly since will be supplied from a cache instead of having to re-execute.

groovy sql eachRow and rows method

I am new to grails and groovy.
Can anyone please explain to me the difference between these two groovy sql methods
sql.eachRow
sql.rows
Also, which is more efficient?
I am working on an application that retrieves data from the database(the resultset is very huge) and writes it to CSV file or returns a JSON format.
I was wondering which of the two methods mentioned above to use to have the process done faster and efficient.
Can anyone please explain to me the
difference between these two groovy
sql methods sql.eachRow sql.rows
It's difficult to tell exactly which 2 methods you're referring 2 because there are a large number of overloaded versions of each method. However, in all cases, eachRow returns nothing
void eachRow(String sql, Closure closure)
whereas rows returns a list of rows
List rows(String sql)
So if you use eachRow, the closure passed in as the second parameter should handle each row, e.g.
sql.eachRow("select * from PERSON where lastname = 'murphy'") { row ->
println "$row.firstname"
}
whereas if you use rows the rows are returned, and therefore should be handled by the caller, e.g.
rows("select * from PERSON where lastname = 'murphy'").each {row ->
println "$row.firstname"
}
Also, which is more efficient?
This question is almost unanswerable. Even if I had implemented these methods myself there's no way of knowing which one will perform better for you because I don't know
what hardware you're using
what JVM you're targeting
what version of Groovy you're using
what parameters you'll be passing
whether this method is a bottleneck for your application's performance
or any of the other factors that influence a method's performance that cannot be determined from the source code alone. The only way you can get a useful answer to the question of which method is more efficient for you is by measuring the performance of each.
Despite everything I've said above, I would be amazed if the performance difference between these two was in any way significant, so if I were you, I would choose whichever one you find more convenient. If you find later on that this method is a performance bottleneck, try using the other one instead (but I'll bet you a dollar to a dime it makes no difference).
If we set aside minor syntax differences, there is one difference that seems important. Let's consider
sql.rows("select * from my_table").each { row -> doIt(row) }
vs
sql.eachRow("select * from my_table") { row -> doIt(row) }
The first one opens connection, retrieves results, closes connection and returns them. Now you can iterate over the results while connection is released. The drawback is you now have entire result list in memory which in some cases might be a lot.
EachRow on the other hand opens a connection and while keeping it open executes your closure for each row. If your closure operates on the database and requires another connection your code will consume two connections from the pool at the same time. The connection used by eachRow is released after it iterates though all the resulting rows. Also if you don't perform any database operations but closure takes a while to execute, you will be blocking one database connection until eachRow completes.
I am not 100% sure but possibly eachRow allows you not to keep all resulting rows in memory but access them through a cursor - this may depend on the database driver.
If you don't perform any database operations inside your closure, closure executes fast and results list is big enough to impact memory then I'd go for eachRow. If you do perform DB operations inside closure or each closure call takes significant time while results list is manageable, then go for rows.
They differ in signature only - both support result sets paging, so both will be efficient. Use whichever fits your code.