I need to perform some queries on a rather large table, how do i check if the query has finished?
The main problem is that the queries can take up to 10 minutes and i want to tell the user, and hence the webbrowser, that its still running, so simply waiting for $sth->fetch* to finish is not an option as it will "pause" the script until there is data to be fetched.
I checked the documentation but there seems to be no function like $dbh->has_finished() or $dbh->has_data().
Asynchronous database queries should be possible with an event loop. I suggest that you take a look at AnyEvent::DBI. The trick is to use a condition variable. The query is executed asynchronously. When the query is finishen it calls a callback sub that broadcasts on the condition variable. You can use $cv->ready for checking whether the query is finished.
Related
When we run a query in big-query environment, the results are cached in the temporary table. From next time onwards, when we ran the same query multiple times, the subsequent runs will fetch the results from the cache for the next 24 hrs with some exceptions. Now my use case is, in the subsequent runs, i want to know like from which jobId this query cache results are got, previous first time run of the query ??
I have checked all the java docs related to query didn't find that info. We have cacheHit variable, which will tell you whether the query has fetched from the cache or not . Here i want to know one step further, from what jobId, the results got fetched. I expected like, may be in this method i can know the info, but i am always getting the null value for that. I also want to know what is meant by parentJob in big-query context.
It's unclear why you'd even care about this other than as a technical exercise. If you want to build your own application caching layer that's a different concern. More details about query caching can be found on https://cloud.google.com/bigquery/docs/cached-results.
The easiest way to do this would probably be by traversing jobs.list until you find a job that has the same destination table (it'll be prefaced with an anon prefix), and where the cacheHit stat is false/not present.
Your inquiry about parentJob is unrelated to this exercise. It's for finding all the child jobs created as part of a script or multi-statement execution. More information about this can be found on https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting-concepts.
Let's say we have a SQL Server database and a table Orders with columns id, date, total, type_id. This table is being updated very frequently (about every 5 seconds) and I would like to use an event listener, which is triggered on every INSERT.
Even more - I would like the event listener to send the newly inserted data to the listening function. Further more, I would like the trigger to work on a SQL WHERE condition (where as the last part with the WHERE cause is not mandatory for me) - so for example only trigger for specific data type: WHERE Orders.type_id=2.
Afaik one would use SQL Server notifications for this. If so, I would like to understand it more deeply. If it works, the way I understand it, the trigger would run every 5 sec - will the trigger wait for the script to end its work?
What does this mean for the performance - will my database be much slower? How can this be measured?
And what is the best way to implement this? I've read that there are different approaches - using C# or VB.
I'm wondering to use Hangfire to program an update task. I want this task to be executed every time. So when the task finish I want to execute it again.
I don't know if the best method to use is RecurringJob or use a loop and a BackgroundJob for it.
What do you recommend me? Are there any other options?
You can use RecurringJob which triggers every x minutes or x hours (based on your requirement - can set CRON expression) , which will trigger the task/work after every such interval. This you need to use in conjunction with DisableConcurrentExecution attribute, so that multiple instances of same task are not triggered, also this attributes makes sure once your first instance is completely, then only second one will be processed.
Alternatively, you can use BackGroundJob to enqueue a task/job but this does the processing of job only one time. So you need to write some code to check this Jobs status and re-enqueue the same job again, once the first is completed. In this approach you need to write some code to do that.
I would suggest the best way is to use RecurringJob.AddOrUpdate in conjunction with DisableConcurrentExecution
Assume I have a UDF that will be used in a worksheet 100,000+ times. Is there a way, within the function, for it to know how many more times it is going to be called in the batch? Basically what I want to do is have every function create a to-do list of work to do. I want to do something like:
IF remaining functions to be executed after this one = 0 then ...
Is there a way to do this?
Background:
I want to make a UDF that will perform SQL queries with the user just giving parameters(date, hour, node, type). This is pretty easy to make if you're willing to actually execute the SQL query every time the function is run. I know its easy because I did this and it was ridiculously slow. My new idea is to have the function first see if the data it is looking for exists in a global cache variable and if it isn't to add it to a global variable "job-list".
What I want it to do is when the last function is called to then go through the job list and perform the fewest number of SQL queries and fill the global cache variable. Once the cache variable is full it would do a table refresh to make all the other functions get called again since on the subsequent call they'll find the data they need in the cache.
Firstly:
VBA UDF performance is extremely sensitive to the way the UDF is coded:
see my series of posts about writing efficient VBA UDFs:
http://fastexcel.wordpress.com/2011/06/13/writing-efficient-vba-udfs-part-3-avoiding-the-vbe-refresh-bug/
http://fastexcel.wordpress.com/2011/05/25/writing-efficient-vba-udfs-part-1/
You should also consider using an Array UDF to return multiple results:
http://fastexcel.wordpress.com/2011/06/20/writing-efiicient-vba-udfs-part5-udf-array-formulas-go-faster/
Secondly:
The 12th post in this series outlines using the AfterCalculate event and a cache
http://fastexcel.wordpress.com/2012/12/05/writing-efficient-udfs-part-12-getting-used-range-fast-using-application-events-and-a-cache/
Basically the approach you would need is for the UDF to check the cache & if not current or available then add a request to the queue. Then use the after-calculation event to process the queue and if neccessary trigger another recalc.
Performing 100,000 SQL queries from an Excel spreadsheet seems like a poor design. Creating a cache'ing mechanism on top of these seems to compound the problem, making it more complicated than it probably needs to be. There are some circumstances where this might be appropriate, but I would consider other design approaches instead.
The most obvious is to take the data from the Excel spreadsheet and load it into a table in the database. Then use the database to do the processing on all the rows as once. The final step is to read the result back into Excel.
I find that the best way to get large numbers of rows from Excel into a database is to save the Excel file as csv and bulk insert them.
This approach may not work for your problem. In general, though, set-based approaches running in the database are going to perform much better.
As for the cach'ing mechanism, if you have to go down that route. I can imagine a function that has the following pseudo-code:
Check if input values are in cache.
If so, read values from cache.
Else do complex processing.
Load values in cache.
This logic could go in the function. As #Bulat suggests, though, it is probably better to add an additional caching layer around the function.
I have a query in MS Access that in where clause I have:
WHERE (((tb_KonzeptFunktionen.Konzept)=[Formulare]![frm_Fahrzeug]![ID]));
It takes long time to run, but when I delete this where clause the query runs less than a second.
Can I say that pass the [Formulare]![frm_Fahrzeug]![ID] as a parameter does not efficient? Or looking up the control value is slowing it down? If yes how can I solve this problem?
The db engine should retrieve the control's value almost instantaneously. If that WHERE condition slows down your query significantly, it is more likely due to extra work the db engine must perform to retrieve the matching rows. You can check this assumption by temporarily substituting a static known value in place of the control's value.
WHERE tb_KonzeptFunktionen.Konzept=1;
If the version with the static value is equally slow, create an index on tb_KonzeptFunktionen.Konzept and try again.