RavenDB and stale indexes on application startup

RavenDB and stale indexes on application startup - ravendb

I have an application that:
Uses an embedded RavenDB database
With the Monotonic read option
With static indexes
I've noticed that when the application starts up, the indexes are all temporarily stale even if they weren't stale when the application shutdown (cleanly) the previous run.
For example, I run the app, insert 10 records, and wait for the indexes to be up to date. I query it and see the results I expect. Then I close the app, start it again, and query, and I get no results. A second later the indexes have updated and I can now get results back. If I repeat this, sometimes I get 10 results, sometimes 0 results, sometimes 2 results.
My assumption was that when querying an index at time T, I would always see "at least as consistent" results at T + 1. However if the server restarts during this time I seem to see less consistent results at T + 1. This isn't eventual consistency, it's eventual inconsistency! :)
My questions are:
Was my assumption wrong/stupid?
Is there a way to prevent this from happening?
Is this behavior just a starting up thing, or could this potentially happen at any time? For example, could I render 10 results, then on refresh 0 results, and then again 10 results the next refresh just because Raven decided it wanted to re-index everything?

To answer your questions,
Was my assumption wrong/stupid?
No, as i understand it your assumption is correct, you may see stale results, but shouldn't see less stale results, unless the index was rebuilt from scratch between T & T+1 and you hit it just after the index was reset.
Is there a way to prevent this from happening?
The only way i know of is to be very sure that the indexes were cleanly closed. There are usually lock files left in the index folders that indicate if this isn't the case. The debug level logs from Raven will indicate if a rebuild is initiated as well.
If all else fails, you may need to have a process that blocks until all indexes (or the ones you care about) are fresh before proceeding with the rest of the application start-up process.
Is this behavior just a starting up thing, or could this potentially happen at any time? For example, could I render 10 results, then on refresh 0 results, and then again 10 results the next refresh just because Raven decided it wanted to re-index everything?
Yes, this is possible if the index is reset, but as far as i know that will only happen (after startup) if manually triggered through the API or management studio.

Related

Load balancing SQL reads while batch-processing?

Given an SQL table with timestamped records. Every once in a while an application App0 does something like foreach record in since(certainTimestamp) do process(record); commitOffset(record.timestamp), i.e. periodically it consumes a batch of "fresh" data, processes it sequentially and commits success after each record and then just sleeps for reasonable time (to accumulate yet another batch). That works perfect with single instance.. however how to load balance multiple ones?
In exactly the same environment App0 and App1 concurrently competite for the fresh data. The idea is that ready query executed by the App0 must not overlay with the same read query executed by the App1 - such that they never try to process the same item. In other words, I need SQL-based guarantees that concurrent read queries return different data. Is that even possible?
P.S. Postgres is preferred option.

The problem description is rather vague on what App1 should do while App0 is processing the previously selected records.
In this answer, I make the following assumptions:
all Apps somehow know what the last certainTimestamp is and it is the same for all Apps whenever they start a DB query.
while App0 is processing, say the 10 records it found when it started working, new records come in. That means, the pile of new records with respect to certainTimestamp grows.
when App1 (or any further App) starts, the should process only those new records with respect to certainTimestamp that are not yet being handled by other Apps.
yet, if on App fails/crashes, the unfinished records should be picked the next time another App runs.
This can be achieved by locking records in many SQL databases.
One way to go about this is to use
SELECT ... FOR UPDATE SKIP LOCKED
This statement, in combination with the range-selection since(certainTimestamp) selects and locks all records matching the condition and not being locked currently.
Whenever a new App instance runs this query, it only gets "what's left" to do and can work on that.
This solves the problem of "overlay" or working on the same data.
What's left is then the definition and update of the certainTimestamp.
In order to keep this answer short, I don't go into that here and just leave the pointer to the OP that this needs to be thought through properly to avoid situations where e.g. a single record that cannot be processed for some reason keeps the certainTimestamp at a permanent minimum.

IBM i SQL - dump plan cache

I run heavy query on IBM i. First time it takes a long time, Subsequent times are much faster. It seems to be creating temporary index. How can I remove this index, so I can re-test like the first time?

Use the Visual Explain (VE) tool in the Run SQL Scripts component of ACS to see the differences between runs.
If indeed the issue is a system maintained temporary index (MTI), you can track it down via the schema's tooling in ACS and delete it if you so desire.
However, an MTI only gets deleted by the system when the system reboots (IPL).
So if you seeing differences without rebooting the server, I suspect the differences are caused by psuedo-closing. By default, once the DB see's the same query a few times (3 is the default), instead of hard closing it's cursors, it will psuedo-close them.
Again, VE will show "hard opens" and "pseudo opens".
To get the pseduo closed cursors to hard close, simply disconnect and reconnect.

Bigquery Cache Not Working

I noticed that BigQuery no longer cache the same query even I have chosen to use cache in the GUI (both Alpha and Classic). I didn't edit the query at all, just keep clicking run query button and every time GUI executed the query without using cache results.
It happens to my PHP script as well. Before, it was enable to use cache and came back with results very quick and now it executes the query every time even the same query has been executed minutes ago. I can confirm the behaviour in the logs.
I am wondering if there is anything changed in the last few weeks? Or some kind of account level settings control this? Because it was working fine for me.

As per official docs here cache is disable when:
...any of the tables referenced by the query have recently received
streaming inserts...
Even if you are streaming to one partition, and then querying to another, this will invalidate caching functionality for the whole table. There is this feature request opened where it is requested to be able to hit cache when doing streaming inserts to one partition but querying a different partition.
EDIT***:
After some investigation I've found out that some months ago there was an issue going on which was allowing to hit the cache even streaming inserts were being made. This was not expected behavior, and therefore it got solved in May. I guess this is the change you have experienced and what you are talking about.
Docs have not changed related to this, and they aren't/weren't incorrect. Just the previous behavior was the incorrect one.

UI testing fails with error "Failed to get snapshot within 15.0s"

I am having a table view with a huge amount of cells. I tried to tap a particular cell from this table view. But the test ending with this error:
Failed to get snapshot within 15.0s
I assume that, the system will take snapshot of the whole table view before accessing its element. Because of the huge number of cells, the snapshot taken time was not enough (15 secs is may be the system default time).
I manually set the sleep time / wait time (I put 60 secs). Still I can not access the cells after 60 seconds!!
A strange thing I found is, before accessing the cell, I print the object in Debugger like this:
po print XCUIApplication().cells.debugDescription
It shows an error like
Failed to get snapshot within 15.0s
error: Execution was interrupted, reason: internal ObjC exception breakpoint(-3)..
The process has been returned to the state before expression evaluation.
Again if I print the same object using
po print XCUIApplication().cells.debugDescription
Now it will print all cells in the tableview in Debugger view.
No idea about why this happens. Does anyone faced similar kind of issues? Help needed!!

I assume that, the system will take snapshot of the whole table view before accessing its element.
Your assumption is correct but there is even more to the story. The UI test requests a snapshot from the application. The application takes this snapshot and then sends the snapshot to the test where the test finally evaluates the query. For really large snapshots (like your table view) that means that:
The snapshot takes a long time for the application to generate and
the snapshot takes a long time to send back to the test for query evaluation.
I'm at WWDC 2017 right now and there is a lot of good news about testing - specifically some things that address your exact problem. I'll outline it here but you should go watch WWDC 2017 Session 409 and skip to timestamp 17:10.
The first improvement is remote queries. This is where the test will transmit the query to the application, the application will evaluate that query remotely from the test and then only transmit back the result of that query. Expect a marginal improvement from this enhancement ~20% faster and ~30% less memory.
The second improvement is query analysis. This enhancement will reduce the size of snapshots taken by using a minimum attribute set for taking snapshots. This means that full snapshots of the view are not taken by default when evaluating a query. Example is if you're querying to tap a button, the snapshot is going to be limited to buttons within the view. This means writing less ambiguous queries are even more important. I.e. if you want to tap a navigation bar button, specify it in the query like app.navigationBars.buttons["A button"]. You'll see even more performance improvement from this enhancement ~50% faster and ~35% less memory
The last and most notable (and dangerous) improvement is what they're calling the First Match API. This comes with some trade offs/risks but offers the largest performance gain. It offers a new .firstMatch property that returns the first match for a XCUIElement query. Ambiguous matches that result in test failures will not occur when using .firstMatch, so you run the risk of evaluating or performing an action on an XCUIElement that you didn't intend to. Expect a performance increase of ~10x faster and no memory spiking at all.
So, to answer your question - update to Xcode 9, macOS High Sierra and iOS 11. Utilize .firstMatch where you can with highly specific queries and your issue of timing out snapshots should be solved. In fact the time out issue you're experiencing might already be solved with the general improvements you'll receive from remote queries and query analysis without you having to use .firstMatch!

mysql slow on first query, then fast for related queries

I have been struggling with a problem that only happens when the database has been idle for a period of time for the data queried. The first query will be extremely slow, on the order of 30 seconds and then related queries will be fast like 0.1 seconds. I am assuming this is related to caching, but I have been unable to find the cause of it.
Changing the mysql variables tmp_table_size, max_heap_table_size to a larger size had no effect except to create the temp tables in memory.
I do not think this is related to the query itself as it is well indexed and after the first slow query, variants of the same query do not show up in the slow query log. I am most interested in trying to determine the cause of this or a way to reset the offending cache so I can troubleshoot the issue.

Pages of the innodb data files get cached in the innodb buffer pool. This is what you'd expect. Reading files is slow, even on good hard drives, especially random reads which is mostly what databases see.
It may be that your first query is doing some kind of table scan which pulls a lot of pages into the buffer pool, then accessing them is fast. Or something similar.
This is what I'd expect.
Ideally, use the same engine for all tables (exceptions: system tables, temporary tables (perhaps) and very small tables or short-lived ones). If you don't do this then they have to fight for ram.
Assuming all your tables are innodb, make the buffer pool use up to 75% of the server's physical ram (assuming you don't run too many other tasks on the machine).
Then you will be able to fit around 12G of your database into ram, so once it's "warmed up", the "most used" 12G of your database will be in ram, where accessing it is nice and fast.
Some users of mysql tend to "warm up" production servers following a restart by sending them queries copied from another machine for a while (these will be replication slaves) until they add them into their production pool. This avoids the extreme slowness seen while the cache is cold. For example, Youtube does this (or at least it used to; Google bought them and they may now use Google-fu)

MySQL Workbench:
The below isn't 100% related to this SO question, but the symptoms are very related and this is the first SO result when searching for "mysql workbench slow" or related terms, so hopefully it's useful for others.
Clear the query history! - following the process at MySql workbench query history ( last executed query / queries ) i.e. create / alter table, select, insert update queries to clear MySQL Workbench's query history really sped up the program for me.
In summary: change the Output pane to History Output, right click on a Date and select Delete All Logs.
The issue I was experiencing was "slow first query" in that it would take a few seconds to load the results even though the duration/fetch were well under 1 second. After clearing my query history, the duration/fetch times stayed the same (well under 1 second, so no DB behavior actually changed), but now the results loaded instantly rather than after a few second delay.

Is anything else running on your mysql server? My thought is that maybe after the first query, your table is still cached in memory. Once it's idle, another process is causing it to be de-cached. Just a guess though.
How much memory do you have any what else is running?

I had an SSIS package that was timing out. The query was very simple, from a single MySQL table, but it sometimes returned a lot of records and would sometimes take a few minutes initially to execute, then only a few milliseconds afterwards if I wanted to query it again. We were stuck with the ADO connection, which meant it would time out after 30 seconds, so about half the databases we were trying to load were failing.
After beating my head against the wall I tried performing an initial query first; very simple and only returning a few rows. Since it was so simple it executed fast and set the table in the cache for faster querying. In the next step of the package I would do the more complex query which returned the large data set that kept timing out. Problem solved - all tables loaded. I may start doing this on a regular basis, the complex queries execute much faster by doing a simple query first.

Ttry and compare the output of "vmstat 1" on the linux command line when running the query after a period of time, vs when you re-run it and get results fast. Specifically check the "bi" column (that's the kb read from disk per second).
You may find the operating system is caching the disk blocks in the fast case (and thus a low "bi" figure), but not in the slow case (and hence a large "bi" figure).
You might also find that vmstat shows high/low cpu usage in either case. If it's low when fast, and disk throughput is also low, then your system may still be returning a cached query, even though you've indicated the relevant config value is set to zero. Perhaps check the output of show engine innodb status and SHOW VARIABLES and confirm.
innodb_buffer_pool_size may also be set high (it should be...), which would cache the blocks even before the OS can return them.
You might also find that "key_buffer" is set high - this would cache the keys in the indexes, which could make your select blindingly fast.
Try check the mysql performance blog site for lots of useful info.

I had issue when MySQL 5.6 was slow on first query after idle period. This was a connection problem, not MySQL instance problem, e.g. if you run MYSQL Query Browser execute "select * from some_queue", leave it alone for a couple of hours, then execute any query, it runs slow, while at the same time processes on server or new instance of Browser will select from same tables instantly.
Adding skip-host-cache, skip-name-resolve to my.ini file solved this problem.
I don't know why is that. Why I tried this: MySQL 5.1 without those options was slowly establishing connections from other networks (e.g. server is 192.168.1.100, 192.168.1.101 connects fast, 192.168.2.100 connects slow), MySQL 5.6 didn't have such problem to start with so we didn't add these to my.ini initially.
UPD: Solved half the cases, actually. Setting wait_timeout to maximum integer fixed the other half. Maybe I even now can remove skip-host-cache, skip-name-resolve and it won't slow down in 100% of the cases

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas