Bigquery Cache Not Working - google-bigquery

I noticed that BigQuery no longer cache the same query even I have chosen to use cache in the GUI (both Alpha and Classic). I didn't edit the query at all, just keep clicking run query button and every time GUI executed the query without using cache results.
It happens to my PHP script as well. Before, it was enable to use cache and came back with results very quick and now it executes the query every time even the same query has been executed minutes ago. I can confirm the behaviour in the logs.
I am wondering if there is anything changed in the last few weeks? Or some kind of account level settings control this? Because it was working fine for me.

As per official docs here cache is disable when:
...any of the tables referenced by the query have recently received
streaming inserts...
Even if you are streaming to one partition, and then querying to another, this will invalidate caching functionality for the whole table. There is this feature request opened where it is requested to be able to hit cache when doing streaming inserts to one partition but querying a different partition.
EDIT***:
After some investigation I've found out that some months ago there was an issue going on which was allowing to hit the cache even streaming inserts were being made. This was not expected behavior, and therefore it got solved in May. I guess this is the change you have experienced and what you are talking about.
Docs have not changed related to this, and they aren't/weren't incorrect. Just the previous behavior was the incorrect one.

Related

Measuring the averaged elapsed time for SQL code running in google BigQuery

As BigQuery is a shared resource, it is possible that one gets different values running the same code on BigQuery. OK one option that I always use is to turn off caching in Query Settings, Cache preference. This way queries will not be cached. The problem with this setting is that if you refresh the browser or leave it idle, that Cache Preference box will be ticked again.
Anyhow I had a discussion with some developers that are optimizing the code. In a nutshell, they take slow running code, run it 5 times and get an average, then following optimization then run the code again 5 times to get an average value for optimized SQL. Details are not clear to me. However, my preference would be (all in BQ console)
create a user session
turn off sql caching
On BQ console paste the slow running code;
On the same session paste the optimized code
Run the codes (separated by ";")
This will ensure that any systematics like BQ busy/overloaded, slow connection etc will affect "BOTH" SQL piece equally and the systematics will be cancelled out. In my option one only need to run it once as caching is turned off as well. Running 5 times to get an average looks excessive and superfluous?
Appreciate any suggestions/feedback
Thanks
Measuring the time is one way, the other way to see if the query has been optimized is the understanding of the query plan and how slots are used effectively.
I've been with BigQuery more than 6 years, and what you describe was never used by me. In BigQuery actually what matters is reducing the costs, and that can be done iteratively rewriting the query, and using partitioning/clustering/materialized views, caching/temporary tables.

IBM i SQL - dump plan cache

I run heavy query on IBM i. First time it takes a long time, Subsequent times are much faster. It seems to be creating temporary index. How can I remove this index, so I can re-test like the first time?
Use the Visual Explain (VE) tool in the Run SQL Scripts component of ACS to see the differences between runs.
If indeed the issue is a system maintained temporary index (MTI), you can track it down via the schema's tooling in ACS and delete it if you so desire.
However, an MTI only gets deleted by the system when the system reboots (IPL).
So if you seeing differences without rebooting the server, I suspect the differences are caused by psuedo-closing. By default, once the DB see's the same query a few times (3 is the default), instead of hard closing it's cursors, it will psuedo-close them.
Again, VE will show "hard opens" and "pseudo opens".
To get the pseduo closed cursors to hard close, simply disconnect and reconnect.

RavenDB taking forever to show updates

I'm starting to assess our company using RavenDB for storing some stuff that doesn't really belong in a relational database (we're traditionally a SQL Server shop). I installed RavenDB locally on my machine, created a database, added a document. Nice!
Being a DBA, I decided to see how backups/restores work. I backed up my database, deleted it, then restored it from the backup. After refreshing my admin screen, I saw my database. I clicked on it, and got a message that the database doesn't exist.
After a couple hours, I tried again. Still doesn't exist. A full day later, I walk into work, and try again. This time the database works. I've had similar situations with updating documents. The update seems to take anywhere between 1 second - several hours to show an update...
Is this normal for RavenDB?? Am I completely misconfigured?? I run SQL Server on my local machine and it's lightning-fast, so I can't imagine updating a single document could take that long. As-is, I can't imagine recommending we use RavenDB for anything.
Are you querying using indexes or getting documents by ID? Documents should be updated immediately (ACID). If indexes are slow to update (check their status using RavenDB Studio), it could be a configuration problem or something external like an anti-virus software can cause them to update slowly.
Apparently, at least for the document-update latency, the default for caching in queries is enabled, so I was getting cached results.
Jeffery,
No, that isn't normal by a long short. You should be able to immediately see what was changed.
Note that certain AV products will interfere with the HTTP pipeline and can affect RavenDB's usage. The studio will also auto update things only every 5 seconds (to reduce UI jitter), but that is about it.
Restoring a database (from the same machine), should take only as long as it take to copy the files (pure I/O bound operation).
If this is from another machine using a different version of Windows, we might need to run a check on the file, which can take a bit of time, but that doesn't sound like your scenario

Django PostgreSQL Database [Data Race?]

I have a Django application hosted on Heroku using the PostgreSQL database addon. Upon performing a GET request for the front page, my applications performs a SQL query to extract some necessary display information. I also create a subprocess with Popen on each GET request.
However, when I notice that the number of GET requests increasing to around once every second, I being erroring at the statement model.objects.get(id="----"). I get an OperationalError ; I'm assuming that either my free plan on heroku isn't keeping up or my database isn't keeping up.
In this case, I don't want to leave Heroku's free plan, but I was wondering if I did, would I need to create more workers? Upgrade my database? What are ways to diagnose the issue? And why would a simple SQL query cause issues as the number of requests increase to an interval of around once every second? Does this seem reasonable?
My hack solution was just to sleep the view whenever I catch an OperationalError. Any other approaches recommended?

Is there a way to query the Apple System Log incrementally?

I want to display some contents of the system log using the Apple System Log API. Is there a way to query the log database incrementally, ie. only get the messages that are new since the last query? The aslresponse_next call only seems to iterate over messages that existed at the time of the query.
I think I could fake incremental querying by setting a minimum time filter, but the approach seems prone to various timing issues, so I’d rather have something more official if possible.