Google Datastore Emulator - testing

For a test, we start a local instance of the cloud-datastore-emulator with the LocalDatastoreHelper class, provided by Google.
The interesting obersvation, we made was, that we can insert data with our code and then find it again, by performing a GQL query
SELECT [..] WHERE myfield = true
if we do it against a "live" store, hosted at Google Cloud.
But:
When we run the same code against the emulator running locally, the insert works, but the query does not. A findAll() works fine, so it looks like inserting and reading works generally, but somehow the indexes are missing?
After reading the documentation for hours now, I did not find any hint, why this might happen.
Can anybody help?

Most likely what you are running into is - eventual consistency. By default, the Datastore Emulator simulates a consistency of 0.9. Note that most queries are eventually consistent, unless the WHERE clause is a Key or the query is an Ancestor Query. I believe you are just lucky that "live" store is returning the result. If you run the test enough times at different times of the day, it is possible that it may not return the results (it all depends on the timing and the time it takes to update the indexes).
That said, the Datastore Emulator has an option to specify the consistency level it should simulate. This can be done with the following command:
gcloud beta emulators datastore start --data-dir=/my/data/dir --host-port localhost:9999 --consistency 1.0
A consistency level of 1.0 would guarantee consistent reads. I'm not sure if there is an option to set the consistency level with LocalDatastoreHelper.
Again, the "live" Datastore is always eventually consistent for all cases other than the few exceptions I have mentioned above.

Just to not confuse people with my original question, I answer myself, as we found the source of our problem.
Unfortunately the difference between live and local was not due to some difference in behavior on the side of the datastore, but was rooted in our code. :-(
As far as I see all our former assumptions about indexes not being generated are false. With that respect, local behaves exactly as live does.
Thank you for contributing!

Related

Can I take advantage of Yugabytes compatability?

Yugabyte seems to support Redis, Cassandra and SQL queries. Do they work with each other? For example, can I write data with Cassandra API and later perform SQL queries against them?
These APIs do not work with each other as is, meaning you would not be able to query YCQL data from YSQL. This is because the data types are all not always present in the other APIs, and they often have different semantics.
That said, we get asked this a lot and the plan is to enable this scenario using a foreign data wrapper. So, in effect, you would be able to "import" the YCQL table into the YSQL side and use it there. Note that PostgreSQL already has a bunch of these wrappers (for example, see this generic list of PG FDWs here - it has entries for Cassandra and Redis). The idea is to re-use/enhance these and get them to work out of the box.
If you're interested, please open a GitHub issue and we can continue there. Would love to understand your use-case better to make sure we are able to address it and work with you closely on this.

BigQuery Classic UI fails to show estimated cost and always shows "This query will process 0 B when run."

I noticed this issue - can be quite bad for users if they don't check size of querying table(s) by their own.
No matter how really heavy query in terms of bytes to be processed - I see below
Looks like this feature is broken on Classic UI since recently. What can I do?
Note: not really programmatic question but I thought critical enough to expose this hopefully temporary issue to users
Try bigquery-mate chrome browser plugin: https://chrome.google.com/webstore/detail/bigquery-mate/nepgdloeceldecnoaaegljlichnfognh?hl=en-US
Or
even a better solution for your org:
https://potens.io/products/#goliath
The best option you have is to either use BigQuery New UI (I double checked and this is not broken there) or to use third party tools like Potens.io as an example
Update :
This was fixed and now estimated bytes and respective cost is available in Classic UI

Trigger action realtime based on keyword in Logs

I have a requirement for which I want to trigger an action (like calling a REST-ful service) in the event a keyword is found in the logs. The trigger would have to be fairly real time. I was evaluating open source solutions like GrayLog2, ELK stack (which I believe can't analyse real time), fluentd etc. but wanted to know your opinion on that. It would be great if the tool also allows setting up rules against key words to eliminate false positives and easy to set up.
I hope this makes sense and apologies if this has been discussed elsewhere!
You can try Massalyzer. It's a real-time analyzer too, very fast (up to 10 millinon line per sec), and you can analyze unlimited size with free demo version.
So, I tried Logstash+Graylog2 combination for the scenario I described in the question and it works quite well. I had to tweak a few things to make Logstash work with Graylog2, especially around capturing the right log levels. I will try this out on a highly loaded clustered environment and update my findings here.

Strategies for Fixing Problems / Tweaking NHibernate Apps in Production

First off, I am not a DBA, but I do work in an environment where DBAs do tune/make changes in the production database from time to time in ways that do not cause the need for an application rebuild/redeployment. Usually these changes consist of reworking indexes, changing procs, and sometimes changing the table structure in minor ways (usually abstracted from the app via procs).
Obviously, a team should strive to catch performance problems with NHibernate before they get into production using things like NHProf, SQL Profiler, and load tests. That being said, are there certain strategies that can be used to allow some amount of tweaking once the code is built and out running in production? Using stored procedures 100% of the time seems like it would allow the most flexibility for the DBA's, but obviously that would really kill the efficiency of NHibernate. From what I've read, updatable views (in SQL Server) don't really work that well with NHibernate either (this may-or-may-not be true).
I've read quite a bit about NHibernate and experimented with it over the years, but I have never put it into practice in a production environment. I have yet to come across a set of "best practices" to allow for maximum tweaking once deployed.
As an NHibernate user, how are you and your team dealing with issues if they arise in production? My production environment is made up of ASP.NET apps and SQL server, but I don't think the answers need to be restricted to that platform.
I am in a similar position, and in order to keep our DBA happy, I did the following:
Wrote some of the queries in HQL, some others in SQL (especially those perf-sensitive)
Externalized those queries to files, one file per query.
When your app needs to execute of these queries, it just loads the appropriate file, optionally running it through a pre-processor, and runs it.
With this approach, the DBA could theoretically tweak the queries just by modifying those files. That's quite similar to having stored procedures.
In practice, it's up to you to decide if you'll really give the DBA access to those files (if you catch my drift...)
IMHO the DBA should just use the DBMS's profiling tools and report her findings back to the devs (as in "there's this query that is running 20 times/sec and does 10 joins. is that really necessary? can it be cached? do you really need all those joins? can we denormalize this?" etc.
I'm not in the deploy phase yet, but on my current project I've come up against this already and my solution presently has been to replace my queries with stored procs. As long as the shape of the data coming back from the DB remains the same it's not a big deal. Yeah you do lose some of that agility you enjoyed during development but I'm not sure it's as bad as it initially sounds. You'll have a code push when you first make the change of course, and then from that point it's just proc changes.
You can use a profiler like NHProf to see the sql queries executed, so you can show them to a DBA. This tool can also detect some problem like n+1 select.
Using a second level of cache can be useful : http://web.archive.org/web/20110514214657/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/11/09/first-and-second-level-caching-in-nhibernate.aspx

PostgreSQL performance monitoring tool

I'm setting up a web application with a FreeBSD PostgreSQL back-end. I'm looking for some database performance optimization tool/technique.
Database optimization is usually a combination of two things
Reduce the number of queries to the database
Reduce the amount of data that needs to be looked at to answer queries
Reducing the amount of queries is usually done by caching non-volatile/less important data (e.g. "Which users are online" or "What are the latest posts by this user?") inside the application (if possible) or in an external - more efficient - datastore (memcached, redis, etc.). If you've got information which is very write-heavy (e.g. hit-counters) and doesn't need ACID-semantics you can also think about moving it out of the Postgres database to more efficient data stores.
Optimizing the query runtime is more tricky - this can amount to creating special indexes (or indexes in the first place), changing (possibly denormalizing) the data model or changing the fundamental approach the application takes when it comes to working with the database. See for example the Pagination done the Postgres way talk by Markus Winand on how to rethink the concept of pagination to make it more database efficient
Measuring queries the slow way
But to understand which queries should be looked at first you need to know how often they are executed and how long they run on average.
One approach to this is logging all (or "slow") queries including their runtime and then parsing the query log. A good tool for this is pgfouine which has already been mentioned earlier in this discussion, it has since been replaced by pgbadger which is written in a more friendly language, is much faster and more actively maintained.
Both pgfouine and pgbadger suffer from the fact that they need query-logging enabled, which can cause a noticeable performance hit on the database or bring you into disk space troubles on top of the fact that parsing the log with the tool can take quite some time and won't give you up-to-date insights on what is going in the database.
Speeding it up with extensions
To address these shortcomings there are now two extensions which track query performance directly in the database - pg_stat_statements (which is only helpful in version 9.2 or newer) and pg_stat_plans. Both extensions offer the same basic functionality - tracking how often a given "normalized query" (Query string minus all expression literals) has been run and how long it took in total. Due to the fact that this is done while the query is actually run this is done in a very efficient manner, the measurable overhead was less than 5% in synthetic benchmarks.
Making sense of the data
The list of queries itself is very "dry" from an information perspective. There's been work on a third extension trying to address this fact and offer nicer representation of the data called pg_statsinfo (along with pg_stats_reporter), but it's a bit of an undertaking to get it up and running.
To offer a more convenient solution to this problem I started working on a commercial project which is focussed around pg_stat_statements and pg_stat_plans and augments the information collected by lots of other data pulled out of the database. It's called pganalyze and you can find it at https://pganalyze.com/.
To offer a concise overview of interesting tools and projects in the Postgres Monitoring area i also started compiling a list at the Postgres Wiki which is updated regularly.
pgfouine works fairly well for me. And it looks like there's a FreeBSD port for it.
I've used pgtop a little. It is quite crude, but at least I can see which query is running for each process ID.
I tried pgfouine, but if I remember, it's an offline tool.
I also tail the psql.log file and set the logging criteria down to a level where I can see the problem queries.
#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this time.
I also use EMS Postgres Manager to do general admin work. It doesn't do anything for you, but it does make most tasks easier and makes reviewing and setting up your schema more simple. I find that when using a GUI, it is much easier for me to spot inconsistencies (like a missing index, field criteria, etc.). It's only one of two programs I'm willing to use VMWare on my Mac to use.
Munin is quite simple yet effective to get trends of how the database is evolving and performing over time. In the standard kit of Munin you can among other thing monitor the size of the database, number of locks, number of connections, sequential scans, size of transaction log and long running queries.
Easy to setup and to get started with and if needed you can write your own plugin quite easily.
Check out the latest postgresql plugins that are shipped with Munin here:
http://munin-monitoring.org/browser/branches/1.4-stable/plugins/node.d/
Well, the first thing to do is try all your queries from psql using "explain" and see if there are sequential scans that can be converted to index scans by adding indexes or rewriting the query.
Other than that, I'm as interested in the answers to this question as you are.
Check out Lightning Admin, it has a GUI for capturing log statements, not perfect but works great for most needs. http://www.amsoftwaredesign.com
DBTuna http://www.dbtuna.com/postgresql_monitor.php has recently started supporting PostgreSQL monitoring. We use it extensively for MySQL monitoring, so if it provides the same for Postgres then it should be a good fit for you too.