SQL Performance: LIKE vs IN - sql

In my logging database I have serveral qualifiers specifying the logged action. A few of them are built like 'Item.Action' e.g. 'Customer.Add'. I wonder which approach will be faster if I want to get all logging items that starts with 'Customer.':
SELECT * FROM log WHERE action LIKE 'Customer.%'
or
SELECT * FROM log WHERE action IN ('Customer.Add', 'Customer.Delete', 'Customer.Update', 'Customer.Export', 'Customer.Import')
I use PostgreSql.

Depends on indexes on log table. Most likely - queries will have the same performance. To check - use explain or explain analyze. Queries with the same execution plan (output of explain) will have the same performance.

Related

How to calculate accumulated sum of query timings?

I have a sql file running many queries. I want to see the accumualted sum of all queries. I know that if I turn on timing, or call
\timing
query 1;
query 2;
query 3;
...
query n;
at the beginning of the script, it will start to show time it takes for each query to run. However, I need to have the accumulate results of all queries, without having to manually add them.
Is there a systematic way? If not, how can I fetch the interim times to throw them in a variable.
The pg_stat_statements is a good module that provides a means for tracking execution statistics.
First, add pg_stat_statements to shared_preload_libraries in the
postgresql.conf file. To know where this .conf file exists in your
filesystem, run show config_file;
shared_preload_libraries = 'pg_stat_statements'
Restart Postgres database
Create the extension
CREATE EXTENSION pg_stat_statements;
Now, the module provides a View, pg_stat_statements, which helps you to analyze various query execution metrics.
Reset the contents of stat collected before running queries.
SELECT pg_stat_statements_reset();
Now, execute your script file containing queries.
\i script_file.sql
You may get all the timing statistics of all the queries executed. To get the total time taken, simply run
select sum(total_time) from pg_stat_statements
where query !~* 'pg_stat_statements';
The time you get is in milliseconds, which may be converted to desired format using various timestamp related Postgres functions
If you want to time the whole script, on linux or mac you can use the time utility to launch the script.
The measurement in this case is a bit more than the sum of the raw query times, because it includes some overhead of starting and running the psql command. On my system this overhead is around 20ms.
$ time psql < script.sql
…
real 0m0.117s
user 0m0.008s
sys 0m0.007s
The real value is the time it took to execute the whole script, including the aforementioned overhead.
The approach in this answer is a crude, simple client side way to measure the runtime of the overall script. It is not useful to measure milli-second precision server side execution times. It still might be sufficient for many use-cases.
The solution of Kaushik Nayak is a way more precise method to time executions directly on the server. It also provides much more insight into the execution (eg. query level times).

EXASOL Explain Analyse Query

I want to get the query plan in Exasol database to check the total execution time, memory and cpu usage. Profiling in Exasol is so complex and difficult to understand.
Is there any way to get the query plan like explain analyze in PostgreSQL or any other simple way?
Please explain how to read the query plan in Exasol without executing the query?
You can check the EXASOL User Manual about profiling a query. I agree it's a bit cumbersome :)
Or you can use the scripts I wrote to have an explain like command: exasol-explain
Maybe it will be useful for someone who will try to use EXASOL Explain. There is a script with one missed field in select statement in exasol-explain/scripts/sqlprofile.lua, after temp_db_ram_peak field should follow:
max(PERSISTENT_DB_RAM_PEAK) as PERSISTENT_DB_RAM_PEAK
Otherwise "explain" and "explain_this" return an error "incorrect numbers of result column"

Simple queries take very long

When I execute a query for the first time in DBeaver it can take up to 10-15 seconds to display the result. In SQLDeveloper those queries only take a fraction of that time.
For example:
Simple "select column1 from table1" statement
DBeaver: 2006ms,
SQLDeveloper: 306ms
Example 2 (other way around; so theres no server-side caching):
Simple "select column1 from table2" statement
SQLDeveloper: 252ms,
DBeaver: 1933ms
DBeavers status box says:
Fetch resultset
Discover attribute column1
Find attribute column1
Late bind attribute colummn1
2, 3 and 4 use most of the query execution time.
I'm using oracle 11g, SQLDeveloper 4.1.1.19 and DBeaver 3.5.8.
See http://dbeaver.jkiss.org/forum/viewtopic.php?f=2&t=1870
What could be the cause?
DBeaver looks up some metadata related to objects in your query.
On an Oracle DB, it queries catalog tables such as
SYS.ALL_ALL_TABLES / SYS.ALL_OBJECTS - only once after connection, for the first query you execute
SYS.ALL_TAB_COLS / SYS.ALL_INDEXES / SYS.ALL_CONSTRAINTS / ... - I believe each time you query a table not used before.
Version 3.6.10 introduced an option to enable/disable a hint used in those queries. Disabling the hint made a huge difference for me. The option is in the Oracle Properties tab of the connection edit dialog. Have a look at issue 360 on dbeaver's github for more info.
The best way to get insight is to perfom the database trace
Perform few time the query to eliminate the caching effect.
Than repeat in both IDEs following steps
activate the trace
ALTER SESSION SET tracefile_identifier = test_IDE_xxxx;
alter session set events '10046 trace name context forever, level 12'; /* binds + waits */
Provide the xxxx to identify the test. You will see this string as a part of the trace file name.
Use level 12 to see the wait events and bind variables.
run the query
close the conenction
This is important to not trace other things.
Examine the two trace files to see:
what statements were performed
what number of rows was fetched
what time was elapsed in DB
for the rest of the time the client (IDE) is responsible
This should provide you enough evidence to claim if one IDE behaves different than other or if simple the DB statements issued are different.

How to run a query for a map job only in Apache hive

If I write a query in Apache hive then it executes mapreduce job behind the scene but how I can run only map job in hive?
Thanks
Certain optimized queries do in fact only require a map phase. You may provide a MAPJOIN hint in Hive to achieve same: this is recommended for small secondary tables:
SELECT /*+ MAPJOIN(...) */ * FROM ...
This was a question which was asked to me in an interview,I didn't knew the answer that time but then i figured it out later on.
The following query runs a Map only job.So selecting column values will run map only job.Hence we dont need reducer for this scenario.
select id,salary from tableA;

MySQL VIEW vs. embedded query, which one is faster?

I'm going to optimize a MySQL embedded query with a view, but I'm not sure whether it will give an effect:
SELECT id FROM (SELECT * FROM t);
I want to convert it to:
CREATE VIEW v AS SELECT * FROM t;
SELECT id FROM v;
I've heard about "indexed views" in SQL Server, but I'm not sure about MySQL. Any help would be appreciated. Thanks!
Indexed views in SQL Server are generally called "materialized views", which MySQL does not support. MySQL's VIEW support is rather limited in comparison to other vendors - the restrictions are listed in their documentation.
A normal view is merely a prepared SQL statement - there's no difference between using the two examples you provided. In some cases, the WHERE clause when selecting from a View can be pushed into the VIEW query by the optimizer, but it's completely out of your control.
The view might be faster (it probably is), but why don't you just test it? Or run an EXPLAIN against both queries to see how they will execute?
It's about the same. Will it be fast or not it depends on your indexes.
MySQL caches query results, so as long as your queries are same between executions, and as long as underlying dataset is same (no new records added), it will return cached results on next query execution.
The select statement will be run each time you fetch a view.
A view behaves a bit differently, see Create View