Oracle - allowing the LIKE clause being configured by another select - sql

In a script I'm running the following query on an Oracle database:
select nvl(max(to_char(DATA.VALUE)), 'OK')
from DATA
where DATA.FILTER like (select DATA2.FILTER from DATA2 where DATA2.FILTER2 = 'WYZ')
In the actual script it's a bit more complicated, but you get the idea. ;-)
DATA2.FILTER contains the filter which needs to applied on DATA as a LIKE -clause. The idea is to have it as generic as possible, meaning it should be possible to filter on:
FILTER%
%FILTER
FI%LTER
%FILTER%
FILTER (as if the clause was DATA.FILTER = (select DATA2.FILTER from DATA2 where DATA2.FILTER2 = 'WYZ')
The system the script runs on does not allow stored procedures to run for this kind of task, also I can't make the scrip build the query directly before running it.
Whatever data needs to be fetched, it has to be done with one single query.
I've already tried numerous solutions I found online, but no matter what I do, I seem to be misisng the mark.
What am I doing wrong here?

This one?
select nvl(max(to_char(DATA.VALUE)), 'OK')
from DATA
JOIN DATA2 on DATA.FILTER LIKE DATA2.FILTER
where DATA2.FILTER2 = 'WYZ'
Note: The performance of this query is not optimal, because for table DATA Oracle will perform always a "Full Table Scan" - but it works.

Related

How do you identify what where clause will be filtered first in Oracle and how to control it

I have a problem where the fix is to exchange what gets filtered first, but I'm not sure if that is even possible and not knowledgeable enough how it works.
To give an example:
Here is a table
When you filter this using the ff query:
select * from pcparts where Parts = 'Monitor' and id = 255322 and Brand = 'Asus'
by logic this will be correct as the Asus component with a character in its ID will be filtered and will prevent an ORA-01722 error.
But to my experience this is inconsistent.
I tried using the same filtering in two different DB connections, the first one didn't get the error (as expected) but other one got an ORA-01722 error.
Checking the explain plan the difference in the two DB's is the ff:
I was thinking if its possible to make sure that the Parts got filtered first before the ID but I'm unable to find anything when i was searching, is this even possible, if not, what is a fix for this issue without relying on using TO_CHAR
I assume you want to (sort of) fix a buggy program without changing the source code.
According to your image, you are using "Filter Predicates", this normally means Oracle isn't using index (though I don't know what displays execution plans this way).
If you have an index on PARTS, Oracle will probably use this index.
create index myindex on mytable (parts);
If Oracle thinks this index is inefficient, it may still use full table scan. You may try to 'fake' Oracle into thinking this an efficient index by lying about the number of distinct values (the more distinct values, the more efficient)
exec dbms_stats.set_index_stats(ownname => 'myname', indname => 'myindex', numdist => 100000000)
Note: This WILL impact performance of other querys using this table
"Fix" is rather simple: take control over what you're doing.
It is evident that ID column's datatype is VARCHAR2. Therefore, don't make Oracle guess, instruct it what to do.
No : select * from pcparts where Parts = 'Monitor' and id = 255322 and Brand = 'Asus'
Yes: select * from pcparts where Parts = 'Monitor' and id = '255322' and Brand = 'Asus'
--------
VARCHAR2 column's value enclosed into single quotes

ms access query (ms access freezes)

I have this report and need to add totals for each person (the red circle)
existing report
new report
I cannot change the existing report so I export data from MS SQL to MS Access and create a new report there. I got it working for one employee but have trouble with a query which would for multiple employees.
This query extract data use as input:
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[EMP_ID])=376) And (([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
this query populates the report:
SELECT *
FROM TIME1
WHERE RCD_NUM = (SELECT Max(RCD_NUM) FROM [TIME1] UQ WHERE UQ.PPERIOD = [TIME1].PPERIOD AND UQ.PC = [TIME1].PC);
the problem is if I remove EMP_ID from the first query like this
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
then the second query doesn't work and ms access freezes when running this query.
any help/idea please?
Caveat: I won't pretend to know the precise cause of the problem, but I have had to repeatedly refactor queries in Access to get them working even though the original SQL statements are completely valid in regards to syntax and logic. Sometimes I've had to convolute a sequence of queries just to avoid bugs in Access. Access is often rather dumb and will simply (re)execute queries and subqueries exactly as given without optimization. At other times Access will attempt to combine queries by performing some internal optimizations, but sometimes those introduce frustrating bugs. Something as simple as a name change or column reordering can be the difference between a functioning query and one that crashes or freezes Access.
First consider:
Can you leave the data on SQL Server and link to the results in Access (rather than export/importing it into Access)? Even if you need or prefer to use Access for creating the actual report, you could use all the power of SQL Server for querying the data--it is likely less buggy and more efficient.
Common best practice is to create SQL Server stored procedures that return just what data you need in Access. A pass-through query is created in Access to retrieve the data, but all data operations are performed on the server.
Perhaps this is just a performance issue where limiting the set by [EMP_ID] selects a small subset, but the full table is large enough to "freeze" Access.
How long have you let Access remain frozen before killing the process? Be patient... like many, many minutes (or hours). Start it in the morning and check after lunch. :) It might eventually return a result set. This does not imply it is tolerable or that there is no other solution, but it can be useful to know if it eventually returns data or not.
How many possible records are there?
Are the imported data properly indexed? Add indexes to all key fields and those which are used in WHERE clauses.
Is the database located on a network share or is it local? Try copying the database to a local drive.
Other hints:
Try the BETWEEN operator for dates in the WHERE clause.
Try refactoring the "second" query by performing a join in the FROM clause rather than the WHERE clause. In doing this, you may also want to save the subquery as a named query (just as [TIME1] is saved). Whether or not a query is saved or embedded in another statement CAN change the behavior of Access (see caveat) even though the results should be identical.
Here's a version with the embedded aggregate query. Notice how all column references are qualified with the source. Some of the original query's columns do not have a source alias prefixing the column name. Remember the caveat... such picky details can affect Access behavior.:
SELECT TIME1.*
FROM TIME1 INNER JOIN
(SELECT UQ.PPERIOD, UQ.PC, Max(UQ.RCD_NUM) As Max_RCD_NUM
FROM [TIME1] UQ
GROUP BY UQ.PPERIOD, UQ.PC) As TIMEAGG
ON (TIME1.PPERIOD = TIMEAGG.PPERIOD) And (TIME1.PC = TIMEAGG.PC)
AND (TIME1.RCD_NUM = TIMEAGG.Max_RCD_NUM)

How to make Hive Terminal show rows (not just headers) after code is run?

As of now, Hive Terminal is showing only column headers after a create table code is run. What settings should I change to make Hive Terminal show few rows also, say first 100 rows?
Code I am using to create table t2 from table t1 which resides in the database (I don't know how t1 is created):
create table t2 as
select *
from t1
limit 100;
Now while development, I am writing select * from t2 limit 100; after each create table section to get the rows with headers.
You cannot
The Hive Create Table documentation does not mention anything about showing records. This, combined with my experience in Hive makes me quite confident that you cannot achieve this by mere regular config changes.
Of course you could tap into the code of hive itself, but that is not something to be attempted lightly.
And you should not want to
Changing the create command could lead to all kinds of problems. Especially because unlike the select command, it is in fact an operation on metadata, followed by an insert. Both of these normally would not show you anything.
If you would create a huge table, it would be problematic to show everything. If you choose always to just show the first 100 rows, that would be inconsistent.
There are ways
Now, there are some things you could do:
Change hive itself (not easy, probably not desirable)
Do it in 2 steps (what you currently do)
Write a wrapper:
If you want to automate things and don't like code duplication, you can look into writing a small wrapper function to call the create and select based on just the input of source (and limit) and destination.
This kind of wrapper could be written in bash, python, or whatever you choose.
However, note that if you like executing the commands ad-hoc/manually this may not be suitable, as you will need to start a hive JVM each time you run such a program and thus response time is expected to be slow.
All in all you are probably best off just doing the create first and select second.
The below command mentioned seems to be correct to show the first 100 rows:
select * from <created_table> limit 100;
Paste the code you have written to create the table will help to diagnose the issue in hand!!
Nevertheless , check if you have correctly mentioned the delimiters for the elements, key-value pairs, collection items etc while creating the table.
If you have not defined them correctly you might end up with having only the first row(header) being shown.

Out of the two sql queries below , suggest which one is better one. Single query with join or two simple queries?

Assuming result of first query in A) (envelopecontrolnumber,partnerid,docfileid) = (000000400, 31,35)
A)
select envelopecontrolnumber, partnerid, docfileid
from envelopeheader
where envelopeid ='LT01ENV1107010000050';
select count(*)
from envelopeheader
where envelopecontrolnumber = '000000400'
and partnerid= 31 and docfileid<>35 ;
or
B)
select count(*)
from envelopeheader a
join envelopeheader b on a.envelopecontrolnumber = b.envelopecontrolnumber
and a.partnerid= b.partnerid
and a.envelopeid = 'LT01ENV1107010000050'
and b.docfileid <> a.docfileid;
I am using the above query in a sql function. I tried the queries in pgAdmin(postgres), it shows 16ms for A) and B). When I tried queries from B) separately on pgadmin. It still shows 16 ms separately for each one - making 32ms for B) - Which is wrong because when you run both the queries in one go from B), it shows 16 ms. Please suggest which one is better. I am using postgres database.
The time displayed includes time to :
send query to server
parse query
plan query
execute query
send results back to client
process all results
Try a simple query like "SELECT 1". You'll probably get 16 ms too.
It's quite likely you are simply measuring the ping time to your server.
If you want to know how much time on the server a query uses, you need EXPLAIN ANALYZE.
Option 1:
Run query A.
Get results.
Use these results to create query B.
Send query B.
Get results.
Option 2:
Run combined query AB.
Get results.
So, if you are using this from a client, connecting to Postgres, use the second option. There is an overhead for sending a query to the db and getting results back.
If you are using it inside an SQL function or procedure, the difference is probably negligible. I would still use the second option though. And in either case, I would check that queries B or AB are optimized (checked query plan, if indexes are used, etc).
Go option 1: the two queries are unrelated, so more efficient to do them separately.
Option A will be faster since you are interested in the count.
The join will create a temporary structure for join the data based on conditions and then performs the counting operation.
Hence option A is better and faster.

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.