I am performing a simple like query such as
SELECT * FROM table WHERE column LIKE '%searchterm%';
on the same data imported to a SQLite DB and a Postgres DB.
However, the number of results varies between the two databases.
I tried googling but I couldn't really find out if there any major implementation differences.
One of the main differences you'll find is that Postgres LIKE queries are case-sensitive, while Sqlite isn't (at least for ASCII characters). You'll need to use ILIKE to get a case-insensitive match in Postgres.
Related
I have a large table (100 million rows) which is properly indexed in a traditional RDBMS system (Oracle, MySQL, Postgres, SQL Server, etc.). I would like to perform a SELECT query which can be formulated with either of the following criteria options:
One that can be represented by a single criteria:
LIKE "T40%"
which only looks for matches at the beginning of the string field due to the wildcard
or
One that requires a list of say 200 exact criteria:
WHERE IN("T40.x21","T40.x32","T40.x43")
etc.
All other things being equal. Which should I expect to be more performant?
Assuming that both queries return the same set of rows (i.e. the list of items that you supply in the IN expression is exhaustive) you should expect almost identical performance, perhaps with some advantage for the LIKE query.
RDBMS engines have been using index searches for begins-with LIKE queries, so LIKE 'T40%' will produce records after an index search
Your IN query would be optimized for index search as well, perhaps giving RDBMS a tighter lower and upper bounds. However, there would be an additional filtering step to eliminate records outside your IN list, which is a waste of CPU cycles under the assumption that all rows would be returned anyway.
In case you'd parameterize your query, the second query becomes harder to pass to an RDBMS from your host program. All other things being equal, I would use LIKE.
i would suggest to go with LIKE operator because the ESCAPE OPTION Has to be used along with '\' symbol to increase the exact matching the character string.
Quest Software\Knowledge Xpert states:
If two identical SQL statements vary because an identical table has two different aliases, then the SQL is different and will not be shared.
What sense does this make?
I understand that if I have table A and table B and I fail to alias an ambiguous column what I'm trying to do is mathematically ambiguous, but the names of the aliases themselves shouldn't matter should they? Why would SQL/Oracle care that table A's alias is FOO in one statement and BAR in another when determining for caching purposes if they are identical?
On a similar line why should whitespace or word case matter at all?
"SQL cannot be shared within the SGA unless it is absolutely identical. Statement components that must be the same include:
Word case (uppercase and lowercase characters)
Whitespace
Underlying schema objects"
Underlying schema objects makes sense, because after all mathematically that's something different. Is the idea I might be an idiot and have columns named "Foo" "FOO" and "foo" and we don't want to accidentally cache?
I think it's to avoid the extra overhead of "normalizing" each SQL statement before creating a SQL_ID.
The SQL_ID is a hash of the SQL statement. In order to do what you are asking, it would require the SQL parser to do extra work (for limited benefit) in order to make a uniform SQL statement that would compare exactly with another statement that was equivalent, but had mixed case, extra spaces, etc.
I think this restrictions are due to SQL processing mechanism Oracle uses. It calculates hash value of query text and if this hash matches with one stored in SGA it helps to avoid hard parsing steps. More details are here.
I am using Oracle 11g. My requirement is to compare the data of two different db's. There are around 350 tables in each db and out of these 350 tables, approx 40 tables have more than 1 million records. For data comparison, I wrote one perl script to compare using hash and tested with few files. Also, tried with unix awk command to check the performance and asked this forum on unix solution and got excellent help.
Now my problem is to find out the best way to extract data from Tables to files.
Both db's have same number of tables and each table will have same number of columns in both db i.e. the layout in both the db's is exactly the same.
options which i think and searched are
1) using sqlloader - I think performance will be bad in this case
2) using data pump - Not sure if i can extract few set of columns via sql by using data pump and load into text file
3) using bulk collect -- same as above. Is it possible to extract each table and from each table set of columns. if yes, how can it be done. Also what would be the performance.
4) sqlplus or anything else. I cannot download any software for this on my machine.
Basic sql for selecting set of columns from each table for both the db's can be done easily. I am looking at the best approach to export data into flat files.
Please suggest
You can do it in the database using DBMS_COMPARISON.
http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/d_comparison.htm#CHDHEFFJ
The fundamental approach that is the least expensive (to the developer for sure) is to compare sets of things rather than string evaluations of files. Nothing can compare sets of things faster (and in less code) than the database itself. The use of database links and a wise use of the MINUS and INTERSECT operators is very powerful means towards this end.
Try using the below SQL this should be the fastest approach as you will be working inside the database. Access the table over the DB link in other database.
select *
from
(
( select * from Table_In_Schema1
minus
select * from Table_In_Schema2)
union all
( select * from Table_In_Schema2
minus
select * from Table_In_Schema1)
)
I built sqlite3 from source to include the FTS3 support and then created a new table in an existing sqlite database containing 1.5million rows of data, using
CREATE VIRTUAL TABLE data USING FTS3(codes text);
Then used
INSERT INTO data(codes) SELECT originalcodes FROM original_data;
Then queried each table with
SELECT * FROM original_data WHERE originalcodes='RH12';
This comes back instantly as I have an index on that column
The query on the FTS3 table
SELECT * FROM data WHERE codes='RH12';
Takes almost 28 seconds
Can someone help explain what I have done wrong as I expected this to be significantly quicker
The documentation explains:
FTS tables can be queried efficiently using SELECT statements of two different forms:
Query by rowid. If the WHERE clause of the SELECT statement contains a sub-clause of the form "rowid = ?", where ? is an SQL expression, FTS is able to retrieve the requested row directly using the equivalent of an SQLite INTEGER PRIMARY KEY index.
Full-text query. If the WHERE clause of the SELECT statement contains a sub-clause of the form " MATCH ?", FTS is able to use the built-in full-text index to restrict the search to those documents that match the full-text query string specified as the right-hand operand of the MATCH clause.
If neither of these two query strategies can be used, all queries on FTS tables are implemented using a linear scan of the entire table.
For an efficient query, you should use
SELECT * FROM data WHERE codes MATCH 'RH12'
but this will find all records that contain the search string.
To do 'normal' queries efficiently, you have to keep a copy of the data in a normal table.
(If you want to save space, you can use a contentless or external content table.)
You should read documentation more carefully.
Any query against virtual FTS table using WHERE col = 'value' will be slow (except for query against ROWID), but query using WHERE col MATCH 'value' will be using FTS and fast.
I'm not an expert on this, but here are a few things to think about.
Your test is flawed (I think). You are contrasting a scenario where you have an exact text match (the index can be used on original_data - nothing is going to outperform this scenario) with an equality on the fts3 table (I'm not sure that FTS3 would even come into play in this type of query). If you want to compare apples to apples (to see the benefit of FTS3), you're going to want to compare a "like" operation on original_data against the FTS3 "match" operation on data.
I need to extract information from a text field which can contain one of many values. The SQL looks like:
SELECT fieldname
FROM table
WHERE bigtextfield LIKE '%val1%'
OR bigtextfield LIKE '%val2%'
OR bigtextfield LIKE '%val3%'
.
.
.
OR bigtextfield LIKE '%valn%'
My question is: how efficient is this when the number of values approaches the hundreds, and possibly thousands? Is there a better way to do this?
One solution would be to create a new table/column with just the values I'm after and doing the following:
SELECT fieldname
FROM othertable
WHERE value IN ('val1', 'val2', 'val3', ... 'valn')
Which I imagine is a lot more efficient as it only has to do exact string matching. The problem with this is that it will be a lot of work keeping this table up to date.
btw I'm using MS SQL Server 2005.
This functionality is already present in most SQL engines, including MS SQL Server 2005. It's called full-text indexing; here are some resources:
developer.com: Understanding SQL Server Full-Text Indexing
MSDN article: Introduction to Full-Text Search
I don't think the main problem is the number of criteria values - but the sheer fact that a WHERE clause with bigtextfield LIKE '%val1%' can never really be very efficient - even with just a single value.
The trouble is the fact that if you have a placeholder like "%" at the beginning of your search term, all the indices are out the window and cannot be used anymore.
So you're basically just searching each and every entry in your table doing a full table scan in the process. Now your performance basically just depends on the number of rows in your table....
I would support intgr's recommendation - if you need to do this frequently, have a serious look at fulltext indexing.
This will inevitable require a fullscan (over the table or over an index) with a filter.
The IN condition won't help here, since it does not work on LIKE
You could do something like this:
SELECT *
FROM master
WHERE EXISTS
(
SELECT NULL
FROM values
WHERE name LIKE '%' + value + '%'
)
, but this hardly be more efficient.
All literal conditions will be transformed to a CONSTANT SCAN which is just like selecting from the same table, but built in memory.
The best solution for this is to redesign and get rid of that field that is storing multiple values and make it a related table instead. This violates one of the first rules of database design.
You should not be storing multiple values in one field and dead slow queries like this are the reason why. If you can't do that then full-test indexing is your only hope.