many identical Exist queries on ActiveRecord save? - sql

I have a table with 200,000 rows.
When I do an insert/update via an ActiveRecord model, I see close to 20 identical Exist queries taking nearly 100ms each!
Domain Load (0.5ms) SELECT "domains".* FROM "domains" WHERE "domains"."name" = 'sgsgroup.in' LIMIT 1
(0.1ms) BEGIN
Domain Exists (90.7ms) SELECT 1 AS one FROM "domains" WHERE LOWER("domains"."name") = LOWER('sgsgroup.in') LIMIT 1
Domain Exists (89.4ms) SELECT 1 AS one FROM "domains" WHERE LOWER("domains"."name") = LOWER('sgsgroup.in') LIMIT 1
Domain Exists (91.6ms) SELECT 1 AS one FROM "domains" WHERE LOWER("domains"."name") = LOWER('sgsgroup.in') LIMIT 1
[...]
Domain Exists (89.7ms) SELECT 1 AS one FROM "domains" WHERE LOWER("domains"."name") = LOWER('sgsgroup.in') LIMIT 1
Domain Exists (89.2ms) SELECT 1 AS one FROM "domains" WHERE LOWER("domains"."name") = LOWER('sgsgroup.in') LIMIT 1
SQL (0.6ms) INSERT INTO "domains" (....
I already have "name" index on the Domain table. Any ideas what's happening here and how to optimize these record updates?
Also, is it normal to have identical queries like these on a record update?

Well it could be to do with a case-insensitive validation of uniqueness on the name. Subsequent executions ought to be cached though -- don't know why they're not.
You might like to see if your RDBMS support creating an index on LOWER(name), as it's possible that a regular index on name is not being used.

Related

Query with JOIN or WHERE (SELECT COUNT(*) ...) >= 1?

I have a database schema which contains about 20 tables. For the sake of my question, I simplify asking with only 3 tables :
* posts
id
title
...
* posts_users
post_id
user_id
status (draft, published, etc)
...
* users
id
username
...
For reasons which are out of this topic, Posts and Users have a "many to many" relationship and the status field is part of posts_users (could have been in the posts table).
I'd like to get published posts. I hesitate between 2 kinds of query:
SELECT posts.*
FROM posts
INNER JOIN posts_users ON posts_users.post_id = posts.id
WHERE status = 'published'
or
SELECT posts.*
FROM posts
WHERE (
SELECT COUNT(*)
FROM posts_users
WHERE post_id = posts.id
AND status = 'published'
) >= 1
(I have simplified my question, but in real, posts are linked to far more other data to filter)
My DB is SQLite. My questions are:
What is the difference?
Which way of querying is best in terms of performance?
These queries have different semantics: The first query returns multiple rows if more than one user has published a post (if that is even possible).
The SQLite query optimizer usually cannot rewrite very much, so what you write is likely to be how it is implemented. So your second query will count all posts_users entries, which is not necessary if you only want to find out if there is at least one. You should better use EXISTS for that.
An even simpler way to write the second query would be:
SELECT *
FROM posts
WHERE id IN (SELECT post_id
FROM posts_users
WHERE status = 'published');
(This is one case where SQLite will rewrite it as a correlated subquery, if it estimates it to be more efficient.)
Ultimately, all these queries have to look up the same rows and will have similar performance; what matters most is that you have proper indexes. (But in this case, if most posts are published, an index on status would not help.)
I can tell you the perfomance of this query dependent of your row and column table.
At query 1 - Join is make
Output.row = tableA.row * tableB.row
Output.column = tableA.column * tableB.column
At query 2 - select count like that:
Output.row = tableA.row + tableB.row
Output.column = tableA.column + tableB.column
I recommend query 2 for more perfomance.

SQL Server 2008 - Slow With Filter A Specific Value

I am facing a weird issue and hoping that someone can advise me where/how to troubleshoot this.
Basically, I have a big query, join multiple tables, then wrap it with a outside query to do the filtering. This query returns really fast with selecting everything (no filter) or filter with Region = 1.
So if I run these queries, they run fast as normal.
Select * from (
select query...... join multiple tables.
Return all records.
) a
OR
Select * from (
select query...... join multiple tables.
Return will records with RegionID = 1
) a
Where Region = 1
However, when I change to Region = 2, it is very slow. Note that Region value is either 1 or 2. I am not sure why it causes slow when I just change the value from 1 = 2. Any thoughts?
Select * from (
select query...... join multiple tables.
Return will records with RegionID = 2
) a
Where Region = 2
Thank you,
Maybe you executed the "Region = 1" several times and the result has been cached.
Have you tried cleaning cache?
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO

Select data excluding a subset of records with LINQ with minimal overhead

I initialiye a GridView with the top 250 most recent records (from Table Document).
I developed a button add all, in order to be able to load all entries (on avarage all returned records might vary from 400 to 3000).
Instead of loading all the records set completely I would prefer an incremental approach, adding only the remaining records and not the ones already loaded.
However if I use the following query:
SELECT d.ItemID FROM Documents d
WHERE NOT EXISTS
( SELECT ItemID FROM Documents WHERE CategoryID = d.CategoryID )
I would need to pass the 250 IDs from the Client to the Server execute the query and the subquery and then return the result.
How could I optimize this procedure? Or is it in this case fine returning derctly all the record set?
Will the top 250 records be the same between the first and second query? In other words, is the data likely to change between the first call and second?
If not, you could simply use the first query to exclude rows from the second (assuming T-SQL):
SELECT
d.ItemID
FROM
Documents d
WHERE
d.ItemID NOT IN
(
--this would be whatever your first query was
SELECT TOP 250 ItemID
FROM Documents
WHERE CategoryID = d.CategoryID
ORDER BY d.ItemID DESC
)

How to do it in mysql: If id=idold THEN UPDATE status=1

I would like to compare two tables and then update if some logic is true,
In pseudo code:
SELECT * FROM users, usersold IF users.id=usersold.id THEN UPDATE users.status=1;
Is there a way to do it in MySQL?
UPDATE users u
SET status = 1
WHERE EXISTS (SELECT id FROM usersold WHERE id = u.id)
Alternate version:
UPDATE users
SET status = 1
WHERE id IN (SELECT id FROM usersold)
You should test and, depending on your database, you may find one performs better than the other although I expect any decent database will optimize then to be much the same anyway.

bulk delete in performance way with incremental fashion

We want to delete some matching rows within the same table seems to have a performance issue as the table has 1 billion rows.
Since it Oracle database, we can use PLSQL as well to incrementally delete, but we want to see what options available just using sql to improve the performance of it.
DELETE
FROM schema.adress
WHERE key = 6776
AND matchSequence = 1
AND EXISTS
(
SELECT 1
FROM schema.adress t2
WHERE t2.flngEntityKey = 9909
AND t2.matchType = 'NEW'
AND t2.matchType = schema.adress.matchType
AND t2.key = schema.adress.key
AND t2.sequence = schema.adress.sequence
)
Additional details
Cardinality is 900 Million rows
No triggers