What is a better way of writing this oracle sql query? - sql

Somehow, this does not seem very efficient.. Can this be optimized of made more efficient?
SELECT DISTINCT p.col1 from table1 p where p.col1 not in
(SELECT DISTINCT o.col1 from table1 o where o.col2 = 'ABC')
For ex, select all supermarkets that do not have product = soap

You want all col1 values where col2 is never 'ABC'. You can approach this with aggregation:
select p.col1
from table1 p
group by p.col1
having sum(case when p.col2 = 'ABC' then 1 else 0 end) = 0;
Why is this faster? Well, there are cases where it won't be. But it often will be. A select distinct is doing an aggregation anyway. So, other methods that use join's or in are adding extra work. Now, this extra work is worth it if they significantly reduce the amount of data being processed.
Also, not in is dangerous semantically. If any values of col1 are NULL whenever col2 = 'ABC', then all data will be filtered out. That is, the query will return no rows at all. That can be sped up a great deal! This formulation assumes that col1 is never NULL in this case.
Finally, if you have a list of col1 values that is already unique, then the fastest method is probably:
select c.col1
from col1table c
where not exists (select 1 from table1 o where o.col1 = c.col1 and o.col2 = 'ABC')
For this query, an index on table1(col1, col2) is optimal for performance.

Did you try just querying with a not clause?
i.e.
select distinct col1 from table1 where col2 <> 'ABC'

I would structure that along the lines of:
select supermarkets.*
from supermarkets
where not exists (
select 1
from product_in_supermarkets
where product_in_supermarkets.supermarket_id = supermarkets.id and
product_in_supermarkets.product_type = 'soap')
Have an index on:
product_in_supermarkets(supermarket_id, product_type)
for best performance.
Now having said that, it could be that under the right circumstances a NOT EXISTS and a NOT IN query get transformed to be the same, and an anti-join would be executed. Semantically I like the correlated subquery with not exists, as I think it better represents the intent of the query.
NOT IN is also susceptible to unexpected effects should there be a null value in the projection from the subquery, as no value can be said to be not in a list that includes NULL (including NULL).

I think you should consider creating an index on col1.
I'd also try using
select distinct p.col1 from table1 p where not exists
(select distinct o.col1 from table1 o where o.col1 = p.col1 and o.col2 = 'ABC');
Also, depending on the amount of rows and data entropy, sometimes avoiding the distinct from the inner query can be an useful trade-off.

Related

Returning only duplicate rows from two tables

Every thread I've seen so far has been to check for duplicate rows and avoiding them. I'm trying to get a query to only return the duplicate rows. I thought it would be as simple as a subquery, but I was wrong. Then I tried the following:
SELECT * FROM a
WHERE EXISTS
(
SELECT * FROM b
WHERE b.id = a.id
)
Was a bust too. How do I return only the duplicate rows? I'm currently going through two tables, but I'm afraid there are a large amount of duplicates.
use this query, maybe is better if you check the relevant column.
SELECT * FROM a
INTERSECT
SELECT * FROM b
I am sure your posted code would work too like
SELECT * FROM a
WHERE EXISTS
(
SELECT 1 FROM b WHERE id = a.id
)
You can as well do a INNER JOIN like
SELECT a.* FROM a
JOIN b on a.id = b.id;
You can as well use a IN operator saying
SELECT * FROM a where id in (select id from b);
If none of them, then you can use UNION if both table satisfies the union restriction along with ROW_NUMBER() function like
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id) AS rn
FROM (
select * from a
union all
select * from b) xx ) yy
WHERE rn = 1;
Note: there's an ambiguity as to what you mean by a duplicate row, and whether you're talking about duplicate keys, or all fields being the same. My answer deals with all fields being the same; some of the others are assuming it's just the keys. It's unclear which you intend.
You might try
SELECT id, col1, col2 FROM a INNER JOIN b ON a.id = b.id
WHERE a.col1 = b.col1 AND a.col2 = b.col2
adding in other columns as necessary. The database engine should be intelligent enough to do the comparisons on the indexed columns first, so it'll be efficient as long as you don't have rows that are different only on lots of non-indexed fields. (If you do, then I don't think anything will do it particularly efficiently.)

Sum multiple columns using a subquery

I'm trying to play with Oracle's DB.
I'm trying to sum two columns from the same row and output a total on the fly.
However, I can't seem to get it to work. Here's the code I have so far.
SELECT a.name , SUM(b.sequence + b.length) as total
FROM (
SELECT a.name, a.sequence, b.length
FROM tbl1 a, tbl2 b
WHERE b.sequence = a.sequence
AND a.loc <> -1
AND a.id='10201'
ORDER BY a.location
)
The inner query works, but I can't seem to make the new query and the subquery work together.
Here's a sample table I'm using:
...[name][sequence][length]...
...['aa']['100000']['2000']...
...
...['za']['200000']['3001']...
And here's the output I'd like:
[name][ total ]
['aa']['102000']
...
['za']['203001']
Help much appreciated, thanks!
SUM() sums number across rows. Instead replace it with sequence + length.
...or if there is the possibility of NULL values occurring in either the sequence or length columns, use: COALESCE(sequence, 0) + COALESCE(length, 0).
Or, if your intention was indeed to produce a running total (i.e. aggregating the sum of all the totals and lengths for each user), add a GROUP BY a.name after the end of the subquery.
BTW: you shouldn't be referencing the internal aliases used inside a subquery from outside of that subquery. Some DB servers allow it (and I don't have convenient access to an Oracle server right now, so I can test it), but it's not really good practice.
I think what you are after is something like:
SELECT a.name,
SUM(B.sequence + B.length) AS total
FROM Tbl1 A
INNER JOIN Tbl2 B
ON B.sequence = A.sequence
WHERE A.loc <> -1
AND A.id = 10201
GROUP BY a.name
ORDER BY A.location
Your query with the subquery fails for several reasons:
You use the table alias a, but it is not defined.
You use the table alias b, but it is not defined.
You have a sum() in the select clause with unaggregated columns, but no group by.
In addition, you have an order by in the subquery which is allowed syntactically, but ignored.
Here is a better way to write the query without a subquery:
SELECT t1.name, (t1.sequence + t2.length) as total
FROM tbl1 t1 join
tbl2 t2
on t1.sequence = t2.sequence
where t1.loc <> -1 AND t1.id = '10201'
ORDER BY t1.location;
Note the use of proper join syntax, the use of aliases that make sense, and the simple calculation at this level.
Here is a version with a subquery:
select name, (sequence + length) as total
from (SELECT t1.name, t1.sequence, t2.length
FROM tbl1 t1 join
tbl2 t2
on t1.sequence = t2.sequence
where t1.loc <> -1 AND t1.id = '10201'
) t
ORDER BY location;
Note that the order by is going at the outer level. And, I gave the subquery an alias. This is not strictly required, but typically a good idea.

How to do a SUM across two unrelated tables?

I'm trying to sum on two unrelated tables with postgres. With MySQL, I would do something like this :
SELECT SUM(table1.col1) AS sum_1, SUM(table2.col1) AS sum_2 FROM table1, table2
This should give me a table with two column named sum_1 and sum_2. However, postgres doesn't give me any result for this query.
Any ideas?
SELECT (SELECT SUM(table1.col1) FROM table1) AS sum_1,
(SELECT SUM(table2.col1) FROM table2) AS sum_2;
You can also write it as:
SELECT t1.sum_c1, t1.sum_c2, t2.sum_t2_c1
FROM
(
SELECT SUM(col1) sum_c1,
SUM(col2) sum_c2
FROM table1
) t1
FULL OUTER JOIN
(
SELECT SUM(col1) sum_t2_c1
FROM table2
) t2 ON 1=1;
The FULL JOIN is used with a dud condition so that either subquery could produce no results (empty) without causing the greater query to have no result.
I don't think the query as you have written would have produced the result you expected to get, because it's doing a CROSS JOIN between table1 and table2, which would inflate each SUM by the count of rows in the other table. Note that if either table1/table2 is empty, the CROSS JOIN will cause X rows by 0 rows to return an empty result.
Look at this SQL Fiddle and compare the results.
To combine multiple aggregates from multiple tables, use CROSS JOIN:
SELECT sum_1, sum_2, sum_3, sum_4
FROM
(SELECT sum(col1) AS sum_1, sum(col2) AS sum_2 FROM table1) t1
CROSS JOIN
(SELECT sum(col3) AS sum_3, sum(col4) AS sum_4 FROM table2) t2
There is always exactly one row from either of the subqueries, even with no rows in the source tables. So a CROSS JOIN (or even just a lowly comma between the subqueries - being the not so easy to read shorthand for a cross join with lower precedence) is the simplest way.
Note that this produces a cross join between single aggregated rows, not a cross join between individual rows of multiple tables like your incorrect statement in the question would - thereby multiplying each other.
I suggest something like the following, although I hjaven't tried it.
select sum1, sum2
from
(select sum(col1) sum1 from table1),
(select sum(col1) sum2 from table2);
The idea is to create two inline views, each with one row it, and then do a cartesian join on these two views, each with one row.
SELECT SUM(table1_column1 + table2_column1)
FROM table1
JOIN table2
ON table1_id= table2_id
WHERE account_no='${account_no}'
Express-JS with PostgreSQL via postman API

DB2 performance of coalesce and inner join

I have a pretty poorly performing query (which I inherited) and I'm not too sure how to optimize it... As far as I understand it's setting the value of a 2nd column as the value of the 1st column PLUS a value from another table, where a relationship is found.
update table1 set
col2 = col1 || coalesce ((
select table2.the_column_wanted from table2 where table2.fk = table1.pk and
table2.flag = 'Y'))
where flag = 'Y' and pk in ( select distinct fk from table2 );
The exact speed issue depends on the characteristics of your table, but in general I would look into MERGE for this sort of problem. Something like this:
MERGE INTO table1 USING table2
ON table1.pk = table2.fk and
table1.flag = 'Y' and
table2.flag = 'Y'
WHEN MATCHED THEN UPDATE SET
table1.col2 = table1.col1 || table2.the_column_wanted;
The query as written is very questionable. You should probably take a good look at all of the code you have "inherited."

SQLite table aliases effecting the performance of queries

How does SQLite internally treats the alias?
Does creating a table name alias internally creates a copy of the same table or does it just refers to the same table without creating a copy?
When I create multiple aliases of the same table in my code, performance of the query is severely hit!
In my case, I have one table, call it MainTable with namely 2 columns, name and value.
I want to select multiple values in one row as different columns. for example
Name: a,b,c,d,e,f
Value: p,q,r,s,t,u
such that a corresponds to p and so on.
I want to select values for names a,b,c and d in one row => p,q,r,s
So I write a query
SELECT t1.name, t2.name, t3.name, t4.name
FROM MainTable t1, MainTable t2, MainTable t3, MainTable t4
WHERE t1.name = 'a' and t2.name = 'b' and t3.name = 'c' and t4.name = 'd';
This way f writing the query kills the performance when size of the table increases as rightly pointed above by Larry.
Is there any efficient way to retrieve this result. I am bad at SQL queries :(
If you list the same table more than once in your SQL statement and do not supply conditions on which to JOIN the tables, you are creating a cartesian JOIN in your result set and it will be enormous:
SELECT * FROM MyTable A, MyTable B;
if MyTable has 1000 records, will create a result set with one million records. Any other selection criteria you include will then have to be evaluated across all one million records.
I'm not sure that's what you're doing (your question is very unclear), but it may be a start on solving your problem.
Updated answer now that the poster has added the query that is being executed.
You're going to have to get a little tricky to get the results you want. You need to use CASE and MAX and, unfortunately, the syntax for CASE is a little verbose:
SELECT MAX(CASE WHEN name='a' THEN value ELSE NULL END),
MAX(CASE WHEN name='b' THEN value ELSE NULL END),
MAX(CASE WHEN name='c' THEN value ELSE NULL END),
MAX(CASE WHEN name='d' THEN value ELSE NULL END)
FROM MainTable WHERE name IN ('a','b','c','d');
Please give that a try against your actual database and see what you get (of course, you want to make sure the column name is indexed).
Assuming you have table dbo.Customers with a million rows
SELECT * from dbo.Customers A
does not result in a copy of the table being created.
As Larry pointed out, the query as it stands is doing a cartesian product across your table four times which, as you has observed, kills your performance.
The updated ticket states the desire is to have 4 values from different queries in a single row. That's fairly simple, assuming this syntax is valid for sqllite
You can see that the following four queries when run in serial produce the desired value but in 4 rows.
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a';
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b';
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c';
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d';
The trick is to simply run them as sub queries like so there are 5 queries: 1 driver query, 4 sub's doing all the work. This pattern will only work if there is one row returned.
SELECT
(
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a'
) AS t1_name
,
(
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b'
) AS t2_name
,
(
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c'
) AS t3_name
,
(
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d'
) AS t4_name
Aliasing a table will result a reference to the original table that exists for the duration of the SQL statement.