How should this SQL query be constructed? (Postgres) - sql

I've never had to check for something like this before, so I'm hoping y'all can help. I'm not sure if it's best to create a temporary table, then check if an item is in it or if there's a better way.
Each row in the table represent an object that may have a parent. The parent is stored in the same table. In this example, the parent_id field holds the primary key of a row's parent. I'm trying to select all rows in the table which have the type column set to a particular value and whose parent rows have a 'z' in the field_b column. The part in parenthesis obviously needs work...
SELECT s_id, s_text, s_parent_id
FROM sections
WHERE s_derivedtype >= 10000 AND
if this returns anything
SELECT s_id
FROM sections
WHERE s_id = {the s_parent_id from the first query) AND s_flags LIKE '%z%'
I've updated this to hopefully be easier to read...
What's the most efficient way to do this? I'm expecting something like 100k rows returned out of 18m in the table, so decent performance is non-trivial.

SELECT key_field, field_a
FROM
t s
inner join
t p on p.key_field = s.key_field
WHERE
s.type = 1
AND p.field_b LIKE '%z%'

try
SELECT t1.key_field, t1.field_a
FROM tbl AS t1
WHERE t1.type = 1 AND parent_id = (SELECT t2.id FROM tbl AS t2
WHERE t2.id = t1.parent_id
AND t2.field_b LIKE '%z%')
or
SELECT t1.key_field, t1.field_a
FROM tbl AS t1
INNER JOIN tbl AS t2 ON t2.id = t1.parent_id
WHERE t1.type = 1
AND t2.field_b LIKE '%z%'

Related

Cross joining tables to see which partners in one table have a report from another table [duplicate]

table1 (id, name)
table2 (id, name)
Query:
SELECT name
FROM table2
-- that are not in table1 already
SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Q: What is happening here?
A: Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the name column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the name column (the one we are sure that exists, from table1).
While it may not be the most performant method possible in all cases, it should work in basically every database engine ever that attempts to implement ANSI 92 SQL
You can either do
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
or
SELECT name
FROM table2
WHERE NOT EXISTS
(SELECT *
FROM table1
WHERE table1.name = table2.name)
See this question for 3 techniques to accomplish this
I don't have enough rep points to vote up froadie's answer. But I have to disagree with the comments on Kris's answer. The following answer:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
Is FAR more efficient in practice. I don't know why, but I'm running it against 800k+ records and the difference is tremendous with the advantage given to the 2nd answer posted above. Just my $0.02.
SELECT <column_list>
FROM TABLEA a
LEFTJOIN TABLEB b
ON a.Key = b.Key
WHERE b.Key IS NULL;
https://www.cloudways.com/blog/how-to-join-two-tables-mysql/
This is pure set theory which you can achieve with the minus operation.
select id, name from table1
minus
select id, name from table2
Here's what worked best for me.
SELECT *
FROM #T1
EXCEPT
SELECT a.*
FROM #T1 a
JOIN #T2 b ON a.ID = b.ID
This was more than twice as fast as any other method I tried.
Watch out for pitfalls. If the field Name in Table1 contain Nulls you are in for surprises.
Better is:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT ISNULL(name ,'')
FROM table1)
You can use EXCEPT in mssql or MINUS in oracle, they are identical according to :
http://blog.sqlauthority.com/2008/08/07/sql-server-except-clause-in-sql-server-is-similar-to-minus-clause-in-oracle/
That work sharp for me
SELECT *
FROM [dbo].[table1] t1
LEFT JOIN [dbo].[table2] t2 ON t1.[t1_ID] = t2.[t2_ID]
WHERE t2.[t2_ID] IS NULL
You can use following query structure :
SELECT t1.name FROM table1 t1 JOIN table2 t2 ON t2.fk_id != t1.id;
table1 :
id
name
1
Amit
2
Sagar
table2 :
id
fk_id
email
1
1
amit#ma.com
Output:
name
Sagar
All the above queries are incredibly slow on big tables. A change of strategy is needed. Here there is the code I used for a DB of mine, you can transliterate changing the fields and table names.
This is the strategy: you create two implicit temporary tables and make a union of them.
The first temporary table comes from a selection of all the rows of the first original table the fields of which you wanna control that are NOT present in the second original table.
The second implicit temporary table contains all the rows of the two original tables that have a match on identical values of the column/field you wanna control.
The result of the union is a table that has more than one row with the same control field value in case there is a match for that value on the two original tables (one coming from the first select, the second coming from the second select) and just one row with the control column value in case of the value of the first original table not matching any value of the second original table.
You group and count. When the count is 1 there is not match and, finally, you select just the rows with the count equal to 1.
Seems not elegant, but it is orders of magnitude faster than all the above solutions.
IMPORTANT NOTE: enable the INDEX on the columns to be checked.
SELECT name, source, id
FROM
(
SELECT name, "active_ingredients" as source, active_ingredients.id as id
FROM active_ingredients
UNION ALL
SELECT active_ingredients.name as name, "UNII_database" as source, temp_active_ingredients_aliases.id as id
FROM active_ingredients
INNER JOIN temp_active_ingredients_aliases ON temp_active_ingredients_aliases.alias_name = active_ingredients.name
) tbl
GROUP BY name
HAVING count(*) = 1
ORDER BY name
See query:
SELECT * FROM Table1 WHERE
id NOT IN (SELECT
e.id
FROM
Table1 e
INNER JOIN
Table2 s ON e.id = s.id);
Conceptually would be: Fetching the matching records in subquery and then in main query fetching the records which are not in subquery.
First define alias of table like t1 and t2.
After that get record of second table.
After that match that record using where condition:
SELECT name FROM table2 as t2
WHERE NOT EXISTS (SELECT * FROM table1 as t1 WHERE t1.name = t2.name)
I'm going to repost (since I'm not cool enough yet to comment) in the correct answer....in case anyone else thought it needed better explaining.
SELECT temp_table_1.name
FROM original_table_1 temp_table_1
LEFT JOIN original_table_2 temp_table_2 ON temp_table_2.name = temp_table_1.name
WHERE temp_table_2.name IS NULL
And I've seen syntax in FROM needing commas between table names in mySQL but in sqlLite it seemed to prefer the space.
The bottom line is when you use bad variable names it leaves questions. My variables should make more sense. And someone should explain why we need a comma or no comma.
I tried all solutions above but they did not work in my case. The following query worked for me.
SELECT NAME
FROM table_1
WHERE NAME NOT IN
(SELECT a.NAME
FROM table_1 AS a
LEFT JOIN table_2 AS b
ON a.NAME = b.NAME
WHERE any further condition);

SQL subquery to joins -

Is it possible to remove the subquery from this SQL?
Table has 2 attributes "id" and "field"
Many field could have the same Id.
These table has many registers with the same Id and different Value
In need get all same Id values using one of them like filter.
select *
from Table
where id = (select id from Table where value = 'someValue')
I think it could be really easy but I don't know how to do.
Self Join can be done
select T.Id,T.Field
from Table T
INNER JOIN Table TT
ON T.ID = TT.ID
AND TT.Value = 'someValue'
Not sure if you over simplified your example too much but you could make this a little simpler.
select *
from Table
where value = 'someValue'
This should work
select T1.* from Table T1 JOIN Table T2 ON T1.id = T2.id AND T2.value = 'someValue'
Edited (Correct Answer):
What I assume your problem is:
You have a value. Let´s pretend it´s "testValue". Now you want to get the id of this value and find all other datasets with the same id.
What has to be cleared is that, "ID" is not the Primary Key and is not Unique.
You should be able to solve this by a simple self join:
select t.* from Table t right join Table tt on tt.id = t.id where tt.value = 'someValue';
So because of the join you will get a result that returns simply the table. With the where clause you shrink the result to your value. You should get the set of ids.
Old Answer:
This should do the trick:
select * from Table a inner join Table2 b on a.id = b.id where b.value = 'someValue';
You mentioned only one table in your question. I think this must be a mistake. If not, you have to change only the Table2 in my query. But that would have no sense as you could do a simple query, too:
select * from Table where value = 'someValue';
this would be the result of the first query with a self join.

How to simplify this query with sql joins?

my_table has 4 columns: id integer, value integer, value2 integer, name character varying
I want all the records that:
have the same value2 as a record which name is 'a_name'
have a field value inferior to the one of a record which name is 'a_name'
And I have satisfying results with the following query:
select t.id
from my_table as t
where t.value < ( select value from my_table where name = 'a_name')
and s.value2 = (select value2 from my_table where name = 'a_name');
But is it possible to simplify this query with sql joins ?
Joining on the same table is still too much intricate in my mind. And I try to understand with this example.
What I happened so far trying, is a result full of dupplicates:
select t2.id
from my_table as t
inner join my_table as t2 on t2.value2 = t.value2
where t2.value < ( select value from my_table where name = 'a_name');
I think this will solve your problem.
select t1.id
from my_table as t1
join my_table as t2
on t1.value2 = t2.value2
and t2.name = 'a_name'
and t1.value < t2.value
You should use self join instead of inner join see this
http://msdn.microsoft.com/en-us/library/ms177490%28v=sql.105%29.aspx
You can always get distinct results by calling "SELECT distinct t2.id ..."
However, that will not enhance your understanding of inner joins. If you are willing, keep reading on. Let's start by getting all records with name = 'a_name'.
SELECT a.*
FROM my_table as a
WHERE a.name = 'a.name';
A simpler way to perform your inner joins is to understand that the result for the above query is yet another table, formally known as a relation. You can think of it as joining on the same table, but an easier way to think of it is as "joining on the result of this query". Lets put this to the test.
SELECT other.id
FROM my_table as a,
INNER JOIN my_table as other ON other.value2 = a.value2
WHERE a.name = 'a_name'
AND other.value < a.value;
If the first query (all rows with name = 'a_name') has many results, you stand a good chance of the second query having duplicates, because the inner join between aliases 'a' and 'other' is a subset of their cross product.
Edits: Grammar, Clarity
please try this
select t.id
from my_table as t
inner join
(select value from my_table where name = 'a_name')t1 on t.value<t1.value
inner join
(select value2 from my_table where name = 'a_name')t2 on t.value2=t2.value2

SQLite table aliases effecting the performance of queries

How does SQLite internally treats the alias?
Does creating a table name alias internally creates a copy of the same table or does it just refers to the same table without creating a copy?
When I create multiple aliases of the same table in my code, performance of the query is severely hit!
In my case, I have one table, call it MainTable with namely 2 columns, name and value.
I want to select multiple values in one row as different columns. for example
Name: a,b,c,d,e,f
Value: p,q,r,s,t,u
such that a corresponds to p and so on.
I want to select values for names a,b,c and d in one row => p,q,r,s
So I write a query
SELECT t1.name, t2.name, t3.name, t4.name
FROM MainTable t1, MainTable t2, MainTable t3, MainTable t4
WHERE t1.name = 'a' and t2.name = 'b' and t3.name = 'c' and t4.name = 'd';
This way f writing the query kills the performance when size of the table increases as rightly pointed above by Larry.
Is there any efficient way to retrieve this result. I am bad at SQL queries :(
If you list the same table more than once in your SQL statement and do not supply conditions on which to JOIN the tables, you are creating a cartesian JOIN in your result set and it will be enormous:
SELECT * FROM MyTable A, MyTable B;
if MyTable has 1000 records, will create a result set with one million records. Any other selection criteria you include will then have to be evaluated across all one million records.
I'm not sure that's what you're doing (your question is very unclear), but it may be a start on solving your problem.
Updated answer now that the poster has added the query that is being executed.
You're going to have to get a little tricky to get the results you want. You need to use CASE and MAX and, unfortunately, the syntax for CASE is a little verbose:
SELECT MAX(CASE WHEN name='a' THEN value ELSE NULL END),
MAX(CASE WHEN name='b' THEN value ELSE NULL END),
MAX(CASE WHEN name='c' THEN value ELSE NULL END),
MAX(CASE WHEN name='d' THEN value ELSE NULL END)
FROM MainTable WHERE name IN ('a','b','c','d');
Please give that a try against your actual database and see what you get (of course, you want to make sure the column name is indexed).
Assuming you have table dbo.Customers with a million rows
SELECT * from dbo.Customers A
does not result in a copy of the table being created.
As Larry pointed out, the query as it stands is doing a cartesian product across your table four times which, as you has observed, kills your performance.
The updated ticket states the desire is to have 4 values from different queries in a single row. That's fairly simple, assuming this syntax is valid for sqllite
You can see that the following four queries when run in serial produce the desired value but in 4 rows.
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a';
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b';
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c';
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d';
The trick is to simply run them as sub queries like so there are 5 queries: 1 driver query, 4 sub's doing all the work. This pattern will only work if there is one row returned.
SELECT
(
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a'
) AS t1_name
,
(
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b'
) AS t2_name
,
(
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c'
) AS t3_name
,
(
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d'
) AS t4_name
Aliasing a table will result a reference to the original table that exists for the duration of the SQL statement.

SQL - using a value in a nested select

Hope the title makes some kind of sense - I'd basically like to do a nested select, based on a value in the original select, like so:
SELECT MAX(iteration) AS maxiteration,
(SELECT column
FROM table
WHERE id = 223652
AND iteration = maxiteration)
FROM table
WHERE id = 223652;
I get an ORA-00904 invalid identifier error.
Would really appreciate any advice on how to return this value, thanks!
It looks like this should be rewritten with a where clause:
select iteration,
col
from tbl
where id = 223652
and iteration = (select max(iteration) from tbl where id = 223652);
You can circumvent the problem alltogether by placing the subselect in an INNER JOIN of its own.
SELECT t.iteration
, t.column
FROM table t
INNER JOIN (
SELECT id, MAX(iteration) AS iteration
FROM table
WHERE id = 223652
) tm ON tm.id = t.id AND tm.iteration = t.iteration
Since you're using Oracle, I'd suggest using analytic functions for this:
SELECT * FROM (
SELECT col,
iteration,
row_number() over (partition by id order by iteration desc) rn
FROM tab
WHERE id = 223652
) WHERE rn = 1
do it like this:
with maxiteration as
(
SELECT MAX(iteration) AS maxiteration
FROM table
WHERE id = 223652
)
select
column,
iteration
from
table
where
id = 223652
AND iteration = maxiteration
;
Not 100% sure on Oracle syntax, but isn't it something like:
select iteration, column from table where id = 223652 order by iteration desc limit 1
I would approach this problem in a slightly different way. You're basically looking for the row that has no other iterations greater than it. There are at least 3 ways I can think of to do this:
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
WHERE
T1.id = 223652 AND
NOT EXISTS
(
SELECT *
FROM Table T2
WHERE
T2.id = 223652 AND
T2.iteration > T1.iteration
)
Or...
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
LEFT OUTER JOIN Table T2 ON
T2.id = T1.id AND
T2.iteration > T1.iteration
WHERE
T1.id = 223652 AND
T2.id IS NULL
Or...
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
INNER JOIN (SELECT id, MAX(iteration) AS maxiteration FROM Table T2 GROUP BY id) SQ ON
SQ.id = T1.id AND
SQ.maxiteration = T1.iteration
WHERE
T1.id = 223652
EDIT: I didn't see the ORA error the first time reading the question and it wasn't tagged as Oracle specific. I think that there may be some differences in the syntax and use of aliases in Oracle, so you may need to tweak some of the above queries.
The Oracle error is telling you that it doesn't know what maxiteration is, because the column alias isn't available yet inside the subquery. You need to refer to it by the table alias and column name instead of the column alias I believe.
You do something like
select maxiteration,column from table a join (select max(iteration) as maxiteration from table where id=1) b using (id) where b.maxiteration=a.iteration;
This could of course return multiple rows for one maxiteration unless your table has a constraint against it.