Inserting values generated from query into a blank column - sql

I am trying to count the number of points (stored in table2) that are found in each polygon of table 1. The query works but I have tried to alter it to add the valus generated to a blank column in table 1.
So far it only works by appending the results to the bottom of the table. Any help? To summarise I am trying to add values generated from this query into and add them into table1. At the moment the query inserts them into the blank column in table 1, but no matched against the ID, but appended at the bottom.
INSERT INTO table1(field3)
SELECT COUNT(table2.id) AS count1
FROM table1 LEFT JOIN table2
ON ST_Contains(table1.geom,table2.geom)
GROUP BY table1.id;

The only change I made here was to switch your left join to an inner join. In the case where a geometry in table1 contains no geometries in table2, the value of field3 will stay null, so you might want to start by doing an "update table1 set field3 = 0" first (it can turn out to be a bit faster doing that in two steps depending on how many features you have and how many points each geometry has).
update table1 a
set field3 = b.count1
from
(
SELECT table1.id,
COUNT(table2.id) AS count1
FROM table1
JOIN table2
ON ST_Contains(table1.geom,table2.geom)
GROUP BY table1.id
) b
where a.id = b.id
Alternative:
update table1 a
set field3 = b.count1
from
(
SELECT table1.id,
COUNT(table2.id) AS count1
FROM table1
left JOIN table2
ON ST_Contains(table1.geom,table2.geom)
GROUP BY table1.id
) b
where a.id = b.id
Also, this site just showed up on reddit this morning. I haven't spent much time digging through it but it looks promising as (yet another) resource for learning sql (in a postgres-specific environment).
Edit: I'm starting to doubt myself with regards to the two step approach that I first posted - I think it's almost entirely wrong about the performance, so I included an alternative query.

Related

Cross joining tables to see which partners in one table have a report from another table [duplicate]

table1 (id, name)
table2 (id, name)
Query:
SELECT name
FROM table2
-- that are not in table1 already
SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Q: What is happening here?
A: Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the name column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the name column (the one we are sure that exists, from table1).
While it may not be the most performant method possible in all cases, it should work in basically every database engine ever that attempts to implement ANSI 92 SQL
You can either do
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
or
SELECT name
FROM table2
WHERE NOT EXISTS
(SELECT *
FROM table1
WHERE table1.name = table2.name)
See this question for 3 techniques to accomplish this
I don't have enough rep points to vote up froadie's answer. But I have to disagree with the comments on Kris's answer. The following answer:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
Is FAR more efficient in practice. I don't know why, but I'm running it against 800k+ records and the difference is tremendous with the advantage given to the 2nd answer posted above. Just my $0.02.
SELECT <column_list>
FROM TABLEA a
LEFTJOIN TABLEB b
ON a.Key = b.Key
WHERE b.Key IS NULL;
https://www.cloudways.com/blog/how-to-join-two-tables-mysql/
This is pure set theory which you can achieve with the minus operation.
select id, name from table1
minus
select id, name from table2
Here's what worked best for me.
SELECT *
FROM #T1
EXCEPT
SELECT a.*
FROM #T1 a
JOIN #T2 b ON a.ID = b.ID
This was more than twice as fast as any other method I tried.
Watch out for pitfalls. If the field Name in Table1 contain Nulls you are in for surprises.
Better is:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT ISNULL(name ,'')
FROM table1)
You can use EXCEPT in mssql or MINUS in oracle, they are identical according to :
http://blog.sqlauthority.com/2008/08/07/sql-server-except-clause-in-sql-server-is-similar-to-minus-clause-in-oracle/
That work sharp for me
SELECT *
FROM [dbo].[table1] t1
LEFT JOIN [dbo].[table2] t2 ON t1.[t1_ID] = t2.[t2_ID]
WHERE t2.[t2_ID] IS NULL
You can use following query structure :
SELECT t1.name FROM table1 t1 JOIN table2 t2 ON t2.fk_id != t1.id;
table1 :
id
name
1
Amit
2
Sagar
table2 :
id
fk_id
email
1
1
amit#ma.com
Output:
name
Sagar
All the above queries are incredibly slow on big tables. A change of strategy is needed. Here there is the code I used for a DB of mine, you can transliterate changing the fields and table names.
This is the strategy: you create two implicit temporary tables and make a union of them.
The first temporary table comes from a selection of all the rows of the first original table the fields of which you wanna control that are NOT present in the second original table.
The second implicit temporary table contains all the rows of the two original tables that have a match on identical values of the column/field you wanna control.
The result of the union is a table that has more than one row with the same control field value in case there is a match for that value on the two original tables (one coming from the first select, the second coming from the second select) and just one row with the control column value in case of the value of the first original table not matching any value of the second original table.
You group and count. When the count is 1 there is not match and, finally, you select just the rows with the count equal to 1.
Seems not elegant, but it is orders of magnitude faster than all the above solutions.
IMPORTANT NOTE: enable the INDEX on the columns to be checked.
SELECT name, source, id
FROM
(
SELECT name, "active_ingredients" as source, active_ingredients.id as id
FROM active_ingredients
UNION ALL
SELECT active_ingredients.name as name, "UNII_database" as source, temp_active_ingredients_aliases.id as id
FROM active_ingredients
INNER JOIN temp_active_ingredients_aliases ON temp_active_ingredients_aliases.alias_name = active_ingredients.name
) tbl
GROUP BY name
HAVING count(*) = 1
ORDER BY name
See query:
SELECT * FROM Table1 WHERE
id NOT IN (SELECT
e.id
FROM
Table1 e
INNER JOIN
Table2 s ON e.id = s.id);
Conceptually would be: Fetching the matching records in subquery and then in main query fetching the records which are not in subquery.
First define alias of table like t1 and t2.
After that get record of second table.
After that match that record using where condition:
SELECT name FROM table2 as t2
WHERE NOT EXISTS (SELECT * FROM table1 as t1 WHERE t1.name = t2.name)
I'm going to repost (since I'm not cool enough yet to comment) in the correct answer....in case anyone else thought it needed better explaining.
SELECT temp_table_1.name
FROM original_table_1 temp_table_1
LEFT JOIN original_table_2 temp_table_2 ON temp_table_2.name = temp_table_1.name
WHERE temp_table_2.name IS NULL
And I've seen syntax in FROM needing commas between table names in mySQL but in sqlLite it seemed to prefer the space.
The bottom line is when you use bad variable names it leaves questions. My variables should make more sense. And someone should explain why we need a comma or no comma.
I tried all solutions above but they did not work in my case. The following query worked for me.
SELECT NAME
FROM table_1
WHERE NAME NOT IN
(SELECT a.NAME
FROM table_1 AS a
LEFT JOIN table_2 AS b
ON a.NAME = b.NAME
WHERE any further condition);

How to select a value that can come from two different tables?

First, SQL is not my strength. So I need help with the following problem. I'll simplify the table contents to describe the problem.
Let's start with three tables : table1 with columns id_1 and value, table2 with columns id_2 and value, and table3 with columns id_3 and value. As you'll notice, a field value appears in all three tables, while ids have different column names. Modifying column names is not an option because they are used by Java legacy code.
I need to set table3.value using table1.value or table2.value according to the fields table1.id_1, table2.id_2 and table3.id_3.
My last attempt, which describes what I try to do, is the following:
UPDATE table3
SET value=(IF ((SELECT COUNT(\*) FROM table1 t1 WHERE t1.id_1=id_3) > 0)
SELECT value FROM table1 t1 WHERE t1.id_1=id_3
ELSE IF ((SELECT COUNT(\*) FROM table2 t2 WHERE t2.id_2=id_3)) > 0)
SELECT value FROM table2 t2 WHERE t2.id_2=id_3)
Here are some informations about the tables and the update.
This update will be included in an XML file used by Liquibase.
It must work with Oracle or SQL Server.
An id from table3.id_3 can be found at most once in table1.id_1 or in table2.id_2, but not in both tables simultaneously.
If table3.id_3 is not found in table1.id_1 nor in table2.id_2, table3.value remains null.
As you can imagine, my last attempt failed. In that case, the IF command was not recognized during the Liquibase update. If anyone has any ideas how to deal with this, I'd appreciate. Thanks in advance.
I don't know Oracle very well, but a SQL Server approach would be the following using COALESCE() and OUTER JOINs.
Update T3
Set Value = Coalesce(T1.Value, T2.Value)
From Table3 T3
Left Join Table2 T2 On T3.Id_3 = T2.Id_2
Left Join Table1 T1 On T3.Id_3 = T1.Id_1
The COALESCE() will return the first non-NULL value from the LEFT JOIN to tables 1 and 2, and if a record was not found in either, it would be set to NULL.
It is Siyual's UPDATE written with MERGE operator.
MERGE into table_1
USING (
SELECT COALESCE(t2.value, t3.value) as value, t1.id_1 as id
FROM table_1 t1, table_2 t2, table_3 t3
WHERE t2.id_2 = t3.id_3 and t1.id_1 = t2.id_2
) t on (table_1.id_1 = t.id)
WHEN MATCHED THEN
UPDATE SET table_1.value = t.value
This should work in Oracle.
In Oracle
UPDATE table3 t
SET value=COALESCE((SELECT value FROM table1 t1 WHERE t1.id_1=t.id_3),
(SELECT value FROM table2 t2 WHERE t2.id_2=t.id_3))
Given your assumption #3, you can use union all to put together tables 1 and 2 without running the risk of duplicating information (at least for the id's of interest). So a simple merge solution like the one below should work (in all DB products that implement the merge operation).
merge into table3
using (
select id_2 as id, value from table2
union all
select id_3, value from table 3
) t
on table3.id_3 = t.id
when matched
then update set table3.value = t.value;
You may want to test the various solutions and see which is most effective for your specific tables.
(Note: merge should be more efficient than the update solution using coalesce, at least when relatively few of the id's in table3 have a match in the other tables. This is because the update solution will re-insert NULL where NULL was already stored when there is no match. The merge solution avoids this unnecessary activity.)

SQL aggregate function returning inflated values on joined table

I'm breaking my head here where I'm going wrong.
The following query:
SELECT SUM(table1.col1) FROM table1
returns value x.
And the following query:
SELECT SUM(table1.col1) FROM table2 RIGHT OUTER JOIN table1 ON table2.ID = table1.ID
returns value y. (I need the Join for the other data of table2). Why is the 2nd example returning a different value than in the first?
Make life easier on yourself, your colleagues that will support your code, and your clients by temporarily ignoring the existence of RIGHT OUTER JOIN. Use Table1 as the "from table" instead of table2.
Then, If aggregating, you will often find it necessary to do this BEFORE joining, so that the numbers are accurate. e.g.
SELECT T1.SUMCOL1
FROM (
SELECT id, SUM(col1) as SUMCOL1 FROM Table1 GROUP BY id
) T1
LEFT OUTER JOIN table2 T2 on T1.id = T2.ID
Obvious answer is because table2 is many to table1's one. That is, there are multiple rows in table2 for one id in table1. You may also be eliminating rows from table1 if the id isn't present in table2.
Compare:
SELECT COUNT(*) FROM table1
To:
SELECT COUNT(*) FROM table2 RIGHT OUTER JOIN table1 ON table2.ID = table1.ID
If you get different results, you're aggregating duplicates or eliminating rows from table1.
If you want to avoid this, you'll need to use a subquery.

Comparing two datasets SQL SSRS 2005

I have two datasets on two seperate servers. They both pull one column of information each.
I would like to build a report showing the values of the rows that only appear in one of the datasets.
From what I have read, it seems I would like to do this on the SQL side, not the reporting side; I am not sure how to do that.
If someone could shed some light on how that is possible, I would really appreciate it.
You can use the NOT EXISTS clause to get the differences between the two tables.
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
This will show the rows in Table1 that don't appear in Table2. You can reverse the table names to find the rows in Table2 that don't appear in Table1.
Edit: Made an edit to show what the case would be in the event of linked servers. Also, if you wanted to see all of the rows that are not shared in both tables at the same time, you can try something as in the below.
SELECT
Column, 'Table1' TableName
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
UNION
SELECT
Column, 'Table2' TableName
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
NOT EXISTS
(
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
Table1.Column = Table2.Column
)
You can also use a left join:
select a.* from tableA a
left join tableB b
on a.PrimaryKey = b.ForeignKey
where b.ForeignKey is null
This query will return all records from tableA that do not have corresponding records in tableB.
If you want rows that appear in exactly one data set and you have a matching key on each table, then you can use a full outer join:
select *
from table1 t1 full outer join
table2 t2
on t1.key = t2.key
where t1.key is null and t2.key is not null or
t1.key is not null and t2.key is null
The where condition chooses the rows where exactly one match.
The problem with this query, though, is that you get lots of columns with nulls. One way to fix this is by going through the columns one by one in the SELECT clause.
select coalesce(t1.key, t2.key) as key, . . .
Another way to solve this problem is to use a union with a window function. This version brings together all the rows and counts the number of times that key appears:
select t.*
from (select t.*, count(*) over (partition by key) as keycnt
from ((select 'Table1' as which, t.*
from table1 t
) union all
(select 'Table2' as which, t.*
from table2 t
)
) t
) t
where keycnt = 1
This has the additional column specifying which table the value comes from. It also has an extra column, keycnt, with the value 1. If you have a composite key, you would just replace with the list of columns specifying a match between the two tables.

Getting distinct rows from a left outer join

I am building an application which dynamically generates sql to search for rows of a particular Table (this is the main domain class, like an Employee).
There are three tables Table1, Table2 and Table1Table2Map.
Table1 has a many to many relationship with Table2, and is mapped through Table1Table2Map table. But since Table1 is my main table the relationship is virtually like a one to many.
My app generates a sql which basically gives a result set containing rows from all these tables. The select clause and joins dont change whereas the where clause is generated based on user interaction. In any case I dont want duplicate rows of Table1 in my result set as it is the main table for result display. Right now the query that is getting generated is like this:
select distinct Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
For simplicity I have excluded the where clause. The problem is when there are multiple rows in Table2 for Table1 even though I have said distinct of Table1.Id the result set has duplicate rows of Table1 as it has to select all the matching rows in Table2.
To elaborate more, consider that for a row in Table1 with Id = 1 there are two rows in Table1Table2Map (1, 1) and (1, 2) mapping Table1 to two rows in Table2 with ids 1, 2. The above mentioned query returns duplicate rows for this case. Now I want the query to return Table1 row with Id 1 only once. This is because there is only one row in Table2 that is like an active value for the corresponding entry in Table1 (this information is in Mapping table).
Is there a way I can avoid getting duplicate rows of Table1.
I think there is some basic problem in the way I am trying to solve the problem, but I am not able to find out what it is. Thanks in advance.
Try:
left outer join (select distinct YOUR_COLUMNS_HERE ...) SUBQUERY_ALIAS on ...
In other words, don't join directly against the table, join against a sub-query that limits the rows you join against.
You can use GROUP BY on Table1.Id ,and that will get rid off the extra rows. You wouldn't need to worry about any mechanics on join side.
I came up with this solution in a huge query and it this solution didnt effect the query time much.
NOTE : I'm answering this question 3 years after its been asked but this may help someone i believe.
You can re-write your left joins to be outer applies, so that you can use a top 1 and an order by as follows:
select Table1.Id as Id, Table1.Name, Table2.Description
from Table1
outer apply (
select top 1 *
from Table1Table2Map
where (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
order by somethingCol
) t1t2
outer apply (
select top 1 *
from Table2
where (Table2.Id = Table1Table2Map.Table2Id)
) t2;
Note that an outer apply without a "top" or an "order by" is exactly equivalent to a left outer join, it just gives you a little more control. (cross apply is equivalent to an inner join).
You can also do something similar using the row_number() function:
select * from (
select distinct Table1.Id as Id, Table1.Name, Table2.Description,
rowNum = row_number() over ( partition by table1.id order by something )
from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
) x
where rowNum = 1;
Most of this doesn't apply if the IsActive flag can narrow down your other tables to one row, but they might come in useful for you.
To elaborate on one point: you said that there is only one "active" row in Table2 per row in Table1. Is that row not marked as active such that you could put it in the where clause? Or is there some magic in the dynamic conditions supplied by the user that determines what's active and what isn't.
If you don't need to select anything from Table2 the solution is relatively simply in that you can use the EXISTS function but since you've put TAble2.Description in the clause I'll assume that's not the case.
Basically what separates the relevant rows in Table2 from the irrelevant ones? Is it an active flag or a dynamic condition? The first row? That's really how you should be removing duplicates.
DISTINCT clauses tend to be overused. That may not be the case here but it sounds like it's possible that you're trying to hack out the results you want with DISTINCT rather than solving the real problem, which is a fairly common problem.
You have to include activity clause into your join (and no need for distinct):
select Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
If you want to display multiple rows from table2 you will have duplicate data from table1 displayed. If you wanted to you could use an aggregate function (IE Max, Min) on table2, this would eliminate the duplicate rows from table1, but would also hide some of the data from table2.
See also my answer on question #70161 for additional explanation