postgres - select with "using" - sql

I found a very useful delete query that will delete duplicates based on specific columns:
DELETE FROM table USING table alias
WHERE table.field1 = alias.field1 AND table.field2 = alias.field2 AND
table.max_field < alias.max_field
How to delete duplicate entries?
However, is there an equivalent SELECT query that will allow to filter the same way? Was trying USING but no success.
Thank you.

You can join your table with itself using the specific columns, field1 and field2, and then filter based on a comparison between max_field on both tables.
select t1.*
from mytable t1
join mytable t2 on (t1.field1 = t2.field1 and t1.field2 = t2.field2)
where t1.max_field < t2.max_field;
You will get all the duplicates whose max_field is not the greatest.
sqlfiddle here.

Related

How to join a table with itself with two records per id?

So, I have a table with the following structure:
id
columnA
columnB
1
Yes
1
No
I want to combine the row into a single row, so it ends up like this:
id
columnA
columnB
1
Yes
No
I believe a self join here would work like this:
SELECT t1.columnA , t2.columnB
FROM table1 t1, table1 t2
where t1.id = t2.id
But is there a way to do this without specifying the columns? I have a table that has 100 columns and I'm trying to see if I can accomplish this without listing out all the columns.
Use the below query to get the column name with aggregation (Query created using information schema to get the column names). Write a select using the result and run the query.
select
case when column_name='Id' then column_name
else concat(',Max(', column_name,')') end as Name
from information_schema.columns
where table_name = 'Table1';
You will get something like below as output, where A and B are the column names.
Id
,Max(A)
,Max(B)
Add convert the result to query
Select
Id
,Max(A)
,Max(B)
from Table1 Group by Id
is there a way to do this without specifying the columns?
You can use using to answer your question.
SELECT t1.columnA , t2.columnB
FROM table1 t1 JOIN
table1 t2
USING (id);
To get the data you want, use aggregation:
SELECT id, MAX(t1.columnA), MAX(t2.columnB)
FROM table1 t1
GROUP BY id;
Use your JOIN only change the colums for *
SELECT t1.*, t2.*
FROM table1 t1, table1 t2
where t1.id = t2.id

Extract all tables and respective columns from long SQL Query

The task I am trying to solve is to get all tables out of a long SQL query and its respective columns.
E.g.
SELECT
t1.id, t1.gender, t1.name,
t2.age, t2.salary
FROM table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.id
Wanted output:
{'table1': ['id', 'gender', 'name'], 'table2': ['age', 'salary']}
I considered using string splitting etc. getting all table names and based on the alias (if available) get the columns.
But this is getting way to complicated if there are multiple joins and maybe also UNIONs.
Is there an available library or smart way to do that?
If it's only for 1 query I would advise to use MS Excel and filter on the table alias. Generate the select statement via MS Excel and you could create something like this:
SELECT
'table1:', t1.id, t1.gender, t1.name,
'table2:',t2.age, t2.salary
FROM table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.id
In case if this helps.
Take the column name from All_TAB_COLUMN and Pivot it. Still this is not exact result you want.
select * from (
select TABLE_NAME,COLUMN_NAME from ALL_TAB_COLUMNS where TABLE_NAME in
('Table1','Table2'))
pivot
(
max(table_name)
for COLUMN_NAME in ('gender','name','age','salary')
)
order by 1;

How to select a value that can come from two different tables?

First, SQL is not my strength. So I need help with the following problem. I'll simplify the table contents to describe the problem.
Let's start with three tables : table1 with columns id_1 and value, table2 with columns id_2 and value, and table3 with columns id_3 and value. As you'll notice, a field value appears in all three tables, while ids have different column names. Modifying column names is not an option because they are used by Java legacy code.
I need to set table3.value using table1.value or table2.value according to the fields table1.id_1, table2.id_2 and table3.id_3.
My last attempt, which describes what I try to do, is the following:
UPDATE table3
SET value=(IF ((SELECT COUNT(\*) FROM table1 t1 WHERE t1.id_1=id_3) > 0)
SELECT value FROM table1 t1 WHERE t1.id_1=id_3
ELSE IF ((SELECT COUNT(\*) FROM table2 t2 WHERE t2.id_2=id_3)) > 0)
SELECT value FROM table2 t2 WHERE t2.id_2=id_3)
Here are some informations about the tables and the update.
This update will be included in an XML file used by Liquibase.
It must work with Oracle or SQL Server.
An id from table3.id_3 can be found at most once in table1.id_1 or in table2.id_2, but not in both tables simultaneously.
If table3.id_3 is not found in table1.id_1 nor in table2.id_2, table3.value remains null.
As you can imagine, my last attempt failed. In that case, the IF command was not recognized during the Liquibase update. If anyone has any ideas how to deal with this, I'd appreciate. Thanks in advance.
I don't know Oracle very well, but a SQL Server approach would be the following using COALESCE() and OUTER JOINs.
Update T3
Set Value = Coalesce(T1.Value, T2.Value)
From Table3 T3
Left Join Table2 T2 On T3.Id_3 = T2.Id_2
Left Join Table1 T1 On T3.Id_3 = T1.Id_1
The COALESCE() will return the first non-NULL value from the LEFT JOIN to tables 1 and 2, and if a record was not found in either, it would be set to NULL.
It is Siyual's UPDATE written with MERGE operator.
MERGE into table_1
USING (
SELECT COALESCE(t2.value, t3.value) as value, t1.id_1 as id
FROM table_1 t1, table_2 t2, table_3 t3
WHERE t2.id_2 = t3.id_3 and t1.id_1 = t2.id_2
) t on (table_1.id_1 = t.id)
WHEN MATCHED THEN
UPDATE SET table_1.value = t.value
This should work in Oracle.
In Oracle
UPDATE table3 t
SET value=COALESCE((SELECT value FROM table1 t1 WHERE t1.id_1=t.id_3),
(SELECT value FROM table2 t2 WHERE t2.id_2=t.id_3))
Given your assumption #3, you can use union all to put together tables 1 and 2 without running the risk of duplicating information (at least for the id's of interest). So a simple merge solution like the one below should work (in all DB products that implement the merge operation).
merge into table3
using (
select id_2 as id, value from table2
union all
select id_3, value from table 3
) t
on table3.id_3 = t.id
when matched
then update set table3.value = t.value;
You may want to test the various solutions and see which is most effective for your specific tables.
(Note: merge should be more efficient than the update solution using coalesce, at least when relatively few of the id's in table3 have a match in the other tables. This is because the update solution will re-insert NULL where NULL was already stored when there is no match. The merge solution avoids this unnecessary activity.)

Delete from table A joining on table A in Redshift

I am trying to write the following MySQL query in PostgreSQL 8.0 (specifically, using Redshift):
DELETE t1 FROM table t1
LEFT JOIN table t2 ON (
t1.field = t2.field AND
t1.field2 = t2.field2
)
WHERE t1.field > 0
PostgreSQL 8.0 does not support DELETE FROM table USING. The examples in the docs say that you can reference columns in other tables in the where clause, but that doesn't work here as I'm joining on the same table I'm deleting from. The other example is a subselect query, but the primary key of the table I'm working with has four columns so I can't see a way to make that work either.
Amazon Redshift was forked from Postgres 8.0, but is a very much different beast. The manual informs, that the USING clause is supported in DELETE statements:
Just use the modern form:
DELETE FROM tbl
USING tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey -- exclude self-join
AND tbl.field > 0;
This is assuming JOIN instead of LEFT JOIN in your MySQL statement, which would not make any sense. I also added the condition AND t2.pkey <> t1.pkey, to make it a useful query. This excludes rows joining itself. pkey being the primary key column.
What this query does:
Delete all rows where at least one other row exists in the same table with the same not-null values in field and field2. All such duplicates are deleted without leaving a single row per set.
To keep (for example) the row with the smallest pkey per set of duplicates, use t2.pkey < t2.pkey.
An EXISTS semi-join (as #wilplasser already hinted) might be a better choice, especially if multiple rows could be joined (a row can only be deleted once anyway):
DELETE FROM tbl
WHERE field > 0
AND EXISTS (
SELECT 1
FROM tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey
);
I don't understand the mysql syntax, but you probably want this:
DELETE FROM mytablet1
WHERE t1.field > 0
-- don't need this self-join if {field,field2}
-- are a candidate key for mytable
-- (in that case, the exists-subquery would detect _exactly_ the
-- same tuples as the ones to be deleted, which always succeeds)
-- AND EXISTS (
-- SELECT *
-- FROM mytable t2
-- WHERE t1.field = t2.field
-- AND t1.field2 = t2.field2
-- )
;
Note: For testing purposes, you can replace the DELETE keyword by SELECT * or SELECT COUNT(*), and see which rows would be affected by the query.

Comparing two datasets SQL SSRS 2005

I have two datasets on two seperate servers. They both pull one column of information each.
I would like to build a report showing the values of the rows that only appear in one of the datasets.
From what I have read, it seems I would like to do this on the SQL side, not the reporting side; I am not sure how to do that.
If someone could shed some light on how that is possible, I would really appreciate it.
You can use the NOT EXISTS clause to get the differences between the two tables.
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
This will show the rows in Table1 that don't appear in Table2. You can reverse the table names to find the rows in Table2 that don't appear in Table1.
Edit: Made an edit to show what the case would be in the event of linked servers. Also, if you wanted to see all of the rows that are not shared in both tables at the same time, you can try something as in the below.
SELECT
Column, 'Table1' TableName
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
UNION
SELECT
Column, 'Table2' TableName
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
NOT EXISTS
(
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
Table1.Column = Table2.Column
)
You can also use a left join:
select a.* from tableA a
left join tableB b
on a.PrimaryKey = b.ForeignKey
where b.ForeignKey is null
This query will return all records from tableA that do not have corresponding records in tableB.
If you want rows that appear in exactly one data set and you have a matching key on each table, then you can use a full outer join:
select *
from table1 t1 full outer join
table2 t2
on t1.key = t2.key
where t1.key is null and t2.key is not null or
t1.key is not null and t2.key is null
The where condition chooses the rows where exactly one match.
The problem with this query, though, is that you get lots of columns with nulls. One way to fix this is by going through the columns one by one in the SELECT clause.
select coalesce(t1.key, t2.key) as key, . . .
Another way to solve this problem is to use a union with a window function. This version brings together all the rows and counts the number of times that key appears:
select t.*
from (select t.*, count(*) over (partition by key) as keycnt
from ((select 'Table1' as which, t.*
from table1 t
) union all
(select 'Table2' as which, t.*
from table2 t
)
) t
) t
where keycnt = 1
This has the additional column specifying which table the value comes from. It also has an extra column, keycnt, with the value 1. If you have a composite key, you would just replace with the list of columns specifying a match between the two tables.