ORACLE - Select ROWNUM with distinct multiple COLUMNS - sql

Hi I am invoking SQL quires from java application, Since table has huge data its taking time to process from java side. Now i am invoking 1000 by 1000 records
SELECT t1.col1,t1.col2,t2.col1,t2.col2 FROM(SELECT rownum AS rn,
t1.col1,
t1.col2,
t2.col1,
t2.col2 FROM table1 t1,
table2 t2 WHERE t1.id=t2.id) WHERE rn BETWEEN ? AND ?;
But i have one more query which has distinct values like below
SELECT t1.col1,t1.col2,t2.col1,t2.col2 FROM(SELECT rownum AS rn,
distinct t1.col1,
t1.col2,
t2.col1,
t2.col2 FROM table1 t1,
table2 t2 WHERE t1.id=t2.id) WHERE rn BETWEEN ? AND ?;
But this query is giving error, Not allowing to add rownum AS rn for distinct. Please could you help us to resolve above use case?

In an oracle database the DISTINCT key word is only allowed directly behind the SELECT key word or inside the COUNT function.
Furthermore your SQL will lead to inconsistent results since in an oracle database the order of records is not guaranteed without explicit ORDER BY clause.
you also cannot access the tabla aliases from an inner select so you have to apply column aliases if the columns in the different tables have the same column name.
The best solution would be to to add another layer of nested selects:
SELECT t1_col1, t1_col2, t2_col1, t2_col2
FROM (
SELECT rownum AS rn, inner_tab.*
FROM (
SELECT distinct t1.col1 AS t1_col1,
t1.col2 AS t1_col2,
t2.col1 AS t2_col1,
t2.col2 AS t2_col2
FROM table1 t1,
table2 t2
WHERE t1.id=t2.id
ORDER BY 1 -- you have to decide!
) inner_tab
) WHERE rn BETWEEN ? AND ?;

Related

How can I nest multple tables into one while modifying them?

I have multiple tables which I need to merge into one after performing some operations into each one of them.
A first nesting was achieved thanks to a (working) "WITH" statement:
With
T1 as (Select col1, col2, col3,...
from *database*
where *condition*)
Select t2.col1, t2.col2, t2.col3, ...
From(
Select
d.col1, d.col2, d.col3,...
from *d*
where *conditions*
Group by d.col1, d.col2, d.col3,...) t2
Inner join T1
on t1.z = t2.x
Where t2.col1 = *condition*
and *conditions*
Group by t2.col1, t2.col2, t2.col3, ...
The problem arises when I try to expand on this and add more layers to the nest.
I have tried to do the following (changes to the previous code are marked in between "**"):
With
T1 as (Select col1, col2, col3,...
from *database*
where *condition*)**,**
**T2 as (**
Select t2.col1, t2.col2, t2.col3, ...
From(
Select
d.col1, d.col2, d.col3,...
from *d*
where *conditions*
Group by d.col1, d.col2, d.col3,...) t2
Inner join T1
on t1.z = t2.x
Where t2.col1 = *condition*
and *conditions*
Group by t2.col1, t2.col2, t2.col3, ...
**)**
**Select t3.col1 as qw, t3.col2 as qe, t3.col3 as qr,...**
**FROM(**
**Select**
**c.col1,**
**c.col2,**
**c.col3, ...**
**from *c***
**where *conditions) t3***
**Inner join t1**
**on t1.col3 = t3.qr**
**where t3.qe = *condition***
**group by t3.qw, t3.qe, t3.qr,...**
In return, I get the following error:
"t3.qr": invalid identifier"
Does anybody knows what the issue is and how can I fix it? I need to figure out how to nest multiple tables in some way, as, after these ones, more table will have to be added
When we write a query we have to write things in the right order.
firstly CTE `with cte_alias as (select ... from ...)
the SELECT: select column_1, column_2 with possible agregate or window functions: SUM, MAX, ROW_NUMBER() etc. There should only be one SELECT which is not in brackets
FROM tables (sub-queries) CTE's etc with JOIN if needed
WHERE conditions which must be boolean (either true or false)
GROUP BY
ORDER BY
only the select is obligatory in mySQL
SELECT Hello AS "speech"; is a valid mySQL query.

How to join two tables on distinct values of a column?

SELECT table1.*
,address
,job
FROM table1
JOIN table2 ON table2.name = table1.name
The above query returns result for duplicate values of name too. How can I convert the query to get only one value for distinct values of name column?
I am using SQL Server
You can easily accomplish this with row_number window function. See query below:
select t1.id, t1.name, t1.pets, t2.address, t2.job
from (
select *,
row_number() over (partition by [name] order by id) rn
from Table1
) t1
join table2 t2 on t1.name = t2.name
where t1.rn = 1
I would recommend a lateral join -- apply -- for this purpose:
SELECT t1.*, t2.address, t2.job
FROM table2 t2 CROSS APPLY
(SELECT t1.*
FROM table1 t1
WHERE t2.name = t1.name
) t1;
Normally, the subquery would have an ORDER BY to specify the ordering. Otherwise the result is indeterminate.
This is often faster than using window functions for this purpose.

Compare a two column with the same table to remove duplicate

A sample table with two column and I need to compare the column 1 and column 2 to the same table records and need to remove the column 1 + column 2 = column 2+column 1.
I tried to do self join and case condition. But its not working
If I understand correctly, you can run a simple select like this if you have all reversed pairs in the table:
select col1, col2
from t
where col1 < col2;
If you have some singletons, then:
select col1, col2
from t
where col1 < col2 or
(col1 > col2 and
not exists (select 1
from t t2
where t2.col1 = t.col2 and
t2.col2 = t.col1
)
);
You can use the except operator.
"EXCEPT returns distinct rows from the left input query that aren't output by the right input query."
SELECT C1, C2 FROM table
Except
SELECT C2, C1 FROM table
Example with your given data set : dbfiddle
I am posting the answer based on oracle database and also the columns are string/varchar:
delete from table where rowid in (
select rowid from table
where column1 || column2 =column2 || column1 )
Feel free to provide more input and we can tweak the answer.
Okay. There might be a simpler way of doing this but this might work as well. {table} is to be replaced with your table name.
;with orderedtable as (select t1.col1, t1.col2, ROW_NUMBER() OVER(ORDER BY t1.col1, t1.col2 ASC) AS rownum
from (select distinct t2.col1, t2.col2 from {table} t2) as t1)
select f1.col1, f1.col2
from orderedtable f1
left join orderedtable f2 on f1.col1 = f2.col2 and f1.col2 = f2.col1 and f1.rownum < f2.rownum
where f2.rownum is null
The SQL below will get the reversed col1 and col2 rows:
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
And when we get these reversed rows, we can except them with the left join clause, the complete SQL is:
select
t.col1,t.col2
from
table t
left join
(
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
) tmp on t.col1 = tmp.col1 and t.col2 = tmp.col2
where
tmp.col1 is null
Is it clear?

Left Join without duplicate rows 1 to 1 join. Make each row in one table only join to one row in the other table

I'm trying to join table 1 to table 2 to get table 3. (See desired output) However, I can't seem to get it to work since there are so many options since the table only contains one value. A left join doesn't seem to work.
I found this: Left Join without duplicate rows from left table
which seems to match my use case, but Outer Apply is not in PrestoDB.
I essentially want to match each row in T1 with a single one in T2.
If I understand correctly, you can use row_number():
select t1.*, t2.col3
from t1 left join
(select t2.*, row_number() over (partition by col2 order by col3 nulls last) as seqnum
from t2
) t2
on t2.col2 = t1.col2 and t2.seqnum = 1;
If you don't have proper keys you get an m:n-join instead of 1:n. You can calculate a row number for both tables which acts (in combination with col2) as key for the following join:
select t1.col1, t1.col2, t2.col3
from
(
select t1.*,
row_number() over (partition by col2 order by col2) as rn
from t1
) as t
left join
(
select t2.*,
row_number() over (partition by col2 order by col2) as rn
from t2
) as t2
on t1.col2 = t2.col2
and t1.rn = t2.rn;

Best self join technique when checking for duplicates

i'm trying to optimize a query that is in production which is taking a long time. The goal is to find duplicate records based on matching field values criteria and then deleting them. The current query uses a self join via inner join on t1.col1 = t2.col1 then a where clause to check the values.
select * from table t1
inner join table t2 on t1.col1 = t2.col1
where t1.col2 = t2.col2 ...
What would be a better way to do this? Or is it all the same based on indexes? Maybe
select * from table t1, table t2
where t1.col1 = t2.col1, t2.col2 = t2.col2 ...
this table has 100m+ rows.
MS SQL, SQL Server 2008 Enterprise
select distinct t2.id
from table1 t1 with (nolock)
inner join table1 t2 with (nolock) on t1.ckid=t2.ckid
left join table2 t3 on t1.cid = t3.cid and t1.typeid = t3.typeid
where
t2.id > #Max_id and
t2.timestamp > t1.timestamp and
t2.rid = 2 and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.cid,-1) = isnull(t2.cid,-1) and
isnull(t1.rid,-1) = isnull(t2.rid,-1)and
isnull(t1.typeid,-1) = isnull(t2.typeid,-1) and
isnull(t1.cktypeid,-1) = isnull(t2.cktypeid,-1) and
isnull(t1.oid,'') = isnull(t2.oid,'') and
isnull(t1.stypeid,-1) = isnull(t2.stypeid,-1)
and (
(
t3.uniqueoid = 1
)
or
(
t3.uniqueoid is null and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.col2,'') = isnull(t2.col2,'') and
isnull(t1.rdid,-1) = isnull(t2.rdid,-1) and
isnull(t1.stid,-1) = isnull(t2.stid,-1) and
isnull(t1.huaid,-1) = isnull(t2.huaid,-1) and
isnull(t1.lpid,-1) = isnull(t2.lpid,-1) and
isnull(t1.col3,-1) = isnull(t2.col3,-1)
)
)
Why self join: this is an aggregate question.
Hope you have an index on col1, col2, ...
--DELETE table
--WHERE KeyCol NOT IN (
select
MIN(KeyCol) AS RowToKeep,
col1, col2,
from
table
GROUP BY
col12, col2
HAVING
COUNT(*) > 1
--)
However, this will take some time. Have a look at bulk delete techniques
You can use ROW_NUMBER() to find duplicate rows in one table.
You can check here
The two methods you give should be equivalent. I think most SQL engines would do exactly the same thing in both cases.
And, by the way, this won't work. You have to have at least one field that is differernt or every record will match itself.
You might want to try something more like:
select col1, col2, col3
from table
group by col1, col2, col3
having count(*)>1
For table with 100m+ rows, Using GROUPBY functions and using holding table will be optimized. Even though it translates into four queries.
STEP 1: create a holding key:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
STEP 2: Push all the duplicate entries into the holddups. This is required for Step 4.
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 3: Delete the duplicate rows from the original table.
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 4: Put the unique rows back in the original table. For example:
INSERT t1 SELECT * FROM holddups
To detect duplicates, you don't need to join:
SELECT col1, col2
FROM table
GROUP BY col1, col2
HAVING COUNT(*) > 1
That should be much faster.
In my experience, SQL Server performance is really bad with OR conditions. Probably it is not the self join but that with table3 that causes the bad performance. But without seeing the plan, I would not be sure.
In this case, it might help to split your query into two:
One with a WHERE condition t3.uniqueoid = 1 and one with a WHERE condition for the other conditons on table3, and then use UNION ALL to append one to the other.