Get unique rows from a cross reference table where all items reference each other - sql

I have a cross reference table of part numbers (PN). There are 2 columns, PN and ALT_PN. All part numbers cross reference each other.
I need to create a report showing only the unique values in this table. Example, only show that A has alternate of B, and not show that B is an alternate of A.
I found solutions for Mysql that work, but they don't work in Oracle 11g.
Create table temp ( id integer primary key, PN varchar(10), Alt_PN
varchar(10));
insert into temp values(1,'A','B');
insert into temp values(2,'B','A');
insert into temp values(3,'X','Y');
insert into temp values(4,'Y','X');
insert into temp values(5,'C','D');
insert into temp values(6,'C','E');
insert into temp values(7,'D','C');
insert into temp values(8,'D','E');
insert into temp values(9,'E','C');
insert into temp values(10,'E','D');
i only want to return IDs of 1, 3, 5, 6 and 8

If all cross reference each other, just do:
select t.*
from temp t
where t.pn < t.alt_pn;
This will return one row from each pair, and works on any type.
If you are concerned that not all pairs are present, you can do:
select t.*
from (select t.*,
row_number() over (partition by least(t.pn, t.alt_pn), least(t.pn, t.alt_pn) order by t.pn) as seqnum
from t
) t
where seqnum = 1;

Related

A sql query to create multiple rows in different tables using inserted id

I need to insert a row into one table and use this row's id to insert two more rows into a different table within one transaction. I've tried this
begin;
insert into table default values returning table.id as C;
insert into table1(table1_id, column1) values (C, 1);
insert into table1(table1_id, column1) values (C, 2);
commit;
But it doesn't work. How can I fix it?
updated
You need a CTE, and you don't need a begin/commit to do it in one transaction:
WITH inserted AS (
INSERT INTO ... RETURNING id
)
INSERT INTO other_table (id)
SELECT id
FROM inserted;
Edit:
To insert two rows into a single table using that id, you could do that two ways:
two separate INSERT statements, one in the CTE and one in the "main" part
a single INSERT which joins on a list of values; a row will be inserted for each of those values.
With these tables as the setup:
CREATE TEMP TABLE t1 (id INTEGER);
CREATE TEMP TABLE t2 (id INTEGER, t TEXT);
Method 1:
WITH inserted1 AS (
INSERT INTO t1
SELECT 9
RETURNING id
), inserted2 AS (
INSERT INTO t2
SELECT id, 'some val'
FROM inserted1
RETURNING id
)
INSERT INTO t2
SELECT id, 'other val'
FROM inserted1
Method 2:
WITH inserted AS (
INSERT INTO t1
SELECT 4
RETURNING id
)
INSERT INTO t2
SELECT id, v
FROM inserted
CROSS JOIN (
VALUES
('val1'),
('val2')
) vals(v)
If you run either, then check t2, you'll see it will contain the expected values.
Please find the below query:
insert into table1(columnName)values('stack2');
insert into table_2 values(SCOPE_IDENTITY(),'val1','val2');

Hive- Delete duplicate rows using ROW_NUMBER()

How to delete duplicates using row_number() without listing all the columns from the table. I have a hive table with 50+ columns. If i want to delete duplicates based on a 2 columns below are the steps i followed
Create temp table as Create temptable as select * from (select
*,row_number() over(col1,col2) as rn from maintable) where rn=1)
Insert overwrite table maintable select * from temptable
But here in insert it fails because the new column rn is present in temptable; To avoid this column i would have to list all the rest of the columns.
And there is no Drop column option in hive. There also you need to use REPLACE function which again needs listing all the rest of the columns.
So any better idea for deleting duplicates in Hive based on 2 columns?
Spell out all column names from the original table for insert overwrite as the query computes a new column. No temp table is needed for this.
Insert overwrite table maintable
select col1,col2,col3 ---...col50
from (select t.*
,row_number() over(order by col1,col2) as rn
from maintable
) t
where rn = 1

T-SQL: select * from table where column in (...) with duplicates without using union all

I have a dataset in Excel where I have a few thousand id's. In a database I need a few columns to match these ids but some ids are listed twice in the Excel list (and they need to be there twice). I'm trying to write a query with an IN statement, but it automatically filters the duplicates. But I want the duplicates as well, otherwise I need to manually rearrange the data merge between the Excel and SQL results.
Is there any way to do something like
SELECT *
FROM table
WHERE id IN (
.. list of thousands ids
)
To also get the duplicates without using UNION ALL to prevent from firing thousands of seperate queries to the database?
You need to use a left join if you want to keep the duplicates. If the ordering is important, then you should include that information as well.
Here is one method:
select t.*
from (values (1, id1), (2, id2), . . .
) ids(ordering, id) left join
table t
on t.id = ids.id
order by ids.ordering;
An alternative is to load the ids into a temporary table with an identity column to capture the ordering:
# Create the table
create table #ids (
ordering int identity(1, 1) primary key,
id
);
# Insert the ids
insert into #ids (id)
select #id;
# Use them in the query
select t.*
from #ids ids left join
table t
on t.id = ids.id
order by ids.ordering;
If I understand this correctly this is exactly the way IN is supposed to work...
DECLARE #tbl TABLE(value INT, content VARCHAR(100));
WITH RunningNummber AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
FROM sys.objects
)
INSERT INTO #tbl
SELECT Nmbr,'Content for ' + CAST(Nmbr AS VARCHAR(100))
FROM RunningNummber;
--This ...
SELECT * FROM #tbl WHERE value IN(1,3,5);
-- ... is the same as this:
SELECT * FROM #tbl WHERE value IN(1,1,1,1,3,3,5,1,3,5);
If you want to combine two result-sets you have to join them...
In my opinion, I believe that is better that you import in a table the values corresponding with the list of thousands ids and apply a Subquery to get the information that you will need.
Even once time that you got all the ids in the target table you can filter with a T-SQL for deleting duplicates values and avoid any future problem.

finding duplicates and removing but keeping one value [duplicate]

This question already has answers here:
Delete duplicate records in SQL Server?
(10 answers)
Closed 9 years ago.
I currently have a URL redirect table in my database that contains ~8000 rows and ~6000 of them are duplicates.
I was wondering if there was a way I could delete these duplicates based on a certain columns value and if it matches, I am looking to use my "old_url" column to find duplicates and I have used
SELECT old_url
,DuplicateCount = COUNT(1)
FROM tbl_ecom_url_redirect
GROUP BY old_url
HAVING COUNT(1) > 1 -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates
however I'm not sure what I can do to remove them now as I don't want to lose every single one, just the duplicates. They are almost a match completely apart from sometimes the new_url is different and the url_id (GUID) is different in each time
In my opinion ranking functions and a CTE are the easiest approach:
WITH CTE AS
(
SELECT old_url
,Num = ROW_NUMBER()OVER(PARTITION BY old_url ORDER BY DateColumn ASC)
FROM tbl_ecom_url_redirect
)
DELETE FROM CTE WHERE Num > 1
Change ORDER BY DateColumn ASC accordingly to determine which records should be deleted and which record should be left alone. In this case i delete all newer duplicates.
If your table has a primary key then this is easy:
BEGIN TRAN
CREATE TABLE #T(Id INT, OldUrl VARCHAR(MAX))
INSERT INTO #T VALUES
(1, 'foo'),
(2, 'moo'),
(3, 'foo'),
(4, 'moo'),
(5, 'foo'),
(6, 'zoo'),
(7, 'foo')
DELETE FROM #T WHERE Id NOT IN (
SELECT MIN(Id)
FROM #T
GROUP BY OldUrl
HAVING COUNT(OldUrl) = 1
UNION
SELECT MIN(Id)
FROM #T
GROUP BY OldUrl
HAVING COUNT(OldUrl) > 1)
SELECT * FROM #T
DROP TABLE #T
ROLLBACK
this is the sample to delete multiple record with guid, hope it can help u=)
DECLARE #t1 TABLE
(
DupID UNIQUEIDENTIFIER,
DupRecords NVARCHAR(255)
)
INSERT INTO #t1 VALUES
(NEWID(),'A1'),
(NEWID(),'A1'),
(NEWID(),'A2'),
(NEWID(),'A1'),
(NEWID(),'A3')
so now, a duplicated record with guid is created in #t1
;WITH CTE AS(
SELECT DupID,DupRecords, Rn = ROW_NUMBER()
OVER (PARTITION BY DupRecords ORDER BY DupRecords)
FROM #t1
)
DELETE FROM #t1 WHERE DupID IN (SELECT DupID FROM CTE WHERE RN>1)
with query above, duplicated record is deleted from #t1, i use Row_number() to distinct each of the records
SELECT * FROM #t1

How to Generate row number as same as inserted without create or insert or cte

I've this script:
Create table #temp (id int,name varchar(10),city varchar(10),sal int)
Insert into #temp
Select 2,'kishor','hyd', 100
Union all
Select 3,'kumar','sec', 200
Union all
Select 4,'santosh','kp', 300
Union all
Select 1,'sudeep','myp', 300
now I want to generate row number as same as data inserted without using a create or insert or CTE or Update commands, using a single select statement.
So that even after sorting by any order the row number column should not change its values
You should add auto-increment field to your table to save information about inserting order.
And then use for example ROW_NUMBER() to get a rownumber:
select #temp.*,
ROW_NUMBER() over (order by <your autoincrement field here> ) as RowNumber
from #temp