I have a table containing the following columns:
seq_no | detail_no |user_id |guid_key
and another table with the following:
header| guid_key| date_entered| login_details| summary_transaction| trailer
Now I wish to map the two tables together such that the final answer is this way:
The first row should be value from header and subsequent rows should be values from seq_no, detail_no and user_id. There will be multiple rows of seq_no, detail_no and user_id. The last row should be the trailer.
The first table contains multiple rows that i need to be referenced to multiple rows in the second table. I'm new to SQL programming. I've looked up some many-many relationships but am unable to find an efficient way to do this. I am using a guid generator to write unique keys to both the tables. However, they key is not unique per row but rather a set of rows--> like for a set of data.
SQL is not a presentation tool which is what you're trying to achieve here. You're also trying to make two results sets present as one result set (you want columns from table 1 on some rows on columns from table 2 on others). You could do something like this by
create table #temp (ID int identity(1,1), col1 varchar(200), col2 varchar(200) etc.)
insert into #temp col1, col2, etc values (select seq_no, guid_key, detail_no, user_id from table1)
insert into #temp col1, col2 etc. values (select seq_no, guid_key, header, date_entered, login_details, summary_transaction from table1 inner join table2 on table1.guid_key = table2.guidkey where trailer is null)
insert into #temp col1, col2 etc. values (select seq_no, guid_key, trailer, date_entered, login_details, summary_transaction from table1 inner join table2 on table1.guid_key = table2.guidkey where trailer is not null)
select * from #temp order by col1, ID
So you enter all the headers first followed by the detail, followed by the trailer. Thus when you order by seq_no and ID they will come out in the order desired.
Related
I have an Oracle table T where there are multiple records with different startdates in them. I would like to delete all but to keep the one with the greatest dates among the same combination of col1,col2,col3. In this example, I want to keep the one with date as 31-May-17 and delete the other two. What would be the best possible way to achieve this in a single query without creating another staging table?
Test scripts -
create table t
(col1 number(10)
,col2 number(10)
,col3 number(10)
,col4 number(10)
,col5 date
);
insert into t values (15731,467,4087,14427,'09-Apr-17');
insert into t values (15731,467,4087,17828,'31-May-17');
insert into t values (15731,467,4087,15499,'16-Apr-17');
commit;
select * from t;[enter image description here][1]
Based on the data above, I would like to keep only the record where the date is 31-May-17 since that is the greatest of the dates having same combination of col1,col2,col3 and delete the remaining two off the table. Note that there are millions other records such as above on this table.
Apologize if this is too naive a question for Oracle experts but I am very new trying my hands on in Oracle db at this place.
You can order the rows by the absolute value of the difference between the date and sysdate. You can then use the rowid pseudocolumn to correlate between the query and the delete statement:
DELETE FROM t
WHERE rowid NOT IN (SELECT r
FROM (SELECT rowid AS r,
ROW_NUMBNER() OVER
(PARTITION BY col1, col2, col3
ORDER BY ABS(SYSDATE - col5) ASC) AS rn
FROM t)
WHERE rn = 1)
Since this is tagged oracle12c, you might as well take advantage of its features. For example, using MATCH_RECOGNIZE:
delete from t
where rowid not in (
select rowid
from t
match_recognize (
partition by col1, col2, col3
order by col5 desc
all rows per match
pattern ( ^ a x* )
define x as x.col5 = a.col5
)
)
;
This assumes you want to keep all the rows "tied" for latest start-date for a given combination of COL1, COL2, COL3. The solution can be adapted for variations of the requirement.
I have a dataset in Excel where I have a few thousand id's. In a database I need a few columns to match these ids but some ids are listed twice in the Excel list (and they need to be there twice). I'm trying to write a query with an IN statement, but it automatically filters the duplicates. But I want the duplicates as well, otherwise I need to manually rearrange the data merge between the Excel and SQL results.
Is there any way to do something like
SELECT *
FROM table
WHERE id IN (
.. list of thousands ids
)
To also get the duplicates without using UNION ALL to prevent from firing thousands of seperate queries to the database?
You need to use a left join if you want to keep the duplicates. If the ordering is important, then you should include that information as well.
Here is one method:
select t.*
from (values (1, id1), (2, id2), . . .
) ids(ordering, id) left join
table t
on t.id = ids.id
order by ids.ordering;
An alternative is to load the ids into a temporary table with an identity column to capture the ordering:
# Create the table
create table #ids (
ordering int identity(1, 1) primary key,
id
);
# Insert the ids
insert into #ids (id)
select #id;
# Use them in the query
select t.*
from #ids ids left join
table t
on t.id = ids.id
order by ids.ordering;
If I understand this correctly this is exactly the way IN is supposed to work...
DECLARE #tbl TABLE(value INT, content VARCHAR(100));
WITH RunningNummber AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
FROM sys.objects
)
INSERT INTO #tbl
SELECT Nmbr,'Content for ' + CAST(Nmbr AS VARCHAR(100))
FROM RunningNummber;
--This ...
SELECT * FROM #tbl WHERE value IN(1,3,5);
-- ... is the same as this:
SELECT * FROM #tbl WHERE value IN(1,1,1,1,3,3,5,1,3,5);
If you want to combine two result-sets you have to join them...
In my opinion, I believe that is better that you import in a table the values corresponding with the list of thousands ids and apply a Subquery to get the information that you will need.
Even once time that you got all the ids in the target table you can filter with a T-SQL for deleting duplicates values and avoid any future problem.
So for each distinct value in a column of one table I want to insert that unique value into a row of another table.
list = select distinct(id) from table0
for distinct_id in list
insert into table1 (id) values (distinct_id)
end
Any ideas as to how to go about this?
Whenever you think about doing something in a loop, step back, and think again. SQL is optimized to work with sets. You can do this using a set-based query without the need to loop:
INSERT dbo.table1(id) SELECT DISTINCT id FROM dbo.table0;
There are some edge cases where looping can make more sense, but as SQL Server matures and more functionality is added, those edge cases get narrower and narrower...
insert into table1 (id)
select distinct id from table0
The following statement works with me.
insert into table1(col1, col2) select distinct on (col1) col1 col2 from table0
The below query will also check the existing data in the Table2.
INSERT INTO Table2(Id) SELECT DISTINCT Id FROM Table1 WHERE Id NOT IN(SELECT Id FROM Table2);
Other Simple way to copy distinct data with multiple columns from one table to other
Insert into TBL2
Select * from (Select COL1, ROW_NUMBER() over(PARTITION BY COL1 Order By COL1) AS COL2 From TBL1)T
where T.COL2 = 1
Strange question, I know. I don't want to delete all the rows and start again, but we have a development database table where some of the rows have duplicate IDs, but different values.
I want to delete all records with duplicate IDs, so I can force data integrity on the table for the new version and build relationships. At the moment it's an ID that is inserted and generated by code (legacy).
From another question I got this:
delete
t1
from
tTable t1, tTable t2
where
t1.locationName = t2.locationName and
t1.id > t2.id
But this won't work as the IDs are the same!
How can I delete all but one record where IDs are the same? That is, delete where the count of records with the same ID > 1? If that's not possible, then deleting all records with duplicate IDs would be fine.
In SQL Server 2005 and above:
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY locationName ORDER BY id) rn
FROM tTable
)
DELETE
FROM q
WHERE rn > 1
Depends on your DB server, but you can associate DELETE and LIMIT (mysql) or TOP (sql server).
You could also move the first (not duplicate) of each record to a temp table, delete the original table and copy the temp table back to the original one.
Not sure for mysql but for a MSServer database you could use the following
SET IDENTITY_INSERT [tablename] ON
SELECT DISTINCT col1, col2, col3 INTO temp_[tablename] FROM [tablename]
ALTER TABLE temp_[tablename] ADD IDcol INT IDENTITY
TRUNCATE TABLE [tablename]
INSERT INTO [tablename](IDcol, col1, col2, col3) SELECT IDcol, col1, col2, col3 FROM temp_[tablename]
DROP TABLE temp_[tablename]
Hope this helps.
I have a table with say 3 columns. There's no primary key so there can be duplicate rows. I need to just keep one and delete the others. Any idea how to do this is Sql Server?
I'd SELECT DISTINCT the rows and throw them into a temporary table, then drop the source table and copy back the data from the temp.
EDIT: now with code snippet!
INSERT INTO TABLE_2
SELECT DISTINCT * FROM TABLE_1
GO
DELETE FROM TABLE_1
GO
INSERT INTO TABLE_1
SELECT * FROM TABLE_2
GO
Add an identity column to act as a surrogate primary key, and use this to identify two of the three rows to be deleted.
I would consider leaving the identity column in place afterwards, or if this is some kind of link table, create a compound primary key on the other columns.
The following example works as well when your PK is just a subset of all table columns.
(Note: I like the approach with inserting another surrogate id column more. But maybe this solution comes handy as well.)
First find the duplicate rows:
SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
If there are only few, you can delete them manually:
set rowcount 1
delete from t1
where col1=1 and col2=1
The value of "rowcount" should be n-1 times the number of duplicates. In this example there are 2 dulpicates, therefore rowcount is 1. If you get several duplicate rows, you have to do this for every unique primary key.
If you have many duplicates, then copy every key once into anoher table:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
Then copy the keys, but eliminate the duplicates.
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
In your keys you have now unique keys. Check if you don't get any result:
SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2
Delete the duplicates from the original table:
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
Insert the original rows:
INSERT t1 SELECT * FROM holddups
btw and for completeness: In Oracle there is a hidden field you could use (rowid):
DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3... ;
see: Microsoft Knowledge Site
Here's the method I used when I asked this question -
DELETE MyTable
FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
This is a way to do it with Common Table Expressions, CTE. It involves no loops, no new columns or anything and won't cause any unwanted triggers to fire (due to deletes+inserts).
Inspired by this article.
CREATE TABLE #temp (i INT)
INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (2)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (4)
SELECT * FROM #temp
;
WITH [#temp+rowid] AS
(SELECT ROW_NUMBER() OVER (ORDER BY i ASC) AS ROWID, * FROM #temp)
DELETE FROM [#temp+rowid] WHERE rowid IN
(SELECT MIN(rowid) FROM [#temp+rowid] GROUP BY i HAVING COUNT(*) > 1)
SELECT * FROM #temp
DROP TABLE #temp
This is a tough situation to be in. Without knowing your particular situation (table size etc) I think that your best shot is to add an identity column, populate it and then delete according to it. You may remove the column later but I would suggest that you should keep it as it is really a good thing to have in the table
After you clean up the current mess you could add a primary key that includes all the fields in the table. that will keep you from getting into the mess again.
Of course this solution could very well break existing code. That will have to be handled as well.
Can you add a primary key identity field to the table?
Manrico Corazzi - I specialize in Oracle, not MS SQL, so you'll have to tell me if this is possible as a performance boost:-
Leave the same as your first step - insert distinct values into TABLE2 from TABLE1.
Drop TABLE1. (Drop should be faster than delete I assume, much as truncate is faster than delete).
Rename TABLE2 as TABLE1 (saves you time, as you're renaming an object rather than copying data from one table to another).
Here's another way, with test data
create table #table1 (colWithDupes1 int, colWithDupes2 int)
insert into #table1
(colWithDupes1, colWithDupes2)
Select 1, 2 union all
Select 1, 2 union all
Select 2, 2 union all
Select 3, 4 union all
Select 3, 4 union all
Select 3, 4 union all
Select 4, 2 union all
Select 4, 2
select * from #table1
set rowcount 1
select 1
while ##rowcount > 0
delete #table1 where 1 < (select count(*) from #table1 a2
where #table1.colWithDupes1 = a2.colWithDupes1
and #table1.colWithDupes2 = a2.colWithDupes2
)
set rowcount 0
select * from #table1
What about this solution :
First you execute the following query :
select 'set rowcount ' + convert(varchar,COUNT(*)-1) + ' delete from MyTable where field=''' + field +'''' + ' set rowcount 0' from mytable group by field having COUNT(*)>1
And then you just have to execute the returned result set
set rowcount 3 delete from Mytable where field='foo' set rowcount 0
....
....
set rowcount 5 delete from Mytable where field='bar' set rowcount 0
I've handled the case when you've got only one column, but it's pretty easy to adapt the same approach tomore than one column. Let me know if you want me to post the code.
How about:
select distinct * into #t from duplicates_tbl
truncate duplicates_tbl
insert duplicates_tbl select * from #t
drop table #t
I'm not sure if this works with DELETE statements, but this is a way to find duplicate rows:
SELECT *
FROM myTable t1, myTable t2
WHERE t1.field = t2.field AND t1.id > t2.id
I'm not sure if you can just change the "SELECT" to a "DELETE" (someone wanna let me know?), but even if you can't, you could just make it into a subquery.