duplicate data in opposite column - sql

I have a table which has two fields lets say col1 and col2
DATA AS
col1,col2
10,age
20,30
30,param
age,10
30,20
param,30
Each row is duplicate but in reverse column order
say
10,age
age,20
In my final output i just want single row to be present among the duplicate one, so final
output will be like
col1,col2
10,age
20,30
30,param
only three rows will be left rest rows will be ignored according to the given scenario
I have tried across so many different ways but can't find the solution.
So if any of you can help or just provide an approach then it will be a great help
Thanks

select distinct col1,col2 from t t1
where col1<=col2
or not exists (select 1 from t where t.col1=t1.col2
and
t.col2=t1.col1)
SqlFiddle demo

You can do the same using Greatest and Least
select distinct
least("col1","col2") AS "col1"
,greatest("col1","col2") as "col2"
from Table1
order by "col1"
SQL Fiddle 1
According to Updated Question SQL Fiddle 2

Shouldn't this be enough?
SELECT *
FROM Data
WHERE Col1 < Col2
(if you are sure every row has a duplicate pair)

Related

How to use where clause to match the combination of multiple columns in one go without using AND or OR clauses

I wanted to match the combination of multiple columns as that from another table or the same table itself.
For example, see table1 and table2.
So, I want the output like below.
So far, I have been using AND clause to achieve this like below:
select * from table1
where col1 in (select col4 from table2)
and col2 in (select col5 from table2)
and col3 in (select col6 from table2)
But it is not giving me the exact output. So, I am looking for such a query which can work for my scenario. Any help will be appreciated.
I found the answer to my own question and SQL query for the same is given below.
select * from table1
where (col1,col2,col3) in (select col4,col5,col6 from table2);
But still, if anybody has any better solution then it will be appreciated.
What you need is INTERSECT set operator which is used to return the results of multiple (two or more) SELECT statements for those each column values should be equal for the respective order of columns.
So, consider using as below :
SELECT * FROM table1
INTERSECT
SELECT * FROM table2
Demo

SQL query for comparing two rows and getting the differences from same table

I want to compare the two rows of same ID and I just want to get difference as result.
e.g.
NOW
|---ID---||--Col_1--||--Col_2--||--Col_3--||--Col_4--|
|----1---||----2----||----4----||----5----||----6----|
|----1---||----3----||----4----||----4----||----6----|
|----2---||----2----||----3----||----3----||----2----|
RESULT
|---ID---||--Col_1--||--Col_2--||--Col_3--||--Col_4--|
|----1---||----3----||---NULL--||----4----||---NULL--|
P.S : I'm using SQL Server 2012
If I am interpreting your question correctly, you want to combine the rows for the same id and apply the following rules:
If the values are the same, then put the value in the row.
If the values are different, then put in NULL.
If there is only one row, then don't include the id.
This is an aggregation query with some filtering and comparison logic:
select id,
(case when min(col1) = max(col1) then min(col1) end) as col1,
. . .
from t
group by id
having count(*) > 1;

SQL select distinct by 2 or more columns

I have a table with a lot of columns and what I need to do is to write select that would take only unique values. The main problem is that I need to check three columns at the same time and if all three columns have same values in their columns(not between them, but in their own column) then distinct. Idea should be something like distinct(column1 and column2 and column3)
Any ideas? Or you need more information, because I'm not sure if everybody gets what I have in mind.
This is example. Select should return two rows from this, one where last column would have Yes and other row withNo`.
This is exactly what the distinct keyword is for:
SELECT distinct col1, col2, col3
FROM mytable

Fast way to eyeball possible duplicate rows in a table?

Similar: How can I delete duplicate rows in a table
I have a feeling this is impossible and I'm going to have to do it the tedious way, but I'll see what you guys have to say.
I have a pretty big table, about 4 million rows, and 50-odd columns. It has a column that is supposed to be unique, Episode. Unfortunately, Episode is not unique - the logic behind this was that occasionally other fields in the row change, despite Episode being repeated. However, there is an actually unique column, Sequence.
I want to try and identify rows that have the same episode number, but something different between them (aside from sequence), so I can pick out how often this occurs, and whether it's worth allowing for or I should just nuke the rows and ignore possible mild discrepancies.
My hope is to create a table that shows the Episode number, and a column for each table column, identifying the value on both sides, where they are different:
SELECT Episode,
CASE WHEN a.Value1<>b.Value1
THEN a.Value1 + ',' + b.Value1
ELSE '' END AS Value1,
CASE WHEN a.Value2<>b.Value2
THEN a.Value2 + ',' + b.Value2
ELSE '' END AS Value2
FROM Table1 a INNER JOIN Table1 b ON a.Episode = b.Episode
WHERE a.Value1<>b.Value1
OR a.Value2<>b.Value2
(That is probably full of holes, but the idea of highlighting changed values comes through, I hope.)
Unfortunately, making a query like that for fifty columns is pretty painful. Obviously, it doesn't exactly have to be rock-solid if it will only be used the once, but at the same time, the more copy-pasta the code, the more likely something will be missed. As far as I know, I can't just do a search for DISTINCT, since Sequence is distinct and the same row will pop up as different.
Does anyone have a query or function that might help? Either something that will output a query result similar to the above, or a different solution? As I said, right now I'm not really looking to remove the duplicates, just identify them.
Use:
SELECT DISTINCT t.*
FROM TABLE t
ORDER BY t.episode --, and whatever other columns
DISTINCT is just shorthand for writing a GROUP BY with all the columns involved. Grouping by all the columns will show you all the unique groups of records associated with the episode column in this case. So there's a risk of not having an accurate count of duplicates, but you will have the values so you can decide what to remove when you get to that point.
50 columns is a lot, but setting the ORDER BY will allow you to eyeball the list. Another alternative would be to export the data to Excel if you don't want to construct the ORDER BY, and use Excel's sorting.
UPDATE
I didn't catch that the sequence column would be a unique value, but in that case you'd have to provide a list of all the columns you want to see. IE:
SELECT DISTINCT t.episode, t.column1, t.column2 --etc.
FROM TABLE t
ORDER BY t.episode --, and whatever other columns
There's no notation that will let you use t.* but not this one column. Once the sequence column is omitted from the output, the duplicates will become apparent.
Instead of typing out all 50 columns, you could do this:
select column_name from information_schema.columns where table_name = 'your table name'
then paste them into a query that groups by all of the columns EXCEPT sequence, and filters by count > 1:
select
count(episode)
, col1
, col2
, col3
, ...
from YourTable
group by
col1
, col2
, col3
, ...
having count(episode) > 1
This should give you a list of all the rows that have the same episode number. (But just neither the sequence nor episode numbers themselves). Here's the rub: you will need to join this result set to YourTable on ALL the columns except sequence and episode since you don't have those columns here.
Here's where I like to use SQL to generate more SQL. This should get you started:
select 't1.' + column_name + ' = t2.' + column_name
from information_schema.columns where table_name = 'YourTable'
You'll plug in those join parameters to this query:
select * from YourTable t1
inner join (
select
count(episode) 'epcount'
, col1
, col2
, col3
, ...
from YourTable
group by
col1
, col2
, col3
, ...
having count(episode) > 1
) t2 on
...plug in all those join parameters here...
select count distinct ....
Should show you without having to guess. You can get your columns by viewing your table definition so you can copy/paste your non-sequence columns.
I think something like this is what you want:
select *
from t
where t.episode in (select episode from t group by episode having count(episode) > 1)
order by episode
This will give all rows that have episodes that are duplicated. Non-duplicate rows should stick out fairly obviously.
Of course, if you have access to some sort of scripting, you could just write a script to generate your query for you. It seems pretty straight-forward. (i.e. describe t and iterate over all the fields).
Also, your query should have some sort of ordering, like FROM Table1 a INNER JOIN Table1 b ON a.Episode = b.Episode AND a.Sequence < b.Sequence, otherwise you'll get duplicate non-duplicates.
A relatively simple solution that Ponies sparked:
SELECT t.*
FROM Table t
INNER JOIN ( SELECT episode
FROM Table
GROUP BY Episode
HAVING COUNT(*) > 1
) AS x ON t.episode = x.episode
And then, copy-paste into Excel, and use this as conditional highlighting for the entire result set:
=AND($C2=$C1,A2<>A1)
Column C is Episode. This way, you get a visual highlight when the data's different from the row above (as long as both rows have the same value for episode).
Generate and store a hash key for each row, designed so the hash values mirror your
definition of sameness. Depending on the complexity of your rows, updating the
hash might be a simple trigger on modifying the row.
Query for duplicates of the hash key, which are your "very probably" identical rows.

Most efficient way to select 1st and last element, SQLite?

What is the most efficient way to select the first and last element only, from a column in SQLite?
The first and last element from a row?
SELECT column1, columnN
FROM mytable;
I think you must mean the first and last element from a column:
SELECT MIN(column1) AS First,
MAX(column1) AS Last
FROM mytable;
See http://www.sqlite.org/lang_aggfunc.html for MIN() and MAX().
I'm using First and Last as column aliases.
if it's just one column:
SELECT min(column) as first, max(column) as last FROM table
if you want to select whole row:
SELECT 'first',* FROM table ORDER BY column DESC LIMIT 1
UNION
SELECT 'last',* FROM table ORDER BY column ASC LIMIT 1
The most efficient way would be to know what those fields were called and simply select them.
SELECT `first_field`, `last_field` FROM `table`;
Probably like this:
SELECT dbo.Table.FirstCol, dbo.Table.LastCol FROM Table
You get minor efficiency enhancements from specifying the table name and schema.
First: MIN() and MAX() on a text column gives AAAA and TTTT results which are not the first and last entries in my test table. They are the minimum and maximum values as mentioned.
I tried this (with .stats on) on my table which has over 94 million records:
select * from
(select col1 from mitable limit 1)
union
select * from
(select col1 from mitable limit 1 offset
(select count(0) from mitable) -1);
But it uses up a lot of virtual machine steps (281,624,718).
Then this which is much more straightforward (which works if the table was created without WITHOUT ROWID) [sql keywords are in capitals]:
SELECT col1 FROM mitable
WHERE ROWID = (SELECT MIN(ROWID) FROM mitable)
OR ROWID = (SELECT MAX(ROWID) FROM mitable);
That ran with 55 virtual machine steps on the same table and produced the same answer.
min()/max() approach is wrong. It is only correct, if the values are ascending only. I needed something liket this for currency rates, which are random raising and falling.
This is my solution:
select st.*
from stats_ticker st,
(
select min(rowid) as first, max(rowid) as last --here is magic part 1
from stats_ticker
-- next line is just a filter I need in my case.
-- if you want first/last of the whole table leave it out.
where timeutc between datetime('now', '-1 days') and datetime('now')
) firstlast
WHERE
st.rowid = firstlast.first --and these two rows do magic part 2
OR st.rowid = firstlast.last
ORDER BY st.rowid;
magic part 1: the subselect results in a single row with the columns first,last containing rowid's.
magic part 2 easy to filter on those two rowid's.
This is the best solution I've come up so far. Hope you like it.
We can do that by the help of Sql Aggregate function, like Max and Min. These are the two aggregate function which help you to get last and first element from data table .
Select max (column_name ), min(column name) from table name
Max will give you the max value means last value and min will give you the min value means it will give you the First value, from the specific table.