Oracle SQL statement without duplicates - sql

I have a requirement to write a SQL statement to return 2 columns, however there cannot be duplicates in either of these columns. For example:
|---------------------|------------------|
| 10 | A |
|---------------------|------------------|
| 11 | B |
|---------------------|------------------|
| 12 | C |
|---------------------|------------------|
| 13 | A | <--- Don't return
|---------------------|------------------|
Using distinct doesn't work, since the row highlighted above is distinct. It also doesn't matter which of the duplicates is returned.
Does anyone know of a way to do this? It feels as though I'm missing something obvious.
Thanks.

You can try to make row number by col2 and get rn = 1 data row.
CREATE TABLE T(
col1 int,
col2 varchar(5)
);
insert into t values (10,'A');
insert into t values (11,'B');
insert into t values (12,'C');
insert into t values (13,'A');
Query 1:
SELECT t1.col1,t1.col2
FROM (
SELECT t1.*,ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col1) rn
FROM T t1
)t1
WHERE t1.rn = 1
Results:
| COL1 | COL2 |
|------|------|
| 10 | A |
| 11 | B |
| 12 | C |

If you just want the lowest value from the first column, do:
SELECT MIN(column1), column2
FROM YourTable
GROUP BY column2

This is not posible in one query, because each column have different number of unique values

Related

SQL Server select column names from multiple tables

I have three tables in SQL Server with following structure:
col1 col2 a1 a2 ... an,
col1 col2 b1 b2 ... bn,
col1 col2 c1 c2 ... cn
The two first records are the same, col1 and col2, however the tables have different lengths.
I need to select the column names of the tables and the result I'm trying to achieve is the followig:
col1, col2, a1, b1, c1, a2, b2, c2 ...
Is there a way to do it?
It's possible but result's is combined into single column of three table tables.
For example
SELECT A.col1 +'/' +B.col1 +'/' + C.col1 As Col1 ,
A.col2 +'/' +B.col2 +'/' + C.col2 As col2 ,a1, b1, c1, a2, b2, c2 ,
* FROM A
INNER JOIN B
ON A.ID =B.ID
INNER JOIN C
ON C.ID = B.ID
SQL-Server is not the right tool to create a generic resultset. The engine needs to know what's coming out in advance. Well, you might try to find a solution with dynamic SQL...
I want to suggest two different approaches.
Both would work with any number of tables, as long as all of them have the columns col1 and col2 with appropriate types.
Let's create a simple mokcup scenario before:
DECLARE #mockup1 TABLE(col1 INT,col2 INT,SomeMore1 VARCHAR(100),SomeMore2 VARCHAR(100));
INSERT INTO #mockup1 VALUES(1,1,'blah 1.1','blub 1.1')
,(1,2,'blah 1.2','blub 1.2')
,(1,100,'not in t2','not in t2');
DECLARE #mockup2 TABLE(col1 INT,col2 INT,OtherType1 INT,OtherType2 DATETIME);
INSERT INTO #mockup2 VALUES(1,1,101,GETDATE())
,(1,2,102,GETDATE()+1)
,(1,200,200,GETDATE()+200);
--You can add as many tables as you need
A very pragmatic approach:
Try this simple FULL OUTER JOIN:
SELECT *
FROM #mockup1 m1
FULL OUTER JOIN #mockup2 m2 ON m1.col1=m2.col1 AND m1.col2=m2.col2
--add more tables here
The result
+------+------+-----------+-----------+------+------+------------+-------------------------+
| col1 | col2 | SomeMore1 | SomeMore2 | col1 | col2 | OtherType1 | OtherType2 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 1 | blah 1.1 | blub 1.1 | 1 | 1 | 101 | 2019-03-08 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 2 | blah 1.2 | blub 1.2 | 1 | 2 | 102 | 2019-03-09 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 100 | not in t2 | not in t2 | NULL | NULL | NULL | NULL |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| NULL | NULL | NULL | NULL | 1 | 200 | 200 | 2019-09-24 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
But you will have to deal with non-unique column names... (This is the moment, where a dynamically created statement can help).
A generic approach using container type XML
Whenever you do not know the result in advance, you can pack the result in a container. This allows a clear structure on the side of your RDBMS and shifts the troubles how to deal with this set to the consumer.
The cte will read all existing pairs of col1 and col2
Each table's row(s) for the pair of values is inserted as XML
Pairs not existing in any of the tables show up as NULL
Try this out
WITH AllDistinctCol1Col2Values AS
(
SELECT col1,col2 FROM #mockup1
UNION ALL
SELECT col1,col2 FROM #mockup2
--add all your tables here
)
SELECT col1,col2
,(SELECT * FROM #mockup1 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content1
,(SELECT * FROM #mockup2 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content2
FROM AllDistinctCol1Col2Values c1c2
GROUP BY col1,col2;
The result
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| col1 | col2 | Content1 | Content2 |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 1 | <row><col1>1</col1><col2>1</col2><SomeMore1>blah 1.1</SomeMore1><SomeMore2>blub 1.1</SomeMore2></row> | <row><col1>1</col1><col2>1</col2><OtherType1>101</OtherType1><OtherType2>2019-03-08T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 2 | <row><col1>1</col1><col2>2</col2><SomeMore1>blah 1.2</SomeMore1><SomeMore2>blub 1.2</SomeMore2></row> | <row><col1>1</col1><col2>2</col2><OtherType1>102</OtherType1><OtherType2>2019-03-09T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 100 | <row><col1>1</col1><col2>100</col2><SomeMore1>not in t2</SomeMore1><SomeMore2>not in t2</SomeMore2></row> | NULL |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 200 | NULL | <row><col1>1</col1><col2>200</col2><OtherType1>200</OtherType1><OtherType2>2019-09-24T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+

SQL Update all entries from a table but insert different values on thecolumn

I have a table and I want to insert in the same column for all entries different values.
The column has no value inserted yet and I like to insert a different string in it for each entry.
Is it possible? If it is, can you help me with some code?
I know how to update all entries with the same value on a column:
UPDATE table_name SET column_name = 'your_string';
You can populate a column with the different values, and use it to populate the target column.
Consider the following example using MySQL database.
Table creation:
CREATE TABLE a (
column1 varchar(20),
column2 varchar(20),
column3 varchar(50)
);
Table population:
insert into a(column1, column2) values('a','100');
insert into a(column1, column2) values('b','200');
insert into a(column1, column2) values('c','300');
Check table:
select * from a;
+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| a | 100 | NULL |
| b | 200 | NULL |
| c | 300 | NULL |
+---------+---------+---------+
Populate column3 using column2:
update a set column3=concat('value-',column2);
Check table again:
select * from a;
+---------+---------+-----------+
| column1 | column2 | column3 |
+---------+---------+-----------+
| a | 100 | value-100 |
| b | 200 | value-200 |
| c | 300 | value-300 |
+---------+---------+-----------+
(Optional) Drop column2 if not needed:
alter table a drop column column2;
select * from a;
+---------+-----------+
| column1 | column3 |
+---------+-----------+
| a | value-100 |
| b | value-200 |
| c | value-300 |
+---------+-----------+
SQL Server supports updatable CTEs and subqueries. This turns out to be pretty easy:
with toupdate as (
select t.*, row_number() over (order by (select null)) as seqnum
from t
)
update toupdate
set col = cast(seqnum as varchar(255));
The inserted value is a number converted to a string.
A simpler alternative is to use newid():
update t
set col = newid();
You can assign an id to a string column.

SQL: Use distinct on groups of similar data

Hello Mates I have the following problem in a Vertica database: I have a large Table
+------+------+------+
| Date | Col1 | Col2 |
+------+------+------+
| 1 | A | B |
| 2 | A | B |
| 3 | D | E |
| 2 | C | D |
| 1 | C | D |
+------+------+------+
As you can see I have redundant data, just taken on different dates (row 1 & 2 and row 4 & 5). So I would like a table that removes that redundant data by deleting the rows with the lower date, giving me a result like that:
+------+------+------+
| Date | Col1 | Col2 |
+------+------+------+
| 2 | A | B |
| 2 | C | D |
| 3 | D | E |
+------+------+------+
Using distinct would not work since it will delete rows randomly not considering the date, so I might end up with a table like this:
SELECT DISTINCT Col2, Col3 from Table
+------+------+------+
| Date | Col1 | Col2 |
+------+------+------+
| 2 | A | B |
| 1 | C | D |
| 3 | D | E |
+------+------+------+
which is not desired.
Is there anyway to accomplish that?
Thanks mates
Do a GROUP BY on your 2 columns and aggregate on the highest date:
SELECT MAX(Date), col1, col2
FROM table
GROUP BY Col1, Col2
I'm just generalizing the patterns here and adding one, for the exact question asked any of these methods would probably work, the devil is in the details.
The aggregate method proposed by #Thomas_G works because you only have 1 column outside the grouping. If you had two it could mix/match (some data from one row, some from another) which is not likely what you want as a duplicate handling strategy.
The analytical method proposed by #Gordon_Linoff is good, but be aware that if the date is duplicated in the source data, then you'll get multiple rows if they exist on the max date. This might be what you want, but maybe not.
Another method is to just peel off the top row in the window. It will choose the first row in the partition based on your window ordering. If there are multiples dates at the max, then you can't guarantee which one will be chosen unless you include something more in the window order. But at least you know you'll only get one row, for what it's worth.
select t.*
from (select t.*, row_number() over (partition by col1, col2 order by date desc) as rn
from t
) t
where rn = 1;
If there are other columns that you care about, you can use window functions:
select t.*
from (select t.*, max(date) over (partition by col1, col2) as maxd
from t
) t
where date = maxd;

Remove duplicates from query, while repeating

I have an SQL table with some data like this, it is sorted by date:
+----------+------+
| Date | Col2 |
+----------+------+
| 12:00:01 | a |
| 12:00:02 | a |
| 12:00:03 | b |
| 12:00:04 | b |
| 12:00:05 | c |
| 12:00:06 | c |
| 12:00:07 | a |
| 12:00:08 | a |
+----------+------+
So, I want my select result to be the following:
+----------+------+
| Date | Col2 |
+----------+------+
| 12:00:01 | a |
| 12:00:03 | b |
| 12:00:05 | c |
| 12:00:07 | a |
+----------+------+
I have used the distinct clause but it removes the last two rows with Col2 = 'a'
You can use lag (SQL Server 2012+) to get the value in the previous row and then compare it with the current row value. If they are equal assign them to one group (1 here) and a different group (0 here) otherwise. Finally select the required rows.
select dt,col2
from (
select dt,col2,
case when lag(col2,1,0) over(order by dt) = col2 then 1 else 0 end as somecol
from t) x
where somecol=0
If you are using Microsoft SQL Server 2012 or later, you can do this:
select date, col2
from (
select date, col2,
case when isnull(lag(col2) over (order by date, col2), '') = col2 then 1 else 0 end as ignore
from (yourtable)
) x
where ignore = 0
This should work as long as col2 cannot contain nulls and if the empty string ('') is not a valid value for col2. The query will need some work if either assumption is not valid.
same as accepted answer (+1) just moving the conditions
assumes col2 is not null
select dt, col2
from ( select dt, col2
lag(col2, 1) over(order by dt) as lagCol2
from t
) x
where x.lagCol2 is null or x.lagCol2 <> x.col2

Copying a SQL Server table and adding and rearranging columns

I know that if I want to make a copy of a SQL Server table, I can write a query akin to this:
SELECT *
INTO NewTable
FROM OldTable
But what if I wanted to take the contents of OldTable that may look like this:
| Column1 | Column2 | Column3 |
|---------|---------|---------|
| 1 | 2 | 3 |
| 4 | 5 | 6 |
| 7 | 8 | 9 |
and make a copy of that table but have the new table look like this:
| Column1 | Column3 | Column2 | Column4 | Column5 |
|--------- |--------- |--------- |--------- |--------- |
| 1 | 3 | 2 | 10 | 11 |
| 4 | 6 | 5 | 12 | 13 |
| 7 | 9 | 8 | 14 | 15 |
So now I've swapped Columns 2 and 3 and added Column 4 and Column 5. I don't need to have a query that will add that data to the columns, just the bare columns.
It's a matter of modifying your select statement. SELECT * takes only the columns from the source table, in their order. You want something different - so SELECT it.
SELECT * INTO NewTable
FROM OldTable
->
SELECT Col1, col3, col2, ' ' AS col4, ' ' AS col5
INTO NewTable
FROM OldTable
This gives you very little flexibility as far as how the table's columns are specced and indices and such - so it's probably a bad idea, probably better to do this another way (properly CREATE TABLE), but if you need quick and dirty, I suppose...
You can just name the columns:
Select
[Column1], [Column3], [Column2], Cast(null as bigint) as [Column4], 0 as [Column5]
Into CopyTable
From YourTable
Just like any query, it is always preferable to use the Column names and avoid using *.
You can then add any value as [ColumnX] in the select.
You can use a cast to get the type you want in the new table.