Add key to unique values in the SQl database - sql

My SQL data looks like this:
Col1
A
A
A
B
B
C
D
I want to add a key to only unique values. So the end result will look like this:
Col1 Col2
A 1
A 1
A 1
B 2
B 2
C 3
D 3
How can I do this?

You can do this with the dense_rank() window function:
select col1, dense_rank() over (order by col1) as col2
from t;
This solves the problem as a query. If you want to actually change the table, then the code is more like:
alter table t add col2 int;
with toupdate as (
select t.*, dense_rank() over (order by col1) as newcol2
from t
)
update toupdate
set col2 = newcol2;

Related

SQL query to remove duplicates from a table with 139 columns and load all columns to another table

I need to remove the duplicates from a table with 139 columns based on 2 columns and load the unique rows with 139 columns into another table.
eg :
col1 col2 col3 .....col139
a b .............
b c .............
a b .............
o/p:
col1 col2 col3 .....col139
a b .............
b c .............
need a SQL query for DB2?
If the "other table" does not exist yet you can create it like this
CREATE TABLE othertable LIKE originaltable
And the insert the requested row with this statement:
INSERT INTO othertable
SELECT col1,...,coln
FROM (SELECT
t.*,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS num
FROM t) t
WHERE num = 1
There are numerous tools out there that generate queries and column lists - so if you do not want to write it by hand you could generate it with these tools or use another SQL statement to select it from the Db2 catalog table (syscat.columns).
You might be better just deleting the duplicates in place. This can be done without specifying a column list.
DELETE FROM
( SELECT
ROW_NUMBER() OVER (PARTITION BY col1, col2) AS DUP
FROM t
)
WHERE
DUP > 1
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by a) as seqnum
from t
) t;
If you don't want seqnum in the result set, though, you need to list out all the columns.
To find duplicate values in col1 or any column, you can run the following query:
SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1;
And if you want to delete those duplicate rows using the value of col1, you can run the following query:
DELETE FROM your_table WHERE col1 IN (SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1);
You can use the same approach to delete duplicate rows from the table using col2 values.

SQL - how to select only those rows up to a certain row by rownumber

I'm having difficulty constructing a SQL query for loading a file into a database.
Let's say my file looks like this:
col1 col2 col3 col4
a b c d
e f g h
end null null null
no interesting data here
not interested in this
bla bla bla bla
I want to select ONLY the rows up to the 'end' row. So:
col1 col2 col3 col4
a b c d
e f g h
The end-row and thereafter are not interesting for my application. The end row is at a different row number for each file we need to load. So I can't use a counter or a TOP statement for this.
How would I go about this?
I suspect I'd need to take care of this in the WHERE clause as follows:
SELECT *
FROM data
WHERE rownumber < endrownumber
Where I'm stuck is the following:
how to find the rownumber where col1 = 'end' (as the endrownumber)
how to only select those rows where the rownumber < endrownumber
Is this possible to achieve with a sub-query within the where clause?
Thanks!
T
The following will get the result you're looking for:
WITH view_1 AS (
SELECT
ROW_NUMBER() OVER(
ORDER BY
col_1
) AS row_num,
col_1,
col_2,
col_3,
col_4
FROM
your_table
)
SELECT
*
FROM
view_1
WHERE
row_num < (
SELECT
row_num
FROM
view_1
WHERE
col_1 = 'end'
);
I have created a view which will give each row a row number and will appear as a new column.
The view is then queried just below to check what the above rownumber is for col_1 which = 'end'.
It will then only bring through the rows which are less than it.
I was using Oracle to do all of the above and works as expected.

SQL DISTINCT based on a single column, but keep all columns as output

--mytable
col1 col2 col3
1 A red
2 A green
3 B purple
4 C blue
Let's call the table above mytable. I want to select only distinct values from col2:
SELECT DISTINCT
col2
FROM
mytable
When I do this the output looks like this, which is expected:
col2
A
B
C
but how do I perform the same type of query, yet keep all columns? The output would look like below. In essence I'm going through mytable looking at col2, and when there's multiple occurrences of col2 I'm only keeping the first row.
col1 col2 col3
1 A red
3 B purple
4 C blue
Do SQL functions (eg DISTINCT) have arguments I could set? I could imagine it to be something like KeepAllColumns = TRUE for this DISTINCT function? Or do I need to perform JOINs to get what I want?
You can use window functions, particularly row_number():
select t.*
from (select t.*, row_number() over (partition by col2 order by col2) as seqnum
from mytable t
) t
where seqnum = 1;
row_number() enumerates the rows, starting with "1". You can control whether you get the oldest, earliest, biggest, smallest . . .
You can use the QUALIFY clause in Teradata:
SELECT col1, col2, col3
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col2) = 1 -- Get 1st row per group
If you want to change the ordering for how to determine which col2 row to get, just change the expression in the ORDER BY.
With NOT EXISTS:
select m.* from mytable m
where not exists (
select 1 from mytable
where col2 = m.col2 and col1 < m.col1
)
This code will return the rows for which there is not another row with the same col2 and a smaller value in col1.

row_number or rank similar data

I have a table like this:
col1 col2
A A
A A
A F
B B
B B
B H
C L
A A
A A
A A
A E
C C
C C
C C
C C
C C
C J
And I want result like this:
col1 count
A 3
B 3
C 1
A 4
C 6
If the col1 <> col2 reset count... But I only want sql code not pl-sql etc.
Maybe row_number() over(RESET WHEN col1<>col2).
Please help me.
Ok freinds thank you. sorry for my bad english.
In fact my table like this :
id col1 col2
1000 A A
2000 A A
3000 A F
4000 B B
5000 B B
6000 B H
7000 C L
8000 A A
9000 A A
10000 A A
11000 A E
12000 C C
13000 C C
14000 C C
15000 C C
16000 C C
17000 C J
Id column is unique and has ordered values always. Maybe this will help us to solve problem. Sorry for my missing information to you. And I want solution like above.
I only want col1 and count. But not col1 unique, count must be 1,2,3 bla bla bla... until col1 <> col2...
After this row count must be reset.
First, I'd like to note, that without having an ORDER BY clause, you cannot guarantee the order of the results. To do this sort of calculation, it would be useful to have an identity (auto-incremental) field to establish an order.
That said, you can attempt to use ROW_NUMBER() to create a field to order on.
with yourtablewithrn as (
select col1, col2, row_number() over (order by (select null)) rn
from yourtable
),
yourtablegrouped as (
select *,
rn - row_number() over (partition by col1 order by rn) as grp
from yourtablewithrn
)
select col1,
count(col2) AS cnt
from yourtablegrouped
group by col1, grp
order by min(rn)
SQL Fiddle Demo
As mentioned above I agree with sgeddes that we need some kind of order that we can rely on for this kind of problem. row_number() over () wont do since it more or less is a random number:
create table yourtable
( n int
, col1 varchar(1)
, col2 varchar(1));
insert into yourtable values
(1,'A','A'),
(2,'A','A'),
(3,'A','F'),
(4,'B','B'),
(5,'B','B'),
(6,'B','H'),
(7,'C','L'),
(8,'A','A'),
(9,'A','A'),
(10,'A','A'),
(11,'A','E'),
(12,'C','C'),
(13,'C','C'),
(14,'C','C'),
(15,'C','C'),
(16,'C','C'),
(17,'C','J');
For this sample data col2 has no impact. We could do (a slight variation of sgeddes solution):
select col1, count(1)
from (
select n, col1
, col2
, row_number() over (order by n)
- row_number() over (partition by col1
order by n) as grp
from yourtable
) t
group by col1, grp
order by min(n)
But, what should the result be with a sample like below?
delete from yourtable;
insert into yourtable values
('A','A'),
('A','A'),
('A','F'),
('A','A'),
('A','G');

DISTINCT for only one Column and other column random?

I have one Table name Demodata which have two column col1 and col2. data of table is
col1 col2
1 5
1 6
2 7
3 8
3 9
4 10
and after SELECT command we need this data
col1 Col2
1 5
6
2 7
3 8
9
4 10
is this possible then what is query please guide me
Try this
SELECT CASE WHEN RN > 1 THEN NULL ELSE Col1 END,Col2
FROM
(
SELECT *,Row_Number() Over(Partition by col1 order by col1) AS RN
From yourTable
) AS T
No it is not possible.
SQL Server result sets are row based not tree based. You must have a value for each column (alternatively a NULL value).
What you can do is grouping by col1 and run an aggregate function on the values of col2 (possibly the STUFF function).
You can do this in SQL, using row_number():
select (case when row_number() over (partition by col1 order by col2) = 1
then col1
end), col2
from table t
order by col1, col2;
Notice that the ordering is important. The way you have written the result set, the data is ordered by col1 and then col2. Result sets do not have an inherent ordering, unless you include an order by clause.
Also, I have used NULL for the missing values.
And, finally, although this can be done in SQL, it is often preferable to do these types of manipulations on the client side.
What do you want to select on the duplicates, an empty string, NULL, 0, ... ?
I presume NULL, you can use a CTE with ROW_NUMBER and CASE on col1:
WITH CTE AS(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY col1
ORDER BY (SELECT 1))
, col1, col2
FROM Demodata
)
SELECT col1 = CASE WHEN RN = 1 THEN col1 ELSE NULL END, col2
FROM CTE
Demo