how to select min value from table if table has two unique values with rest of columns are identical - sql

ex:Input
ID Col1 Col2 Col3
-- ---- ---- ----
1 a a sql
2 a a hive
Out put
ID Col1 Col2 Col3
-- ---- ---- ----
1 a a sql
Here my id value and Col3 values are unique but i need to filter on min id and populate all records.
I know below approach will work, but any best approach other than this please suggest
select Col1,Col2,min(ID) from table group by Col1,Col2;
and join this on ID,Col1,Col2

I think you want row_number():
select t.*
from (select t.*, row_number() over (partition by col1, col2 order by id) as seqnum
from t
) t
where seqnum = 1

It appears that Hive supports ROW_NUMBER. Though I’ve never used hive, other rdbms would use it like this to get the entire contents of the min row without needing to join (doesn’t suffer problems if there are repeated minimum values)
SELECT a.* FROM
(
SELECT *, ROW_NUMBER() OVER(ORDER BY id) rn FROM yourtable
) a
WHERE a.rn = 1
The inner query selects all the table data and establishes an incrementing counter in order of ID. It could be based on any column, the min ID (in this case) being row number 1. If you wanted the max, order by ID desc
If you want the number to restart for different values of another column (eg of ten of your Col3 were “sql” and twenty rows had “hive”) you an say PARTITION BY col3 ORDER BY id, and the row number will be a counter that increments for identical values of col3, restarting from 1 for each distinct value of col3

Related

SQL filtering for table based on 3rd column's value

How should my SQL be so that the table mentioned above should have the expected output using SQL. The filtering happens using the col 3 value. Row with the highest col 3 value for a particular col 1 value is selected in the output.
You can use ROW_NUMBER() window function
SELECT col1, col2, col3 from (
SELECT row_number() over (partition by col1 order by col3 desc) sn, * from your_table_name
) a WHERE sn=1;
I assume you require the usage of the WHERE clause in SQL which acts as a method to filter results based on a conditional.
Considering you want all entries rather than any given entry one that matches, you can use the following (written in MySQL) which uses a nested SELECT statement:
SELECT *
FROM Table
WHERE col3=(SELECT MAX(col3) FROM Table);

SQL query to remove duplicates from a table with 139 columns and load all columns to another table

I need to remove the duplicates from a table with 139 columns based on 2 columns and load the unique rows with 139 columns into another table.
eg :
col1 col2 col3 .....col139
a b .............
b c .............
a b .............
o/p:
col1 col2 col3 .....col139
a b .............
b c .............
need a SQL query for DB2?
If the "other table" does not exist yet you can create it like this
CREATE TABLE othertable LIKE originaltable
And the insert the requested row with this statement:
INSERT INTO othertable
SELECT col1,...,coln
FROM (SELECT
t.*,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS num
FROM t) t
WHERE num = 1
There are numerous tools out there that generate queries and column lists - so if you do not want to write it by hand you could generate it with these tools or use another SQL statement to select it from the Db2 catalog table (syscat.columns).
You might be better just deleting the duplicates in place. This can be done without specifying a column list.
DELETE FROM
( SELECT
ROW_NUMBER() OVER (PARTITION BY col1, col2) AS DUP
FROM t
)
WHERE
DUP > 1
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by a) as seqnum
from t
) t;
If you don't want seqnum in the result set, though, you need to list out all the columns.
To find duplicate values in col1 or any column, you can run the following query:
SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1;
And if you want to delete those duplicate rows using the value of col1, you can run the following query:
DELETE FROM your_table WHERE col1 IN (SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1);
You can use the same approach to delete duplicate rows from the table using col2 values.

SQL DISTINCT based on a single column, but keep all columns as output

--mytable
col1 col2 col3
1 A red
2 A green
3 B purple
4 C blue
Let's call the table above mytable. I want to select only distinct values from col2:
SELECT DISTINCT
col2
FROM
mytable
When I do this the output looks like this, which is expected:
col2
A
B
C
but how do I perform the same type of query, yet keep all columns? The output would look like below. In essence I'm going through mytable looking at col2, and when there's multiple occurrences of col2 I'm only keeping the first row.
col1 col2 col3
1 A red
3 B purple
4 C blue
Do SQL functions (eg DISTINCT) have arguments I could set? I could imagine it to be something like KeepAllColumns = TRUE for this DISTINCT function? Or do I need to perform JOINs to get what I want?
You can use window functions, particularly row_number():
select t.*
from (select t.*, row_number() over (partition by col2 order by col2) as seqnum
from mytable t
) t
where seqnum = 1;
row_number() enumerates the rows, starting with "1". You can control whether you get the oldest, earliest, biggest, smallest . . .
You can use the QUALIFY clause in Teradata:
SELECT col1, col2, col3
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col2) = 1 -- Get 1st row per group
If you want to change the ordering for how to determine which col2 row to get, just change the expression in the ORDER BY.
With NOT EXISTS:
select m.* from mytable m
where not exists (
select 1 from mytable
where col2 = m.col2 and col1 < m.col1
)
This code will return the rows for which there is not another row with the same col2 and a smaller value in col1.

update value in row based on value in another row

I have a table with timestamped data.
t_stamp | col1 | col2 | col3
I want to update the value in say col3 based on values in col1 and col2 from the previous row in the table. By previous row I mean the row with the next lowest timestamp value. I also want to do this for every row in the table.
For example:
col3 = col1.prev + col2
Note: The operation here is only provided as an example. I want to calculate a value for col3 given a function of col1, col2 and/or previous values of either.
I was able to use a window function to create a SELECT query to give me the desired values for col3
SELECT lag(col1) OVER (ORDER BY t_stamp ASC) + col2 AS col3
FROM table1
but this does not update the values in the table. Can I somehow apply this to the original table? Or is there a way to format an update query in the same way?
You just need to use the FROM clause along with the query you already have:
UPDATE test set col3 = prev_col1 + prev_col2
FROM (
SELECT t_stamp,
lag(col1) OVER (ORDER BY t_stamp ASC) prev_col1,
lag(col2) OVER (ORDER BY t_stamp ASC) prev_col2
FROM test) prev
WHERE prev.t_stamp = test.t_stamp;
I think you want a cumulative sum:
SELECT (sum(col2 + co1) OVER (ORDER BY t_stamp ASC) - col1) AS col3
FROM table1
Create the first 3 columns and than use Lag to create the last col.
How to get previous row data in sql server
Use Sub queries.. will solve.. here is a sample
select col2+p_col1 from
(
SELECT col1, col2, lag(col1) OVER (ORDER BY t_stamp ASC) as p_col1
FROM table1
) t

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).