Distinct Values but get all the columns - sql

In SSMS we have a view, which is returning distinct columns by the following command,
Create View [VIEWNAME] As
`Select distinct [Col1],[Col2], Max(TimeDate) as TimeDate
from [Table]
Group By [Col1],[Col2]`
I want the column [Col3] as well from the table in view.
I tried the following so far but unfortunately, it didn't work for me
Select Distint on [Col1],[Col2] * from [Table]
Error: Incorrect syntax near 'on'.
Also,
select [Col1],[Col2],[Col3],Max[TimeDate] from [Table]
Group by [Col1],[Col2],[Col3],[TimeDate]
Error: Column TABLE.Col3 is invalid in the select list because it is not contained in either an aggregated function or GROUP BY clause.
Below is the original table sample.
enter image description here
Desired table image
enter image description here
Thanks.

You can use distinct on in Postgresql.
select distinct on (col1, col2) *
from the_table
order by col1, col2, timedate desc;
I guess that your RDBMS is SQL Server. The query below uses the SQL server equivalent of Postgresql distinct on ().
select col1, col2, col3, timedate
from
(
select *, row_number() over (partition by col1,col2 order by timedate desc) as rn
from the_table
) as t
where rn = 1;

Related

Cannot recognize input near 'DELETE' 'FROM' 'CTE' in statement

I want to drop duplicates in mytable if there are identical value in col1.
WITH CTE AS
(
SELECT
*, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col1) AS RN
FROM
mytable
)
DELETE FROM CTE
WHERE RN <> 1
I got error:
Cannot recognize input near 'DELETE' 'FROM' 'CTE' in statement
I don't think Hive supports that syntax for DELETE. Try this:
DELETE FROM mytable t
WHERE t.id > (SELECT MIN(t2.id) -- some sort of unique id
FROM t t2
WHERE t2.id = t.id
);
If you have complete duplicates, then the above won't work. In the most recent versions of Hive you can use MERGE. In older versions:
create table temp_t as
select distinct t.*
from t;
truncate table t;
insert into t
select * from temp_t;
Of course, backup the table before trying this!
Alternative way: assuming you have UNIQUE ID Column.
Delete from MyTable where ID in
(SELECT ID FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col1) AS RN
FROM mytable) a where RN <> 1)

BigQuery: Use COUNT as LIMIT

I want to select everything from mytable1 and combine that with just as many rows from mytable2. In my case mytable1 always has fewer rows than mytable2 and I want the final table to be a 50-50 mix of data from each table. While I feel like the following code expresses what I want logically, it doesn't work syntax wise:
Syntax error: Expected "#" or integer literal or keyword CAST but got
"(" at [3:1]
(SELECT * FROM `mytable1`)
UNION ALL (
SELECT * FROM `mytable2`
LIMIT (SELECT COUNT(*) FROM`mytable1`)
)
Using standard SQL in bigquery
The docs state that LIMIT clause accept only literal or parameter values. I think you can ROW_NUMBER() the rows from second table and limit based on that:
SELECT col1, col2, col3
FROM mytable1
UNION ALL
SELECT col1, col2, col3
FROM (
SELECT col1, col2, col3, ROW_NUMBER() OVER () AS rn
FROM mytable2
) AS x
WHERE x.rn <= (SELECT COUNT(*) FROM mytable1)
Each SELECT statement within UNION must have the same number of
columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
As your mytable1 always less column than mytable2 so you have to put same number of column by selection
select col1,col2,col3,'' as col4 from mytable1 --in case less column you can use alias
union all
select col1,col2,col3,col4 from mytable2

How to replace a DISTINCT ON with GROUP BY in PostgreSQL 9?

I have been using the DISTINCT ON predicate and have decided to replace it with GROUP BY, mainly because it "is not part of the SQL standard and is sometimes considered bad style because of the potentially indeterminate nature of its results".
I am using DISTINCT ON in conjunction with ORDER BY in order to select the latest records in a history table, but it's not clear to me how to do the same with the GROUP BY.
What could be a general approach in order to move from one construct to the other one?
An example could be
SELECT
DISTINCT ON (f1, f2 ) *
FROM table
ORDER BY f1, f2, datefield DESC;
where I get the "latest" pairs of (f1,f2).
If you have a query like this:
select distinct on (col1) t.*
from table t
order by col1, col2
Then you would replace this with window functions, not a group by:
select t.*
from (select t.*,
row_number() over (partition by col1 order by col2) as seqnum
from table t
) t
where seqnum = 1;

SQL: windowed aggregate functions for DB2 iSeries

Is count not a valid aggregation function for row partitions for SQL DB2 on the iSeries?
This query works:
select ROW_NUMBER() over (partition by COL1, COL2 order by COL3 asc)
from MyTable
And this query gives a syntax error:
select COUNT(1) over (partition by COL1, COL2)
from MyTable
The error message is pointing at the parenthesis before the word partition:
[Message SQL0401] Token ( is not a valid token. A partial list of valid tokens is , FROM INTO.
I'm aware I can rewrite the query to avoid the row partition, but I'd like to know why this isn't working.
No, COUNT() is not the same type of function as ROW_NUMBER().
If you want the number of rows per (col1,col2) then you could simply use
select COL1, COL2, count(*)
from MyTable
group by col1, col2

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).