I have table(Id, Name, Type) in sql.
Id, Name, Type:
1, AA, 1
2, BB, 2
3, CC, 4
4, DD, 2
5, EE, 3
6, FF, 3
I want select the first non-duplicate data. Result:
Id, Name, Type:
1, AA, 1
2, BB, 2
3, CC, 4
6, FF, 3
I use DISTINCT and GROUP BY, but not working, I have select all row not select Type with DISTINCT or GROUP BY.
select DISTINCT Type
from tbltest
I like CTE's and ROW_NUMBER since it allows to change it easily to delete the duplicates.
Presuming that you want to remove duplicate Types and first means according to the ID:
WITH CTE AS(
SELECT Id, Name, Type,
RN = ROW_NUMBER() OVER ( PARTITION BY Type ORDER BY ID )
FROM dbo.Table1
)
SELECT Id, Name, Type FROM CTE WHERE RN = 1
You can do this in several ways. My preference is row_number():
select id, name, type
from (select t.*, row_number() over (partition by type order by id) as seqnum
from tbltest t
) t
where seqnum = 1;
EDIT:
Performance of the above should be reasonable. However, the following might be faster with an index on type, id:
selct id, name, type
from tbltest t
where not exists (select 1 from tbltest t2 where t2.type = t.type and t2.id < t.id);
That is, select the rows that have no lower id for the same type.
Related
is there a way using sql, in bigquery more specifically, to get one line per unique value in a given column
I know that this is possible using a sequence of union queries where you have as much union as distinct values as there is in the column of interest. but i'm wondering if there is a better way to do it.
You can use row_number():
select t.* except (seqnum)
from (select t.*, row_number() over (partition by col order by col) as seqnum
from t
) t
where seqnum = 1;
This returns an arbitrary row. You can control which row by adjusting the order by.
Another fun solution in BigQuery uses structs:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by col;
You can add an order by (order by X limit 1) if you want a particular row.
here is just a more formated format :
select tab.* except(seqnum)
from (
select *, row_number() over (partition by column_x order by column_x) as seqnum
from `project.dataset.table`
) as tab
where seqnum = 1
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 col UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 4, 2 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 3
)
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
with result
Row id col
1 1 1
2 4 2
3 6 3
I have two tables, table1 contains old values and table2 contains latest values, I want to show latest value in table1 but I do not have anything which tells me this is the latest value in table2.
for example
Table1
CID-----PID-----RID
CT1-----C-------R1
CT2-----C-------R2
CT3-----C-------R3
CT4-----C-------R4
Table2
CID-----PID----RID
CT1-----A-------R1
CT1-----C-------R11
CT2-----C-------R2
CT3-----A-------R3
CT4-----A-------R4
The condition is I have to give priority to value C in case both values (A and C) exist also it's RID changes so need to get that also in output table, for the same CID and for unique value I will simple replace it in table1 from table2, so output will be like this
Table3
CID-----PID----RID
CT1-----C-------R11
CT2-----C-------R2
CT3-----A-------R3
CT4-----A-------R4
I may be missing something, but isn't this simply:
select cid, max(pid)
from table2
group by cid;
If you want whole records, use a ranking with ROW_NUMBER instead:
select cid, pid, rid
from
(
select cid, pid, rid, row_number() over (partition by cid order by pid desc) as rn
from table2
)
where rn = 1;
You can also use case expressions for ranking, e.g.:
(partition by cid order by case pid when 'C' then 1 when 'A' then 2 else 3 end) as rn
UPDATE: Now that you've finally explained what you are after ...
You want more or less the second query I gave you above. Only that you want data from both tables, which you can get with UNION ALL. You can easily give each row a rank on the way:
table2 PIM C => rank #1
table2 PIM A => rank #2
table1 rank #3
Then again take the row with the best rank:
select cid, pid, rid
from
(
select cid, pid, rid, row_number() over (partition by cid order by rnk) as rn
from
(
select cid, pid, rid, case when pid = 'C' then 1 else 2 end as rnk from table2
union
select cid, pid, rid, 3 as rnk from table1
)
)
where rn = 1;
My table:
ID NUM VAL
1 1 Hello
1 2 Goodbye
2 2 Hey
2 4 What's up?
3 5 See you
If I want to return the max number for each ID, it's really nice and clean:
SELECT MAX(NUM) FROM table GROUP BY (ID)
But what if I want to grab the value associated with the max of each number for each ID?
Why can't I do:
SELECT MAX(NUM) OVER (ORDER BY NUM) FROM table GROUP BY (ID)
Why is that an error? I'd like to have this select grouped by ID, rather than partitioning separately for each window...
EDIT: The error is "not a GROUP BY expression".
You could probably use the MAX() KEEP(DENSE_RANK LAST...) function:
with sample_data as (
select 1 id, 1 num, 'Hello' val from dual union all
select 1 id, 2 num, 'Goodbye' val from dual union all
select 2 id, 2 num, 'Hey' val from dual union all
select 2 id, 4 num, 'What''s up?' val from dual union all
select 3 id, 5 num, 'See you' val from dual)
select id, max(num), max(val) keep (dense_rank last order by num)
from sample_data
group by id;
When you use windowing function, you don't need to use GROUP BY anymore, this would suffice:
select id,
max(num) over(partition by id)
from x
Actually you can get the result without using windowing function:
select *
from x
where (id,num) in
(
select id, max(num)
from x
group by id
)
Output:
ID NUM VAL
1 2 Goodbye
2 4 What's up
3 5 SEE YOU
http://www.sqlfiddle.com/#!4/a9a07/7
If you want to use windowing function, you might do this:
select id, val,
case when num = max(num) over(partition by id) then
1
else
0
end as to_select
from x
where to_select = 1
Or this:
select id, val
from x
where num = max(num) over(partition by id)
But since it's not allowed to do those, you have to do this:
with list as
(
select id, val,
case when num = max(num) over(partition by id) then
1
else
0
end as to_select
from x
)
select *
from list
where to_select = 1
http://www.sqlfiddle.com/#!4/a9a07/19
If you're looking to get the rows which contain the values from MAX(num) GROUP BY id, this tends to be a common pattern...
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num DESC) AS sequence_id,
*
FROM
yourTable
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
EDIT
I don't know if TeraData will allow this, but the logic seems to make sense...
SELECT
*
FROM
yourTable
WHERE
num = MAX(num) OVER (PARTITION BY id)
Or maybe...
SELECT
*
FROM
(
SELECT
*,
MAX(num) OVER (PARTITION BY id) AS max_num_by_id
FROM
yourTable
)
AS sub_query
WHERE
num = max_num_by_id
This is slightly different from my previous answer; if multiple records are tied with the same MAX(num), this will return all of them, the other answer will only ever return one.
EDIT
In your proposed SQL the error relates to the fact that the OVER() clause contains a field not in your GROUP BY. It's like trying to do this...
SELECT id, num FROM yourTable GROUP BY id
num is invalid, because there can be multiple values in that field for each row returned (with the rows returned being defined by GROUP BY id).
In the same way, you can't put num inside the OVER() clause.
SELECT
id,
MAX(num), <-- Valid as it is an aggregate
MAX(num) <-- still valid
OVER(PARTITION BY id), <-- Also valid, as id is in the GROUP BY
MAX(num) <-- still valid
OVER(PARTITION BY num) <-- Not valid, as num is not in the GROUP BY
FROM
yourTable
GROUP BY
id
See this question for when you can't specify something in the OVER() clause, and an answer showing when (I think) you can: over-partition-by-question
Someone please change my title to better reflect what I am trying to ask.
I have a table like
Table (id, value, value_type, data)
ID is NOT unique. There is no unique key.
value_type has two possible values, let's say A and B.
Type B is better than A, but often not available.
For each id if any records with value_type B exists, I want all the records with that id and value_type B.
If no record for that id with value_Type B exists I want all records with that id and value_type A.
Notice that if B exists for that id I don't want records with type A.
I currently do this with a series of temp tables. Is there a single select statement (sub queries OK) that can do the job?
Thanks so much!
Additional details:
SQL Server 2005
RANK, rather than ROW_NUMBER, because you want ties (those with the same B value) to have the same rank value:
WITH summary AS (
SELECT t.*,
RANK() OVER (PARTITION BY t.id
ORDER BY t.value_type DESC) AS rank
FROM TABLE t
WHERE t.value_type IN ('A', 'B'))
SELECT s.id,
s.value,
s.value_type,
s.data
FROM summary s
WHERE s.rank = 1
Non CTE version:
SELECT s.id,
s.value,
s.value_type,
s.data
FROM (SELECT t.*,
RANK() OVER (PARTITION BY t.id
ORDER BY t.value_type DESC) AS rank
FROM TABLE t
WHERE t.value_type IN ('A', 'B')) s
WHERE s.rank = 1
WITH test AS (
SELECT 1 AS id, 'B' AS value_type
UNION ALL
SELECT 1, 'B'
UNION ALL
SELECT 1, 'A'
UNION ALL
SELECT 2, 'A'
UNION ALL
SELECT 2, 'A'),
summary AS (
SELECT t.*,
RANK() OVER (PARTITION BY t.id
ORDER BY t.value_type DESC) AS rank
FROM test t)
SELECT *
FROM summary
WHERE rank = 1
I get:
id value_type rank
----------------------
1 B 1
1 B 1
2 A 1
2 A 1
SELECT *
FROM table
WHERE value_type = B
UNION ALL
SELECT *
FROM table
WHERE ID not in (SELECT distinct id
FROM table
WHERE value_type = B)
The shortest query to do the job I can think of:
SELECT TOP 1 WITH TIES *
FROM #test
ORDER BY Rank() OVER (PARTITION BY id ORDER BY value_type DESC)
This is about 50% worse on CPU as OMG Ponies' and Christoperous 5000's solutions, but the same number of reads. It's the extra sort that is making it take more CPU.
The best-performing original query I've come up with so far is:
SELECT *
FROM #test
WHERE value_type = 'B'
UNION ALL
SELECT *
FROM #test T1
WHERE NOT EXISTS (
SELECT *
FROM #test T2
WHERE
T1.id = T2.id
AND T2.value_type = 'B'
)
This consistently beats all the others presented on CPU by about 1/3rd (the others are about 50% more) but has 3x the number of reads. The duration on this query is often 2/3rds the time of all the others. I consider it a good contender.
Indexes and data types could change everything.
declare #test as table(
id int , value [nvarchar](255),value_type [nvarchar](255),data int)
INSERT INTO #test
SELECT 1, 'X', 'A',1 UNION
SELECT 1, 'X', 'A',2 UNION
SELECT 1, 'X', 'A',3 UNION
SELECT 1, 'X', 'A',4 UNION
SELECT 2, 'X', 'A',5 UNION
SELECT 2, 'X', 'B',6 UNION
SELECT 2, 'X', 'B',7 UNION
SELECT 2, 'X', 'A',8 UNION
SELECT 2, 'X', 'A',9
SELECT * FROM #test x
INNER JOIN
(SELECT id, MAX(value_type) as value_type FROM
#test GROUP BY id) as y
ON x.id = y.id AND x.value_type = y.value_type
Try this (MSSQL).
Select id, value_typeB, null
from myTable
where value_typeB is not null
Union All
Select id, null, value_typeA
from myTable
where value_typeB is null and value_typeA is not null
Perhaps something like this:
select * from mytable
where id in (select distinct id where value_type = "B")
union
select * from mytable
where id in (select distinct id where value_type = "A"
and id not in (select distinct id where value_type = "B"))
This uses a union, combining all records of value B with all records that have only A values:
SELECT *
FROM mainTable
WHERE value_type = B
GROUP BY value_type UNION SELECT *
FROM mainTable
WHERE value_type = A
AND id NOT IN(SELECT *
FROM mainTable
WHERE value_type = B);
This should be a simple question, but I can't get it to work :(
How to select rows that have the maximum column value,as group by another column?
For example,
I have the following table definition:
ID
Del_Index
docgroupviewid
The issue now is that I want to group by results by docgroupviewid first, and then choose one row from each docgroupviewid group, depending on which one has the highest del_index.
I tried
SELECT docgroupviewid, max(del_index),id FROM table
group by docgroupviewid
But instead of return me with the correct id, it returns me with the earliest id from the group with the same docgroupviewid.
Any ideas?
I've struggled with this many times myself and the solution is to think about your query differently.
I want each DocGroupViewID row where the Del_Index is the highest(max) for all rows with that DocGroupViewID:
SELECT
T.DocGroupViewID,
T.Del_Index,
T.ID
FROM MyTable T
WHERE T.Del_Index = (
SELECT MAX( T1.Del_Index ) FROM MyTable T1
WHERE T1.DocGroupViewID = T.DocGroupViewID
)
It gets more complex when more than one row can have the same Del_Index, since then you need some way to choose which one to show.
EDIT: wanted to follow up with another option
You can use the RANK() or ROW_NUMBER() functions with a CTE to get more control over the results, as follows:
-- fake a source table
DECLARE #t TABLE (
ID int IDENTITY(1,1) PRIMARY KEY,
Del_Index int,
DocGroupViewID int
)
INSERT INTO #t
SELECT 1, 1 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 1, 2 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 2, 3 UNION ALL
SELECT 3, 3 UNION ALL
SELECT 4, 3
-- show our source
SELECT * FROM #t
-- select using RANK (can have duplicates)
;WITH cteRank AS
(
SELECT
DocGroupViewID,
Del_Index,
ID,
RANK() OVER
(PARTITION BY DocGroupViewID ORDER BY Del_Index DESC)
AS RowRank,
ROW_NUMBER() OVER
(PARTITION BY DocGroupViewID ORDER BY Del_Index DESC)
AS RowNumber
FROM #t
)
SELECT *
FROM cteRank
WHERE RowRank = 1
-- select using ROW_NUMBER
;WITH cteRowNumber AS
(
SELECT
DocGroupViewID,
Del_Index,
ID,
RANK() OVER
(PARTITION BY DocGroupViewID ORDER BY Del_Index DESC)
AS RowRank,
ROW_NUMBER() OVER
(PARTITION BY DocGroupViewID ORDER BY Del_Index DESC)
AS RowNumber
FROM #t
)
SELECT *
FROM cteRowNumber
WHERE RowNumber = 1
If you have ways to sort out ties, just add it to the ORDER BY.
You will have to complicate your query a little bit:
select a.docgroupviewid, a.del_index, a.id from table a
where a.del_index = (select max(b.del_index) from table
where b.docgroupviewid = a.docgroupviewid)