I have a table that has millions of records.
So I might have these columns
a, b, c, d
I need to select all the distinct records based on columns a and b.
But I need to select columns a, b, c and d not just a and b.
Can I do this?
edit
Data might be
1,1,frog,green
1,1,frog,brown
2,1,cat,black
2,4,dog,white
so i need;
1,1,frog,green
2,1,cat,black
2,4,dog,white
SQL Server supports Common Table Expression and Window Function. The query below uses ROW_NUMBER() which ranks the record according to group. It sorts by c ASC, d ASC (just play with it).
WITH records
AS
(
SELECT a, b, c, d,
ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY c, d) rn
FROM TableName
)
SELECT a, b, c, d
FROM records
WHERE rn = 1
SQLFiddle Demo
TSQL Ranking Functions
partition by is your man
SELECT a, b, c, d FROM (
SELECT a, b, c, d, ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY a, b) rn
FROM table
) sq
where rn = 1
Please try:
select *
From(
select
row_number() over (partition by a, b order by a, b) RNum,
*
from
YourTable
)x
where RNum=1
Sample
select * From(
select row_number() over (partition by a, b order by a, b) RNum, *
from(
select 1 a, 1 b, 'frog' c, 'green' d union all
select 1 a, 1 b, 'frog' c, 'brown' d union all
select 2 a, 1 b, 'cat' c, 'black' d union all
select 2 a, 4 b, 'dog' c, 'white' d)x
)y
where RNum=1
Related
count(*) OVER (PARTITION BY a, b ORDER BY a, b, c) * 10
This produces the same result as:
dense_rank() OVER (PARTITION BY a, b ORDER BY a, b, c) * 10
Used in a query like this:
SELECT
dense_rank() OVER (ORDER BY a, b) ,
a || b,
count(*) OVER (
PARTITION BY a, b
ORDER BY a, b, c
) * 10 ,
a2,
b1,
c1,
cc1,
c2,
FROM
join ....
ORDER BY 1, 6;
I'm happy with my query result.
But should I appreciate one approach over the other and why?
After PARTITION BY a, b there is no point in adding aor b to ORDER BY, like David commented.
So we simplify to:
count(*) OVER (PARTITION BY a, b ORDER BY c) * 10
dense_rank() OVER (PARTITION BY a, b ORDER BY c) * 10
These two only happen to be equivalent while c is UNIQUE. Else they are not.
You'd need to define exactly what the number is supposed to signify, and show your table definition, and the exact query because joins can introduce duplicates and NULL values.
row_numer() or rank() are similar window functions ...
Performance is practically the same for all of them.
I have a table where each rows contains product id (A), price (P) and modification date (D) in YYYYMMDD format.
Here is the table :
WITH temp_table AS (
select 744583 as a, 9.21 as p, 20210706 as d from sysibm.sysdummy1
union all
select 744583 as a, 9.21 as p, 20210630 as d from sysibm.sysdummy1
union all
select 744583 as a, 9.21 as p, 20210628 as d from sysibm.sysdummy1
union all
select 744583 as a, 9.04 as p, 20210604 as d from sysibm.sysdummy1
union all
select 744583 as a, 9.04 as p, 20210201 as d from sysibm.sysdummy1
union all
select 744583 as a, 9.21 as p, 20200407 as d from sysibm.sysdummy1
)
select *
from temp_table
what i have
What i would like to have is when the price changed for the last time. In this example, the third line :
enter image description here
How would you do that ?
Thanks,
One method uses lag() and then ordering:
select t.*
from (select t.*,
lag(p) over (order by d desc) as prev_p
from temp_table t
) t
where prev_p is null or prev_p <> p
order by d desc
fetch first 1 row only;
If you wanted to do this for multiple as at the same time, then there are different approaches. An interesting one uses a difference of row numbers:
select a, p, min(date)
from (select t.*,
row_number() over (partition by a order by date desc) as seqnum,
row_number() over (partition by a, p order by date desc) as seqnum_2
from temp_table t
) t
where seqnum = seqnum_2
group by a, p;
You can investigate why this works. The two row numbers are the same only for the last price for each a.
I want to select the same columns from different tables that look the same
(daily tables).
I saw this SELECT from multiple tables with the same structure answer but if I'm going according to this I'm ending up with a huge query.
this code is similar to what I have, according to answer above I need to do the following:
select a, b, c
from (
select a, b, c, d, e from hourly.16
union all
select a, b, c, d, e from hourly.15
)
isn't there an option to do something like:
select a, b, c
from (
select a, b, c, d, e from (hourly.16 union all hourly.15)
)
so I won't end up with huge queries?
#standardSQL
SELECT a, b, c
FROM (
SELECT a, b, c, d, e
FROM `project.hourly.*`
WHERE _TABLE_SUFFIX BETWEEN '15' AND '16'
)
Above is assuming that hourly is your dataset
I'm trying to select the last MAX(DateTime) status from the table "Zee" but if the DateTime is the same it returns two lines, and I would like to get only the last one (maybe last inserted?).
here is the query:
SELECT Z."ID" AS ID,Z."A" AS A,Z."B" AS B,Z."C" AS C,Z."D" AS D
FROM ZEE Z
INNER JOIN
(SELECT ID, A, B, MAX(C) AS C
FROM ZEE
GROUP BY A, B) groupedtt
ON Z.A = groupedtt.A
AND Z.B = groupedtt.B
AND Z.C = groupedtt.C
WHERE (
Z.B = 103
OR Z.B = 104
);
and the result:
Thanks,
Regards.
I usually use rank() for such things:
select Z."ID" AS ID,Z."A" AS A,Z."B" AS B,Z."C" AS C,Z."D" AS D
from (select Z.*, rank()over(partition by A,B order by C desc, rownum) r from ZEE Z
)Z where Z.r=1
Use the ROW_NUMBER() analytic function (you will also eliminate the self-join):
SELECT ID, A, B, C, D
FROM (
SELECT ID,
A,
B,
C,
D,
ROW_NUMBER() OVER ( PARTITION BY A, B ORDER BY C DESC ) As rn
FROM ZEE
)
WHERE rn = 1;
Is there a way to select the sum of a column and other columns at the same time in SQL?
Example:
SELECT sum(a) as car,b,c FROM toys
How about:
select sum(a) over(), b, c from toy;
or, if it's required:
select sum(a) over(partition by b), b, c from toy;
try add GROUP BY
SELECT sum(a) as car,b,c FROM toys
GROUP BY b, c
Since you do not give much context, I assume you mean one of the following :
SELECT (SELECT SUM(a) FROM Toys) as 'car', b, c FROM Toys;
or
SELECT SUM(a) as Car, b, c FROM Toys GROUP BY b, c;
SELECT b,
c,
(SELECT sum(a) FROM toys) as 'car'
FROM toys