SQL Select highest value where duplicate ID - sql

I have a SQL table with the columns:
ID, DayNumber, Mfm, value
432080971, 1, 15, 57
432080971, 1, 15, 59
432080978, 3, 15, 54
432080978, 4, 45, 54
Unfortunately there are some duplicated entries. What I'd like is a select statement that returns the table without duplicated ID, Daynumber and Mfm, and where if there is a double entry to select the row with the higher value.
So, as an example the above entries would be returned as:
ID, DayNumber, Mfm, value
432080971, 1, 15, 59
432080978, 3, 15, 54
432080978, 4, 45, 54
I'm using sql server management studio running sql server 2012

select top (1)
with ties ID, DayNumber, Mfm, value
from
table
order by row_number() over (partiton by
ID, DayNumber, Mfm
order by value desc)

You have to use Group By clause and use the aggregate function MAX to get the highest value of the group. Something like this:
Select ID, DayNumber, Mfm, Max(value) From
From your_table
Group By ID, DayNumber, Mfm

select
ID, DayNumber, Mfm, max(value)
from table
group by ID, DayNumber, Mfm

Related

Given an id, can you get the "index" of that items id in a sorted query

EDIT:
I am using android versions that don't have a sqlite version > 2.35. I cannot use ROW_NUMBER.
Given the following table:
id, date(long)
1, 100
2, 25
3, 5
4, 50
If I query for items sorted:
select * from items order by date:
id, date
3, 5
2, 25
4, 50
1, 100
If I have id 4, can I query to get the index in the sorted list, in this case index "3"
With ROW_NUMBER() window function:
SELECT rn
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY date DESC) rn
FROM items
)
WHERE id = 4;
An alternative, for versions of SQLite prior to 3.25.0+ which don't support window functions:
SELECT COUNT(*) + 1 rn
FROM items
WHERE date > (SELECT date FROM items WHERE id = 4);
See the demo.
You can use ROW_NUMBER() to get the location of the row according to a custom ordering. The query can look like:
select rn
from (
select t.*, row_number() over(order by date) as rn from t
) x
where id = 4

Difference in the output of query, using rank() and CTE

My first query looks like:
select trans.* from
( select
acc_num,
acc_type,
trans_amount,
load_date,
rank() over(partition by acc_num order by load_date) as rk
from monetary
where rat_code = 123
) trans
where trans.rk =1;
second query looks like
with a as (
select *,
row_number() over(partition by acc_num order by load_date) as rn
from monetary
where rat_code = 123 )
select
acc_num,
acc_type,
trans_amount,
load_date
from a
where rn =1;
Can any one please help me I am getting different number of records for both the cases.
though the query is same.
Its because there is difference between rank and row_number.
Below example will show
Accno, dt, rank_col, rownum_col
100, 2-jun-2022, 1, 1
100, 3-jun-2022, 1, 2
100, 1-jul-2022, 1, 3
54, 2-jun-2022, 4, 1
54, 1-jul-2022, 4, 2
In above example, you can see row number will calculate unique row id. Whereas rank gives unique id but in a continuous manner. You can see from above example, rank=1 gives you 3 rows but rownum=1 gives only two.

Nth result in BigQuery Group By

I have a derived table like:
id, desc, total, account
1, one, 10, a
1, one, 9, b
1, one, 3, c
2, two, 27, c
I can do a simple
select id, desc, sum(total) as total from mytable group by id
but I want to add the equivalent first(account), first(total), second(account), second(total) to the output so it'd be:
id, desc, total, first_account, first_account_total, second_account, second_account_total
1, one, 21, a, 10, b, 9
2, two, 27, c, 27, null, 0
Any pointers?
Thanks in advance!
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, `desc`, total,
arr[OFFSET(0)].account AS first_account,
arr[OFFSET(0)].total AS first_account_total,
arr[SAFE_OFFSET(1)].account AS second_account,
arr[SAFE_OFFSET(1)].total AS second_account_total
FROM (
SELECT id, `desc`, SUM(total) total,
ARRAY_AGG(STRUCT(account, total) ORDER BY total DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY id, `desc`
)
In cases when more than 2 first bins are required I would use below pattern that eliminates repeating of heavy repeated lines like arr[SAFE_OFFSET(1)].total AS second_account_total
#standardSQL
SELECT * FROM (SELECT NULL id, '' `desc`, NULL total, '' first_account, NULL first_account_total, '' second_account, NULL second_account_total) WHERE FALSE
UNION ALL
SELECT id, `desc`, total, arr[OFFSET(0)].*, arr[SAFE_OFFSET(1)].*
FROM (
SELECT id, `desc`, SUM(total) total,
ARRAY_AGG(STRUCT(account, total) ORDER BY total DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY id, `desc`
)
In above, first line sets layout of output while returning no rows at all because of WHERE FALSE, so then I don't need to explicitly parse struct's elements and provide aliases

Differences between row in google big query

I'm currently attempting to calculate differences between rows in google big query. I actually have a working query.
SELECT
id, record_time, level, lag,
(level - lag) as diff
FROM (
SELECT
id, record_time, level,
LAG(level) OVER (ORDER BY id, record_time) as lag
FROM (
SELECT
*
FROM
TABLE_QUERY(MY_TABLES))
ORDER BY
1, 2 ASC
)
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2 ASC
But I'm working with big data and sometimes I have memory limit warning that does not let me execute the query. So, I would like to understand why I cant do an optimized query like bellow. I think it will allow work with more records without memory limit warning.
SELECT
id, record_time, level,
level - LAG(level, 1) OVER (ORDER BY id, record_time) as diff
FROM (
SELECT
*
FROM
TABLE_QUERY(MY_TABLES))
ORDER BY
1, 2 ASC
This kind of function level - LAG(level, 1) OVER (ORDER BY id, record_time) as diff, when the query is executed returns the error
Missing function in Analytic Expression
on Big Query.
I also tried to put ( ) into this function but it does not work as well.
Thanks for helping me!
It works fine for me. Maybe you forgot to enable standard SQL? Here is an example:
WITH Input AS (
SELECT 1 AS id, TIMESTAMP '2017-10-17 00:00:00' AS record_time, 2 AS level UNION ALL
SELECT 2, TIMESTAMP '2017-10-16 00:00:00', 3 UNION ALL
SELECT 1, TIMESTAMP '2017-10-16 00:00:00', 4
)
SELECT
id, record_time, level, lag,
(level - lag) as diff
FROM (
SELECT
id, record_time, level,
LAG(level) OVER (ORDER BY id, record_time) as lag
FROM Input
)
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2 ASC;

creating a pseudo linked list in sql

I have a table that has the following columns
table: route
columns: id, location, order_id
and it has values such as
id, location, order_id
1, London, 12
2, Amsterdam, 102
3, Berlin, 90
5, Paris, 19
Is it possible to do a sql select statement in postgres that will return each row along with the id with the next highest order_id? So I want something like...
id, location, order_id, next_id
1, London, 12, 5
2, Amsterdam, 102, NULL
3, Berlin, 90, 2
5, Paris, 19, 3
Thanks
select
id,
location,
order_id,
lag(id) over (order by order_id desc) as next_id
from your_table
Creating testbed first:
CREATE TABLE route (id int4, location varchar(20), order_id int4);
INSERT INTO route VALUES
(1,'London',12),(2,'Amsterdam',102),
(3,'Berlin',90),(5,'Paris',19);
The query:
WITH ranked AS (
SELECT id,location,order_id,rank() OVER (ORDER BY order_id)
FROM route)
SELECT b.id, b.location, b.order_id, n.id
FROM ranked b
LEFT JOIN ranked n ON b.rank+1=n.rank
ORDER BY b.id;
You can read more on the window functions in the documentation.
yes:
select * ,
(select top 1 id from routes_table where order_id > main.order_id order by 1 desc)
from routes_table main