How to calculate average omitting duplicates per partition in SQL? - sql

I want to calculate the average item count accounting for sub-partitions in each partition.
Sample Data:
id session item_count random_field_1
1 weoifn2 3 A
1 weoifn2 3 B
1 iuboiwe 2 K
2 oeino33 5 R
2 vergeeg 8 C
2 feooinn 9 P
2 feooinn 9 M
Logic:
id = 1: (3 + 2) / 2 = 2.5
id = 2: (5 + 8 + 9) / 3 = 7.33
Expected Output:
id avg
1 2.5
2 7.33
My Query:
SELECT
id
, AVG(item_count) OVER (PARTITION BY id) AS avg
FROM my_table
However, I believe this will factor in duplicates twice, which is unintended. How can I fix my query to only consider one item_count value per session?

Consider below approach
select id, avg(item_count) as avg
from (
select distinct id, session, item_count
from your_table
)
group by id
if applied to sample data in your question - output is

SELECT id, AVG(item_count) OVER (PARTITION BY id) AS avg
FROM (
SELECT
id
, CASE
WHEN ROW_NUMBER OVER (PARTITION BY id) = 1
THEN item_count
ELSE NULL
END
AS item_count
FROM my_table
)

Related

How do i select all columns, plus the result of the sum

I have this select:
"Select * from table" that return:
Id
Value
1
1
1
1
2
10
2
10
My goal is create a sum from each Value group by id like this:
Id
Value
Sum
1
1
2
1
1
2
2
10
20
2
10
20
I Have tried ways like:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
But the is not grouping by id.
Id
Value
Sum
1
1
1
1
1
1
2
10
10
2
10
10
Aggregation aggregates rows, reducing the number of records in the output. In this case you want to apply the result of a computation to each of your records, task carried out by the corresponding window function.
SELECT table.*, SUM(Value) OVER(PARTITION BY Id) AS sum_
FROM table
Check the demo here.
Your attempt looks correct.
Can you try the below query :
It works for me :
SELECT Id, Value,
(SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY ID) as sum
FROM Table v;
You can do it using inner join to join with selection grouped by id :
select t.*, sum
from _table t
inner join (
select id, sum(Value) as sum
from _table
group by id
) as s on s.id = t.id
You can check it here
Your select is ok if you adjust it just a little:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
GROUP BY IDRNC is a mistake and should be GROUP BY ID
you should give an alias to a sum column ...
subquery selecting the sum does not have to have self table alias to be compared with outer query that has one (this is not a mistake - works either way)
Test:
WITH
a_table (ID, VALUE) AS
(
Select 1, 1 From Dual Union All
Select 1, 1 From Dual Union All
Select 2, 10 From Dual Union All
Select 2, 10 From Dual
)
SELECT ID, VALUE, (SELECT SUM(VALUE) FROM a_table WHERE ID = v.ID GROUP BY ID) "ID_SUM" FROM a_table v;
ID VALUE ID_SUM
---------- ---------- ----------
1 1 2
1 1 2
2 10 20
2 10 20

How to use LIMIT to sample rows dynamically

I have a table as follows:
SampleReq
Group
ID
2
1
_001
2
1
_002
2
1
_003
1
2
_004
1
2
_005
1
2
_006
I want my query to IDs based on the column SampleReq, resulting in the following output:
Group
ID
1
_001
1
_003
2
_006
The query should pick any 2 IDs from group 1, any 1 IDs from group 2 and so on (depending on the column SampleReq).
I tried the query using LIMIT, but this gives me an error saying column names cannot be parsed to a limit.
SELECT Group, ID
FROM Table
LIMIT SampleReq
ORDER BY RAND()
One method is row_number():
select t.*
from (select t.*,
row_number() over (partition by samplereq order by random()) as seqnum
from t
) t
where seqnum <= 2 and id = 1 or
seqnum <= 1 and id = 2;

How to sample from different values in a column but only return records that are unique from another column?

I am struggling with a sampling issue using Teradata
Below is the format of the data
ID Group Rank
1 dog 1
1 cat 1
1 lion 1
1 elephant 2
2 dog 1
2 cat 1
2 lion 1
2 elephant 1
3 dog 1
3 cat 2
3 lion 1
3 elephant 1
4 dog 2
4 cat 1
4 lion 1
4 elephant 1
...
I would ideally like to return a sample number for each entry in Group but with only unique values from ID.
Below is the current query I produced but this returns duplicates for ID
SELECT ID, Group FROM Table
WHERE rank = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
with cte as
(
SELECT ID, Group,
random(1,10000) as rnd -- RANDOM can't be directly used in OLAP-functions
FROM Table
WHERE rank = 1
)
SELECT ID, Group
FROM cte
QUALIFY
ROW_NUMBER() -- get one random row per ID
OVER (PARTITION BY ID
ORDER BY rnd) = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
Assuming you have enough records, choose a random row for each id and then choose the appropriate numbers from that:
select t.*
from (select t.*,
row_number() over (partition by group order by seqnum) as sequm_g
from (select t.*,
row_number() over (partition by id order by random(1, 1000000))
from t
) t
where seqnum = 1
) t
where (group in ('dog', 'cat') and seqnum_g <= 10) or
(group in ('elephant', 'lion') and seqnum_g <= 5) ;
This doesn't guarantee that the groups will be big enough in the result set. But if you have enough data relative to the size of the groups, then it should work.

How to select top 2 values for each id

I have a table with values
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
1 1 "2015-01-01"
1 1 "2015-01-01"
2 7 "2015-01-05"
2 6 "2015-01-04"
2 4 "2015-01-03"
3 11 "2015-01-08"
3 10 "2015-01-07"
3 9 "2015-01-06"
3 8 "2015-01-05"
I want to select top two values of each id as shown in desired output.
Desired output:
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
2 7 "2015-01-05"
2 6 "2015-01-04"
3 11 "2015-01-08"
3 10 "2015-01-07"
My attempt:
can someone help me with this. Thank you in advance!
select transactions.salesperson_id, transactions.id, transactions.date
from transactions
ORDER BY transactions.salesperson_id ASC, transactions.date DESC;
This can be done using window functions:
select id, sales, "date"
from (
select id, sales, "date",
dense_rank() over (partition by id order by "date" desc) as rnk
from transactions
) t
where rnk <= 2;
If there are multiple rows on the same date this might return more than two rows for the same ID. If you don't want that, use row_number() instead of dense_rank()
row_number() will get what you want.
select * from
(select row_number() over (partition by id order by date) as rn, sales, date from transactions) t1
where t1.rn <= 2

random row from diapason (1: n) in groups sql

I need select random row from Table using groups and order, but random's row number in group should not be more then constant (for example const = 3).
What I mean:
id time x
1 10:20 1
1 11:21 9
1 16:14 4
1 08:13 8
2 01:20 2
2 21:13 0
For id=1 rows could be:
id time x
1 10:20 1
1 11:21 9
1 08:13 8
BUT not
1 16:14 4 because in order by time it's local number more than 3
for
Id= 2 - any row
WITH cte as (
SELECT *, ROW_NUMBER() OVER (partition by id ORDER BY RANNDOM()) as rn
FROM myTable
)
SELECT *
FROM cte
WHERE rn <= 3
Something like this:
SELECT distinct on (id) *
FROM (select
row_number() over (partition by id order by time ) as up_lim
from tab1) as a
WHERE row_number <= 3
ORDER by id, random() ;