Unnest only one duplicate value per row - sql

I have the following table -
ID A1 A2 A3 A4 A5 A6
1 324 243 3432 23423 342 342
2 342 242 4345 23423 324 342
I can unnest this table to give me counts of all numbers like so -
324 2
243 1
3432 1
23423 1
342 3
242 1
4345 1
23423 1
But how do I get it to count numbers in the same row only 1 time. For example, this is the output I am expecting -
324 2
243 1
3432 1
23423 1
342 2
242 1
4345 1
23423 1
342 is 2 because -
1) It is in the first row.
2) It appears 2 times in the second row, but I only want to count it once.

Simply use count(distinct):
select v.a, count(distinct t.id)
from t cross join lateral
(values (a1), (a2), (a3), (a4), (a5), (a6)
) v(a)
group by v.a;
Here is a db<>fiddle.

Related

What is the best why to aggregate data for last 7,30,60.. days in SQL

Hi I have a table with date and the number of views that we had in our channel at the same day
date views
03/06/2020 5
08/06/2020 49
09/06/2020 50
10/06/2020 1
13/06/2020 1
16/06/2020 1
17/06/2020 102
23/06/2020 97
29/06/2020 98
07/07/2020 2
08/07/2020 198
12/07/2020 1
14/07/2020 168
23/07/2020 292
No we want to see in each calendar date the sum of the past 7 and 30 days
so the result will be
date sum_of_7d sum_of_30d
01/06/2020 0 0
02/06/2020 0 0
03/06/2020 5 5
04/06/2020 5 5
05/06/2020 5 5
06/06/2020 5 5
07/06/2020 5 5
08/06/2020 54 54
09/06/2020 104 104
10/06/2020 100 105
11/06/2020 100 105
12/06/2020 100 105
13/06/2020 101 106
14/06/2020 101 106
15/06/2020 52 106
16/06/2020 53 107
17/06/2020 105 209
18/06/2020 105 209
so I was wondering what is the best SQL that I can write in order to get it
I'm working on redshift and the actual table (not this example) include over 40B rows
I used to do something like this:
select dates_helper.date
, tbl1.cnt
, sum(tbl1.cnt) over (order by date rows between 7 preceding and current row ) as sum_7d
, sum(tbl1.cnt) over (order by date rows between 30 preceding and current row ) as sum_7d
from bi_db.dates_helper
left join tbl1
on tbl1.invite_date = dates_helper.date

Group repeating pattern in pandas Dataframe

so i have a Dataframe that has a repeating Number Series that i want to group like this:
Number Pattern
Value
Desired Group
Value.1
1
723
1
Max of Group
2
400
1
Max of Group
8
235
1
Max of Group
5
387
2
Max of Group
7
911
2
Max of Group
3
365
3
Max of Group
4
270
3
Max of Group
5
194
3
Max of Group
7
452
3
Max of Group
100
716
4
Max of Group
104
69
4
Max of Group
2
846
5
Max of Group
3
474
5
Max of Group
4
524
5
Max of Group
So essentially the number pattern is always monotonly increasing.
Any Ideas?
You can compare Number Pattern by 1 with cumulative sum by Series.cumsum and then is used GroupBy.transform with max:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
For monotically increasing use:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()

Count only original seconds with Oracle SQL

I have a table with this structure and data, with start and stop positions of an audio/video. I have to count the original seconds and discard the not original ones.
E.g.
CUSTOMER_ID ITEM_ID CHAPTER_ID START_POSITION END_POSITION
A 123456 1 6 0 97
B 123456 1 6 97 498
C 123456 1 6 498 678
D 123456 1 6 678 1332
E 123456 1 6 1180 1190
F 123456 1 6 1190 1206
G 123456 1 6 1364 1529
H 123456 1 6 1530 1531
Original Data
Lines "E" and "F" does not represent original seconds because "D" line starts at 678 and finishes with 1332 so I need to create a new set of lines like this:
CUSTOMER_ID ITEM_ID CHAPTER_ID START_POSITION END_POSITION
A 123456 1 6 0 97
B 123456 1 6 97 498
C 123456 1 6 498 678
D 123456 1 6 678 1332
E 123456 1 6 1364 1529
F 123456 1 6 1530 1531
New Result Set
Can you help mw with this?
If I am following you correctly, you can use not exists to filter out rows whose range is contained in the range of another row:
select t.*
from mytable t
where not exists (
select 1
from mytable t1
where
t1.customer_id = t.customer_id
and t1.start_position < t.start_position
and t1.end_position > t.end_position
)
You can use the self join as follows:
Select distinct t.*
from your_table t
Left Join your_table tt
On t.customer_id = tt.customer_id
And t.item_id = tt.item_id
And t.chapter_id = tt.chapter_id
And t.rowid <> tt.rowid
And t.start_position between tt.start_position and tt.end_position - 1
Where tt.rowid is null

Running Sum between dates on group by clause

I have the following query which shows the first 3 columns:
select
'Position Date' = todaypositiondate,
'Realized LtD SEK' = round(sum(realizedccy * spotsek), 0),
'Delta Realized SEK' = round(sum(realizedccy * spotsek) -
(SELECT sum(realizedccy*spotsek)
FROM t1
WHERE todaypositiondate = a.todaypositiondate - 1
GROUP BY todaypositiondate), 0)
FROM
t1 AS a
GROUP BY
todaypositiondate
ORDER BY
todaypositiondate DESC
Table:
Date | Realized | Delta | 5 day avg delta
-------------------------------------------------------------------
2016-09-08 | 696 981 323 | 90 526 | 336 611
2016-09-07 | 696 890 797 | 833 731 | 335 232
2016-09-06 | 696 057 066 | 85 576 | 84 467
2016-09-05 | 695 971 490 | 86 390 | 83 086
2016-09-04 | 695 885 100 | 81 434 | 80 849
2016-09-03 | 695 803 666 | 81 434 | 78 806
2016-09-02 | 695 722 231 | 79 679 | 74 500
2016-09-01 | 695 642 553 | 75 305 |
2016-08-31 | 695 567 248 | 68 515 |
How do I create the 5d average of delta realized?
Based on delta I tried the following but it did not work:
select
todaypositiondate,
'30d avg delta' = (select sum(realizedccy * spotsek)
from T1
where todaypositiondate between a.todaypositiondate and a.todaypositiondate -5
group by todaypositiondate)
from
T1 as a
group by
todaypositiondate
order by
todaypositiondate desc
Do not use single quotes for column names. Only use single quotes for string and date literals.
I would write this as:
with t as (
select todaypositiondate as PositionDate,
round(sum(realizedccy * spotsek), 0) as RealizedSEK,
from t1 a
group by todaypositiondate
)
select a.*,
(a.RealizedSEK - a_prev.RealizedSEK) as diff_1,
(a.RealizedSEK - a_prev5.RealizedSEK)/5 as avg_diff_5
from a outer apply
(select top 1 a_prev.*
from a a_prev
where a_prev = a.PositionDate - 1
) a_prev outer apply
(select top 1 a_prev.*
from a a_prev
where a_prev = a.PositionDate - 5
) a_prev5;
Note that the 5 day average difference is the most recent value minus the value from 6 days ago divided by 5.
I already have that kind of formula when I caluclate Delta between 2 dates.
It's like this:
Select todaypositiondate,
'D_RealizedSEK' = round(sum(realizedccy*spotsek) -
(SELECT sum(realizedccy*spotsek)
FROM T1
WHERE todaypositiondate = a.todaypositiondate - 1
GROUP BY todaypositiondate),0)
FROM T1 AS a
group by todaypositiondate
J
Instead of adding 5 formulas and just replaceing -1 with -2, -3... I would like to find away to select the average sum of all realicedccy from the previous 5 days, eventually adding them together and divide by 5.

how to update duplicate rows in a column to a new values

I will explain my problem briefly
have duplicate rino like below (actually this rino is the serial number in front end)
chqid rino branchid
----- ---- --------
876 6 2
14 6 2
18 10 2
828 10 2
829 11 2
19 11 2
830 12 2
20 12 2
78 40 2
1092 40 2
1094 41 2
79 41 2
413 43 2
1103 43 2
82 44 2
1104 44 2
1105 45 2
83 45 2
91 46 2
1106 46 2
here in my case I don't want to delete these duplicate rino instead of that I planned to update the rino having max date(column not specified in the above sample actually a date column is there) to the next rino number
what exactly I meant is :
if I sort out the above result according to the max(date) I will get
chqid rino branchid
----- ---- --------
876 6 2
828 10 2
19 11 2
830 12 2
1092 40 2
79 41 2
413 43 2
82 44 2
83 45 2
1106 46 2
(NOTE : total number of duplicate rows are 10 in branchid=2)
the last entered rino in the table for branchid=2 is 245
So I just want to update the 10 rows(Of column rino) with numbers starting from 246 to 255( just added 245+10 like this select lastno+ generate_series(1,10) nos from tab where cola=4 and branchid = 2 and vrid=20;)
Expected Output:
chqid rino branchid
----- ---- --------
876 246 2
828 247 2
19 248 2
830 249 2
1092 250 2
79 251 2
413 252 2
82 253 2
83 254 2
1106 255 2
using postgresql
Finally I found a solution, am using dynamic-sql to solve my issue
do
$$
declare
arow record;
begin
for arow in
select chqid,rino,branchid from (
select chqid,rino::int ,vrid,branchid ,row_number()over (partition by rino::int ) rn
from tab
where vrid =20
and branchid = 2)t
where rn >1
loop
execute format('
update tab
set rino=(select max(rino::int)+1 from gtab19 where acyrid=4 and branchid = 2 and vrid=20)
where chqid=%s
',arow.chqid);
end loop;
end;
$$;