Preparing SQL Data for a CDF Plot - sql

i have the following SQL that I would like to use to plot cumulative distribution plot but i can't seem to get the data right.
Sample Data:
token_Length
Frequency
1
6436
2
7489
3
3724
4
2440
5
667
6
396
7
264
8
215
9
117
10
90
11
61
12
29
13
69
15
40
18
45
How do i prepare this data to create a CDF plot in looker?
So that it looks like
token_Length
Frequency
cume_dist
1
6436
0.291459107
2
7489
0.630604112
3
3724
0.799248256
4
2440
0.909745494
5
667
0.939951091
6
396
0.95788425
7
264
0.969839688
8
215
0.979576125
9
117
0.984874558
10
90
0.988950276
11
61
0.991712707
12
29
0.993025994
13
69
0.996150711
15
40
0.997962141
18
45
1
I have tried a measure as follows:
measure: cume_dist {
type: number
sql: cume_dist() over (order by ${token_length} ASC);;
}
This generates SQL as:
SELECT
token_length,
COUNT(*) AS "count",
cume_dist() over (order by (token_length) ASC) AS "cume_dist"
FROM string_facts

Related

Group repeating pattern in pandas Dataframe

so i have a Dataframe that has a repeating Number Series that i want to group like this:
Number Pattern
Value
Desired Group
Value.1
1
723
1
Max of Group
2
400
1
Max of Group
8
235
1
Max of Group
5
387
2
Max of Group
7
911
2
Max of Group
3
365
3
Max of Group
4
270
3
Max of Group
5
194
3
Max of Group
7
452
3
Max of Group
100
716
4
Max of Group
104
69
4
Max of Group
2
846
5
Max of Group
3
474
5
Max of Group
4
524
5
Max of Group
So essentially the number pattern is always monotonly increasing.
Any Ideas?
You can compare Number Pattern by 1 with cumulative sum by Series.cumsum and then is used GroupBy.transform with max:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
For monotically increasing use:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()

Oracle - Group By Creating Duplicate Rows

I have a query that looks like this:
select nvl(trim(a.code), 'Blanks') as Ward, count(b.apcasekey) as UNSP, count(c.apcasekey) as GRAPH,
count(d.apcasekey) as "ANI/PIG",
(count(b.apcasekey) + count(c.apcasekey) + count(d.apcasekey)) as "TOTAL ACTIVE",
count(a.apcasekey) as "TOTAL OPEN" from (etc...)
group by a.code
order by Ward
The reason I have nvl(trim(a.code), 'Blanks') as Ward is that sometimes a.code is a blank string, sometimes it's a null.
The problem is that when I use the Group By statement, I can't use Ward or I get the error
Ward: Invalid Identifier
I can only use a.code so I get 2 rows for 'Blanks', as per below
1 Blanks 7 0 0 7 7
2 Blanks 23 1 1 25 30
3 W01 75 4 0 79 91
4 W02 62 1 0 63 72
5 W03 140 2 0 142 162
6 W04 6 1 0 7 7
7 W05 46 0 1 47 48
8 W06 322 46 1 369 425
9 W07 91 0 1 92 108
10 W08 93 2 0 95 104
11 W09 28 1 0 29 30
12 W10 25 0 0 25 28
What I need, is for the row with 'Blanks' to combined into 1 row. Little help?
Thanks.
You can not use the alias in the GROUP BY, but you can use the expression that builds the value:
GROUP BY nvl(trim(a.code), 'Blanks')

SQL access Group By and Sum

I'm having an issue with MS Access with my SQL statement.
I have companies and teams, and they each have a balance of money.
(Company1 can have team 1,2,3,4 and Company2 can have team 1,2,3,4,5. Though Comapny1 team1 is not the same as Company2 team1!)
But I have a ton of entries which each correspond to a seller.
I want to sum every balances for each company and each team, no matter which seller it is:
I actually have :
SELECT Company, Team, Sum(Balance) AS tot_balance
FROM Retro2014
GROUP BY Company, Team
But the amounts are 5 to 10 time bigger then they should be when i sum it manually. (But I have around 1200 seller, I can't do it all manually)
EDIT: What I want is something like this:
Company Team tot_balance
------- ---- -----------
Company1 Team1 1000
Company1 Team2 1530
Company1 Team3 120
Company1 Team4 500
Company2 Team1 800
Company2 Team2 750
Company2 Team3 420
Company2 Team4 820
Company2 Team5 120
... ... ...
EDIT2:
I have those values now :
Company Team tot_balance REAL_Balance
10 90 2 534.60 269.06
10 92 813.30 120.89
10 95 1 384.75 210.89
10 96 950.72 142.43
10 97 3 957.03 789.92
10 98 4 822.34 1128.71
EDIT3 : And the source values are those:
COMPANY TEAM SELLER BALANCE
10 50 123.65
10 90 L07630 245.06
10 90 L07630 4
10 90 L07630 8
10 90 L07630 4
10 90 L07630 8
10 92 L96420 32.93
10 92 L96420 87.96
10 95 35.74
10 95 16
10 95 4
10 95 12
10 95 12
10 95 131.15
10 96 L04771 65.5
10 96 L04771 12
10 96 L04771 8
10 96 L04771 8
10 96 L04771 48.93
10 97 L94605 61.93
10 97 L94605 4
10 97 L94605 8
10 97 L94605 233.76
10 97 L94605 344.97
10 97 L94605 90.33
10 97 L94605 38.93
10 97 L94605 4
10 97 L94605 4
10 98 L95652 42.51
10 98 L95652 34.75
10 98 L95652 549.26
10 98 L95652 320.36
10 98 L95652 20
10 98 L95652 112.58
10 98 L95652 41.25
10 98 L95652 8
Thanks,
Phil
As long as the table don't contain multiple entries on Company, Team, Balance
than you SQL should work just fine.
But given your explained issue, I pressume there are more values than what are shown and can therefore cause more rows with the same information shown more than once, which would result in an incorrect summerization. Here is what I would suggest:
Select Company, Team, Sum(Balance) as tot_balance from (
SELECT Company, Team, Balance
FROM Retro2014
GROUP BY Company, Team, Balance ) as b
GROUP by Company, Team

how to update duplicate rows in a column to a new values

I will explain my problem briefly
have duplicate rino like below (actually this rino is the serial number in front end)
chqid rino branchid
----- ---- --------
876 6 2
14 6 2
18 10 2
828 10 2
829 11 2
19 11 2
830 12 2
20 12 2
78 40 2
1092 40 2
1094 41 2
79 41 2
413 43 2
1103 43 2
82 44 2
1104 44 2
1105 45 2
83 45 2
91 46 2
1106 46 2
here in my case I don't want to delete these duplicate rino instead of that I planned to update the rino having max date(column not specified in the above sample actually a date column is there) to the next rino number
what exactly I meant is :
if I sort out the above result according to the max(date) I will get
chqid rino branchid
----- ---- --------
876 6 2
828 10 2
19 11 2
830 12 2
1092 40 2
79 41 2
413 43 2
82 44 2
83 45 2
1106 46 2
(NOTE : total number of duplicate rows are 10 in branchid=2)
the last entered rino in the table for branchid=2 is 245
So I just want to update the 10 rows(Of column rino) with numbers starting from 246 to 255( just added 245+10 like this select lastno+ generate_series(1,10) nos from tab where cola=4 and branchid = 2 and vrid=20;)
Expected Output:
chqid rino branchid
----- ---- --------
876 246 2
828 247 2
19 248 2
830 249 2
1092 250 2
79 251 2
413 252 2
82 253 2
83 254 2
1106 255 2
using postgresql
Finally I found a solution, am using dynamic-sql to solve my issue
do
$$
declare
arow record;
begin
for arow in
select chqid,rino,branchid from (
select chqid,rino::int ,vrid,branchid ,row_number()over (partition by rino::int ) rn
from tab
where vrid =20
and branchid = 2)t
where rn >1
loop
execute format('
update tab
set rino=(select max(rino::int)+1 from gtab19 where acyrid=4 and branchid = 2 and vrid=20)
where chqid=%s
',arow.chqid);
end loop;
end;
$$;

Getting more data while converting data int to float and doing division and Multiplying with int?

I have three columns as shown in below tableA
Student Day Shifts
129 11 4
91 9 6
166 19 8
164 26 12
146 11 6
147 16 8
201 8 3
164 4 2
186 8 6
165 7 4
171 10 4
104 5 4
1834 134 67
I am writing a tvf to calculate Value of Points generated for Students as below
ALTER function Statagic(
#StartDate date
)
RETURNS TABLE
AS
RETURN
(
with src as
( select
Division=case when Shifts=0 then 0 else cast(Day as float)/cast(Shifts as float) end,*
from TableA
)
,tgt as
(select *,Points=Student*Division from src
)
select * from tgt)
When i execute above tvf(select * from Statagic('3/16/2014'))
My output is below
129 11 4 2.75 354.75
91 9 6 1.5 136.5
166 19 8 2.375 394.25
164 26 12 2.16666666666667 355.333333333333
146 11 6 1.83333333333333 267.666666666667
147 16 8 2 294
201 8 3 2.66666666666667 536
164 4 2 2 328
186 8 6 1.33333333333333 248
165 7 4 1.75 288.75
171 10 4 2.5 427.5
104 5 4 1.25 130
1834 134 67 2 3668
Note :
If you see the last row for three columns in the table is the total of rest column.So when you see the last row in the Output of TVF for last two columns when i am adding i am not getting same data i am getting more.
Guys please help me i am struggling to fix this bug i tried in all ways but i am unable to fix it.
select 354.75+136.5+394.25+355.333333333333+267.666666666667+294+536+328+248+288.75+427.5+130=3760.750000000000
3668 is not euql to 3760.75(I am getting more 100 value)