How number of MFCC coefficients depends on the length of the file

How number of MFCC coefficients depends on the length of the file - voice-recognition

I have a voice data with length 1.85 seconds, then I extract its feature using MFCC (with libraby from James Lyson). It returns 184 x 13 features. I am using 10 milisecond frame step, 25 miliseconds frame size, and 13 coefficients from DCT. How can it return 184? I still can not understand because the last frame's length is not 25 miliseconds. Is there any formula which explain how can it return 184? Thank you in advance.

There is a picture that can explain you things, basically the last window takes more space than previous ones.
If you have 184 windows, the region you cover is 183 * 10 + 25 or approximately 1855 ms.

Related

Creating a Nested/Loop Calculation in Vertica (?)

So maybe I'm just way over-thinking things, but is there any way to replicate a nested/loop calculation in Vertica with just SQL syntax.
Explanation -
In Column AP I have remaining values per month by an attribute key, in column CHANGE_1M I have an attribution value to apply.
The goal is for future values to calculate the preceding Row partition AP*CHANGE_1M, by the subsequent row partition CHANGE_1M to fill in the future AP values.
For reference I have 15,000 Keys Per Period and 60 Periods Per Year in the full-data set.
Sample Calculation
Period 5 =
(Period4_AP * Period5_CHANGE_1M)+Period4_AP
Period 6 =
(((Period4_AP * Period5_CHANGE_1M)+Period4_AP)*Period6_CHANGE_1M)
+
((Period4_AP * Period5_CHANGE_1M)+Period4_AP)
ect.
Sample Data on Top
Expected Results below

Vertica does not have (yet?) the RECURSIVE WITH clause, which you would need for the recursive calculation you seem to be needing here.
Only possible workaround would be tedious: write (or generate, using perl or Python, for example) as many nested queries as you need iterations.
I'll only want to detail this if you want to go down that path.

Long time no see - I should have returned to answer this question earlier.
I got so stuck on thinking of the programmatic way to solve this issue, I inherently forgot it is a math equation, and where you have math functions you have solutions.
Basically this question revolves around doing table multiplication.
The solution is to simply use LOG/LN functions to multiply and convert back using EXP.
Snippet of the simple solve.
Hope this helps other lost souls, don't forget your math background and spiral into a whirlpool of self-defeat.
EXP(SUM(LN(DEGREDATION)) OVER (ORDER BY PERIOD_NUMBER ASC ROWS UNBOUNDED PRECEDING)) AS DEGREDATION_RATE
** Controlled by what factors/attributes you need the data stratified by with a PARTITION
Basically instead of starting at the retention PX/P0, I back into with the degradation P1/P0 - P2/P1 ect.
PERIOD_NUMBER
DEGRADATION
DEGREDATION_RATE
DEGREDATION_RATE x 100000
0
100.00%
100.00%
100000.00
1
57.72%
57.72%
57715.18
2
60.71%
35.04%
35036.59
3
70.84%
24.82%
24820.66
4
76.59%
19.01%
19009.17
5
79.29%
15.07%
15071.79
6
83.27%
12.55%
12550.59
7
82.08%
10.30%
10301.94
8
86.49%
8.91%
8910.59
9
89.60%
7.98%
7984.24
10
86.03%
6.87%
6868.79
11
86.00%
5.91%
5907.16
12
90.52%
5.35%
5347.00
13
91.89%
4.91%
4913.46
14
89.86%
4.41%
4414.99
15
91.96%
4.06%
4060.22
16
89.36%
3.63%
3628.28
17
90.63%
3.29%
3288.13
18
92.45%
3.04%
3039.97
19
94.95%
2.89%
2886.43
20
92.31%
2.66%
2664.40
21
92.11%
2.45%
2454.05
22
93.94%
2.31%
2305.32
23
89.66%
2.07%
2066.84
24
94.12%
1.95%
1945.26
25
95.83%
1.86%
1864.21
26
92.31%
1.72%
1720.81
27
96.97%
1.67%
1668.66
28
90.32%
1.51%
1507.18
29
90.00%
1.36%
1356.46
30
94.44%
1.28%
1281.10
31
94.12%
1.21%
1205.74
32
100.00%
1.21%
1205.74
33
90.91%
1.10%
1096.13
34
90.00%
0.99%
986.52
35
94.44%
0.93%
931.71
36
100.00%
0.93%
931.71

Plotting data from two sets with different shapes in the same plot

I am using data collected from two different instruments which have different resolution because of the sampling rate of each instrument. For a specific time, one of the sets have >10k entries while the other has ~2.5k. They however capture data over the same time interval, and I want to plot them on top of each other even though they have different resolution in data. The minimum and maximum x of both sets are the same however one of them have more entries.
Simplified it could look like this:
1st set from instrument with higher sampling rate:
time(s) value
0.0 10
0.2 11
0.4 12
0.6 13
0.8 14
... ..
100 50
2nd set from instrument with lower sampling rate:
time(s) value
0 100
1 120
2 125
3 128
4 130
. ...
100 430
They are measuring different things, but I would like to display them in the same plot. How can I accomplish this?

I found the mistake.. I was trying to plot both datasets using the time data from the first instrument. Of course they need to be plotted with their respective time data and I put the first time data in the second plot by mistake..

How to handle decimal numbers in solidity?

How to handle decimal numbers in solidity?
If you want to find the percentage of some amount and do some calculation on that number, how to do that?
Suppose I perform : 15 % of 45 and need to divide that value with 7 how to get the answer.
Please help. I have done research, but getting answer like it is not possible to do that calculation. Please help.

You have a few options. To just multiply by a percentage (but truncate to an integer result), 45 * 15 / 100 = 6 works well. (45 * 15%)
If you want to keep some more digits around, you can just scale everything up by, e.g., some exponent of 10. 4500 * 15 / 100 = 675 (i.e. 6.75 * 100).

Pandas shifting uneven timeseries data

I have some irregularly stamped time series data, with timestamps and the observations at every timestamp, in pandas. Irregular basically means that the timestamps are uneven, for instance the gap between two successive timestamps is not even.
For instance the data may look like
Timestamp Property
0 100
1 200
4 300
6 400
6 401
7 500
14 506
24 550
.....
59 700
61 750
64 800
Here the timestamp is say seconds elapsed since a chose origin time. As you can see we could have data at the same timestamp, 6 secs in this case. Basically the timestamps are strictly different, just that second resolution cannot measure the change.
Now I need to shift the timeseries data ahead, say I want to shift the entire data by 60 secs, or a minute. So the target output is
Timestamp Property
0 750
1 800
So the 0 point got matched to the 61 point and the 1 point got matched to the 64 point.
Now I can do this by writing something dirty, but I am looking to use as much as possible any inbuilt pandas feature. If the timeseries were regular, or evenly gapped, I could've just used the shift() function. But the fact that the series is uneven makes it a bit tricky. Any ideas from Pandas experts would be welcome. I feel that this would be a commonly encountered problem. Many thanks!

Edit: added a second, more elegant, way to do it. I don't know what will happen if you had a timestamp at 1 and two timestamps of 61. I think it will choose the first 61 timestamp but not sure.
new_stamps = pd.Series(range(df['Timestamp'].max()+1))
shifted = pd.DataFrame(new_stamps)
shifted.columns = ['Timestamp']
merged = pd.merge(df,shifted,on='Timestamp',how='outer')
merged['Timestamp'] = merged['Timestamp'] - 60
merged = merged.sort(columns = 'Timestamp').bfill()
results = pd.merge(df,merged, on = 'Timestamp')
[Original Post]
I can't think of an inbuilt or elegant way to do this. Posting this in case it's more elegant than your "something dirty", which is I guess unlikely. How about:
lookup_dict = {}
def assigner(row):
lookup_dict[row['Timestamp']] = row['Property']
df.apply(assigner, axis=1)
sorted_keys = sorted(lookup_dict.keys)
df['Property_Shifted'] = None
def get_shifted_property(row,shift_amt):
for i in sorted_keys:
if i >= row['Timestamp'] + shift_amt:
row['Property_Shifted'] = lookup_dict[i]
return row
df = df.apply(get_shifted_property, shift_amt=60, axis=1)

Is there a way to represent a number in binary where bits have approximately uniform significance?

I'm wondering if it is possible to represent a number as a sequence of bits, each having approximately the same significance, such that if we flip one of the bits, the overall value does not change by much.
For example, we can use sequences of 4-bits, where each group represents a value from 0 to 15 and the overall value is the sum of all these values.
0110 0101 1101 1010 1011 → 6 + 5 + 13 + 10 + 11 = 45
and now flipping any bit can only incur in a maximum difference of 8 in the final value.
Some drawbacks obviously exist with this approach:
values have multiple representations, with some values having more representations than other ones (for example, there are 39280 distinct representations for the number 38, and only 1 for the number 0);
the amount of values that can be represented is greatly reduced (this representation allows for integers from 0 to 75, while 20 bits could normally represent 220 ~ 1 million different integers).
Are there any resources I can find concerning this problem? I can't seem to find anything online, but maybe I'm not searching with the right keywords. What other alternatives exist to my approach? Do they improve on its disadvantages?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How number of MFCC coefficients depends on the length of the file - voice-recognition

There is a picture that can explain you things, basically the last window takes more space than previous ones. If you have 184 windows, the region you cover is 183 * 10 + 25 or approximately 1855 ms.

Related

Creating a Nested/Loop Calculation in Vertica (?)

Plotting data from two sets with different shapes in the same plot

How to handle decimal numbers in solidity?

Pandas shifting uneven timeseries data

Is there a way to represent a number in binary where bits have approximately uniform significance?

Categories

Resources