I have the following equation:
eq1 := 2*diff(phi(r),r)/r+diff(phi(r),$(r,2)) + psi(r)^2*phi(r);
I want to change the independent variable r to 1/z, that is the changing of variable r=1/z. How does I write this equation using the new variable
Do you want the dchange command of the PDEtools package.
eq1 := 2*diff(phi(r),r)/r+diff(phi(r),$(r,2)) + psi(r)^2*phi(r);
/ d \
2 |--- phi(r)| / 2 \
\ dr / | d | 2
eq1 := -------------- + |---- phi(r)| + psi(r) phi(r)
r | 2 |
\ dr /
PDEtools[dchange](r=1/z, eq1);
/ / 2 \\
3 / d \ 2 | / d \ 2 | d || 2
-2 z |--- phi(z)| - z |-2 z |--- phi(z)| - z |---- phi(z)|| + psi(z) phi(z)
\ dz / | \ dz / | 2 ||
\ \ dz //
simplify(%);
/ 2 \
| d | 4 2
|---- phi(z)| z + psi(z) phi(z)
| 2 |
\ dz /
Related
I'm contemplating making decisions on outliers on a dataset with over 300 features. I'd like to analyse the frame without removing the data hastingly. I have a frame:
| | A | B | C | D | E |
|---:|----:|----:|-----:|----:|----:|
| 0 | 100 | 99 | 1000 | 300 | 250 |
| 1 | 665 | 6 | 9 | 1 | 9 |
| 2 | 7 | 665 | 4 | 9 | 1 |
| 3 | 1 | 3 | 4 | 3 | 6 |
| 4 | 1 | 9 | 1 | 665 | 5 |
| 5 | 3 | 4 | 6 | 1 | 9 |
| 6 | 5 | 9 | 1 | 3 | 2 |
| 7 | 1 | 665 | 3 | 2 | 3 |
| 8 | 2 | 665 | 9 | 1 | 0 |
| 9 | 5 | 0 | 7 | 6 | 5 |
| 10 | 0 | 3 | 3 | 7 | 3 |
| 11 | 6 | 3 | 0 | 3 | 6 |
| 12 | 6 | 6 | 5 | 1 | 5 |
I have coded some introspection to be saved in another frame called _outliers:
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = (Q3 - Q1)
min_ = (Q1 - (1.5 * IQR))
max_ = (Q3 + (1.5 * IQR))
# Counts outliers in columns
_outliers = ((df.le (min_)) | (df.ge (max_))).sum().to_frame(name="outliers")
# Gives percentage of data that outliers represent in the column
_outliers["percent"] = (_outliers['outliers'] / _outliers['outliers'].sum()) * 100
# Shows max value in the column
_outliers["max_val"] = df[_outliers.index].max()
# Shows min value in the column
_outliers["min_val"] = df[_outliers.index].min()
# Shows median value in the column
_outliers["median"] = df[_outliers.index].median()
# Shows mean value in the column
_outliers["mean"] = df[_outliers.index].mean()
That yields:
| | outliers | percent | max_val | min_val | median | mean |
|:---|-----------:|----------:|----------:|----------:|---------:|---------:|
| A | 2 | 22.2222 | 665 | 0 | 5 | 61.6923 |
| B | 3 | 33.3333 | 665 | 0 | 6 | 164.385 |
| C | 1 | 11.1111 | 1000 | 0 | 4 | 80.9231 |
| D | 2 | 22.2222 | 665 | 1 | 3 | 77.0769 |
| E | 1 | 11.1111 | 250 | 0 | 5 | 23.3846 |
I would like to calculate the impact of the outliers on the column by calculating the mean and the median without them. I don't want to remove them to do this calculation. I suppose the best way is to add "~" to the outlier filter but I get lost in the code... This will benefit a lot of people as a search on removing outliers yields a lot of results. Other than the why they sneaked in the data in the first place, I just don't think the removal decision should be made without consideration on the potential impact. Feel free to add other considerations (skewness, sigma, n, etc.)
As always, I'm grateful to this community!
EDIT: I added variance and its square root standard deviation with and without outliers. In some fields you might want to keep outliers and go into ML directly. At least, by inspecting your data beforehand, you'll know how much they are contributing to your results. Used with nlargest() in the outliers column you get a quick view of which features contain the most. You could use this as a basis for filtering features by setting up thresholds on variance or mean. Thanks to the contributors, I have a powerful analytics tool now. Hope it can be useful to others.
Take advantage of apply method of DataFrame.
Series genereator
Just define the way you want the robust mean to apply by creating a method consuming Series and returning scalar and apply it to your DataFrame.
For the IRQ mean, here is a simple snippet:
def irq_agg(x, factor=1.5, aggregate=pd.Series.mean):
q1, q3 = x.quantile(0.25), x.quantile(0.75)
return aggregate(x[(q1 - factor*(q3 - q1) < x) & (x < q3 + factor*(q3 - q1))])
data.apply(irq_agg)
# A 3.363636
# B 14.200000
# C 4.333333
# D 3.363636
# E 4.500000
# dtype: float64
The same can be done to filter out based on percentiles (both side version):
def quantile_agg(x, alpha=0.05, aggregate=pd.Series.mean):
return aggregate(x[(x.quantile(alpha/2) < x) & (x < x.quantile(1 - alpha/2))])
data.apply(quantile_agg, alpha=0.01)
# A 12.454545
# B 15.777778
# C 4.727273
# D 41.625000
# E 4.909091
# dtype: float64
Frame generator
Even better, create a function that returns a Series, apply will create a DataFrame. Then we can compute at once a bunch of different means and medians in order to compare them. We can also reuse Series generator method defined above:
def analyze(x, alpha=0.05, factor=1.5):
return pd.Series({
"p_mean": quantile_agg(x, alpha=alpha),
"p_median": quantile_agg(x, alpha=alpha, aggregate=pd.Series.median),
"irq_mean": irq_agg(x, factor=factor),
"irq_median": irq_agg(x, factor=factor, aggregate=pd.Series.median),
"standard": x[((x - x.mean())/x.std()).abs() < 1].mean(),
"mean": x.mean(),
"median": x.median(),
})
data.apply(analyze).T
# p_mean p_median irq_mean irq_median standard mean median
# A 12.454545 5.0 3.363636 3.0 11.416667 61.692308 5.0
# B 15.777778 6.0 14.200000 5.0 14.200000 164.384615 6.0
# C 4.727273 4.0 4.333333 4.0 4.333333 80.923077 4.0
# D 41.625000 4.5 3.363636 3.0 3.363636 77.076923 3.0
# E 4.909091 5.0 4.500000 5.0 4.500000 23.384615 5.0
Now you can filter out outlier in several ways computes relevant aggregate on it such as mean or median.
No comment on whether this is an appropriate method to filter out your outliers. The code below should do what you asked:
q1, q3 = df.quantile([0.25, 0.75]).to_numpy()
delta = (q3 - q1) * 1.5
min_val, max_val = q1 - delta, q3 + delta
outliers = (df < min_val) | (max_val < df)
result = pd.concat(
[
pd.DataFrame(
{
"outliers": outliers.sum(),
"percent": outliers.sum() / outliers.sum().sum() * 100,
"max_val": max_val,
"min_val": min_val,
}
),
df.agg(["median", "mean"]).T,
df.mask(outliers, np.nan).agg(["median", "mean"]).T.add_suffix("_no_outliers"),
],
axis=1,
)
Result:
outliers percent max_val min_val median mean median_no_outliers mean_no_outliers
A 2 15.384615 13.5 -6.5 5.0 61.692308 3.0 3.363636
B 3 23.076923 243.0 -141.0 6.0 164.384615 5.0 14.200000
C 1 7.692308 13.0 -3.0 4.0 80.923077 4.0 4.333333
D 2 15.384615 16.0 -8.0 3.0 77.076923 3.0 3.363636
E 1 7.692308 10.5 -1.5 5.0 23.384615 5.0 4.500000
Given the following DataFrame:
| a | b | c | d |
|---|---|---|---|
| 1 | 0 | 1 | 0 |
| 1 | 0 | 1 | 1 |
| 1 | 0 | 0 | 1 |
| 0 | 1 | 0 | 1 |
How does one efficiently construct a weighted graph, such that:
The nodes correspond to the column names;
Two vertices are connected if they both have 1's in the same line of the DataFrame
(e.g. 'a' is connected to 'c' in the first row);
The weight is equal to the number of times two vertices are connected (e.g. edge 'a'-'c' has weight 2, while 'c'-'d' has weight 1).
Here is how to manually construct this graph using SimpleWeightedGraphs.jl and GraphPlot.jl:
g = SimpleWeightedGraph(4)
add_edge!(g,1,3,2)
add_edge!(g,1,4,2)
add_edge!(g,2,4,1)
add_edge!(g,3,4,1)
nodes = ["a","b","c","d"]
gplot(g,nodelabel=nodes,edgelinewidth=[2,2,1,1])
Something like this should work assuming df is your data frame:
using LinearAlgebra
function gengraph(df)
g = SimpleWeightedGraph(ncol(df))
ew = Int[]
for i in 1:ncol(df), j in i+1:ncol(df)
w = dot(df[!, i], df[!, j])
if w > 0
push!(ew, w)
add_edge!(g, i, j, w)
end
end
gplot(g,nodelabel=names(df),edgelinewidth=ew)
end
I am using the following grammar:
#JSGF V1.0;
grammar tag;
public <tag> = <tagPart> +;
<tagPart> = <digit> | <letter>;
<digit> = oh | zero | one | two | three | four | five | six |seven | eight | nine ;
<letter> = a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z ;
Everything works well unless I add weights. Running with weights:
<tagPart> = /0.8/ <digit> | /0.1/ <letter>;
I am getting the following error:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.jsgf.JSGFGrammar.getNormalizedWeights(JSGFGrammar.java:49)
The way I am using grammar is:
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("file:/E/sphinx4-5prealpha-src/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath("file:/E/sphinx4-5prealpha-src/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration.setGrammarPath("file:/E/sT/src/main/resources/");
configuration.setGrammarName("tag");
configuration.setUseGrammar(true);
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
I'm sorry for delay, this issue has been just fixed in trunk in revision 13217, please update and try again, it should work.
I have an existing macro that populates expense data into three adjacent cells in columns K-M (for ~300 rows) and I want to copy the data in these cells (same order) and paste special values based on the month value of a short date that is located in column AA. The copied data needs to be pasted into the same row as source, but columns N-Y (headers = Jan, Feb, March, Apr...Dec). Is there code to do this?
example below
Column| K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA |
Header| $ | $ | $ | J | F | M | A | M | J | J | A | S | O | N | D |FY |Date|
Row 3 |978|540|395|-->|-->|978|540|395| | | | | | | |1913|5/11|
Row 4 |841|779|120|-->|-->|-->|-->|841|779|120| | | | | |1740|7/24|
Row 5 |682|557| 55|-->|-->|-->|-->|-->|-->|682|557| 55| | | |1294|9/18| ' Row 6 |985|883|578|-->|-->|-->|-->|-->|-->|-->|-->|->|985|883|578|2446|12/2|
Try this:
Sub MoveData()
Dim vals As Range, val As Range, colOffset As Integer
Set vals = Range("K2:K" & Range("K2").End(xlDown).Row)
For Each val In vals
If val > 0 Then
colOffset = VBA.month(val.offset(0, 16))
val.offset(0, colOffset) = val
val.offset(0, colOffset + 1) = val.offset(0, 1)
val.offset(0, colOffset + 2) = val.offset(0, 2)
End if
Next val
End Sub
I've a relation D:
grunt> DESCRIBE D;
D: {i: int,l: chararray}
on which a GROUP is applied:
grunt> G = group D by i;
grunt> illustrate G;
-------------------------------------
| D | i:int | l:chararray |
-------------------------------------
| | 1 | B |
| | 1 | A |
| | 2 | A |
-------------------------------------
-----------------------------------------------------------------------
| G | group:int | D:bag{:tuple(i:int,l:chararray)} |
-----------------------------------------------------------------------
| | 1 | {(1, B), (1, A)} |
| | 2 | {(2, A)} |
-----------------------------------------------------------------------
How can I store each nested bag G.D in a file named as the corresponding group? I.e. /ouput/1, /output/2
I understand I can't use a store operation in a foreach block. In fact the following doeasn't work:
grunt> foreach G { store D into '/output/' + ((chararray) group) }
MultiStorage() option will work for you. It will be available in piggybank jar. You need to download from this link http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm and set it in your classpath.
Example:
input
1,A
1,B
2,A
PigScript:
REGISTER '/tmp/piggybank.jar';
A = LOAD 'input' USING PigStorage(',') AS (i:int,l:chararray);
B = GROUP A BY i;
STORE B INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');
Now output folder contains 2 dirs named 1 and 2 where the corresponding group value will be stored inside this folder.
Output:
output$ ls
1 2 _SUCCESS
Reference:
https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/storage/MultiStorage.html