GAMS - matrix row operations - gams-math

I have the following matrix in GAMS (call it a matrix x with rows
i and columns j):
I wish to transform this by summing elements in the same column from
row 2 and row 1, and then setting row 2 to 0:
If an element of the original matrix x can be called $x(i,j),$ how
can I tranform it to look like this second matrix? Any help is much
appreciated.

Actually, through trial and error, I managed it. If row 1 is entitled "1", and column 1 is entitled "1" and so on, I would do:
x("1",j)=x("1",j)+x("2",j)
x("2",j)=0

Related

How to label a whole dataset?

I have a question.I have a pandas dataframe that contains 5000 columns and 12 rows. Each row represents the signal received from an electrocardiogram lead. I want to assign 3 labels to this dataset. These 3 tags belong to the entire dataset and are not related to a specific row. How can I do this?
I have attached the picture of my dataframepandas dataframe.
and my labels are: Atrial Fibrillation:0,
right bundle branch block:1,
T Wave Change:2
I tried to assign 3 labels to a large dataset
(Not for a specific row or column)
but I didn't find a solution.
As you see, it has 12 rows and 5000 columns. each row represents 5000 data from one specific lead and overall we have 12 leads which refers to this 12 rows (I, II, III, aVR,.... V6) in my data frame. professional experts are recognised 3 label for this data frame which helps us to train a ML Model to detect different heart disease. I have 10000 data frame just like this and each one has 3 or 4 specific labels. Here is my question: How can I assign these 3 labels to this dataset that I mentioned.as I told before these labels don't refers to specific rows, in fact each data frame has 3 or 4 label for its whole. I mean, How can I assign 3 label to a whole data frame?

Pandas run function only on subset of whole Dataframe

Lets say i have Dataframe, which has 200 values, prices for products. I want to run some operation on this dataframe, like calculate average price for last 10 prices.
The way i understand it, right now pandas will go through every single row and calculate average for each row. Ie first 9 rows will be Nan, then from 10-200, it would calculate average for each row.
My issue is that i need to do a lot of these calculations and performance is an issue. For that reason, i would want to run the average only on say on last 10 values (dont need more) from all values, while i want to keep those values in the dataframe. Ie i dont want to get rid of those values or create new Dataframe.
I just essentially want to do calculation on less data, so it is faster.
Is something like that possible? Hopefully the question is clear.
Building off Chicodelarose's answer, you can achieve this in a more "pandas-like" syntax.
Defining your df as follows, we get 200 prices up to within [0, 1000).
df = pd.DataFrame((np.random.rand(200) * 1000.).round(decimals=2), columns=["price"])
The bit you're looking for, though, would the following:
def add10(n: float) -> float:
"""An exceptionally simple function to demonstrate you can set
values, too.
"""
return n + 10
df["price"].iloc[-12:] = df["price"].iloc[-12:].apply(add10)
Of course, you can also use these selections to return something else without setting values, too.
>>> df["price"].iloc[-12:].mean().round(decimals=2)
309.63 # this will, of course, be different as we're using random numbers
The primary justification for this approach lies in the use of pandas tooling. Say you want to operate over a subset of your data with multiple columns, you simply need to adjust your .apply(...) to contain an axis parameter, as follows: .apply(fn, axis=1).
This becomes much more readable the longer you spend in pandas. 🙂
Given a dataframe like the following:
Price
0 197.45
1 59.30
2 131.63
3 127.22
4 35.22
.. ...
195 73.05
196 47.73
197 107.58
198 162.31
199 195.02
[200 rows x 1 columns]
Call the following to obtain the mean over the last n rows of the dataframe:
def mean_over_n_last_rows(df, n, colname):
return df.iloc[-n:][colname].mean().round(decimals=2)
print(mean_over_n_last_rows(df, 2, "Price"))
Output:
178.67

Pandas identifying if any element is in a row

I have a data frame that is a single row of numerical values and I want to know if any of those values is greater than 2 and if so create a new column with the word 'Diff'
Col_,F_1,F_2
1,5,0
My dataframe is diff_df. Here is one thing I tried
c = diff_df >2
if c.any():
diff_df['difference']='Difference'
If I were to print c. it would be
Col_,F_1,F_2
False,True,False
I have tried c.all() and many iterations of other things. Clearly my inexperience is holding me back and google is not helping in this regards. Everything I try is either "The truth value of a Series (or Data Frame) is ambiguous use a.any(), a.all()...." Any help would be appreciated.
Since it is only one row, take the .max().max() of the dataframe. With one .max() you are going to get the .max() of each column. The second .max() takes the max of all the columns.
if diff_df.max().max() > 2: diff_df['difference']='Difference'
output:
Col_ F_1 F_2 difference
0 1 5 0 Difference
Use .loc accessor and .gt() to query and at the same time create new column and populate it
df.loc[df.gt(2).any(1), "difference"] = 'Difference'
Col_ F_1 F_2 difference
0 1 5 0 Difference
In addition to David's reponse you may also try this:
if ((df > 2).astype(int)).sum(axis=1).values[0] == 1:
df['difference']='Difference'

Need explanation on how pandas.drop is working here

I have a data frame, lets say xyz. I have written code to find out the % of null values each column possess in the dataframe. my code below:
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)
let say i got following results:
abc 26.63
def 36.58
ghi 78.46
I want to drop column ghi because it has more than 70% of null values.
I achieved it using the following code:
xyz = xyz.drop(xyz.loc[:,round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70].columns, 1)
but , i did not understand how does this code works, can anyone please explain it?
the code is doing the following:
xyz.drop( [...], 1)
removes the specified elements for a given axis, either by row or by column. In this particular case, df.drop( ..., 1) means you're dropping by axis 1, i.e, column
xyz.loc[:, ... ].columns
will return a list with the column names resulting from your slicing condition
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70
this instruction is counting the number of nulls, adding them up and normalizing by the number of rows, effectively computing the percentage of nan in each column. Then, the amount is rounded to have only 2 decimal positions and finally you return True is the number of nan is more than 70%. Hence, you get a mapping between columns and a True/False array.
Putting everything together: you're first producing a Boolean array that marks which columns have more than 70% nan, then, using .loc you use Boolean indexing to look only at the columns you want to drop ( nan % > 70%), then using .columns you recover the name of such columns, which then are used by the .drop instruction.
Hopefully this clear things up!
If you code is hard to understand , you can just check dropna with thresh, since pandas already cover this case.
df=df.dropna(axis=1,thresh=round(len(df)*0.3))

Dendrograms with SciPy

I have a dataset that I shaped according to my needs, the dataframe is as follows:
Index A B C D ..... Z
Date/Time 1 0 0 0,35 ... 1
Date/Time 0,75 1 1 1 1
The total number of rows is 8878
What I try to do is create a time-series dendrogram (Example: Whole A column will be compared to whole B column in whole time).
I am expecting an output like this:
(source: rsc.org)
I tried to construct the linkage matrix with Z = hierarchy.linkage(X, 'ward')
However, when I print the dendrogram, it just shows an empty picture.
There is no problem if a compare every time point with each other and plot, but in that way, the dendrogram becomes way too complicated to observe even in truncated form.
Is there a way to handle the data as a whole time series and compare within columns in SciPy?