Change cell color if the two values of this cell have opposite signs in Pretty tables in Julia - datatables

How to change cell color if the two values of this cell have opposite signs in Pretty tables in Julia below is my code and the table is attached.
names = string.(-1/1:1/4:1/1)
pretty_table(AStrings , header = ([-1,-3/4, -1/2, -1/4, 0, 1/4, 1/2, 3/4, 1]), row_names= names)

After digging through the docs:
using PrettyTables
# making some demo data
data = collect(zip(rand([-1.0,1.0],5,5),rand([-1.0,1.0],5,5)))
names = [-1, -1/2, 0, 1/2, 1]
# this is the Highlighter which makes text red when signs differ.
# signs differ if their product is negative.
hl = Highlighter((d,i,j)->d[i,j][1]*d[i,j][2] < 0, crayon"red")
Then the Highlighter is used as follows:
pretty_table(data ; header = names, row_names= names, highlighters=hl)
Well, colors don't go through in text, so put an image of result.

This answer is beyond what was asked by the OP, but hopefully would be informative. In addition to Dan Getz's answer, one can apply more than one rule for highlighting the values. For example, if you want to make pairs with positive value green besides the first rule, you can pass a tuple of Highlighter to the highlighters keyword argument.
I will use Dan's example to show you the results:
julia> hl = (
Highlighter((d,i,j)->d[i,j][1]*d[i,j][2]<0, crayon"red"),
Highlighter((d,i,j)->d[i,j][1]>0 && d[i,j][2]>0, crayon"green")
)
The result of pretty_table(data; header=names, row_names=names, highlighters=hl) would be:

Related

geom_nodelabel_repel() position for circular ggraph plot

I have a network diagram that looks like this:
I made it using ggraph and added the labels using geom_nodelabel_repel() from ggnetwork:
( ggraph_plot <- ggraph(layout) +
geom_edge_fan(aes(color = as.factor(responses), edge_width = as.factor(responses))) +
geom_node_point(aes(color = as.factor(group)), size = 10) +
geom_nodelabel_repel(aes(label = name, x=x, y=y), segment.size = 1, segment.color = "black", size = 5) +
scale_color_manual("Group", values = c("#2b83ba", "#d7191c", "#fdae61")) +
scale_edge_color_manual("Frequency of Communication", values = c("Once a week or more" = "#444444","Monthly" = "#777777",
"Once every 3 months" = "#888888", "Once a year" = "#999999"),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
scale_edge_width_manual("Frequency of Communication", values = c("Once a week or more" = 3,"Monthly" = 2,
"Once every 3 months" = 1, "Once a year" = 0.25),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
theme_void() +
theme(legend.text = element_text(size=16, face="bold"),
legend.title = element_text(size=16, face="bold")) )
I want to have the labels on the left side of the plot be off to the left, and the labels on the right side of the plot to be off to the right. I want to do this because the actual labels are quite long (organization names) and they get in the way of the lines in the actual plot.
How can I do this using geom_nodelabel_repel()? i've tried different combinations of box_padding and point_padding, as well as h_just and v_just but these apply to all labels and it doesn't seem like there is a way to subset or position specific points.
Apologies for not providing a reproducible example but I wasn't sure how to do this without compromising the identities of respondents from my survey.
Well, there is always the manually-intensive, yet effective method of separately adding the geom_node_label_repel function for the nodes on the "left" vs. the "right" of the plot. It's not at all elegant and probably bad coding practice, but I've done similar things myself when I can't figure out an elegant solution. It works really well when you don't have a very large dataset to begin with and if you are not planning to make the same plot over and over again. Basically, it would entail:
Identifying if there exists a property in your dataset that places points on the "left" vs. the "right". In this case, it doesn't look like it, so you would just have to create a list manually of those entries on the "left" vs. "right" of your plot.
Using separate calls to geom_node_label_repel with different nudge_x values. Use any reasonable method to subset the "left" and "right datapoints. You can create a new column in the dataset, or use formatting in-line like data = subset(your.data.frame, property %in% left.list)
For example, if you created a column called subset.side, being either "left" or "right" in your data.frame (here: your.data.frame), your calls to geom_node_label_repel might look something like:
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='left'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=-10
) +
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='right'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=10
) +
Alternatively, you can create a list based on the label name itself--let's say you called those lists names.left and names.right, where you can subset accordingly by swapping in as represented in the pseudo code below:
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.left),...
nudge_x = -10, ...
) +
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.right),...
nudge_x = 10, ...
)
To be fair, I have not worked with the node geoms before, so I am assuming here that the positioning of the labels will not affect the mapping (as it would not with other geoms).

Pandas manipulation: matching data from other columns to one column, applied uniquely to all rows

I have a model that predicts 10 words for a particular course in order of likelihood, and I'd like the first 5 words of those words that appear in the course's description.
This is the format of the data:
course_name course_title course_description predicted_word_10 predicted_word_9 predicted_word_8 predicted_word_7 predicted_word_6 predicted_word_5 predicted_word_4 predicted_word_3 predicted_word_2 predicted_word_1
Xmath 32 Precalculus Polynomial and rational functions, exponential... directed scholars approach build african different visual cultures placed global
Xphilos 2 Morality Introduction to ethical and political philosop... make presentation weekly european ways general range questions liberal speakers
My idea is for each row to start iterating from predicted_word_1 until I get the first 5 that are in the description. I'd like to save those words in the order they appear into additional columns description_word_1 ... description_word_5. (If there are <5 predicted words in the description I plan to return NAN in the corresponding columns).
To clarify with an example: if the course_description of a course is 'Polynomial and rational functions, exponential and logarithmic functions, trigonometry and trigonometric functions. Complex numbers, fundamental theorem of algebra, mathematical induction, binomial theorem, series, and sequences. ' and its first few predicted words are irrelevantword1, induction, exponential, logarithmic, irrelevantword2, polynomial, algebra...
I would want to return induction, exponential, logarithmic, polynomial, algebra for that in that order and do the same for the rest of the courses.
My attempt was to define an apply function that will take in a row and iterate from the first predicted word until it finds the first 5 that are in the description, but the part I am unable to figure out is how to create these additional columns that have the correct words for each course. This code will currently only keep the words for one course for all the rows.
def find_top_description_words(row):
print(row['course_title'])
description_words_index=1
for i in range(num_words_per_course):
description = row.loc['course_description']
word_i = row.loc['predicted_word_' + str(i+1)]
if (word_i in description) & (description_words_index <=5) :
print(description_words_index)
row['description_word_' + str(description_words_index)] = word_i
description_words_index += 1
df.apply(find_top_description_words,axis=1)
The end goal of this data manipulation is to keep the top 10 predicted words from the model and the top 5 predicted words in the description so the dataframe would look like:
course_name course_title course_description top_description_word_1 ... top_description_word_5 predicted_word_1 ... predicted_word_10
Any pointers would be appreciated. Thank you!
If I understand correctly:
Create new DataFrame with just 100 predicted words:
pred_words_lists = df.apply(lambda x: list(x[3:].dropna())[::-1], axis = 1)
Please note that, there are lists in each row with predicted words. The order is nice, I mean the first, not empty, predicted word is on the first place, the second on the second place and so on.
Now let's create a new DataFrame:
pred_words_df = pd.DataFrame(pred_words_lists.tolist())
pred_words_df.columns = df.columns[:2:-1]
And The final DataFrame:
final_df = df[['course_name', 'course_title', 'course_description']].join(pred_words_df.iloc[:,0:11])
Hope this works.
EDIT
def common_elements(xx, yy):
temp = pd.Series(range(0, len(xx)), index= xx)
return list(df.reindex(yy).sort_values()[0:10].dropna().index)
pred_words_lists = df.apply(lambda x: common_elements(x[2].replace(',','').split(), list(x[3:].dropna())), axis = 1)
Does it satisfy your requirements?
Adapted solution (OP):
def get_sorted_descriptions_words(course_description, predicted_words, k):
description_words = course_description.replace(',','').split()
predicted_words_list = list(predicted_words)
predicted_words = pd.Series(range(0, len(predicted_words_list)), index=predicted_words_list)
predicted_words = predicted_words[~predicted_words.index.duplicated()]
ordered_description = predicted_words.reindex(description_words).dropna().sort_values()
ordered_description_list = pd.Series(ordered_description.index).unique()[:k]
return ordered_description_list
df.apply(lambda x: get_sorted_descriptions_words(x['course_description'], x.filter(regex=r'predicted_word_.*'), k), axis=1)

Data Selection - Finding relations between dataframe attributes

let's say i have a dataframe of 80 columns and 1 target column,
for example a bank account table with 80 attributes for each record (account) and 1 target column which decides if the client stays or leaves.
what steps and algorithms should i follow to select the most effective columns with the higher impact on the target column ?
There are a number of steps you can take, I'll give some examples to get you started:
A correlation coefficient, such as Pearson's Rho (for parametric data) or Spearman's R (for ordinate data).
Feature importances. I like XGBoost for this, as it includes the handy xgb.ggplot.importance / xgb.plot_importance methods.
One of the many feature selection options, such as python's sklearn.feature_selection methods.
This one way to do it using the Pearson correlation coefficient in Rstudio, I used it once when exploring the red_wine dataset my targeted variable or column was the quality and I wanted to know the effect of the rest of the columns on it.
see below figure shows the output of the code as you can see the blue color represents positive relation and red represents negative relations and the closer the value to 1 or -1 the darker the color
c <- cor(
red_wine %>%
# first we remove unwanted columns
dplyr::select(-X) %>%
dplyr::select(-rating) %>%
mutate(
# now we translate quality to a number
quality = as.numeric(quality)
)
)
corrplot(c, method = "color", type = "lower", addCoef.col = "gray", title = "Red Wine Variables Correlations", mar=c(0,0,1,0), tl.cex = 0.7, tl.col = "black", number.cex = 0.9)

Matplotlib table: individual column width

Is there a way to specify the width of individual columns in a matplotlib table?
The first column in my table contains just 2-3 digit IDs, and I'd like this column to be smaller than the others, but I can't seem to get it to work.
Let's say I have a table like this:
import matplotlib.pyplot as plt
fig = plt.figure()
table_ax = fig.add_subplot(1,1,1)
table_content = [["1", "Daisy", "ill"],
["2", "Topsy", "healthy"]]
table_header = ('ID', 'Name','Status')
the_table = table_ax.table(cellText=table_content, loc='center', colLabels=table_header, cellLoc='left')
fig.show()
(Never mind the weird cropping, it doesn't happen in my real table.)
What I've tried is this:
prop = the_table.properties()
cells = prop['child_artists']
for cell in cells:
text = cell.get_text()
if text == "ID":
cell.set_width(0.1)
else:
try:
int(text)
cell.set_width(0.1)
except TypeError:
pass
The above code seems to have zero effect - the columns are still all equally wide. (cell.get_width() returns 0.3333333333, so I would think that width is indeed cell-width... so what am I doing wrong?
Any help would be appreciated!
I've been searching the web over and over again looking for similar probelm sollutions. I've found some answers and used them, but I didn't find them quite straight forward. By chance I just found the table method get_celld when simply trying different table methods.
By using it you get a dictionary where the keys are tuples corresponding to table coordinates in terms of cell position. So by writing
cellDict=the_table.get_celld()
cellDict[(0,0)].set_width(0.1)
you will simply adress the upper left cell. Now looping over rows or columns will be fairly easy.
A bit late answer, but hopefully others may be helped.
Just for completion. The column header starts with (0,0) ... (0, n-1). The row header starts with (1,-1) ... (n,-1).
---------------------------------------------
| ColumnHeader (0,0) | ColumnHeader (0,1) |
---------------------------------------------
rowHeader (1,-1) | Value (1,0) | Value (1,1) |
--------------------------------------------
rowHeader (2,-1) | Value (2,0) | Value (2,1) |
--------------------------------------------
The code:
for key, cell in the_table.get_celld().items():
print (str(key[0])+", "+ str(key[1])+"\t"+str(cell.get_text()))
Condition text=="ID" is always False, since cell.get_text() returns a Text object rather than a string:
for cell in cells:
text = cell.get_text()
print text, text=="ID" # <==== here
if text == "ID":
cell.set_width(0.1)
else:
try:
int(text)
cell.set_width(0.1)
except TypeError:
pass
On the other hand, addressing the cells directly works: try cells[0].set_width(0.5).
EDIT: Text objects have an attribute get_text() themselves, so getting down to a string of a cell can be done like this:
text = cell.get_text().get_text() # yup, looks weird
if text == "ID":

Convert a Dynamic[] construct to a numerical list

I have been trying to put together something that allows me to extract points from a ListPlot in order to use them in further computations. My current approach is to select points with a Locator[]. This works fine for displaying points, but I cannot figure out how to extract numerical values from a construct with head Dynamic[]. Below is a self-contained example. By dragging the gray locator, you should be able to select points (indicated by the pink locator and stored in q, a list of two elements). This is the second line below the plot. Now I would like to pass q[[2]] to a function, or perhaps simply display it. However, Mathematica treats q as a single entity with head Dynamic, and thus taking the second part is impossible (hence the error message). Can anyone shed light on how to convert q into a regular list?
EuclideanDistanceMod[p1_List, p2_List, fac_: {1, 1}] /;
Length[p1] == Length[p2] :=
Plus ## (fac.MapThread[Abs[#1 - #2]^2 &, {p1, p2}]) // Sqrt;
test1 = {{1.`, 6.340196001221532`}, {1.`,
13.78779876355869`}, {1.045`, 6.2634018978377295`}, {1.045`,
13.754947081416544`}, {1.09`, 6.178367702583522`}, {1.09`,
13.72055251752498`}, {1.135`, 1.8183153704413153`}, {1.135`,
6.082497198000075`}, {1.135`, 13.684582525399742`}, {1.18`,
1.6809452373465104`}, {1.18`, 5.971583107298081`}, {1.18`,
13.646996905469383`}, {1.225`, 1.9480537697339537`}, {1.225`,
5.838386922625636`}, {1.225`, 13.607746407088161`}, {1.27`,
2.1183174369679234`}, {1.27`, 5.669799095595362`}, {1.27`,
13.566771130126131`}, {1.315`, 2.2572975468163463`}, {1.315`,
5.444014254828522`}, {1.315`, 13.523998701347882`}, {1.36`,
2.380307009155079`}, {1.36`, 5.153024664297602`}, {1.36`,
13.479342200528283`}, {1.405`, 2.4941312539733285`}, {1.405`,
4.861423833512566`}, {1.405`, 13.432697814928654`}, {1.45`,
2.6028066447609426`}, {1.45`, 4.619367407525507`}, {1.45`,
13.383942212133244`}};
DynamicModule[{p = {1.2, 10}, q = {1.3, 11}},
q := Dynamic#
First#test1[[
Ordering[{#, EuclideanDistanceMod[p, #, {1, .1}]} & /# test1,
1, #1[[2]] < #2[[2]] &]]];
Grid[{{Show[{ListPlot[test1, Frame -> True, ImageSize -> 300],
Graphics#Locator[Dynamic[p]],
Graphics#
Locator[q, Appearance -> {Small},
Background -> Pink]}]}, {Dynamic#p}, {q},{q[[2]]}}]]
There are several ways to extract values from a dynamic expression. What you probably want is Setting (documentation), which resolves all dynamic values into their values at the time Setting is evaluated.
In[75]:= Slider[Dynamic[x]] (* evaluate then move the slider *)
In[76]:= FullForm[Dynamic[x]]
Out[76]//FullForm= Dynamic[x]
In[77]:= FullForm[Setting[Dynamic[x]]]
Out[77]//FullForm= 0.384`
Here's a slightly more complicated example:
DynamicModule[{x},
{Dynamic[x], Slider[Dynamic[x]],
Button["Set y to the current value of x", y = Setting[Dynamic[x]]]}
]
If you evaluate the above expression, move the slider and then click the button, the current value of x as set by the slider is assigned to y. If you then move the slider again, the value of y doesn't change until you update it again by clicking the button.
Instead of assigning to a variable, you can of course paste values into the notebook, call a function, export a file, etc.
After a little more research, it appears that the answer revolves around the fact that Dynamic[] is a wrapper for updating and displaying the expression. Any computations that you want dynamically updated must be placed inside the wrapper: for instance, instead of doing something like q = Dynamic[p] + 1 one should use something like Dynamic[q = p + 1; q]}]. For my example, where I wanted to split q into two parts, here's the updated code:
DynamicModule[{p = {1.2, 10}, q = {1.3, 11}, qq, q1, q2},
q := Dynamic[
qq = First#
test1[[Ordering[{#, EuclideanDistanceMod[p, #, {1, .1}]} & /#
test1, 1, #1[[2]] < #2[[2]] &]]];
{q1, q2} = qq;
qq
];
Grid[{{Show[{ListPlot[test1, Frame -> True, ImageSize -> 300],
Graphics#Locator[Dynamic[p]],
Graphics#
Locator[q, Appearance -> {Small},
Background -> Pink]}]}, {Dynamic#p}, {Dynamic#q}, {Dynamic#
q1}}]]
If I am still missing something, or if there's a cleaner way to do this, I welcome any suggestions...