Scrapy - Grab all product details - scrapy

I need to grab all Product Details (with green tickmarks) from this page: https://sourceforge.net/software/product/Budget-Maestro/
divs = response.xpath("//section[#class='row psp-section m-section-comm-details m-section-emphasized grey']/div[#class='list-outer column']/div")
for div in divs:
detail = div.xpath("./h3/text()").extract_first().strip() + ":"
if detail!="Company Information:":
divs2 = div.xpath(".//div[#class='list']/div")
for div2 in divs2:
dd = [val for val in div2.xpath("./text()").extract() if val.strip('\n').strip().strip('\n')]
for d in dd:
detail = detail + d + ","
detail = detail.strip(",")
product_details = product_details + detail + "|"
product_details = product_details.strip("|")
But it gives me some features with \n as well. And I'm sure there must be a smarter & shorter way to do this.

If you need data only from "Product Details", check this:
In [6]: response.css("section.m-section-comm-details div.list svg").xpath('.//following-sibling::text()').extract()
Out[6]:
[u' SaaS\n ',
u' Windows\n ',
u' Live Online ',
u' In Person ',
u' Online ',
u' Business Hours ']

Use this,
divs = [div.strip() for div in response.xpath('//*[contains(#class, "has-feature")]/text()').extract() if div.strip()]
Now Div is
[u'Accounts Payable', u'Accounts Receivable', u'Cash Management', u'General Ledger', u'Payroll', u'Project Accounting', u'"What If" Scenarios', u'Balance Sheet', u'Capital Asset Planning', u'Cash Management', u'Consolidation / Roll-Up', u'Forecasting', u'General Ledger', u'Income Statements', u'Multi-Company', u'Multi-Department / Project', u'Profit / Loss Statement', u'Project Budgeting', u'Run Rate Tracking', u'Version Control',u'"What If" Scenarios', u'Balance Sheet', u'Cash Management', u'Consolidation / Roll-Up', u'Forecasting', u'General Ledger', u'Income Statements', u'Profit / Loss Statement']
And i hope this is all you want. Iterate over this list now and do you logic :)

Related

Gseapy: how to get gene list used for each pathway

I am running an enrichment analysis with gseapy enrichr on a list of genes.
I am using the following code:
enr_res = gseapy.enrichr(gene_list = glist[:5000],
organism = 'Mouse',
gene_sets = ['GO_Biological_Process_2021'],
description = 'pathway',
#cutoff = 0.5
)
The result looks like this:
enr_res.results.head(10)
The question I have is, how do I get the full set of Genes (very right column in the picture) used for the individual pathways?
If I try the following code, it will just give me the already displayed genes. I added some correction to have a list that I then could further use for the analysis.
x = 'fatty acid beta-oxidation (GO:0006635)'
g_list = enr_res.results[enr_res.results.Term == x]['Genes'].to_string()
deliminator = ';'
g_list = [section + deliminator for section in g_list.split(deliminator) if section]
g_list = [s.replace(';', '') for s in g_list]
g_list = [s.replace(' ', '') for s in g_list]
g_list = [s.replace('.', '') for s in g_list]
first_gene = g_list[0:1]
first_gene = [sub[1 : ] for sub in first_gene]
g_list[0:1] = first_gene
for i in range(len(g_list)):
g_list[i] = g_list[i].lower()
for i in range(len(g_list)):
g_list[i] = g_list[i].capitalize()
g_list
I think my approach might be wrong to get all the Genes and I just get the displayed genes. Does somebody has an idea, how it is possible to get all genes?
pd.set_option('display.max_colwidth', 3000)
This increases the number of displayed characters and somehow this solves the problem for me. :)

Percentage labels in pie chart with ggplot

I'm working now in a statistics project and recently started with R. I have some problems with the visualization. I found a lot of different tutorials about how to add percentage labels in pie charts, but after one hour of trying I still don't get it. Maybe something is different with my data frame so that this doesn't work?
It's a data frame with collected survey answers, so I'm not allowed to publish them here. The column in question (geschäftliche_lage) is a factor with three levels ("Gut", "Befriedigend", "Schlecht"). I want to add percentage labels for each level.
I used the following code in order to create the pie chart:
dataset %>%
ggplot(aes(x= "", fill = geschäftliche_lage)) +
geom_bar(stat= "count", width = 1, color = "white") +
coord_polar("y", start = 0, direction = -1) +
scale_fill_manual(values = c("#00BA38", "#619CFF", "#F8766D")) +
theme_void()
This code gives me the desired pie chart, but without percentage labels. As soon as a I try to add percentage labels, everything is messed up. Do you know a clean code for adding percentage labels?
If you need more information or data, just let me know!
Greetings
Using mtcars as example data. Maybe this what your are looking for:
library(ggplot2)
ggplot(mtcars, aes(x = "", fill = factor(cyl))) +
geom_bar(stat= "count", width = 1, color = "white") +
geom_text(aes(label = scales::percent(..count.. / sum(..count..))), stat = "count", position = position_stack(vjust = .5)) +
coord_polar("y", start = 0, direction = -1) +
scale_fill_manual(values = c("#00BA38", "#619CFF", "#F8766D")) +
theme_void()
Created on 2020-05-25 by the reprex package (v0.3.0)

Connect observations (dots and lines) without using ggpaired

I created a bar chart using geom_bar with "Group" on the x-axis (Female, Male), and "Values" on the y-axis. Group is further subdivided into "Session" such that there is "Session 1" and "Session 2" for both Male and Female (i.e. four bars in total).
Since all participants participated in Session 1 and 2, I overlayed a dotplot (geom_dot) over each of the four bars, to represent the individual data.
I am now trying to connect the observations for all participants ("PID"), between session 1 and 2. In other words, there should be lines connecting several sets of two-points on the "Male" portion of the x-axis (i.e. per participant), and "Female portion".
I tried this with "geom_line" (below) but to no avail (instead, it created a single vertical line in the middle of "Male" and another in the middle of "Female"). I'm not too sure how to fix this.
See code below:
ggplot(data_foo, aes(x=factor(Group),y=Values, colour = factor(Session), fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 1.0, position = "dodge", fill = "black") +
geom_line(aes(group = PID), colour="dark grey") +
labs(title='My Data',x='Group',y='Values') +
theme_light()
Sample data (.txt)
data_foo <- readr::read_csv("PID,Group,Session,Values
P1,F,1,14
P2,F,1,13
P3,F,1,16
P4,M,1,18
P5,F,1,20
P6,M,1,27
P7,M,1,19
P8,M,1,11
P9,F,1,28
P10,F,1,20
P11,F,1,24
P12,M,1,10
P1,F,2,26
P2,F,2,21
P3,F,2,19
P4,M,2,13
P5,F,2,26
P6,M,2,15
P7,M,2,23
P8,M,2,23
P9,F,2,30
P10,F,2,21
P11,F,2,11
P12,M,2,19")
The trouble you have is that you want to dodge by several groups. Your geom_line does not know how to split the Group variable by session. Here are two ways to address this problem. Method 1 is probably the most "ggploty way", and a neat way of adding another grouping without making the visualisation too overcrowded. for method 2 you need to change your x variable
1) Use facet
2) Use interaction to split session for each Group. Define levels for the right bar order
I have also used geom_point instead, because geom_dot is more a specific type of histogram.
I would generally recommend to use boxplots for such plots of values like that, because bars are more appropriate for specific measures such as counts.
Method 1: Facets
library(ggplot2)
ggplot(data_foo, aes(x = Session, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black') +
facet_wrap(~Group)
Created on 2020-01-20 by the reprex package (v0.3.0)
Method 2: create an interaction term in your x variable. note that you need to order the factor levels manually.
data_foo <- data_foo %>% mutate(new_x = factor(interaction(Group,Session), levels = c('F.1','F.2','M.1','M.2')))
ggplot(data_foo, aes(x = new_x, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black')
Created on 2020-01-20 by the reprex package (v0.3.0)
But everything gets visually not very compelling.
I suggest doing a few visualization tips to have a more informative chart. For example, I feel like having a differentiation of colors for PID will help us track the changes of each participant for different levels of other variables. Something like:
library(ggplot2)
ggplot(data_foo, aes(x = factor(Session), y = Values, fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = factor(PID), colour=factor(PID)), size=2, alpha=0.7) +
geom_point(aes(group = factor(PID), colour=factor(PID)), shape = 21, size=2,show.legend = F) +
theme_bw() +
labs(x='Session',fill='Session',colour='PID')+
theme(legend.position="right") +
facet_wrap(~Group)+
scale_colour_discrete(breaks=paste0('P',1:12))
And we have the following plot:
Hope it helps.

ggplot second and third x axis for different parameters

I am trying to plot three different types of data in one ggplot.
The types are activity, temperature and chlorophyll. So I need two more x-axes,
one for temperature and one for chlorophyll.
The code I use at the moment is
ggplot(S22,
aes(x=activity,
y= depth_m,
color= Temperature)) +
labs(title="alkaline phosphatase_S22", y = "depth [m]") +
xlab(expression(paste("activity [nmol", " ",l^-1, " ", h^-1, "]", sep=""))) +
scale_color_manual(values = c("blue","red"),
breaks=c("c", "w"),
labels=c("1.8°C", "5.8°C"))+
scale_y_continuous(trans=reverselog_trans(10),
breaks = c(0,5,10,25,50,100,250,500,1000)) + geom_point()
#scale_x_continuous(position = "top") +
geom_path(data=Chl22d, aes(x=Fluorometer_V, y=depth_m), color='green') +
geom_path(data=Chl22d, aes(x=T_C, y=depth_m), color= 'darkgrey')
I read, that I could use the scale_x_continous function. But that won´t work.
Does anyone have an idea? Is this even possible with ggplot?
Thanks so much in advance

Return multiple input (Python)

In python 3 I have a line asking for input that will then look in an imported dictionary and then list all their inputs that appear in the dictionary. My problem is when I run the code and put in the input it will only return the last word I input.
For example
the dictionary contains (AIR, AMA)
and if I input (AIR, AMA) it will only return AMA.
Any information to resolve this would be very helpful!
The dictionary:
EXCHANGE_DATA = [('AIA', 'Auckair', 1.50),
('AIR', 'Airnz', 5.60),
('AMP', 'Amp',3.22),
The Code:
import shares
a=input("Please input")
s1 = a.replace(' ' , "")
print ('Please list portfolio: ' + a)
print (" ")
n=["Code", "Name", "Price"]
print ('{0: <6}'.format(n[0]) + '{0:<20}'.format(n[1]) + '{0:>8}'.format(n[2]))
z = shares.EXCHANGE_DATA[0:][0]
b=s1.upper()
c=b.split()
f=shares.EXCHANGE_DATA
def find(f, a):
return [s for s in f if a.upper() in s]
x= (find(f, str(a)))
toDisplay = []
a = a.split()
for i in a:
temp = find(f, i)
if(temp):
toDisplay.append(temp)
for i in toDisplay:
print ('{0: <6}'.format(i[0][0]) + '{0:<20}'.format(i[0][1]) + ("{0:>8.2f}".format(i[0][2])))
Ok, the code seems somewhat confused. Here's a simpler version that seems to do what you want:
#!/usr/bin/env python3
EXCHANGE_DATA = [('AIA', 'Auckair', 1.50),
('AIR', 'Airnz', 5.60),
('AMP', 'Amp',3.22)]
user_input = input("Please Specify Shares: ")
names = set(user_input.upper().split())
print ('Listing the following shares: ' + str(names))
print (" ")
# Print header
n=["Code", "Name", "Price"]
print ('{0: <6}{1:<20}{2:>8}'.format(n[0],n[1],n[2]))
#print data
for i in [data for data in EXCHANGE_DATA if data[0] in names]:
print ('{0: <6}{1:<20}{2:>8}'.format(i[0],i[1],i[2]))
And here's an example of use:
➤ python3 program.py
Please Specify Shares: air amp
Listing the following shares: {'AMP', 'AIR'}
Code Name Price
AIR Airnz 5.6
AMP Amp 3.22
The code sample you provided actually does what was expected, if you gave it space separated quote names.
Hope this helps.