Plot secondary x_axis in ggplot

Plot secondary x_axis in ggplot - ggplot2

Dear All seniors and members,
Hope you are doing great. I have data set, which I like to plot the secondary x-axis in ggplot. I could not make it to work for the last 4 hours. below is my dataset.
Pathway ES NES p_value q_value Group
1 HALLMARK_HYPOXIA 0.49 2.25 0.000 0.000 Top
2 HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 0.44 2.00 0.000 0.000 Top
3 HALLMARK_UV_RESPONSE_DN 0.45 1.98 0.000 0.000 Top
4 HALLMARK_TGF_BETA_SIGNALING 0.48 1.77 0.003 0.004 Top
5 HALLMARK_HEDGEHOG_SIGNALING 0.52 1.76 0.003 0.003 Top
6 HALLMARK_ESTROGEN_RESPONSE_EARLY 0.38 1.73 0.000 0.004 Top
7 HALLMARK_KRAS_SIGNALING_DN 0.37 1.69 0.000 0.005 Top
8 HALLMARK_INTERFERON_ALPHA_RESPONSE 0.37 1.54 0.009 0.021 Top
9 HALLMARK_TNFA_SIGNALING_VIA_NFKB 0.32 1.45 0.005 0.048 Top
10 HALLMARK_NOTCH_SIGNALING 0.42 1.42 0.070 0.059 Top
11 HALLMARK_COAGULATION 0.32 1.39 0.031 0.067 Top
12 HALLMARK_MITOTIC_SPINDLE 0.30 1.37 0.025 0.078 Top
13 HALLMARK_ANGIOGENESIS 0.40 1.37 0.088 0.074 Top
14 HALLMARK_WNT_BETA_CATENIN_SIGNALING 0.35 1.23 0.173 0.216 Top
15 HALLMARK_OXIDATIVE_PHOSPHORYLATION -0.65 -3.43 0.000 0.000 Bottom
16 HALLMARK_MYC_TARGETS_V1 -0.49 -2.56 0.000 0.000 Bottom
17 HALLMARK_E2F_TARGETS -0.45 -2.37 0.000 0.000 Bottom
18 HALLMARK_DNA_REPAIR -0.46 -2.33 0.000 0.000 Bottom
19 HALLMARK_ADIPOGENESIS -0.42 -2.26 0.000 0.000 Bottom
20 HALLMARK_FATTY_ACID_METABOLISM -0.41 -2.06 0.000 0.000 Bottom
21 HALLMARK_PEROXISOME -0.43 -2.01 0.000 0.000 Bottom
22 HALLMARK_MYC_TARGETS_V2 -0.43 -1.84 0.003 0.001 Bottom
23 HALLMARK_CHOLESTEROL_HOMEOSTASIS -0.42 -1.83 0.003 0.001 Bottom
24 HALLMARK_ALLOGRAFT_REJECTION -0.34 -1.78 0.000 0.003 Bottom
25 HALLMARK_MTORC1_SIGNALING -0.32 -1.67 0.000 0.004 Bottom
26 HALLMARK_P53_PATHWAY -0.29 -1.52 0.000 0.015 Bottom
27 HALLMARK_UV_RESPONSE_UP -0.28 -1.41 0.013 0.036 Bottom
28 HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY -0.35 -1.39 0.057 0.040 Bottom
29 HALLMARK_HEME_METABOLISM -0.26 -1.34 0.014 0.061 Bottom
30 HALLMARK_G2M_CHECKPOINT -0.23 -1.20 0.080 0.172 Bottom
I like to plot like the following plot (plot # 1)
Here is my current codes chunks.
ggplot(data, aes(reorder(Pathway, NES), NES, fill= Group)) +
theme_classic() + geom_col() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 8),
axis.title = element_text(face = "bold", size = 12),
axis.text = element_text(face = "bold", size = 8), plot.title = element_text(hjust = 0.5)) + labs(x="Pathway", y="Normalized Enrichment Score",
title="2Gy_5f vs. 0Gy") + coord_flip()
This code produces the following plot (plot # 2)
So I would like to generate the plot where I have secondary x-axis with q_value (same like the first bar plot I have attached). Any help is greatly appreciated. Note: I used coord_flip so it turn angle of x-axis.
Kind Regards,
synat
[1]: https://i.stack.imgur.com/dBFIS.jpg
[2]: https://i.stack.imgur.com/yDbC5.jpg

Maybe you don't need a secondary axis per se to get the plot style you seek.
library(tidyverse)
ggplot(data, aes(x = NES, y = reorder(Pathway, NES), fill= Group)) +
theme_classic() +
geom_col() +
geom_text(aes(x = 2.5, y = reorder(Pathway, NES), label = q_value), hjust = 0) +
annotate("text", x = 2.5, y = length(data$Pathway) + 1, hjust = 0, fontface = "bold", label = "q_value" ) +
coord_cartesian(xlim = c(NA, 3),
ylim = c(NA, length(data$Pathway) + 1),
clip = "off") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 8),
axis.title = element_text(face = "bold", size = 12),
axis.text = element_text(face = "bold", size = 8),
plot.title = element_text(hjust = 0.5)) +
labs(x="Pathway", y="Normalized Enrichment Score",
title="2Gy_5f vs. 0Gy")
And for future reference you can read in data in the format you pasted like so:
data <- read_table(
"
Pathway ES NES p_value q_value Group
HALLMARK_HYPOXIA 0.49 2.25 0.000 0.000 Top
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 0.44 2.00 0.000 0.000 Top
HALLMARK_UV_RESPONSE_DN 0.45 1.98 0.000 0.000 Top
HALLMARK_TGF_BETA_SIGNALING 0.48 1.77 0.003 0.004 Top
HALLMARK_HEDGEHOG_SIGNALING 0.52 1.76 0.003 0.003 Top
HALLMARK_ESTROGEN_RESPONSE_EARLY 0.38 1.73 0.000 0.004 Top
HALLMARK_KRAS_SIGNALING_DN 0.37 1.69 0.000 0.005 Top
HALLMARK_INTERFERON_ALPHA_RESPONSE 0.37 1.54 0.009 0.021 Top
HALLMARK_TNFA_SIGNALING_VIA_NFKB 0.32 1.45 0.005 0.048 Top
HALLMARK_NOTCH_SIGNALING 0.42 1.42 0.070 0.059 Top
HALLMARK_COAGULATION 0.32 1.39 0.031 0.067 Top
HALLMARK_MITOTIC_SPINDLE 0.30 1.37 0.025 0.078 Top
HALLMARK_ANGIOGENESIS 0.40 1.37 0.088 0.074 Top
HALLMARK_WNT_BETA_CATENIN_SIGNALING 0.35 1.23 0.173 0.216 Top
HALLMARK_OXIDATIVE_PHOSPHORYLATION -0.65 -3.43 0.000 0.000 Bottom
HALLMARK_MYC_TARGETS_V1 -0.49 -2.56 0.000 0.000 Bottom
HALLMARK_E2F_TARGETS -0.45 -2.37 0.000 0.000 Bottom
HALLMARK_DNA_REPAIR -0.46 -2.33 0.000 0.000 Bottom
HALLMARK_ADIPOGENESIS -0.42 -2.26 0.000 0.000 Bottom
HALLMARK_FATTY_ACID_METABOLISM -0.41 -2.06 0.000 0.000 Bottom
HALLMARK_PEROXISOME -0.43 -2.01 0.000 0.000 Bottom
HALLMARK_MYC_TARGETS_V2 -0.43 -1.84 0.003 0.001 Bottom
HALLMARK_CHOLESTEROL_HOMEOSTASIS -0.42 -1.83 0.003 0.001 Bottom
HALLMARK_ALLOGRAFT_REJECTION -0.34 -1.78 0.000 0.003 Bottom
HALLMARK_MTORC1_SIGNALING -0.32 -1.67 0.000 0.004 Bottom
HALLMARK_P53_PATHWAY -0.29 -1.52 0.000 0.015 Bottom
HALLMARK_UV_RESPONSE_UP -0.28 -1.41 0.013 0.036 Bottom
HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY -0.35 -1.39 0.057 0.040 Bottom
HALLMARK_HEME_METABOLISM -0.26 -1.34 0.014 0.061 Bottom
HALLMARK_G2M_CHECKPOINT -0.23 -1.20 0.080 0.172 Bottom")
Created on 2021-11-23 by the reprex package (v2.0.1)

Related

Apply transformation to masked dataframe

I have this matrix df.head():
0 1 2 3 4 5 6 7 8 9 ... 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857
0 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 0.0 0.0 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 0.0 0.0 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 0.0 0.0 30.88689 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 0.0 0.0 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 42.43819 0.0 0.0 0.0 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 rows × 1858 columns
And I need to apply a transformation to it every time a value other than 0.0 is found, dividing the value by 0.32
So far I have the mask, like so:
normalize = 0.32
mask = (df>=0.0)
df = df.where(mask)
How do I apply such a transformation on a very large dataframe, after masking it?

You don't need mask, just divide your dataframe by 0.32.
df / 0.32
>>> df
A B
0 0 3
1 5 0
>>> df / 0.32
A B
0 0.000 9.375
1 15.625 0.000

If you needed to use mask, try;
mask = (df.eq(0))
df.where(mask, df/0.32)

create multi-indexed dataframe

I do not know how to create a multi-indexed df (that has unequal number of 2nd-indices). here is a sample:
data = [{'caterpillar': [('Сatérpillar',
{'fuzz': 0.82,
'levenshtein': 0.98,
'jaro_winkler': 0.9192,
'hamming': 0.98}),
('caterpiⅼⅼaʀ',
{'fuzz': 0.73,
'levenshtein': 0.97,
'jaro_winkler': 0.9114,
'hamming': 0.97}),
('cÂteԻpillÂr',
{'fuzz': 0.73,
'levenshtein': 0.97,
'jaro_winkler': 0.881,
'hamming': 0.97})]},
{'elementis': [('elEmENtis',
{'fuzz': 1.0, 'levenshtein': 1.0, 'jaro_winkler': 1.0, 'hamming': 1.0}),
('ÊlemĚntis',
{'fuzz': 0.78,
'levenshtein': 0.98,
'jaro_winkler': 0.863,
'hamming': 0.98}),
('еlÈmÈntis',
{'fuzz': 0.67,
'levenshtein': 0.97,
'jaro_winkler': 0.8333,
'hamming': 0.97})]},
{'gibson': [('giBᏚon',
{'fuzz': 0.83,
'levenshtein': 0.99,
'jaro_winkler': 0.9319,
'hamming': 0.99}),
('ɡibsoN',
{'fuzz': 0.83,
'levenshtein': 0.99,
'jaro_winkler': 0.9206,
'hamming': 0.99}),
('giЬႽon',
{'fuzz': 0.67,
'levenshtein': 0.98,
'jaro_winkler': 0.84,
'hamming': 0.98}),
('glbsՕn',
{'fuzz': 0.67,
'levenshtein': 0.98,
'jaro_winkler': 0.8333,
'hamming': 0.98})]}]
I want a df like this (note: 'Other Name' has differing number of values for each 'Orig Name':
Orig Name| Other Name| fuzz| levenstein| Jaro-Winkler| Hamming
------------------------------------------------------------------------
caterpillar Сatérpillar 0.82 0.98. 0.9192 0.98
caterpiⅼⅼaʀ 0.73 0.97 0.9114 0.97
cÂteԻpillÂr 0.73 0.97 0.881 0.97
gibson giBᏚon 0.83. 0.99 0.9319 0.99
ɡibsoN 0.83 0.99. 0.9206 0.99
giЬႽon 0.67. 0.98 0.84 0.98
glbsՕn 0.67. 0.98. 0.8333 0.98
elementis .........
--------------------------------------------------------------------------
I tried :
orig_name_list = [x for d in data for x, v in d.items()]
value_list = [v for d in data for x, v in d.items()]
other_names = [tup[0] for tup_list in value_list for tup in tup_list]
algos = ['fuzz', 'levenshtein', 'jaro_winkler', 'hamming']
Not sure how to proceed from there. Suggestions are appreciated.

Let's try concat:
pd.concat([pd.DataFrame([x[1]]).assign(OrigName=k, OtherName=x[0])
for df in data for k,d in df.items() for x in d])
Output:
fuzz levenshtein jaro_winkler hamming OrigName OtherName
0 0.82 0.98 0.9192 0.98 caterpillar Сatérpillar
0 0.73 0.97 0.9114 0.97 caterpillar caterpiⅼⅼaʀ
0 0.73 0.97 0.8810 0.97 caterpillar cÂteԻpillÂr
0 1.00 1.00 1.0000 1.00 elementis elEmENtis
0 0.78 0.98 0.8630 0.98 elementis ÊlemĚntis
0 0.67 0.97 0.8333 0.97 elementis еlÈmÈntis
0 0.83 0.99 0.9319 0.99 gibson giBᏚon
0 0.83 0.99 0.9206 0.99 gibson ɡibsoN
0 0.67 0.98 0.8400 0.98 gibson giЬႽon
0 0.67 0.98 0.8333 0.98 gibson glbsՕn

One way to do this is to reformat your data for json record consumption via the pd.json_normalize function. Your json is currently not formatted correctly to be stored into a dataframe easily:
new_data = []
for entry in data:
new_entry = {}
for name, matches in entry.items():
new_entry["name"] = name
new_entry["matches"] = []
for match in matches:
match[1]["match"] = match[0]
new_entry["matches"].append(match[1])
new_data.append(new_entry)
df = pd.json_normalize(new_data, "matches", ["name"]).set_index(["name", "match"])
print(df)
fuzz levenshtein jaro_winkler hamming
name match
caterpillar Сatérpillar 0.82 0.98 0.9192 0.98
caterpiⅼⅼaʀ 0.73 0.97 0.9114 0.97
cÂteԻpillÂr 0.73 0.97 0.8810 0.97
elementis elEmENtis 1.00 1.00 1.0000 1.00
ÊlemĚntis 0.78 0.98 0.8630 0.98
еlÈmÈntis 0.67 0.97 0.8333 0.97
gibson giBᏚon 0.83 0.99 0.9319 0.99
ɡibsoN 0.83 0.99 0.9206 0.99
giЬႽon 0.67 0.98 0.8400 0.98
glbsՕn 0.67 0.98 0.8333 0.98

How to calculate the aggregate variance in pivot table

when I use aggfunc = np.var in pivot table. I found the value of metrics became NaN. But when it comes to aggfunc = np.sum it doesn't.
why the original value was changed with aggfunc = np.var or aggfunc = np.std. I can not found answer in the docs. docs of pivot table
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
print(df.pivot_table(
index = ['A', 'B'],
values = ['D', 'E'],
columns = ['C'],
aggfunc= np.sum,
margins=True,
margins_name = 'sum',
dropna = False
))
print('-' * 100)
df = df.pivot_table(
index = ['A', 'B'],
values = ['D', 'E'],
columns = ['C'],
aggfunc= np.var,
margins=True,
margins_name = 'var',
dropna = False
)
print(df)
D E
C large small sum large small sum
A B
bar one 4.0 5.0 9 6.0 8.0 14
two 7.0 6.0 13 9.0 9.0 18
foo one 4.0 1.0 5 9.0 2.0 11
two NaN 6.0 6 NaN 11.0 11
sum 15.0 18.0 33 24.0 30.0 54
-----------------------------------------------------------------------
D E
C large small var large small var
A B
bar one NaN NaN 0.500000 NaN NaN 2.000000
two NaN NaN 0.500000 NaN NaN 0.000000
foo one 0.000000 NaN 0.333333 0.500000 NaN 2.333333
two NaN 0.0 0.000000 NaN 0.5 0.500000
var 5.583333 3.8 3.555556 4.666667 7.5 4.888889
what's more, I found the var of D = large is np.var([4.0, 7.0, 4.0]) = 2.0 instead of 5.583333.
what I expected is:
D E
C large small var large small var
A B
bar one 4.0 5.0 0.25 6.0 8.0 1.0
two 7.0 6.0 0.25 9.0 9.0 0
foo one 4.0 1.0 2.25 9.0 2.0 12.25
two NaN 6.0 0 NaN 11.0 0.0
var 2.0 4.25 3.6 2.0 11.25 7.34
What is the meaning of aggfunc = np.var in pivot table?

Pandas uses by default ddof = 1, see here for details on np.var.
When you have just one value, then the variance using ddof = 1 will be NaN as you try to divide by zero.
Var of D = large is np.var([2,2,4,7], ddof=1) = 5.583333333333333, so everything is correct (you'll have to use the individual values, not the sums).
If you need var with ddof = 0 then you can provide your own function:
def var0(x):
return np.var(x, ddof=0)
print(df.pivot_table(
index = ['A', 'B'],
values = ['D', 'E'],
columns = ['C'],
aggfunc= var0,
margins=True,
margins_name = 'var',
dropna = False
))
Result:
D E
C large small var large small var
A B
bar one 0.0000 0.00 0.250000 0.00 0.00 1.000000
two 0.0000 0.00 0.250000 0.00 0.00 0.000000
foo one 0.0000 0.00 0.222222 0.25 0.00 1.555556
two NaN 0.00 0.000000 NaN 0.25 0.250000
var 4.1875 3.04 3.555556 3.50 6.00 4.888889
UPDATE based on the edited question.
Pivot table with the sums of C and additionally the var of the sums as margin columns/row.
We first create the sum pivot table with margin columns/row named var. Then we updated these margin columns/row with the var of the sum table:
dfs = df.pivot_table(
index = ['A', 'B'],
values = ['D', 'E'],
columns = ['C'],
aggfunc= np.sum,
margins=True,
margins_name = 'var',
dropna = False)
dfs[[('D','var'),('E','var')]] = df.pivot_table(
index = ['A', 'B'],
values = ['D', 'E'],
columns = ['C'],
aggfunc= np.sum,
dropna = False).stack().groupby(level=(0,1)).apply(var0)
dfs.iloc[-1] = dfs.iloc[:-1].apply(var0)
Result:
D E
C large small var large small var
A B
bar one 4.0 5.00 0.250000 6.0 8.00 1.000000
two 7.0 6.00 0.250000 9.0 9.00 0.000000
foo one 4.0 1.00 2.250000 9.0 2.00 12.250000
two NaN 6.00 0.000000 NaN 11.00 0.000000
var 2.0 4.25 0.824219 2.0 11.25 26.792969
In the margin row (last row) the var columns are calculated as the var of the row vars. I don't understand how the OP calculated his values for these two cells. Anyway they don't seem to make much sense.

pandas error when reading file

I am new to jupyter and have been trying to plot using it only last week. I have huge excel sheets .csv format that I want to read and plot. I am doing it by converting the .csv to a .dat format.
My code looks like this
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
file1 = 'king2.dat'
file2 = 'iso.dat'
data1 = pd.read_csv(file1, delimiter='\s+', header=None, engine='python')
data1.columns = ['no_plt', 'Op_RA_plt', 'Op_DE_plt', 'Vmag_plt', 'B-
V_plt', '(B-V)o_plt', 'no_2_plt', 'FUV_mag_plt','FUV_magerr_plt',
'(FUV-V)_plt', '(V-I)_plt', '(FUV-I)_plt',
'no_op_2M','Op_RA_2M','Op_DE_2M','Vmag_2M','(B-V)o_2M',
'FUV_mag_2M','FUV-V_2M','no_2M','j_m_2M','h_m_2M','k_m_2M','j-h_2M',
'h-k_2M','j-k_2M','(V-I)_2M','(FUV-I)_2M',
'no_MS','Op_RA_MS','Op_DE_MS','Vmag_MS','(B-V)o_MS',
'FUV_mag_MS','FUV-V_MS','(V-I)_MS','(FUV-I)_MS',
'no_508','Op_RA_508','Op_DE_508','Vmag_508','(B-V)o_508',
'FUV_mag_508','FUV-V_508', '(V-I)_508','(FUV-I)_508',
'no_RG','Op_RA_RG','Op_DE_RG','Vmag_RG','(B-V)o_RG','FUV_mag_RG',
'FUV-V_RG','(V-I)_RG','(FUV-I)_RG','no_RG609','Op_RA_RG609',
'Op_DE_RG609','Vmag_RG609','(B-V)o_RG609','FUV_mag_RG609',
'FUV-V_RG609', '(V-I)_RG609', '(FUV-I)_RG609',
'no_TF621','Op_RA_TF621','Op_DE_TF621','Vmag_TF621','(B-V)o_TF621',
'FUV_mag_TF621','FUV-V_TF621','(V-I)_TF621','(FUV-I)_TF621',
'no_onBSS','Op_RA_onBSS','Op_DE_onBSS','Vmag_onBSS','(B-V)o_onBSS',
'FUV_mag_onBSS','FUV-V_onBSS','(V-I)_onBSs','(FUV-I)_onBSS',
'no_BSSreg','Op_RA_BSSreg','Op_DE_BSSreg','Vmag_BSSreg',
'(B-V)o_BSSreg','FUV_mag_BSSreg','FUV-V_BSSreg','(V-I)_BSSreg',
'(FUV-I)_BSSreg','no_BSSreg558', 'Op_RA_BSSreg558' , 'Op_DE_BSSreg558','Vmag_BSSreg558','(B-V)o_BSSreg558',
'FUV_mag_BSSreg558','FUV-V_BSSreg558','(V-I)_BSSreg558','(FUV-I)_BSSreg558','no_aMS','Op_RA_aMS',
'Op_DE_aMS','Vmag_aMS','(B-V)o_aMS','FUV_mag_aMS','FUV-V_aMS','(V-I)_aMS','(FUV-I)_aMS','no_bMS',
'Op_RA_bMS','Op_DE_bMS','Vmag_bMS','(B-V)o_bMS','FUV_mag_bMS','FUV-V_bMS','(V-I)_bMS',
'(FUV-I)_bMS','no _SED','Op_RA _SED','Op_DE _SED','Vmag _SED','(B-V)o _SED','FUV_mag _SED',
'FUV-V _SED','(V-I)_SED','(FUV-I)_SED']
data2 = pd.read_csv(file1, delimiter='\s+', header=None, engine='python')
data2.columns =['age','log(Z)','mass','logl','logt','logg',
'FUVCa_14.76','NUVB15_14.76', '(FUV-NUV)14.76','V14.76','B14.76',
'B-V14.76','(FUV-V)14.76','(NUV-V)14.76','GA NUV14.76',
'(Uf-Gn)14.76','I14.76','(V-I)14.76','(FUV-I)14.76']
def fit_data():
fig = plt.figure(1,figsize=(8,8))
plt.subplot(111)
plt.scatter(data1['(B-V)o_plt'], data1['Vmag_plt'], marker='.', color='r', s=5)
plt.scatter(data2['B-V14.76'], data2['V14.76'], marker='o', color='g', s=6)
plt.xlabel('RA_F',size=20)
plt.ylabel('DEC_F',size=20)
plt.gca().invert_xaxis()
plt.gca().invert_yaxis()
plt.show()
plt.close()
fit_data()
When I had sheets of 4 columns, it would read and plot without any errors. But if I increase the columns, it gives me the error:
ParserError Traceback (most recent call last)
<ipython-input-5-fe66f2a2aac9> in <module>()
13 data1.columns = ['age','FUVCa','NUVB15','(FUV-NUV)','V','B','B-V','(FUV-V)','(NUV-V)','I','(V-I)','(FUV-I)']
14
---> 15 data2 = pd.read_csv(file2, delimiter='\s+', header=None, engine='python')
16 data2.columns = ['no','Op_RA','Op_DE','Vmag','(B-V)o','FUV_mag','(FUV-V)','(V-I)','(FUV-I)', 'no_MS','Op_RA_MS','Op_DE_MS','Vmag_MS','(B-V)o_MS',
17 'FUV_mag_MS','FUV-V_MS','(V-I)_MS','(FUV-I)_MS','no_508','Op_RA_508','Op_DE_508','Vmag_508',
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
707 skip_blank_lines=skip_blank_lines)
708
--> 709 return _read(filepath_or_buffer, kwds)
710
711 parser_f.__name__ = name
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
453
454 try:
--> 455 data = parser.read(nrows)
456 finally:
457 parser.close()
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
1067 raise ValueError('skipfooter not supported for iteration')
1068
-> 1069 ret = self._engine.read(nrows)
1070
1071 if self.options.get('as_recarray'):
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, rows)
2261 content = content[1:]
2262
-> 2263 alldata = self._rows_to_cols(content)
2264 data = self._exclude_implicit_index(alldata)
2265
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _rows_to_cols(self, content)
2916 msg += '. ' + reason
2917
-> 2918 self._alert_malformed(msg, row_num + 1)
2919
2920 # see gh-13320
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _alert_malformed(self, msg, row_num)
2683
2684 if self.error_bad_lines:
-> 2685 raise ParserError(msg)
2686 elif self.warn_bad_lines:
2687 base = 'Skipping line {row_num}: '.format(row_num=row_num)
ParserError: Expected 30 fields in line 21, saw 45. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
I wasn't able to understand what it means. I don't know where I am going wrong or if I am giving too many columns for it to handle.
Line 20
508 12.76968 58.18559 18.97 0.96 0.65 1371 22.925 0.343 3.955 508 12.76968 58.18559 18.97 0.65 22.925 3.955 32 16.111 15.777 15.253 0.334 0.524 0.858 508 12.76968 58.18559 18.97 0.65 22.925 3.955 508 12.76968 58.18559 18.97 0.65 22.925 3.955
Line 21
508 12.76968 58.18559 18.97 0.96 0.65 1371 22.925 0.343 3.955 508 12.76968 58.18559 18.97 0.6522.925 3.955 32 16.111 15.777 15.253 0.334 0.524 0.858 508 12.76968 58.18559 18.97 0.65 22.925 3.955 508 12.76968 58.18559 18.97 0.65 22.925 3.955 508 12.76968 58.18559 18.97 0.65 22.925 3.955

How to round times in Xcode

I am struggling for days trying to solve this puzzle.
I have this code that calculates time IN & OUT as decimal hours: (6 min = 0.1 hr)~(60 min = 1.0 hr)
NSUInteger unitFlag = NSCalendarUnitHour | NSCalendarUnitMinute;
NSDateComponents *components = [calendar components:unitFlag
fromDate:self.outT
toDate:self.inT
options:0];
NSInteger hours = [components hour];
NSInteger minutes = [components minute];
if (minutes <0) (minutes -= 60*-1) && (hours -=1);
if (hours<0 && minutes<0)(hours +=24)&& (minutes -=60*-1);
if(hours<0 && minutes>0)(hours +=24)&& (minutes = minutes);
if(hours <0 && minutes == 00)(hours +=24)&&(minutes = minutes);
if(minutes >0)(minutes = (minutes/6));
self.blockDecimalLabel.text = [NSString stringWithFormat:#"%d.%d", (int)hours, (int)minutes];
The green lines show what the code does, what I am looking for is to round the minutes like the blue lines, 1,2 minutes round down to the next decimal hr, 3,4,5 minutes round up to the next decimal hr
What I am try to achieve is:
If the result is 11 minutes the code return 0.1 then only after 12 minutes it will return 0.2. What I am trying to do is if the result is 8 the code returns 01, but if it is 9 will round to the next decimal that is 0.2 and so on.The objective is do not loose maximum of 5 minutes in each multiple of 6 in worst cases. Doing this the maximum lost will be 3 minutes in average
Any input is more than welcome :)
Cheers

Your goals seem incoherent to me. However, I tried this:
let beh = NSDecimalNumberHandler(
roundingMode: .RoundPlain, scale: 1, raiseOnExactness: false,
raiseOnOverflow: false, raiseOnUnderflow: false, raiseOnDivideByZero: false
)
for t in 0...60 {
let div = Double(t)/60.0
let deci = NSDecimalNumber(double: div)
let deci2 = deci.decimalNumberByRoundingAccordingToBehavior(beh)
let result = deci2.doubleValue
println("min: \(t) deci: \(result)")
}
The output seems pretty much what you are asking for:
min: 0 deci: 0.0
min: 1 deci: 0.0
min: 2 deci: 0.0
min: 3 deci: 0.1
min: 4 deci: 0.1
min: 5 deci: 0.1
min: 6 deci: 0.1
min: 7 deci: 0.1
min: 8 deci: 0.1
min: 9 deci: 0.2
min: 10 deci: 0.2
min: 11 deci: 0.2
min: 12 deci: 0.2
min: 13 deci: 0.2
min: 14 deci: 0.2
min: 15 deci: 0.3
min: 16 deci: 0.3
min: 17 deci: 0.3
min: 18 deci: 0.3
min: 19 deci: 0.3
min: 20 deci: 0.3
min: 21 deci: 0.4
min: 22 deci: 0.4
min: 23 deci: 0.4
min: 24 deci: 0.4
min: 25 deci: 0.4
min: 26 deci: 0.4
min: 27 deci: 0.5
min: 28 deci: 0.5
min: 29 deci: 0.5
min: 30 deci: 0.5
min: 31 deci: 0.5
min: 32 deci: 0.5
min: 33 deci: 0.6
min: 34 deci: 0.6
min: 35 deci: 0.6
min: 36 deci: 0.6
min: 37 deci: 0.6
min: 38 deci: 0.6
min: 39 deci: 0.7
min: 40 deci: 0.7
min: 41 deci: 0.7
min: 42 deci: 0.7
min: 43 deci: 0.7
min: 44 deci: 0.7
min: 45 deci: 0.8
min: 46 deci: 0.8
min: 47 deci: 0.8
min: 48 deci: 0.8
min: 49 deci: 0.8
min: 50 deci: 0.8
min: 51 deci: 0.9
min: 52 deci: 0.9
min: 53 deci: 0.9
min: 54 deci: 0.9
min: 55 deci: 0.9
min: 56 deci: 0.9
min: 57 deci: 1.0
min: 58 deci: 1.0
min: 59 deci: 1.0
min: 60 deci: 1.0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Plot secondary x_axis in ggplot - ggplot2

Related

Apply transformation to masked dataframe

create multi-indexed dataframe

How to calculate the aggregate variance in pivot table

pandas error when reading file

How to round times in Xcode

Categories

Resources