Split nested column in random order? - sql

I am trying to split a column in Redshift that contains blocks of strings in a random order and random lengths.
Say I am trying to get columns of x, y and z they're listed like below.
x,a,b,y,e,h,z
z,g,a,n,x
e,b,q,b,z,x,f,y
How can I do this?

Related

string representation of DataFrame with max width and ellipsized columns without terminal

I have a pandas.DataFrame which I would like to represent as string (not in Jupyter, not in IPython) with limited width (for later terminal output), without line wrapping (one value per output line) and with ellipses for excess columns in the middle. This is similar what Pandas does when printing to terminal. Is there a function for that? DataFrame.to_string lets me only wrap excess lines (with line_width) but I don't see a way to insert the ellipsis automatically.
If I understand your correctly, you could just do:
print(str(df))
But if you would like to specify n rows and n columns, pd.DataFrame.to_string has arguments for that:
print(df.to_string(max_rows=10, max_cols=10))
This would only display 10 columns (5 columns and ellipsis then another 5 columns), and 10 rows (5 rows and ellipsis then another 5 rows).

Groupby Get Group For Loop

I have a dataframe that I need to subset by the column measure name.
Measure_Group=measures.groupby('Measure'). I can get.group() like this CDC=Measure_Group.get_group('CDC') , but I have over 20 measures to subset. Is there a for loop or lambda function that I can use with the group by to subset all 20 column names with just one iteration instead of using the get.group multiple times

Can I use dataframes as Input for functions?

I am currently trying to find optimal portfolio weights by optimizing a utility function that depends on those weights. I have a dataframe of containing the time series of returns, named rets_optns. rets_optns has 100 groups of 8 assets (800 columns - 1st group column 1 to 8, 2nd group column 9 to 16). I also have a dataframe named rf_options with 100 columns that present the corresponding risk free rate for each group of returns. I want to create a new dataframe composed by the portfolio's returns, using this formula: p. returns= rf_optns+sum(weights*rets_optns). It should have 100 columns and each columns should represent the returns of a portfolio composed by 8 assets belonging to the same group. I currently have:
def pret(rf,weights,rets):
return rf+np.sum(weights*(rets-rf))
It does not work

Fastest way to find two minimum values in each column of NumPy array

If I want to find the minimum value in each column of a NumPy array, I can use the numpy.amin() function. However, is there a way to find the two minimum values in each column, that is faster than sorting each column?
You can simply use np.partition along the columns to get smallest N numbers -
N = 2
np.partition(a,kth=N-1,axis=0)[:N]
This doesn't actually sort the entire data, simply partitions into two sections such that smallest N numbers are in the first section, also called as partial-sort.
Bonus (Getting top N elements) : Similarly, to get the top N numbers per col, simply use negative kth value -
np.partition(a,kth=-N,axis=0)[-N:]
Along other axes and higher dim arrays
To use it along other axes, change the axis value. So, along rows, it would be axis=1 for a 2D array and extend the same way for higher dimension ndarrays.
Use the min() method, and specify the axis you want to average over:
a = np.random.rand(10,3)
a.min(axis=0)
gives:
array([0.04435587, 0.00163139, 0.06327353])
a.min(axis=1)
gives
array([0.01354386, 0.08996586, 0.19332211, 0.00163139, 0.55650945,
0.08409907, 0.23015718, 0.31463493, 0.49117553, 0.53646868])

How can I compare two sets of data having two columns in excel? Picture below will elaborate

Below are two sets of data. Each has two columns. I want that that the similar data comes in front of each other.
This is a manual solution with formulas and sorting.
Imagine the following data in columns A to E:
Enter the following formulas into columns G to K
Column G: =IFERROR(IF(VLOOKUP(D:D,A:B,2,FALSE)=E:E,1,2),3)
Column H: =IF(G:G<3,D:D,"")
Column I: =IFERROR(VLOOKUP(H:H,A:B,2,FALSE),"")
Column J: =D:D
Column K: =IFERROR(VLOOKUP(J:J,D:E,2,FALSE),"")
The column G sort by now shows:
1 if part and quantity matched
2 if only part matched
3 if nothing matched
So if you now select data from A3:K10 and sort by column G (sort by) then it will result in this: