How to access an element in array vector using metaprogramming? - arraylist

Here is a table t. The data form of column arr1 is array vector.
arr1=array(DOUBLE[], 0, 10).append!([2 3 4, 4 5 7, 7 9 10])
t = table(1..3 as id, arr1, rand(100, 3) as value)
I can use a SQL statement to query for the first element in column arr1, i.e., arr1[0].
select arr1[0] from t
Output:
arr1_at
2
4
7
Now I want to query using metaprogramming.
sql(select = sqlCol('arr1[0]') ,from =t).eval()
But an error was raised as follows:
Server response: 'Unrecognized column name arr1[0]

Try the following two lines:
sql(select=sqlColAlias(<arr1[0]>,"arr1_0"), from=t).eval()
sql(select=sqlColAlias(makeCall(at, sqlCol("arr1"), 0), "arr1_0"), from=t).eval()
Output:
arr1_0
2
4
7
The first line uses the metacode <arr1[0]>.
The second line uses function makeCall to call the at function to get the value at the 0-th position in column arr1 and thus obtain the new column arr1_0.

Related

Conditional replacement of values in a matrix in DolphinDB

For example, matrix A = matrix(1 2 3, 4 5 6, 7 8 9).
How can I replace all the values smaller than 5 with a specified value? The desired output is(5 5 5, 5 5 6, 7 8 9)
You can obtain the result in three different ways in DolphinDB. See the following codeļ¼š
Method 1:
Use function iif
iif(a<5, 5, a)
Method 2:
Use conditional expressions
(a<5) *5 + (a>=5) * a
Method 3:
Use user-defined functions
m=each(def(mutable x){x[x<5]=5;return x},a)

Select column with the most unique values from csv, python

I'm trying to come up with a way to select from a csv file the one numeric column that shows the most unique values. If there are multiple with the same amount of unique values it should be the left-most one. The output should be either the name of the column or the index.
Position,Experience in Years,Salary,Starting Date,Floor,Room
Middle Management,5,5584.10,2019-02-03,12,100
Lower Management,2,3925.52,2016-04-18,12,100
Upper Management,1,7174.46,2019-01-02,10,200
Middle Management,5,5461.25,2018-02-02,14,300
Middle Management,7,7471.43,2017-09-09,17,400
Upper Management,10,12021.31,2020-01-01,11,500
Lower Management,2,2921.92,2019-08-17,11,500
Middle Management,5,5932.94,2017-11-21,15,600
Upper Management,7,10192.14,2018-08-18,18,700
So here I would want 'Floor' or 4 as my output given that Floor and Room have the same amount of unique values but Floor is the left-most one (I need it in pure python, i can't use pandas)
I have this nested in a whole bunch of other code for what I need to do as a whole, i will spare you the details but these are the used elements in the code:
new_types_list = [str, int, str, datetime.datetime, int, int] #all the datatypes of the columns
l1_listed = ['Position', 'Experience in Years', 'Salary', 'Starting Date', 'Floor', 'Room'] #the header for each column
difference = [3, 5, 9, 9, 6, 7] #is basically the amount of unique values each column has
And here I try to do exactly what I mentioned before:
another_list = [] #now i create another list
for i in new_types_list: # this is where the error occurs, it only fills the list with the index of the first integer 3 times instead of with the individual indices
if i== int:
another_list.append(new_types_list.index(i))
integer_listi = [difference[i] for i in another_list] #and this list is the corresponding unique values from the integers
for i in difference: #now we want to find out the one that is the highest
if i== max(integer_listi):
chosen_one_i = difference.index(i) #the index of the column with the most unique values is the chosen one -
MUV_LMNC = l1_listed[chosen_one_i]
```
You can use .nunique() to get number of unique in each column:
df = pd.read_csv("your_file.csv")
print(df.nunique())
Prints:
Position 3
Experience in Years 5
Salary 9
Starting Date 9
Floor 7
Room 7
dtype: int64
Then to find max, use .idxmax():
print(df.nunique().idxmax())
Prints:
Salary
EDIT: To select only integer columns:
print(df.loc[:, df.dtypes == np.integer].nunique().idxmax())
Prints:
Floor

How to combine certain column values together in Python and make values in the other column be the means of the values combined?

I have a Panda dataframe where one of the columns is a sequence of numbers('sequence')many of them repeating and the other column values('binary variable') are either 1 or 0.
I have grouped by the values in the sequences column which are the same and made the column values in the binary variable be the % of entries which are non-zero in that group.
I now want to combine entries in the 'sequence' column with the same values together and make the column values in 'binary variable' the mean of the column values of those columns that that were combined.
So my data frame looks like this:
df = pd.DataFrame([{'sequence' : [1, 1, 4,4,4 ,6], 'binary variable' : [1,0,0,1,0,1]}).
I have then used this code to group together the same values in sequence. Using this code:
df.groupby(["sequence"]).apply(lambda 'binary variable': (binary variable!= 0).sum() / binary variable.count()*100 )
I am left with the sequence columns with non-repeating values and the binary variable column now being the percentage of non zeros
.
But now I want to group some of the column values together(so for this toy example the 1 and 4 values), and have the binary variable column have values which are the mean of the percentages of say the values for 1 and 4.
This isn't terribly well worded as finding it awkward to describe it but any help would be much appreciated, I've tried to look online and had many failed attempts with code of my own but it just is not working.
Any help would be greatly appreciated
It seems like you want to group the table twice and take the mean each time. For the second grouping, you need to create a new column to indicate the group.
Try this code:
import pandas as pd
# sequence groups for final average
grps = {(1,4):[1,4],
(5,6):[5,6]}
# initial data
df = pd.DataFrame({'sequence' : [1,1,4,4,4,5,5,6], 'binvar' : [1,0,0,1,0,1,0,1]})
gb = df.groupby(["sequence"])['binvar'].mean().reset_index() #.apply(lambda 'binary variable': (binary variable!= 0).sum() / binary variable.count()*100 )
def getgrp(x): # search groups
for k in grps:
if x in grps[k]:
return k
print(df.to_string(index=False))
gb['group'] = gb.apply(lambda r: getgrp(r[0]), axis = 1)
gb = gb.reset_index()
print(gb.to_string(index=False))
gb = gb[['group','binvar']].groupby("group")['binvar'].mean().reset_index()
print(gb.to_string(index=False))
Output
sequence binvar
1 1
1 0
4 0
4 1
4 0
5 1
5 0
6 1
index sequence binvar group
0 1 0.500000 (1, 4)
1 4 0.333333 (1, 4)
2 5 0.500000 (5, 6)
3 6 1.000000 (5, 6)
group binvar
(1, 4) 0.416667
(5, 6) 0.750000

AttributeError: 'int' object has no attribute 'count' while using itertuples() method with dataframes

I am trying to iterate over rows in a Pandas Dataframe using the itertuples()-method, which works quite fine for my case. Now i want to check if a specific value ('x') is in a specific tuple. I used the count() method for that, as i need to use the number of occurences of x later.
The weird part is, for some Tuples that works just fine (i.e. in my case (namedtuple[7].count('x')) + (namedtuple[8].count('x')) ), but for some (i.e. namedtuple[9].count('x')) i get an AttributeError: 'int' object has no attribute 'count'
Would appreciate your help very much!
Apparently, some columns of your DataFrame are of object type (actually a string)
and some of them are of int type (more generally - numbers).
To count occurrences of x in each row, you should:
Apply a function to each row which:
checks whether the type of the current element is str,
if it is, return count('x'),
if not, return 0 (don't attempt to look for x in a number).
So far this function returns a Series, with a number of x in each column
(separately), so to compute the total for the whole row, this Series should
be summed.
Example of working code:
Test DataFrame:
C1 C2 C3
0 axxv bxy 10
1 vx cy 20
2 vv vx 30
Code:
for ind, row in df.iterrows():
print(ind, row.apply(lambda it:
it.count('x') if type(it).__name__ == 'str' else 0).sum())
(in my opinion, iterrows is more convenient here).
The result is:
0 3
1 1
2 1
So as you can see, it is possible to count occurrences of x,
even when some columns are not strings.

Multiple for loop variable in vba?

I have the following code in VBA:
For n = 1 To 15
Cells(n, 8) = Application.Combin(2 * n, n)
next n
I want the n in the cells(n,8) to have an incerement 2, so the code skips a line after each entry.
Is it possible to have an other increment variable in this same loop that jumps 2 at once?
Thanks in advance!
EDIT: after reading the comment: I think what is needed is a counter to count, 1,2,3,4,5,6...15, and another one to count 1,3,5,7...15
For that, here is what is need to be done:
basically, you want the first iterator to be a normal counter,
and the second iterator to be odd numbers only.
So here is a simply input output table
input output
----- -----
1 1
2 3
3 5
4 7
5 9
6 11
From the above, we can deduce the formula needed to convert the input into the desired output: output = (input x 2) -1
And so, we can re-write our for loop to be like so:
For n=1 to 15
Cells(n,8) = Application.Combin(2*n-1,n)
Next
============= End of Edit =========================
Simply, use the keyword STEP in the for loop
For n = 1 To 15 STEP 2 'STEP can also be negative
'but you have to reverse the starting, and endin
'value of the iterator
The values for n will be: 1, 3, 5, 7, 9, 11 , 13, 15
Alternatively, use a local variable inside the for loop for that purpose (in-case you want the loop to execute 15 times)
For n=1 to 15
i = n + 1
Cells(i,8) = Application.Combine(2*n,n)
Next