Convert cell text in progressive number - sql

I have written this SQL in PostgreSQL environment:
SELECT
ST_X("Position4326") AS lon,
ST_Y("Position4326") AS lat,
"Values"[4] AS ppe,
"Values"[5] AS speed,
"Date" AS "timestamp",
"SourceId" AS smartphone,
"Track" as session
FROM
"SingleData"
WHERE
"OsmLineId" = 44792088
AND
array_length("Values", 1) > 4
AND
"Values"[5] > 0
ORDER BY smartphone, session;
Now I have imported the result in Matlab and I have six vectors and one cell (because the text from the UUIDs was converted in cell) all of 5710x1 size.
Now I would like convert the text in the cell, in a progressive number, like 1, 2, 3... for each different session code.
In Excel it is easy with FIND.VERT(obj, matrix, col), but I do not know how do it in Matlab.
Now I have a big cell with a lot of codes like:
ff95465f-0593-43cb-b400-7d32942023e1
I would like convert this cell in an array of numbers where at the first occurrence of
ff95465f-0593-43cb-b400-7d32942023e1 -> 1
and so on. And you put 2 when a different code appear, and so on.

OK, I have solve.
I put the single session code in a second cell C.
At this point, with a for loop, I obtain:
%% Converting of the UUIDs into integer
C = unique(session);
N = length(session);
session2 = zeros(N, 1);
for i = 1:N
session2(i) = find(strcmp(C, session(i)));
end
Thanks to all!

Related

Pinescript index range

Is there a way to code an index range in pinescript? For example, If I want to include all close values between 10 bars and 5 bars ago. Everything between close[10] and close[5].
In python this would be close[5:10] but I cannot find any literature discussing a range of indexes.
thanks!
You could code a function to do that, but to be clear [] has a specific meaning in Pinescript as a history-referencing operator. I think what you are asking for is a way to construct an array of values from a series based on indicies.
This would work if you're use float values like OHLC
//#version=4
study("My Script")
range(_src, _a, _b) =>
_arr = array.new_float(0)
for i = _a to _b - 1
array.push(_arr, _src[i])
_arr
someCloses = range(close, 5, 10)
plot(array.size(someCloses))
But with this you are converting your data to a different type. So make sure to look at the available array functions.

Find a column that contains a specific string from another column

I have 2 data frames. One called cuartos (rooms in English) and another called paredes (walls in English) They have room temperatures and walls temperatures. I want to create a new data frame with the temperature difference between each wall and its room. For example
Room name = 2_APTO_1
Walls of the room = 2_APTO_1.FACE2, 2_APTO_1.FACE3 and 2_APTO_1.FACE4
The new data frame should be something like
2_APTO_1.FACE2 = 2_APTO_1.FACE2 - 2_APTO_1
2_APTO_1.FACE3 = 2_APTO_1.FACE3 - 2_APTO_1
2_APTO_1.FACE4 = 2_APTO_1.FACE4 - 2_APTO_1 ....
I tried this:
get a list of paredes and cuartos columns
col_names_paredes= paredes.columns.tolist()
col_names_cuartos= cuartos.columns.tolist()
Check if col_names_paredes has col_names_cuartos names in it
for i in col_names_cuartos:
for k in col_names_paredes:
if col_names_paredes[k] in col_names_cuartos[i]:
print(k)
I got this error
TypeError: list indices must be integers or slices, not str
any help would be appreciated.
When you do for i in col_names_cuartos, i will take column names values, not indice values that you would obtain with for i in range(len(col_names_cuartos)).
So you can use the following code instead :
for col_cuartos in col_names_cuartos:
for col_paredes in col_names_paredes:
if col_paredes in col_cuartos:
print(col_paredes)

Access graph get series on 2 database fields

I have a technical issue with access graphs: I have a table in Access database with 4 fields: xValue, yValue, round, partOfRound
What I want: there are always 2 rounds, each round has 2 parts. I need to get a series per round per part (so from round 1 part 1, round 1 part 2, round 2 part 1, round 2 part 2) with all xValues and yValues in a chart.
But then I have an other problem:The xValue isn't a good number to show, this is needing to be this number divided by a number from an other table (see this as number in table3) where the row of table 3 equels the identifier with the identifier I use for my chart. (IDtable2=IDtable3)
The final result will be 4 lines with the data in my graph, so 4 series.
But when I use the wizard for making graphs, I can only set 1 field to the series value, so it will see a round as just 1 series instead of 2.
How do I solve this problem?
Kind regards
Kristof
What type of graph - just a column?
Concatenate the round and partOfRound fields.
Try changing the graph RowSource to:
TRANSFORM Sum(Table2.yValue) AS SumOfyValue SELECT Table2.xValue FROM Table2 GROUP BY Table2.xValue PIVOT [round] & "_" & [partOfRound];
Possible SQL to include table join to calculate the division:
TRANSFORM Sum(Table2.yValue) AS SumOfyValue
SELECT Round([xValue]/[Factor],0) AS x
FROM Table3 INNER JOIN Table2 ON Table3.PK_Table3 = Table2.FK_Table3
GROUP BY Round([xValue]/[Factor],0)
PIVOT [round] & "_" & [partOfRound];
For both queries, I had to open the graph editor (double click the graph) and from the menu click on "By Column" button to get the x values on the x axis.
I do hope round is not an actual name as it is a reserved word and should not use reserved words as names for anything.

Apply function with pandas dataframe - POS tagger computation time

I'm very confused on the apply function for pandas. I have a big dataframe where one column is a column of strings. I'm then using a function to count part-of-speech occurrences. I'm just not sure the way of setting up my apply statement or my function.
def noun_count(row):
x = tagger(df['string'][row].split())
# array flattening and filtering out all but nouns, then summing them
return num
So basically I have a function similar to the above where I use a POS tagger on a column that outputs a single number (number of nouns). I may possibly rewrite it to output multiple numbers for different parts of speech, but I can't wrap my head around apply.
I'm pretty sure I don't really have either part arranged correctly. For instance, I can run noun_count[row] and get the correct value for any index but I can't figure out how to make it work with apply how I have it set up. Basically I don't know how to pass the row value to the function within the apply statement.
df['num_nouns'] = df.apply(noun_count(??),1)
Sorry this question is all over the place. So what can I do to get a simple result like
string num_nouns
0 'cat' 1
1 'two cats' 1
EDIT:
So I've managed to get something working by using list comprehension (someone posted an answer, but they've deleted it).
df['string'].apply(lambda row: noun_count(row),1)
which required an adjustment to my function:
def tagger_nouns(x):
list_of_lists = st.tag(x.split())
flat = [y for z in list_of_lists for y in z]
Parts_of_speech = [row[1] for row in flattened]
c = Counter(Parts_of_speech)
nouns = c['NN']+c['NNS']+c['NNP']+c['NNPS']
return nouns
I'm using the Stanford tagger, but I have a big problem with computation time, and I'm using the left 3 words model. I'm noticing that it's calling the .jar file again and again (java keeps opening and closing in the task manager) and maybe that's unavoidable, but it's really taking far too long to run. Any way I can speed it up?
I don't know what 'tagger' is but here's a simple example with a word count that ought to work more or less the same way:
f = lambda x: len(x.split())
df['num_words'] = df['string'].apply(f)
string num_words
0 'cat' 1
1 'two cats' 2

Dataframe non-null values differ from value_counts() values

There is an inconsistency with dataframes that I cant explain. In the following, I'm not looking for a workaround (already found one) but an explanation of what is going on under the hood and how it explains the output.
One of my colleagues which I talked into using python and pandas, has a dataframe "data" with 12,000 rows.
"data" has a column "length" that contains numbers from 0 to 20. she wants to divided the dateframe into groups by length range: 0 to 9 in group 1, 9 to 14 in group 2, 15 and more in group 3. her solution was to add another column, "group", and fill it with the appropriate values. she wrote the following code:
data['group'] = np.nan
mask = data['length'] < 10;
data['group'][mask] = 1;
mask2 = (data['length'] > 9) & (data['phraseLength'] < 15);
data['group'][mask2] = 2;
mask3 = data['length'] > 14;
data['group'][mask3] = 3;
This code is not good, of course. the reason it is not good is because you dont know in run time whether data['group'][mask3], for example, will be a view and thus actually change the dataframe, or it will be a copy and thus the dataframe would remain unchanged. It took me quit sometime to explain it to her, since she argued correctly that she is doing an assignment, not a selection, so the operation should always return a view.
But that was not the strange part. the part the even I couldn't understand is this:
After performing this set of operation, we verified that the assignment took place in two different ways:
By typing data in the console and examining the dataframe summary. It told us we had a few thousand of null values. The number of null values was the same as the size of mask3 so we assumed the last assignment was made on a copy and not on a view.
By typing data.group.value_counts(). That returned 3 values: 1,2 and 3 (surprise) we then typed data.group.value_counts.sum() and it summed up to 12,000!
So by method 2, the group column contained no null values and all the values we wanted it to have. But by method 1 - it didnt!
Can anyone explain this?
see docs here.
You dont' want to set values this way for exactly the reason you pointed; since you don't know if its a view, you don't know that you are actually changing the data. 0.13 will raise/warn that you are attempting to do this, but easiest/best to just access like:
data.loc[mask3,'group'] = 3
which will guarantee you inplace setitem