How to create a set to whose elements were added the first 4 letters of another set and the first 2 letters of the elements of another set? - gams-math

How do I create a set to whose elements were added the first 4 letters of another set and the first 2 letters of the elements of another set?
For example, I have 2 sets: one is composed of years and the other of quarters:
SET TY /2019`*`2040/;
`SET TQ /Q1`*`Q4/;
And I need another set that is the Cartesian product of both, but in a single value for each element:
SET T /2019Q1*2019Q4,2020Q1*2020Q4… 2040Q1*2040Q4/;
In Stata for example, I would do this:
Global year “2019 2020 2021 … 2040 “
Global quarter “Q1 Q2 Q3 Q4”
Foreach y of global year {
Foreach q of global quarter{
Global T = ‘y’ ‘q’
…
}}
How can I did this in GAMS?

If you really need this new one dimensional set, that has no logical relation to TY and TQ, you could do it like this:
SET TY /2019*2040/;
SET TQ /Q1*Q4/;
Set T(*);
$onEmbeddedCode Python:
t = []
for ty in gams.get('TY'):
for tq in gams.get('TQ'):
t.append(ty + tq)
gams.set('T',t)
$offEmbeddedCode T
display T;
However, I do not know your use case of course, but maybe something like this could make more sense for you:
SET TY /2019*2040/;
SET TQ /Q1*Q4/;
Set T(TY,TQ) /#TY.#TQ/;
display T;

Related

get prefix out a size range with different size formats

I have column in a df with a size range with different sizeformats.
artikelkleurnummer size
6725 0161810ZWA B080
6726 0161810ZWA B085
6727 0161810ZWA B090
6728 0161810ZWA B095
6729 0161810ZWA B100
in the sizerange are also these other size formats like XS - XXL, 36-50 , 36/38 - 52/54, ONE, XS/S - XL/XXL, 363-545
I have tried to get the prefix '0' out of all sizes with start with a letter in range (A:K). For exemple: Want to change B080 into B80. B100 stays B100.
steps:
1 look for items in column ['size'] with first letter of string in range (A:K),
2 if True change second position in string into ''
for range I use:
from string import ascii_letters
def range_alpha(start_letter, end_letter):
return ascii_letters[ascii_letters.index(start_letter):ascii_letters.index(end_letter) + 1]
then I've tried a for loop
for items in df['size']:
if df.loc[df['size'].str[0] in range_alpha('A','K'):
df.loc[df['size'].str[1] == ''
message
SyntaxError: unexpected EOF while parsing
what's wrong?
You can do it with regex and the pd.Series.str.replace -
df = pd.DataFrame([['0161810ZWA']*5, ['B080', 'B085', 'B090', 'B095', 'B100']]).T
df.columns = "artikelkleurnummer size".split()
replacement = lambda mpat: ''.join(g for g in mpat.groups() if mpat.groups().index(g) != 1)
df['size_cleaned'] = df['size'].str.replace(r'([a-kA-K])(0*)(\d+)', replacement)
Output
artikelkleurnummer size size_cleaned
0 0161810ZWA B080 B80
1 0161810ZWA B085 B85
2 0161810ZWA B090 B90
3 0161810ZWA B095 B95
4 0161810ZWA B100 B100
TL;DR
Find a pattern "LetterZeroDigits" and change it to "LetterDigits" using a regular expression.
Slightly longer explanation
Regexes are very handy but also hard. In the solution above, we are trying to find the pattern of interest and then replace it. In our case, the pattern of interest is made of 3 parts -
A letter in from A-K
Zero or more 0's
Some more digits
In regex terms - this can be written as r'([a-kA-K])(0*)(\d+)'. Note that the 3 brackets make up the 3 parts - they are called groups. It might make a little or no sense depending on how exposed you have been to regexes in the past - but you can get it from any introduction to regexes online.
Once we have the parts, what we want to do is retain everything else except part-2, which is the 0s.
The pd.Series.str.replace documentation has the details on the replacement portion. In essence replacement is a function that takes all the matching groups as the input and produces an output.
In the first part - where we identified three groups or parts. These groups are accessed with the mpat.groups() function - which returns a tuple containing the match for each group. We want to reconstruct a string with the middle part excluded, which is what the replacement function does
sizes = [{"size": "B080"},{"size": "B085"},{"size": "B090"},{"size": "B095"},{"size": "B100"}]
def range_char(start, stop):
return (chr(n) for n in range(ord(start), ord(stop) + 1))
for s in sizes:
if s['size'][0].upper() in range_char("A", "K"):
s['size'] = s['size'][0]+s['size'][1:].lstrip('0')
print(sizes)
Using a List/Dict here for example.

how to sum rows in my dataframe Pandas with specific condition?

Could anyone help me ?
I want to sum the values with the format:
print (...+....+)
for example:
a b
France 2
Italie 15
Croatie 7
I want to make the sum of France and Croatie.
Thank you for your help !
One of possible solutions:
set column a as the index,
using loc select rows for the "wanted" values,
take column b,
sum the values found.
So the code can be:
result = df.set_index('a').loc[['France', 'Croatie']].b.sum()
Note double square brackets. The outer pair is the "container" of index values
passed to loc.
The inner part, and what is inside, is a list of values.
To subtract two sums (one for some set of countries and the second for another set),
you can run e.g.:
wrk = df.set_index('a').b
result = wrk.loc[['Italie', 'USA']].sum() - wrk.loc[['France', 'Croatie']].sum()

assign value at specific list index in robot framework

I have a list:
#{IFUP} 10 20
I want to modify only one of those values, e.g:
${IFUP}[${idx}]= Set Variable 30
where $idx is 0
This produces No keyword with name ''${IFUP}[${idx}]='. Same happens with direct ${IFUP}[0] assignment.
RobotFramwork version is 3.1 (list syntax is a bit different).
I would be happy with a variable variable solution like: ${IFUP_${idx}}= but this produces the same error.
Suggestions?
You can use the 'Set List Value' keyword from Collections library.
In your case, it should be
Set List Value ${IFUP} 0 30
http://robotframework.org/robotframework/latest/libraries/Collections.html#Set%20List%20Value
Please check below code:
List_at_place_change
#{IFUP} Create List 10 20
Log ${IFUP[0]}
Set List Value ${IFUP} 0 30
Log ${IFUP}

How to make a new variable based on 30 other variables

I have 30 variables on family history of cancer i.e. breast cancer father, breast cancer mother, breast cancer sister etc. I would like to make a new variable and give it a value of "1" if in one of my columns there is a 1.
Thus:
I have 30 variables with answers 1 to 3; 1 is yes, 2 is no and, 3 is unknown if one of the 30 variables is given a 1 I would like my new variable to take on the value 1.
Does someone know how I can do this?
You can create a list instead of separate 30 variables and then filter it out to create a new variable. This will make it more dynamic.
// This will be the cancer history for a single family
var cancerHistory = [];
// Add dummy data
cancerHistory.push('yes');
cancerHistory.push('no')
cancerHistory.push('unknown');
cancerHistory.push('no');
// Check if at least one of them is "yes"
var hasHistoryOfCancer = cancerHistory.indexOf('yes') > -1;
alert(hasHistoryOfCancer); // true
You can use a for loop. You did not mention the language so I am writing the code in Python which is easy to understand. If you want it in other language you can use the similar approach and apply it
import pandas as pd
new_var = []
df = pd.read_csv("DataFile.csv") # Convert data file to csv and put name it.
for i in range(len(df)):
x = [df['column1'][i], df['column2'][i] ...., df['column30'][i]]
if (1 in x): new_var.append(1)
else: new_var.append(0)
df['new_var'] = new_var
df.to_csv('NewDataFile.csv', sep=',', encoding='utf-8')

Apply function with pandas dataframe - POS tagger computation time

I'm very confused on the apply function for pandas. I have a big dataframe where one column is a column of strings. I'm then using a function to count part-of-speech occurrences. I'm just not sure the way of setting up my apply statement or my function.
def noun_count(row):
x = tagger(df['string'][row].split())
# array flattening and filtering out all but nouns, then summing them
return num
So basically I have a function similar to the above where I use a POS tagger on a column that outputs a single number (number of nouns). I may possibly rewrite it to output multiple numbers for different parts of speech, but I can't wrap my head around apply.
I'm pretty sure I don't really have either part arranged correctly. For instance, I can run noun_count[row] and get the correct value for any index but I can't figure out how to make it work with apply how I have it set up. Basically I don't know how to pass the row value to the function within the apply statement.
df['num_nouns'] = df.apply(noun_count(??),1)
Sorry this question is all over the place. So what can I do to get a simple result like
string num_nouns
0 'cat' 1
1 'two cats' 1
EDIT:
So I've managed to get something working by using list comprehension (someone posted an answer, but they've deleted it).
df['string'].apply(lambda row: noun_count(row),1)
which required an adjustment to my function:
def tagger_nouns(x):
list_of_lists = st.tag(x.split())
flat = [y for z in list_of_lists for y in z]
Parts_of_speech = [row[1] for row in flattened]
c = Counter(Parts_of_speech)
nouns = c['NN']+c['NNS']+c['NNP']+c['NNPS']
return nouns
I'm using the Stanford tagger, but I have a big problem with computation time, and I'm using the left 3 words model. I'm noticing that it's calling the .jar file again and again (java keeps opening and closing in the task manager) and maybe that's unavoidable, but it's really taking far too long to run. Any way I can speed it up?
I don't know what 'tagger' is but here's a simple example with a word count that ought to work more or less the same way:
f = lambda x: len(x.split())
df['num_words'] = df['string'].apply(f)
string num_words
0 'cat' 1
1 'two cats' 2