Getting a set of a list from a list of strings - pandas

I have this dataframe:
df = pd.DataFrame({"c1":["[\"text\",\"text2\"]","[\"bla\",\"bla\",\"bla\"]"]})
and I'm removind [] and "" :
df["c2"] = df["c1"].apply(lambda x:re.sub('[\["\]]', "", x))
then I want to add df['c2'] to a list:
list = df['c2'].to_list()
Then I get this: ['text,text2', 'bla,bla,bla']
So far so good. But then I want a list with only unique values, what I could to using set(list).
The proble is that Instead of ['text,text2', 'bla,bla,bla'] I needed to get ['text','text2', 'bla','bla','bla'] so when I apply `set(list) I would get what I am expecting:
['text','text2','bla']

First, don't use list as a variable. Second, once you get ['text,text2',...] you can use str.split. So your set would be
{y for x in df['c2'].str.split(',') for y in x}
Output:
{'bla', 'text', 'text2'}
Note: You can use regex directly to extract all patterns between the \":
set(df['c1'].str.extractall('\"([^"]+)\"')[0])

Try this:
new = []
for l in list:
new.extend(l.split(',') )
new = list(set(new))
which results in new to be
['text2', 'text', 'bla']

Related

Swapping characters in Strings Python

I have been trying to make something like an encoder:
here is my idea
dict = {
1: "!",
2: "#"
}
in = 21 # Input number in
out = ?
print(out) # Returns "#!"
Is there any way I could perform this?
What you want is exactly the translate function of str:
x="12"
y="!#"
in=12
txt=str(in)
mapping = txt.maketrans(x, y)
out=txt.translate(mapping)
You can check the complete reference here.

Compact way to save JuMP optimization results in DataFrames

I would like to save all my variables and dual variables of my finished lp-optimization in an efficient manner. My current solution works, but is neither elegant nor suited for larger optimization programs with many variables and constraints because I define and push! every single variable into DataFrames separately. Is there a way to iterate through the variables using all_variables() and all_constraints() for the duals? While iterating, I would like to push the results into DataFrames with the variable index name as columns and save the DataFrame in a Dict().
A conceptual example would be for variables:
Result_vars = Dict()
for vari in all_variables(Model)
Resul_vars["vari"] = DataFrame(data=[indexval(vari),value(vari)],columns=[index(vari),"Value"])
end
An example of the appearance of the declared variable in JuMP and DataFrame:
#variable(Model, p[t=s_time,n=s_n,m=s_m], lower_bound=0,base_name="Expected production")
And Result_vars[p] shall approximately look like:
t,n,m,Value
1,1,1,50
2,1,1,60
3,1,1,145
Presumably, you could go something like:
x = all_variables(model)
DataFrame(
name = variable_name.(x),
Value = value.(x),
)
If you want some structure more complicated, you need to write custom code.
T, N, M, primal_solution = [], [], [], []
for t in s_time, n in s_n, m in s_m
push!(T, t)
push!(N, n)
push!(M, m)
push!(primal_solution, value(p[t, n, m]))
end
DataFrame(t = T, n = N, m = M, Value = primal_solution)
See here for constraints: https://jump.dev/JuMP.jl/stable/constraints/#Accessing-constraints-from-a-model-1. You want something like:
for (F, S) in list_of_constraint_types(model)
for con in all_constraints(model, F, S)
#show dual(con)
end
end
Thanks to Oscar, I have built a solution that could help to automatize the extraction of results.
The solution is build around a naming convention using base_name in the variable definition. One can copy paste the variable definition into base_name followed by :. E.g.:
#variable(Model, p[t=s_time,n=s_n,m=s_m], lower_bound=0,base_name="p[t=s_time,n=s_n,m=s_m]:")
The naming convention and syntax can be changed, comments can e.g. be added, or one can just not define a base_name. The following function divides the base_name into variable name, sets (if needed) and index:
function var_info(vars::VariableRef)
split_conv = [":","]","[",","]
x_str = name(vars)
if occursin(":",x_str)
x_str = replace(x_str, " " => "") #Deletes all spaces
x_name,x_index = split(x_str,split_conv[1]) #splits raw variable name+ sets and index
x_name = replace(x_name, split_conv[2] => "")
x_name,s_set = split(x_name,split_conv[3])#splits raw variable name and sets
x_set = split(s_set,split_conv[4])
x_index = replace(x_index, split_conv[2] => "")
x_index = replace(x_index, split_conv[3] => "")
x_index = split(x_index,split_conv[4])
return (x_name,x_set,x_index)
else
println("Var base_name not properly defined. Special Syntax required in form var[s=set]: ")
end
end
The next functions create the columns and the index values plus columns for the primal solution ("Value").
function create_columns(x)
col_ind=[String(var_info(x)[2][col]) for col in 1:size(var_info(x)[2])[1]]
cols = append!(["Value"],col_ind)
return cols
end
function create_index(x)
col_ind=[String(var_info(x)[3][ind]) for ind in 1:size(var_info(x)[3])[1]]
index = append!([string(value(x))],col_ind)
return index
end
function create_sol_matrix(varss,model)
nested_sol_array=[create_index(xx) for xx in all_variables(model) if varss[1]==var_info(xx)[1]]
sol_array=hcat(nested_sol_array...)
return sol_array
end
Finally, the last function creates the Dict which holds all results of the variables in DataFrames in the previously mentioned style:
function create_var_dict(model)
Variable_dict=Dict(vars[1]
=>DataFrame(Dict(vars[2][1][cols]
=>create_sol_matrix(vars,model)[cols,:] for cols in 1:size(vars[2][1])[1]))
for vars in unique([[String(var_info(x)[1]),[create_columns(x)]] for x in all_variables(model)]))
return Variable_dict
end
When those functions are added to your script, you can simply retrieve all the solutions of the variables after the optimization by calling create_var_dict():
var_dict = create_var_dict(model)
Be aware: they are nested functions. When you change the naming convention, you might have to update the other functions as well. If you add more comments you have to avoid using [, ], and ,.
This solution is obviously far from optimal. I believe there could be a more efficient solution falling back to MOI.

difference between two lists that include duplicates

I have a problem with two lists which contain duplicates
a = [1,1,2,3,4,4]
b = [1,2,3,4]
I would like to be able to extract the differences between the two lists ie.
c = [1,4]
but if I do c = a-b I get c =[]
It should be trivial but I can't find out :(
I tried also to parse the biggest list and remove items from it when I find them in the smallest list but I can't update lists on the fly, it does not work either
has anyone got an idea ?
thanks
You see an empty c as a result, because removing e.g. 1 removes all elements that are equal 1.
groovy:000> [1,1,1,1,1,2] - 1
===> [2]
What you need instead is to remove each occurrence of specific value separately. For that, you can use Groovy's Collection.removeElement(n) that removes a single element that matches the value. You can do it in a regular for-loop manner, or you can use another Groovy's collection method, e.g. inject to reduce a copy of a by removing each occurrence separately.
def c = b.inject([*a]) { acc, val -> acc.removeElement(val); acc }
assert c == [1,4]
Keep in mind, that inject method receives a copy of the a list (expression [*a] creates a new list from the a list elements.) Otherwise, acc.removeElement() would modify an existing a list. The inject method is an equivalent of a popular reduce or fold operation. Each iteration from this example could be visualized as:
--inject starts--
acc = [1,1,2,3,4,4]; val = 1; acc.removeElement(1) -> return [1,2,3,4,4]
acc = [1,2,3,4,4]; val = 2; acc.removeElement(2) -> return [1,3,4,4]
acc = [1,3,4,4]; val = 3; acc.removeElement(3) -> return [1,4,4]
acc = [1,4,4]; val = 4; acc.removeElement(4) -> return [1,4]
-- inject ends -->
PS: Kudos to almighty tim_yates who recommended improvements to that answer. Thanks, Tim!
the most readable that comes to my mind is:
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
b.each {c.removeElement(it)}
if you use this frequently you could add a method to the List metaClass:
List.metaClass.removeElements = { values -> values.each { delegate.removeElement(it) } }
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
c.removeElements(b)

Invalid parameter number when using whereNotIn in laravel

I want to print the list of option from database on my blade, which I want to get all list except the option that already selected. So what I've already do is getting the selected option (which is saved on another table) as an array:
$ctg_id[] = [];
foreach($vendor->category as $item){
$ctg_id[] = [$item->category_id];
}
Then finally print it:
$category = ItemCategory::whereNotIn('id',$ctg_id)->get();
But it return an error that says:
"SQLSTATE[HY093]: Invalid parameter number (SQL: select * from category_item where id not in (4, 3, ?))" Is there anything wrong with my code?
You are using double dimensional array, you need to change it to one dimensional:
$ctg_id = [];
foreach($vendor->category as $item){
$ctg_id[] = $item->category_id;
}
Recommend to use method-pluck like this:
$category = ItemCategory::whereNotIn('id', $vendor->category->pluck('category_id'))->get();

Lua get index name of table as table

Is there any way to get every index value of a table?
Example:
local mytbl = {
["Hello"] = 123,
["world"] = 321
}
I want to get this:
{"Hello", "world"}
local t = {}
for k, v in pairs(mytbl) do
table.insert(t, k) -- or t[#t + 1] = k
end
Note that the order of how pairs iterates a table is not specified. If you want to make sure the elements in the result are in a certain order, use:
table.sort(t)