Stata: for loop for storing values of Gini coefficient

Stata: for loop for storing values of Gini coefficient - syntax-error

I have 133 variables on income (each variable represents a group). I want the Gini coefficients of all these groups, so I use ineqdeco in Stata. I can't compute all these coefficients by hand so I created a for loop:
gen sgini = .
foreach var of varlist C07-V14 {
forvalue i=1/133 {
ineqdeco `var'
replace sgini[i] = $S_gini
}
}
Also tried changing the order:
foreach var of varlist C07-V14 {
ineqdeco `var'
forvalue i=1/133 {
replace sgini[i] = $S_gini
}
}
And specifying i beforehand:
gen i = 1
foreach var of varlist C07-V14 {
ineqdeco `var'
replace sgini[i] = $S_gini
replace i = i+1
}
}
I don't know if this last method works anyway.
In all cases I get the error: weight not allowed r(101). I don't know what this means, or what to do. Basically, I want to compute the Gini coefficient of all 133 variables, and store these values in a vector of length 133, so a single variable with all the coefficients stored in it.
Edit: I found that the error has to do with the replace command. I replaced this line with:
replace sgini = $S_gini in `i'
But now it does not "loop", so I get the first value in all entries of sgini.

There is no obvious reason for your inner loop. If you have no more variables than observations, then this might work:
gen sgini = .
gen varname = ""
local i = 1
foreach var of varlist C07-V14 {
ineqdeco `var'
replace sgini = $S_gini in `i'
replace varname = "`var'" in `i'
local i = `i' + 1
}
The problems evident in your code (seem to) include:
Confusion between variables and local macros. If you have much experience with other languages, it is hard to break old mental habits. (Mata is more like other languages here.)
Not being aware that a loop over observations is automatic. Or perhaps not seeing that there is just a single loop needed here, the twist being that the loop over variables is easy but your concomitant loop over observations needs to be arranged with your own code.
Putting a subscript on the LHS of a replace. The [] notation is reserved for weights but is illegal there in any case. To find out about weights, search weights or help weight.
Note that with this way of recording results, the Gini coefficients are not aligned with anything else. A token fix for that is to record the associated variable names alongside, as done above.
A more advanced version of this solution would be to use postfile to save to a new dataset.

Related

Compact way to save JuMP optimization results in DataFrames

I would like to save all my variables and dual variables of my finished lp-optimization in an efficient manner. My current solution works, but is neither elegant nor suited for larger optimization programs with many variables and constraints because I define and push! every single variable into DataFrames separately. Is there a way to iterate through the variables using all_variables() and all_constraints() for the duals? While iterating, I would like to push the results into DataFrames with the variable index name as columns and save the DataFrame in a Dict().
A conceptual example would be for variables:
Result_vars = Dict()
for vari in all_variables(Model)
Resul_vars["vari"] = DataFrame(data=[indexval(vari),value(vari)],columns=[index(vari),"Value"])
end
An example of the appearance of the declared variable in JuMP and DataFrame:
#variable(Model, p[t=s_time,n=s_n,m=s_m], lower_bound=0,base_name="Expected production")
And Result_vars[p] shall approximately look like:
t,n,m,Value
1,1,1,50
2,1,1,60
3,1,1,145

Presumably, you could go something like:
x = all_variables(model)
DataFrame(
name = variable_name.(x),
Value = value.(x),
)
If you want some structure more complicated, you need to write custom code.
T, N, M, primal_solution = [], [], [], []
for t in s_time, n in s_n, m in s_m
push!(T, t)
push!(N, n)
push!(M, m)
push!(primal_solution, value(p[t, n, m]))
end
DataFrame(t = T, n = N, m = M, Value = primal_solution)
See here for constraints: https://jump.dev/JuMP.jl/stable/constraints/#Accessing-constraints-from-a-model-1. You want something like:
for (F, S) in list_of_constraint_types(model)
for con in all_constraints(model, F, S)
#show dual(con)
end
end

Thanks to Oscar, I have built a solution that could help to automatize the extraction of results.
The solution is build around a naming convention using base_name in the variable definition. One can copy paste the variable definition into base_name followed by :. E.g.:
#variable(Model, p[t=s_time,n=s_n,m=s_m], lower_bound=0,base_name="p[t=s_time,n=s_n,m=s_m]:")
The naming convention and syntax can be changed, comments can e.g. be added, or one can just not define a base_name. The following function divides the base_name into variable name, sets (if needed) and index:
function var_info(vars::VariableRef)
split_conv = [":","]","[",","]
x_str = name(vars)
if occursin(":",x_str)
x_str = replace(x_str, " " => "") #Deletes all spaces
x_name,x_index = split(x_str,split_conv[1]) #splits raw variable name+ sets and index
x_name = replace(x_name, split_conv[2] => "")
x_name,s_set = split(x_name,split_conv[3])#splits raw variable name and sets
x_set = split(s_set,split_conv[4])
x_index = replace(x_index, split_conv[2] => "")
x_index = replace(x_index, split_conv[3] => "")
x_index = split(x_index,split_conv[4])
return (x_name,x_set,x_index)
else
println("Var base_name not properly defined. Special Syntax required in form var[s=set]: ")
end
end
The next functions create the columns and the index values plus columns for the primal solution ("Value").
function create_columns(x)
col_ind=[String(var_info(x)[2][col]) for col in 1:size(var_info(x)[2])[1]]
cols = append!(["Value"],col_ind)
return cols
end
function create_index(x)
col_ind=[String(var_info(x)[3][ind]) for ind in 1:size(var_info(x)[3])[1]]
index = append!([string(value(x))],col_ind)
return index
end
function create_sol_matrix(varss,model)
nested_sol_array=[create_index(xx) for xx in all_variables(model) if varss[1]==var_info(xx)[1]]
sol_array=hcat(nested_sol_array...)
return sol_array
end
Finally, the last function creates the Dict which holds all results of the variables in DataFrames in the previously mentioned style:
function create_var_dict(model)
Variable_dict=Dict(vars[1]
=>DataFrame(Dict(vars[2][1][cols]
=>create_sol_matrix(vars,model)[cols,:] for cols in 1:size(vars[2][1])[1]))
for vars in unique([[String(var_info(x)[1]),[create_columns(x)]] for x in all_variables(model)]))
return Variable_dict
end
When those functions are added to your script, you can simply retrieve all the solutions of the variables after the optimization by calling create_var_dict():
var_dict = create_var_dict(model)
Be aware: they are nested functions. When you change the naming convention, you might have to update the other functions as well. If you add more comments you have to avoid using [, ], and ,.
This solution is obviously far from optimal. I believe there could be a more efficient solution falling back to MOI.

Getting the name of the variable as a string in GD Script

I have been looking for a solution everywhere on the internet but nowhere I can see a single script which lets me read the name of a variable as a string in Godot 3.1
What I want to do:
Save path names as variables.
Compare the name of the path variable as a string to the value of another string and print the path value.
Eg -
var Apple = "mypath/folder/apple.png"
var myArray = ["Apple", "Pear"]
Function that compares the Variable name as String to the String -
if (myArray[myposition] == **the required function that outputs variable name as String**(Apple) :
print (Apple) #this prints out the path.
Thanks in advance!

I think your approach here might be a little oversimplified for what you're trying to accomplish. It basically seems to work out to if (array[apple]) == apple then apple, which doesn't really solve a programmatic problem. More complexity seems required.
First, you might have a function to return all of your icon names, something like this.
func get_avatar_names():
var avatar_names = []
var folder_path = "res://my/path"
var avatar_dir = Directory.new()
avatar_dir.open(folder_path)
avatar_dir.list_dir_begin(true, true)
while true:
var avatar_file = avatar_dir.get_next()
if avatar_file == "":
break
else:
var avatar_name = avatar_file.trim_suffix(".png")
avatar_names.append(avatar_name)
return avatar_names
Then something like this back in the main function, where you have your list of names you care about at the moment, and for each name, check the list of avatar names, and if you have a match, reconstruct the path and do other work:
var some_names = ["Jim","Apple","Sally"]
var avatar_names = get_avatar_names()
for name in some_names:
if avatar_names.has(name):
var img_path = "res://my/path/" + name + ".png"
# load images, additional work, etc...
That's the approach I would take here, hope this makes sense and helps.

I think the current answer is best for the approach you desire, but the performance is pretty bad with string comparisons.
I would suggest adding an enumeration for efficient comparisons. unfortunately Godot does enums differently then this, it seems like your position is an int so we can define a dictionary like this to search for the index and print it out with the int value.
var fruits = {0:"Apple",1:"Pear"}
func myfunc():
var myposition = 0
if fruits.has(myposition):
print(fruits[myposition])
output: Apple
If your position was string based then an enum could be used with slightly less typing and different considerations.
reference: https://docs.godotengine.org/en/latest/tutorials/scripting/gdscript/gdscript_basics.html#enums

Can't you just use the str() function to convert any data type to stirng?
var = str(var)

generating DataFrames in for loop in Scala Spark cause out of memory

I'm generating small dataFrames in for loop. At each round of for loop, I pass the generated dataFrame to a function which returns double. This simple process (which I thought could be easily taken care of by garbage collector) blow up my memory. When I look at Spark UI at each round of for loop it adds a new "SQL{1-500}" (my loop runs 500 times). My question is how to drop this sql object before generating a new one?
my code is something like this:
Seq.fill(500){
val data = (1 to 1000).map(_=>Random.nextInt(1000))
val dataframe = createDataFrame(data)
myFunction(dataframe)
dataframe.unpersist()
}
def myFunction(df: DataFrame)={
df.count()
}
I tried to solve this problem by dataframe.unpersist() and sqlContext.clearCache() but neither of them worked.

You have two places where I suspect something fishy is happening:
in the definition of myFunction : you really need to put the = before the body of the definition. I had typos like that compile, but produce really weird errors (note I changed your myFunction for debugging purposes)
it is better to fill your Seq with something you know and then apply foreach or some such
(You also need to replace random.nexInt with Random.nextInt, and also, you can only create a DataFrame from a Seq of a type that is a subtype of Product, such as tuple, and need to use sqlContext to use createDataFrame)
This code works with no memory issues:
Seq.fill(500)(0).foreach{ i =>
val data = {1 to 1000}.map(_.toDouble).toList.zipWithIndex
val dataframe = sqlContext.createDataFrame(data)
myFunction(dataframe)
}
def myFunction(df: DataFrame) = {
println(df.count())
}
Edit: parallelizing the computation (across 10 cores) and returning the RDD of counts:
sc.parallelize(Seq.fill(500)(0), 10).map{ i =>
val data = {1 to 1000}.map(_.toDouble).toList.zipWithIndex
val dataframe = sqlContext.createDataFrame(data)
myFunction(dataframe)
}
def myFunction(df: DataFrame) = {
df.count()
}
Edit 2: the difference between declaring function myFunction with = and without = is that the first is (a usual) function definition, while the other is procedure definition and is only used for methods that return Unit. See explanation. Here is this point illustrated in Spark-shell:
scala> def myf(df:DataFrame) = df.count()
myf: (df: org.apache.spark.sql.DataFrame)Long
scala> def myf2(df:DataFrame) { df.count() }
myf2: (df: org.apache.spark.sql.DataFrame)Unit

Make line chart with values and dates

In my app i use ios-charts library (swift alternative of MPAndroidChart).
All i need is to display line chart with dates and values.
Right now i use this function to display chart
func setChart(dataPoints: [String], values: [Double]) {
var dataEntries: [ChartDataEntry] = []
for i in 0..<dataPoints.count {
let dataEntry = ChartDataEntry(value: values[i], xIndex: i)
dataEntries.append(dataEntry)
}
let lineChartDataSet = LineChartDataSet(yVals: dataEntries, label: "Items count")
let lineChartData = LineChartData(xVals: dataPoints, dataSet: lineChartDataSet)
dateChartView.data = lineChartData
}
And this is my data:
xItems = ["27.05", "03.06", "17.07", "19.09", "20.09"] //String
let unitsSold = [25.0, 30.0, 45.0, 60.0, 20.0] //Double
But as you can see - xItems are dates in "dd.mm" format. As they are strings they have same paddings between each other. I want them to be more accurate with real dates. For example 19.09 and 20.09 should be very close. I know that i should match each day with some number in order to accomplish it. But i don't know what to do next - how i can adjust x labels margins?
UPDATE
After small research where i found out that many developers had asked about this feature but nothing happened - for my case i found very interesting alternative to this library in Swift - PNChart. It is easy to use, it solves my problem.

The easiest solution will be to loop through your data and add a ChartDataEntry with a value of 0 and a corresponding label for each missing date.
In response to the question in the comments here is a screenshot from one of my applications where I am filling in date gaps with 0 values:
In my case I wanted the 0 values rather than an averaged line from data point to data point as it clearly indicates there is no data on the days skipped (8/11 for instance).
From #Philipp Jahoda's comments it sounds like you could skip the 0 value entries and just index the data you have to the correct labels.
I modified the MPAndroidChart example program to skip a few data points and this is the result:
As #Philipp Jahoda mentioned in the comments the chart handles missing Entry by just connecting to the next data point. From the code below you can see that I am generating x values (labels) for the entire data set but skipping y values (data points) for index 11 - 29 which is what you want. The only thing remaining would be to handle the x labels as it sounds like you don't want 15, 20, and 25 in my example to show up.
ArrayList<String> xVals = new ArrayList<String>();
for (int i = 0; i < count; i++) {
xVals.add((i) + "");
}
ArrayList<Entry> yVals = new ArrayList<Entry>();
for (int i = 0; i < count; i++) {
if (i > 10 && i < 30) {
continue;
}
float mult = (range + 1);
float val = (float) (Math.random() * mult) + 3;// + (float)
// ((mult *
// 0.1) / 10);
yVals.add(new Entry(val, i));
}

What I did is fully feed the dates for x data even no y data for it, and just not add the data entry for the specific xIndex, then it will not draw the y value for the xIndex to achieve what you want, this is the easiest way since you just write a for loop and continue if you detect no y value there.
I don't suggest use 0 or nan, since if it is a line chart, it will connect the 0 data or bad things will happen for nan. You might want to break the lines, but again ios-charts does not support it yet (I also asked a feature for this), you need to write your own code to break the line, or you can live with connecting the 0 data or just connect to the next valid data.
The down side is it may has performance drop since many xIndex there, but I tried ~1000 and it is acceptable. I already asked for such feature a long time ago, but it took lot of time to think about it.

Here's a function I wrote based on Wingzero's answer (I pass NaNs for the entries in the values array that are empty) :
func populateLineChartView(lineChartView: LineChartView, labels: [String], values: [Float]) {
var dataEntries: [ChartDataEntry] = []
for i in 0..<labels.count {
if !values[i].isNaN {
let dataEntry = ChartDataEntry(value: Double(values[i]), xIndex: i)
dataEntries.append(dataEntry)
}
}
let lineChartDataSet = LineChartDataSet(yVals: dataEntries, label: "Label")
let lineChartData = LineChartData(xVals: labels, dataSet: lineChartDataSet)
lineChartView.data = lineChartData
}

The solution which worked for me is splitting Linedataset into 2 Linedatasets. First would hold yvals till empty space and second after emptyspace.
//create 2 LineDataSets. set1- till empty space set2 after empty space
set1 = new LineDataSet(yVals1, "DataSet 1");
set2= new LineDataSet(yVals2,"DataSet 1");
//load datasets into datasets array
ArrayList<ILineDataSet> dataSets = new ArrayList<ILineDataSet>();
dataSets.add(set1);
dataSets.add(set2);
//create a data object with the datasets
LineData data = new LineData(xVals, dataSets);
// set data
mChart.setData(data);

How to convert a list of attribute-value pairs into a flat table whose columns are attributes

I'm trying to convert a csv file containing 3 columns (ATTRIBUTE_NAME,ATTRIBUTE_VALUE,ID) into a flat table whose each row is (ID,Attribute1,Attribute2,Attribute3,....). The samples of such tables are provided at the end.
Either Python, Perl or SQL is fine. Thank you very much and I really appreciate your time and efforts!
In fact, my question is very similar to this post, except that in my case the number of attributes is pretty big (~300) and not consistent across each ID, so hard coding each attribute might not be a practical solution.
For me, the challenging/difficult parts are:
There are approximately 270 millions lines of input, the total size of the input table is about 60 GB.
Some single values (string) contain comma (,) within, and the whole string will be enclosed with double-quote (") to make the reader aware of that. For example "JPMORGAN CHASE BANK, NA, TX" in ID=53.
The set of attributes is not the same across ID's. For example, the number of overall attributes is 8, but ID=53, 17 and 23 has only 7, 6 and 5 respectively. ID=17 does not have attributes string_country and string_address, so output blank/nothing after the comma.
The input attribute-value table looks like this. In this sample input and output, we have 3 ID's, whose number of attributes can be different depending on we can obtain such attributes from the server or not.
ATTRIBUTE_NAME,ATTRIBUTE_VALUE,ID
num_integer,100,53
string_country,US (United States),53
string_address,FORT WORTH,53
num_double2,546.0,53
string_acc,My BankAcc,53
string_award,SILVER,53
string_bankname,"JPMORGAN CHASE BANK, NA, TX",53
num_integer,61,17
num_double,34.32,17
num_double2,200.541,17
string_acc,Your BankAcc,17
string_award,GOLD,17
string_bankname,CHASE BANK,17
num_integer,36,23
num_double,78.0,23
string_country,CA (Canada),23
string_address,VAN COUVER,23
string_acc,Her BankAcc,23
The output table should look like this. (The order of attributes in the columns is not fixed. It can be sorted alphabetically or by order-of-appearance.)
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,US (United States),FORT WORTH,546.0,My BankAcc,SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,Your BankAcc,GOLD,CHASE BANK
23,36,78.0,CA (Canada),VAN COUVER,,Her BankAcc,,

This program will do as you ask. It expects the name of the input file as a parameter on the command line.
Update Looking more carefully at the data I see that not all of the data fields are available for every ID. That makes things more complex if the fields are to be kept in the same order as they appear in the file.
This program works by scanning the file and accumulating all the data for output into hash %data. At the same time it builds a hash %headers, that keeps the position each header appears in the data for each ID value.
Once the file has been scanned, the collected headers are sorted by finding the first ID for each pair that includes information for both headers. The sort order for that pair within the complete set must be the same as the order they appeared in the data for that ID, so it's just a matter of comparing the two position values using <=>.
Once a sorted set of headers has been created, the %data hash is dumped, accessing the complete list of values for each ID using a hash slice.
Update 2 Now that I realise the sheer size of your data I can see that my second attempt was also flawed, as it tried to read all of the information into memory before outputting it. That isn't going to work unless you have a monster machine with about 1TB of memory!
You may get some mileage from this version. It scans twice through the file, the first time to read the data so that the full set of header names can be created and ordered, then again to read the data for each ID and output it.
Let me know if it's not working for you, as there's still things I can do to make it more memory-efficient.
use strict;
use warnings;
use 5.010;
use Text::CSV;
use Fcntl 'SEEK_SET';
my $csv = Text::CSV->new;
open my $fh, '<', $ARGV[0] or die qq{Unable to open "$ARGV[0]" for input: $!};
my %headers = ();
my $last_id;
my $header_num;
my $num_ids;
while (my $row = $csv->getline($fh)) {
next if $. == 1;
my ($key, $val, $id) = #$row;
unless (defined $last_id and $id eq $last_id) {
++$num_ids;
$header_num = 0;
$last_id = $id;
print STDERR "Processing ID $id\n";
}
$headers{$key}[$num_ids-1] = ++$header_num;
}
sub by_position {
for my $id (0 .. $num_ids-1) {
my ($posa, $posb) = map $headers{$_}[$id], our $a, our $b;
return $posa <=> $posb if $posa and $posb;
}
0;
}
my #headers = sort by_position keys %headers;
%headers = ();
print STDERR "List of headers complete\n";
seek $fh, 0, SEEK_SET;
$. = 0;
$csv->combine('ID', #headers);
print $csv->string, "\n";
my %data = ();
$last_id = undef;
while () {
my $row = $csv->getline($fh);
next if $. == 1;
if (not defined $row or defined $last_id and $last_id ne $row->[2]) {
$csv->combine($last_id, #data{#headers});
print $csv->string, "\n";
%data = ();
}
last unless defined $row;
my ($key, $val, $id) = #$row;
$data{$key} = $val;
$last_id = $id;
}
output
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,"US (United States)","FORT WORTH",546.0,"My BankAcc",SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,"Your BankAcc",GOLD,"CHASE BANK"
23,36,78.0,"CA (Canada)","VAN COUVER",,"Her BankAcc",,

Use Text::CSV from CPAN:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Text::CSV;
my $col_csv = Text::CSV->new();
my $id_attr_csv = Text::CSV->new({ eol=>"\n", });
$col_csv->column_names( $col_csv->getline( *DATA ));
while( my $row = $col_csv->getline_hr( *DATA )){
# do all the keys but skip if ID
for my $attribute ( keys %$row ){
next if $attribute eq 'ID';
$id_attr_csv->print( *STDOUT, [ $attribute, $row->{$attribute}, $row->{ID}, ]);
}
}
__DATA__
ID,num_integer,num_double,string_country,string_address,num_double2,string_acc,string_award,string_bankname
53,100,,US (United States),FORT WORTH,546.0,My BankAcc,SILVER,"JPMORGAN CHASE BANK, NA, TX"
17,61,34.32,,,200.541,Your BankAcc,GOLD,CHASE BANK
23,36,78.0,CA (Canada),VAN COUVER,,Her BankAcc,,

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas