I'm looking at the print interface in the gota dataframe here:
https://github.com/kniren/gota/blob/master/dataframe/dataframe.go#L99
I see the default value is shortCols = true, given here.
When I call the print data frame, how can I override this value to print with shortCols = false when I println?
fmt.Println(fil)
eg, I'd like to print all columns, rather than just the first 5 as the above produces the below:
[31x16] DataFrame
valA valB valC valD valE ...
0: 578 8.30 491 7959 1.040000 ...
1: 577 8.30 291 7975 2.050000 ...
2: 466 16.7 179 6470 3.210000 ...
3: 592 9.03 194 8212 4.040000 ...
Without modifying the library there is nothing you can do.
If modifying the library is an option you have a few possibilities:
Change the name of the internal formatting function so it is exported and call that. This is a bit more work, since you need to explicitly call a function every time you want to print a DataFrame, but it is a reasonable option if you want to make minimal changes to the way the library works.
Basically change print to Print on lines 101 and 104 (I think those are the only occurrences of that function; if not the compiler will be happy to point out the others :P)
Change the arguments to df.print in the definition of df.String. This is positively trivial, but it has the effect of changing the default behaviour, which may or may not be a good thing.
For this option just change line 101 to return df.print(true, false, true, true, 10, 70, "DataFrame") or whatever combination fits your needs.
Add a new method for each printing format you want, and explicitly call these new methods. This is more work than #1 or #2, but some people may prefer it.
Personally, I would go with #1, but your question makes #2 sound more like what you want.
Related
Since I could not find the declarations of the single methods of DataFrame.interpolation()'s "method"-parameter, I am asking here:
How does pandas' DataFrame.interpolation() work in relation to the amount of rows it considers, is it just the row before the NaNs and the row right after?
Or is it the whole DataFrame (how does that work at 1 million rows?)
If you already know where to look, feel free to share the link to the source-code (since https://github.com/pandas-dev/pandas/blob/06d230151e6f18fdb8139d09abf539867a8cd481/pandas/core/frame.py#L10916 doesnt include the "method"'s declarations (for example "polynomial").
I found the attached in core/missing.py.
My interpretation is that interpolation is either done with np.iter or, if method is specified and only available in scipy, with _interpolate_scipy_wrapper. A function which I could not locate but a reasonable guess is that it is a wrapper for scipy.
if method in NP_METHODS:
# np.interp requires sorted X values, #21037
indexer = np.argsort(indices[valid])
yvalues[invalid] = np.interp(
indices[invalid], indices[valid][indexer], yvalues[valid][indexer]
)
else:
yvalues[invalid] = _interpolate_scipy_wrapper(
indices[valid],
yvalues[valid],
indices[invalid],
method=method,
fill_value=fill_value,
bounds_error=bounds_error,
order=order,
**kwargs,
)
yvalues[preserve_nans] = np.nan
I am currently exploring coroutines in Python, but I have difficulties in determining when to use them over normal subroutines. I am trying to explain my problem with the help of an example.
The task is to iterate over tabular data and format the content of the row according to its entry in the type field. When formatted, the result is written to the same output file.
index | type | time | content
-----------------------------
0 | A | 4 | ...
1 | B | 6 | ...
2 | C | 9 | ...
3 | B | 11 | ...
...
Normally, I would check the type and write some sort of switch case and delegate the data to a specific subroutine (== function) like so (pseudo-code):
outfile = open('test.txt')
for row in infile:
switch(row.type)
{
case A:
format_a(row.content, outfile) // subroutine that formats and writes data of type A
break
case B:
format_b(row.content, outfile) // same for type B...
break
case C:
format_c(row.content, outfile) // ... and type C
break
default:
// handle unknown type exception
break
}
close(outfile)
The question is: would I get any benefit from realizing this with coroutines? I don't think so, and let me explain why: if I have determined the type of a row, I would pass the content to the respective coroutine. As long as I do this, the other coroutines and the calling function are paused. The data is formatted and written to the file and passes control back to the calling function which gets the next row, etc. This procedure repeats until I run out of rows. Therefore, this is exactly what the workflow using subroutines would look like.
One pro for using coroutines here would be if I had to keep track of some state. Maybe I am interested in the time difference to the last row per type. In this case, the coroutine function for B would save the time of its first call (6). When it is called the second time, it retrieves the value (11) and can calculate the difference (11 - 6 = 5). This would be way harder to do with subroutines.
But is the argument of keeping track of some state the only reason for using coroutines? I am looking for a rule-of-thumb, not a rule that covers every case possible.
I have a question regarding Answer Set Programming on how to make an existing fact invalid, when there is already (also) a default statement present in the Knowledge Base.
For example, there are two persons seby and andy, one of them is able to drive at once. The scenario can be that seby can drive as seen in Line 3 but let's say, after his license is cancelled he cannot drive anymore, hence we now have Lines 4 to 7 and meanwhile andy learnt driving, as seen in Line 7. Line 6 shows only one person can drive at a time, besides showing seby and andy are not the same.
1 person(seby).
2 person(andy).
3 drives(seby).
4 drives(seby) :- person(seby), not ab(d(drives(seby))), not -drives(seby).
5 ab(d(drives(seby))).
6 -drives(P) :- drives(P0), person(P), P0 != P.
7 drives(andy).
In the above program, Lines 3 and 7 contradict with Line 6, and the Clingo solver (which I use) obviously outputs UNSATISFIABLE.
Having said all this, please don't say to delete Line 3 and the problem is solved. The intention behind asking this question is to know whether it is possible now to make Line 3 somehow invalid to let Line 4 do its duty.
However, Line 4 can also be written as:
4 drives(P) :- person(P), not ab(d(drives(P))), not -drives(P).
Thanks a lot in advance.
I do not fully understand the problem. Line 3 and line 4 are separate rules, even if line 4's antecedent is false line 3 would still be true. In other words, line 4 seems redundant.
It seems like you want a choice. I assume ab(d(drives(seby))) denotes that seby has lost their license. So, below line four is your constraint on only people with a license driving. Line five is the choice, so by default andy or seby can drive but not both. Notice in the ground program how line five is equivalent to drives(seby) :- not drives(andy). and drives(andy) :- not drives(seby). You can also have seby as being the preferred driver using optimization statements (the choice rule below is like an optimization statement).
person(seby).
person(andy).
ab(d(drives(seby))).
:- person(P), ab(d(drives(P))), drives(P).
1{drives(P) : person(P)}1.
If something is true, it must always be true. Therefore the line:
drives(seby).
will always be true.
However, we can get around this by putting the fact into a choice rule.
0{drives(seby)}1.
This line says that an answer will have 0 to 1 drives(seby).. This means we can have rules that contradict drives(seby). and the answer will still be satisfiable, but we can also have drives(seby). be true.
Both this program:
0{drives(seby)}1.
drives(seby).
And this program:
0{drives(seby)}1.
:- drives(seby).
Are satisfiable.
Most likely you will want drives(seby). to be true if it can be, and false if it can't be. To accomplish this, we need to force Clingo into making drives(seby). true if it can. We can do this by using an optimization.
A naive way to do this is to count how many drives(seby). exist (either 0 or 1) and maximize the count.
We can count the number of drives(seby). with this line:
sebyCount(N) :- N = #count {drives(seby) : drives(seby)}.
N is equal to the number of drives(seby). in the domain drives(seby).. This will either be 0 or 1.
And then we can maximize the value of N with this statement:
#maximize {N#1 : sebyCount(N)}.
This maximizes the value of N with the priority 1 (the lower the number, the lower the priority) in the domain of sebyCount(N).
As many scripting languages has caller(), I'd like to get caller's information in ObjC methods. Especially, I need it in the dealloc method, which is automatically called by the compiler so I could not pass any arguments to it.
Because ObjC exceptions have stacktrace, the caller information exists somewhere, I guess. How can I get the information without throwing exceptions?
-(void)dealloc {
// get caller's information and NSLog() it here!
}
You can get the information you want from the backtrace(3) && backtrace_symbols(3) C functions. You might need some jiggery pokery to make it look good for an Objective-C case.
Edit: I take it back - backtrace_symbols gave beautiful output here for an Objective-C test program:
0 example 0x0000000109274c77 +[TestClass classMethod] + 55
1 example 0x0000000109274cee -[TestClass instanceMethod] + 46
2 example 0x0000000109274dec main + 140
3 libdyld.dylib 0x00007fff914c37e1 start + 0
0 example 0x0000000109274c77 +[TestClass classMethod] + 55
1 example 0x0000000109274d36 -[TestClass dealloc] + 54
2 example 0x0000000109274e19 main + 185
3 libdyld.dylib 0x00007fff914c37e1 start + 0
I put the backtrace* calls in classMethod and called it from instanceMethod and from dealloc. Seems to work in both cases, no problem.
Have you considered using dtrace (http://www.mactech.com/articles/mactech/Vol.23/23.11/ExploringLeopardwithDTrace/index.html has some info, googling for "mac dtrace" has much more) to introspect your app from the outside, rather than adding things inside it? You can get a ton of information that way, and if it's not enough, you can even add custom static probes inside your app to gather more.
Using self and _cmd?
NSLog(#"%#", self);
I am working on fairly large Mathematica projects and the problem arises that I have to intermittently check numerical results but want to easily revert to having all my constructs in analytical form.
The code is fairly fluid I don't want to use scoping constructs everywhere as they add work overhead. Is there an easy way for identifying and clearing all assignments that are numerical?
EDIT: I really do know that scoping is the way to do this correctly ;-). However, for my workflow I am really just looking for a dirty trick to nix all numerical assignments after the fact instead of having the foresight to put down a Block.
If your assignments are on the top level, you can use something like this:
a = 1;
b = c;
d = 3;
e = d + b;
Cases[DownValues[In],
HoldPattern[lhs_ = rhs_?NumericQ] |
HoldPattern[(lhs_ = rhs_?NumericQ;)] :> Unset[lhs],
3]
This will work if you have a sufficient history length $HistoryLength (defaults to infinity). Note however that, in the above example, e was assigned 3+c, and 3 here was not undone. So, the problem is really ambiguous in formulation, because some numbers could make it into definitions. One way to avoid this is to use SetDelayed for assignments, rather than Set.
Another alternative would be to analyze the names in say Global' context (if that is the context where your symbols live), and then say OwnValues and DownValues of the symbols, in a fashion similar to the above, and remove definitions with purely numerical r.h.s.
But IMO neither of these approaches are robust. I'd still use scoping constructs and try to isolate numerics. One possibility is to wrap you final code in Block, and assign numerical values inside this Block. This seems a much cleaner approach. The work overhead is minimal - you just have to remember which symbols you want to assign the values to. Block will automatically ensure that outside it, the symbols will have no definitions.
EDIT
Yet another possibility is to use local rules. For example, one could define rule[a] = a->1; rule[d]=d->3 instead of the assignments above. You could then apply these rules, extracting them as say
DownValues[rule][[All, 2]], whenever you want to test with some numerical arguments.
Building on Andrew Moylan's solution, one can construct a Block like function that would takes rules:
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
You can then save your numeric rules in a variable, and use BlockRules[ savedrules, code ], or even define a function that would apply a fixed set of rules, kind of like so:
In[76]:= NumericCheck =
Function[body, BlockRules[{a -> 3, b -> 2`}, body], HoldAll];
In[78]:= a + b // NumericCheck
Out[78]= 5.
EDIT In response to Timo's comment, it might be possible to use NotebookEvaluate (new in 8) to achieve the requested effect.
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
nb = CreateDocument[{ExpressionCell[
Defer[Plot[Sin[a x], {x, 0, 2 Pi}]], "Input"],
ExpressionCell[Defer[Integrate[Sin[a x^2], {x, 0, 2 Pi}]],
"Input"]}];
BlockRules[{a -> 4}, NotebookEvaluate[nb, InsertResults -> "True"];]
As the result of this evaluation you get a notebook with your commands evaluated when a was locally set to 4. In order to take it further, you would have to take the notebook
with your code, open a new notebook, evaluate Notebooks[] to identify the notebook of interest and then do :
BlockRules[variablerules,
NotebookEvaluate[NotebookPut[NotebookGet[nbobj]],
InsertResults -> "True"]]
I hope you can make this idea work.