Printing out a binary search tree with slashes - binary-search-tree

http://pastebin.com/dN9a9xfs
That's my code to print out the elements of a binary search tree. The goal is to display it in level order, with slashes connecting the parent to each child. So for instance, the sequence 15 3 16 2 1 4 19 17 28 31 12 14 11 0 would display after execution as:
15
/ \
3 16
/ \ \
2 4 19
/ \ / \
1 12 17 28
/ / \ \
0 11 14 31
I've been working on it for a long time now, but I just can't seem to get the spacing/indentation right. I know I wrote the proper algorithm for displaying the nodes in the proper order, but the slashes are just off. This is the result of my code as is: http://imgur.com/sz8l1
I know I'm so close to the answer, since my display is not that far off from what I need, and I have a feeling it's a really simple solution, but for some reason I just seem to get it right.

I'm out of time for now, but here's a quick version. I did not read your code (don't know C++), so I don't know how close our solutions are.
I changed the output format slightly. Instead of / for the left node, I used | so I didn't have to worry about left spacing at all.
15
| \
3 16
|\ \
2 4 19
| \ | \
1 | 17 28
| | \
0 12 31
| \
11 14
Here's the code. I hope you're able to take what you need from it. There are definitely some Pythonisms which I hope map to what you're using. The main idea is to treat each row of numbers as a map of position to node object, and at each level, sort the map by key and print them to the console iteratively based on their assigned position. Then generate a new map with positions relative to their parents in the previous level. If there's a collision, generate a fake node to bump the real node down a line.
from collections import namedtuple
# simple node representation. sorry for the mess, but it does represent the
# tree example you gave.
Node = namedtuple('Node', ('label', 'left', 'right'))
def makenode(n, left=None, right=None):
return Node(str(n), left, right)
root = makenode(
15,
makenode(
3,
makenode(2, makenode(1, makenode(0))),
makenode(4, None, makenode(12, makenode(11), makenode(14)))),
makenode(16, None, makenode(19, makenode(17),
makenode(28, None, makenode(31)))))
# takes a dict of {line position: node} and returns a list of lines to print
def print_levels(print_items, lines=None):
if lines is None:
lines = []
if not print_items:
return lines
# working position - where we are in the line
pos = 0
# line of text containing node labels
new_nodes_line = []
# line of text containing slashes
new_slashes_line = []
# args for recursive call
next_items = {}
# sort dictionary by key and put them in a list of pairs of (position,
# node)
sorted_pos_and_node = [
(k, print_items[k]) for k in sorted(print_items.keys())]
for position, node in sorted_pos_and_node:
# add leading whitespace
while len(new_nodes_line) < position:
new_nodes_line.append(' ')
while len(new_slashes_line) < position:
new_slashes_line.append(' ')
# update working position
pos = position
# add node label to string, as separate characters so list length
# matches string length
new_nodes_line.extend(list(node.label))
# add left child if any
if node.left is not None:
# if we're close to overlapping another node, push that node down
# by adding a parent with label '|' which will make it look like a
# line dropping down
for collision in [pos - i for i in range(3)]:
if collision in next_items:
next_items[collision] = makenode(
'|', next_items[collision])
# add the slash and the node to the appropriate places
new_slashes_line.append('|')
next_items[position] = node.left
else:
new_slashes_line.append(' ')
# update working position
len_num = len(node.label)
pos += len_num
# add some more whitespace
while len(new_slashes_line) < position + len_num:
new_slashes_line.append(' ')
# and take care of the right child
if node.right is not None:
new_slashes_line.append('\\')
next_items[position + len_num + 1] = node.right
else:
new_slashes_line.append(' ')
# concatenate each line's components and append them to the list
lines.append(''.join(new_nodes_line))
lines.append(''.join(new_slashes_line))
# do it again!
return print_levels(next_items, lines)
lines = print_levels({0: root})
print '\n'.join(lines)

Related

Sorting/Optimization problem with rows in a pandas dataframe [duplicate]

So if I was given a sorted list/array i.e. [1,6,8,15,40], the size of the array, and the requested number..
How would you find the minimum number of values required from that list to sum to the requested number?
For example given the array [1,6,8,15,40], I requested the number 23, it would take 2 values from the list (8 and 15) to equal 23. The function would then return 2 (# of values). Furthermore, there are an unlimited number of 1s in the array (so you the function will always return a value)
Any help is appreciated
The NP-complete subset-sum problem trivially reduces to your problem: given a set S of integers and a target value s, we construct set S' having values (n+1) xk for each xk in S and set the target equal to (n+1) s. If there's a subset of the original set S summing to s, then there will be a subset of size at most n in the new set summing to (n+1) s, and such a set cannot involve extra 1s. If there is no such subset, then the subset produced as an answer must contain at least n+1 elements since it needs enough 1s to get to a multiple of n+1.
So, the problem will not admit any polynomial-time solution without a revolution in computing. With that disclaimer out of the way, you can consider some pseudopolynomial-time solutions to the problem which work well in practice if the maximum size of the set is small.
Here's a Python algorithm that will do this:
import functools
S = [1, 6, 8, 15, 40] # must contain only positive integers
#functools.lru_cache(maxsize=None) # memoizing decorator
def min_subset(k, s):
# returns the minimum size of a subset of S[:k] summing to s, including any extra 1s needed to get there
best = s # use all ones
for i, j in enumerate(S[:k]):
if j <= s:
sz = min_subset(i, s-j)+1
if sz < best: best = sz
return best
print min_subset(len(S), 23) # prints 2
This is tractable even for fairly large lists (I tested a random list of n=50 elements), provided their values are bounded. With S = [random.randint(1, 500) for _ in xrange(50)], min_subset(len(S), 8489) takes less than 10 seconds to run.
There may be a simpler solution, but if your lists are sufficiently short, you can just try every set of values, i.e.:
1 --> Not 23
6 --> Not 23
...
1 + 6 = 7 --> Not 23
1 + 8 = 9 --> Not 23
...
1 + 40 = 41 --> Not 23
6 + 8 = 14 --> Not 23
...
8 + 15 = 23 --> Oh look, it's 23, and we added 2 values
If you know your list is sorted, you can skip some tests, since if 6 + 20 > 23, then there's no need to test 6 + 40.

Pandas not consistently skipping input number of rows for skiprows argument?

I am using Pandas to organize CSV files to later plot with matplotlib. First I create a Pandas dataframe to find the line containing 'Pt'. This is what I search for to use as my header line. header
Then I save the index of this line and apply it to the skiprow argument when creating the new dataframe which I will use.
Oddly, depending on the file format, even though the correct index is found, the wrong line shows up as the header. For example, note how in Pandas line 54 has 'Pt" right after the tab:
correct index on first file
The dataframe comes out correctly here.
correct dataframe on first file
For another file, line 44 is correctly recognized with having 'Pt'.
correct index on second file
But the dataframe includes line 43 as the header!
incorrect dataframe on second file
I have tried setting header=0, header=none. Am I missing something?
Here is the code
entire_df = pd.read_csv(file_path, header=None)
print(entire_df.head(60))
header_idx = -1
for index, row in entire_df.iterrows(): # find line with desired header
if any(row.str.contains('Pt')):
print("Yes! I have pt!")
print("Header index is: " + str(index))
print("row contains:")
print(entire_df.loc[[index]])
header_idx = index # correct index obtained!
break
df = pd.read_csv(file_path, delimiter='\t', skiprows=header_idx, header=0) # use line index to exclude extra information above
print(df.head())
Here are sections of the two files that give different results. They are saved as .dta files. I cannot share the entire files.
file1 (properly made dataframe)
FRAMEWORKVERSION QUANT 7.07 Framework Version
INSTRUMENTVERSION LABEL 4.32 Instrument Version
CURVE TABLE 16875
Pt T Vf Im Vu Pwr Sig Ach Temp IERange Over
# s V A V W V V deg C # bits
0 0.1 3.49916E+000 -1.40364E-002 0.00000E+000 -4.91157E-002 -4.22328E-001 0.00000E+000 1.41995E+003 11 ...........
1 0.2 3.49439E+000 -1.40305E-002 0.00000E+000 -4.90282E-002 -4.22322E-001 0.00000E+000 1.41995E+003 11 ...........
2 0.3 3.49147E+000 -1.40258E-002 0.00000E+000 -4.89705E-002 -4.22322E-001
file2 (dataframe with wrong header)
FRAMEWORKVERSION QUANT 7.07 Framework Version
INSTRUMENTVERSION LABEL 4.32 Instrument Version
CURVE TABLE 18
Pt T Vf Vm Ach Over Temp
# s V vs. Ref. V V bits deg C
0 2.00833 3.69429E+000 3.69429E+000 0.00000E+000 ........... 1419.95
1 4.01667 3.69428E+000 3.69352E+000 0.00000E+000 ........... 1419.95
2 6.025 3.69419E+000 3.69284E+000 0.00000E+000 ........... 1419.95
3 8.03333 3.69394E+000 3.69211E+000 0.00000E+000 ........... 1419.95
Help would be much appreciated.
You should pay attention to your indentation levels. Your code block in which you want to set the header_idx depending on your if any(row.str.contains('Pt')) condition has the same intendation level as the if statement, which means it is executed at each iteration of the for loop, and not just when the condition is met.
for index, row in entire_df.iterrows():
if any(row.str.contains('Pt')):
[...]
header_idx = index
Adapt the indentation like that to put the assignment under the control of the if statement:
for index, row in entire_df.iterrows():
if any(row.str.contains('Pt')):
[...]
header_idx = index

Binding a scalar to a sigilless variable (Perl 6)

Let me start by saying that I understand that what I'm asking about in the title is dubious practice (as explained here), but my lack of understanding concerns the syntax involved.
When I first tried to bind a scalar to a sigilless symbol, I did this:
my \a = $(3);
thinking that $(...) would package the Int 3 in a Scalar (as seemingly suggested in the documentation), which would then be bound to symbol a. This doesn't seem to work though: the Scalar is nowhere to be found (a.VAR.WHAT returns (Int), not (Scalar)).
In the above-referenced post, raiph mentions that the desired binding can be performed using a different syntax:
my \a = $ = 3;
which works. Given the result, I suspect that the statement can be phrased equivalently, though less concisely, as: my \a = (my $ = 3), which I could then understand.
That leaves the question: why does the attempt with $(...) not work, and what does it do instead?
What $(…) does is turn a value into an item.
(A value in a scalar variable ($a) also gets marked as being an item)
say flat (1,2, (3,4) );
# (1 2 3 4)
say flat (1,2, $((3,4)) );
# (1 2 (3 4))
say flat (1,2, item((3,4)) );
# (1 2 (3 4))
Basically it is there to prevent a value from flattening. The reason for its existence is that Perl 6 does not flatten lists as much as most other languages, and sometimes you need a little more control over flattening.
The following only sort-of does what you want it to do
my \a = $ = 3;
A bare $ is an anonymous state variable.
my \a = (state $) = 3;
The problem shows up when you run that same bit of code more than once.
sub foo ( $init ) {
my \a = $ = $init; # my \a = (state $) = $init;
(^10).map: {
sleep 0.1;
++a
}
}
.say for await (start foo(0)), (start foo(42));
# (43 44 45 46 47 48 49 50 51 52)
# (53 54 55 56 57 58 59 60 61 62)
# If foo(42) beat out foo(0) instead it would result in:
# (1 2 3 4 5 6 7 8 9 10)
# (11 12 13 14 15 16 17 18 19 20)
Note that variable is shared between calls.
The first Promise halts at the sleep call, and then the second sets the state variable before the first runs ++a.
If you use my $ instead, it now works properly.
sub foo ( $init ) {
my \a = my $ = $init;
(^10).map: {
sleep 0.1;
++a
}
}
.say for await (start foo(0)), (start foo(42));
# (1 2 3 4 5 6 7 8 9 10)
# (43 44 45 46 47 48 49 50 51 52)
The thing is that sigiless “variables” aren't really variables (they don't vary), they are more akin to lexically scoped (non)constants.
constant \foo = (1..10).pick; # only pick one value and never change it
say foo;
for ^5 {
my \foo = (1..10).pick; # pick a new one each time through
say foo;
}
Basically the whole point of them is to be as close as possible to referring to the value you assign to it. (Static Single Assignment)
# these work basically the same
-> \a {…}
-> \a is raw {…}
-> $a is raw {…}
# as do these
my \a = $i;
my \a := $i;
my $a := $i;
Note that above I wrote the following:
my \a = (state $) = 3;
Normally in the declaration of a state var, the assignment only happens the first time the code gets run. Bare $ doesn't have a declaration as such, so I had to prevent that behaviour by putting the declaration in parens.
# bare $
for (5 ... 1) {
my \a = $ = $_; # set each time through the loop
say a *= 2; # 15 12 9 6 3
}
# state in parens
for (5 ... 1) {
my \a = (state $) = $_; # set each time through the loop
say a *= 2; # 15 12 9 6 3
}
# normal state declaration
for (5 ... 1) {
my \a = state $ = $_; # set it only on the first time through the loop
say a *= 2; # 15 45 135 405 1215
}
Sigilless variables are not actually variables, they are more of an alias, that is, they are not containers but bind to the values they get on the right hand side.
my \a = $(3);
say a.WHAT; # OUTPUT: «(Int)␤»
say a.VAR.WHAT; # OUTPUT: «(Int)␤»
Here, by doing $(3) you are actually putting in scalar context what is already in scalar context:
my \a = 3; say a.WHAT; say a.VAR.WHAT; # OUTPUT: «(Int)␤(Int)␤»
However, the second form in your question does something different. You're binding to an anonymous variable, which is a container:
my \a = $ = 3;
say a.WHAT; # OUTPUT: «(Int)␤»
say a.VAR.WHAT;# OUTPUT: «(Scalar)␤»
In the first case, a was an alias for 3 (or $(3), which is the same); in the second, a is an alias for $, which is a container, whose value is 3. This last case is equivalent to:
my $anon = 3; say $anon.WHAT; say $anon.VAR.WHAT; # OUTPUT: «(Int)␤(Scalar)␤»
(If you have some suggestion on how to improve the documentation, I'd be happy to follow up on it)

Code folding in RStudio: Creating hierarchy in the code

I'm writing R scripts in RStudio and I use the code folding a lot. I found that you can see the hierarchy of the folding by pressing cmd + shift + O. This is super helpful.
# to my dear love ---------------------------------------------------------
2+2
# yo man ====
x.2 = function (x) {x+2}
### I do love potatoes ####
See the result by pressing cmd + shift + O.
I don't understand how this is working because when I write the code below, I can create a subsection without text but not when there is text in it (using # ==== but not # yo man ====).
# to my dear love ---------------------------------------------------------
2+2
# ====
# yo man ====
### I do love potatoes ####
x.2 = function (x) {x+2}
data = "here is some data"
See the result by pressing cmd + shift + O.
You can see that under # to my dear love --------------------------------------------------------- everything under is shifted to the right! This is cool!
The question is thus, how could it be possible to create a hierarchy of sections that include text in it?
Is it a peculiar package or Emac that is doing this? How can I create subsections, with text, and see the hierarchy in the cmd + shift + O box?
How can I down shift a section (going to a higher section (say section 2) to a lower section (section 1), by decreasing the visual hierarchy in the right box?
As per Chris's answer subheaders within functions
RStudio Code Folding hierarchy only works within function definitions and if-else structures. For example:
# Section 1 ----
a <- 1
testfunct1 <- function () {
# sect in function=====
b <- 2
c <- 3
}
# Section 2 #####
d <- 4
# Section 3 =======
e <- 5
testfunct2 <- function () {
# sect in function 2 =====
f <- 6
testsubfunct2_1 <- function () {
# sect in subfunction 2_1 -----
if (a == 1) {
# section in if ----
g < 7
} else {
# section in else ----
h = 8
}
}
}
# Section 4 ####
j <- 9
Produces this outline:
I don't know why the if-else section labels do not line up.
Just discovered I could use various special characters across the top of my keyboard in combination with hyphens to produce a hierarchical look to the code section ToC. I chose the asterisk for this example, but you can use anything from the special character keys across the top to produce this look.

gnuplot store one number from data file into variable

OSX v10.6.8 and Gnuplot v4.4
I have a data file with 8 columns. I would like to take the first value from the 6th column and make it the title. Here's what I have so far:
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
K = #read in data here
graph(n) = sprintf("K=%.2e",n)
set term aqua enhanced font "Times-Roman,18"
plot file using 1:3 title graph(K)
And here is what the first few rows of my data file looks like:
1.00e-07 1.00e-07 1.00e+00 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
1.11e-06 1.00e-07 9.02e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
2.12e-06 1.00e-07 4.72e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
3.13e-06 1.00e-07 3.20e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12090.00
I don't know how to correctly read in the data or if this is even the right way to go about this.
EDIT #1
Ok, thanks to mgilson I now have
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
set term aqua enhanced font "Times-Roman,18"
K = "`head -1 datafile | awk '{print $6}'`"
print K+0
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
but I get the error: Non-numeric string found where a numeric expression was expected
EDIT #2
file = "testPlot.txt"
K = "`head -1 file | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number #this is line 9
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
This gives the error--> head: file: No such file or directory
"testPlot.gnu", line 9: Non-numeric string found where a numeric expression was expected
You have a few options...
FIRST OPTION:
use columnheader
plot file using 1:3 title columnheader(6)
I haven't tested it, but this may prevent the first row from actually being plotted.
SECOND OPTION:
use an external utility to get the title:
TITLE="`head -1 datafile | awk '{print $6}'`"
plot 'datafile' using 1:3 title TITLE
If the variable is numeric, and you want to reformat it, in gnuplot, you can cast strings to a numeric type (integer/float) by adding 0 to them (e.g).
print "36.5"+0
Then you can format it with sprintf or gprintf as you're already doing.
It's weird that there is no float function. (int will work if you want to cast to an integer).
EDIT
The script below worked for me (when I pasted your example data into a file called "datafile"):
K = "`head -1 datafile | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number
graph(n) = sprintf("K=%.2e",n)
plot "datafile" using 1:3 title graph(K)
EDIT 2 (addresses comments below)
To expand a variable in backtics, you'll need macros:
set macro
file="mydatafile.txt"
#THE ORDER OF QUOTES (' and ") IS CRUCIAL HERE.
cmd='"`head -1 ' . file . ' | awk ''{print $6}''`"'
# . is string concatenation. (this string has 3 pieces)
# to get a single quote inside a single quoted string
# you need to double. e.g. 'a''b' yields the string a'b
data=#cmd
To address your question 2, it is a good idea to familiarize yourself with shell utilities -- sed and awk can both do it. I'll show a combination of head/tail:
cmd='"`head -2 ' . file . ' | tail -1 | awk ''{print $6}''`"'
should work.
EDIT 3
I recently learned that in gnuplot, system is a function as well as a command. To do the above without all the backtic gymnastics,
data=system("head -1 " . file . " | awk '{print $6}'")
Wow, much better.
This is a very old question, but here's a nice way to get access to a single value anywhere in your data file and save it as a gnuplot-accessible variable:
set term unknown #This terminal will not attempt to plot anything
plot 'myfile.dat' index 0 every 1:1:0:0:0:0 u (var=$1):1
The index number allows you to address a particular dataset (separated by two carriage returns), while every allows you to specify a particular line.
The colon-separated numbers after every should be of the form 1:1:<line_number>:<block_number>:<line_number>:<block_number>, where the line number is the line with the the block (starting from 0), and the block number is the number of the block (separated by a single carriage return, again starting from 0). The first and second numbers say plot every 1 lines and every one data block, and the third and fourth say start from line <line_number> and block <block_number>. The fifth and sixth say where to stop. This allows you to select a single line anywhere in your data file.
The last part of the plot command assigns the value in a particular column (in this case, column 1) to your variable (var). There needs to be two values to a plot command, so I chose column 1 to plot against my variable assignment statement.
Here is a less 'awk'-ward solution which assigns the value from the first row and 6th column of the file 'Data.txt' to the variable x16.
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
plot 'Data.txt' u 0:($0==0?(x16=$6):$6)
unset table
A more general example for storing several values is given below:
# Load data from file to variable
# Gnuplot can only access the data via the "plot" command
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
# Example: Assign all values according to: xij = Data33[i,j]; i,j = 1,2,3
plot 'Data33.txt' u 0:($0==0?(x11=$1):$1),\
'' u 0:($0==0?(x12=$2):$2),\
'' u 0:($0==0?(x13=$3):$3),\
'' u 0:($0==1?(x21=$1):$1),\
'' u 0:($0==1?(x22=$2):$2),\
'' u 0:($0==1?(x23=$3):$3),\
'' u 0:($0==2?(x31=$1):$1),\
'' u 0:($0==2?(x32=$2):$2),\
'' u 0:($0==2?(x33=$3):$3)
unset table
print x11, x12, x13 # Data from first row
print x21, x22, x23 # Data from second row
print x31, x32, x33 # Data from third row