Python User defined exception - input

Write a program that receives 5 integers and stores them and their cubes in a dictionary. If the number entered is less than 3, raise a user-defined exception NumberTooSmall, and if the number entered is more than 30, then raise a user-defined exception NumberTooBig. Regardless of whether an exception occurs, print the dictionary’s contents at the end.
Sample Input:
4
5
6
7
77
Sample Output:
{'Number too big': 77}
4 : 64
5 : 125
6 : 216
7 : 343
...///////////////////

Related

Deflate: code lengths of > 7 bits for top-level HCLEN?

RFC 1951 specifies that the first level of encoding in a block contains HCLEN 3-bit values, which encode the lengths of the next level of Huffman codes. Since these are 3-bit values, it follows that no code for the next level can be longer than 7 bits (111 in binary).
However, there seem to be corner cases which (at least with the "classical" algorithm to build Huffman codes, using a priority queue) apparently generate codes of 8 bits, which can of course not be encoded.
An example I came up with is the following (this represents the 19 possible symbols resulting from the RLE encoding, 0-15 plus 16, 17 and 18):
symbol | frequency
-------+----------
0 | 15
1 | 14
2 | 6
3 | 2
4 | 18
5 | 5
6 | 12
7 | 26
8 | 3
9 | 20
10 | 79
11 | 94
12 | 17
13 | 7
14 | 8
15 | 4
16 | 16
17 | 1
18 | 13
According to various online calculators (eg https://people.ok.ubc.ca/ylucet/DS/Huffman.html), and also building the tree by hand, some symbols in the above table (namely 3 and 17) produce 8-bit long Huffman codes. The resulting tree looks ok to me, with 19 leaf nodes and 18 internal nodes.
So, is there a special way to calculate Huffman codes for use in DEFLATE?
Yes. deflate uses length-limited Huffman codes. You need either a modified Huffman algorithm that limits the length, or an algorithm that shortens a Huffman code that has exceeded the length. (zlib does the latter.)
In addition to the code lengths code being limited to seven bits, the literal/length and distance codes are limited to 15 bits. It is not at all uncommon to exceed those limits when applying Huffman's algorithm to sets of frequencies encountered during compression.
Though your example is not a valid or possible set of frequencies for that code. Here is a valid example that results in a 9-bit Huffman code, which would then need to be squashed down to seven bits:
3 0 0 5 5 1 9 31 58 73 59 28 9 1 2 0 6 0 0

Why does Pandas df.mode() return a zero before the actual modal value?

When I run df.mode() on the below dataframe I get a leading zero before the expected output. Why is that?
df
sample 1 2 3 4 5 6 7 8 9 10
zone run
2 5 14 12 22 23 24 22 23 22 23 23
print(df.iloc[:,3:10].mode(axis=1)))
gives
0
zone run
2 5 23
expecting
zone run
2 5 23
pd.Series.mode
Return the mode(s) of the dataset. Always returns Series even if only one value is returned.
So that's how it is by design. A Series must have an index and it will start counting from 0. This ensures that the return type is stable regardless of whether there is only a single mode or multiple values tied for the mode.
So if you take a slice where values are tied for the mode, your return is a Series where the numbers 0, ...N are indicators for the N values tied for the mode (modal values in sorted order).
df.iloc[:, 4:7]
#sample 5 6 7
#zone run
#2 5 24 22 23
df.iloc[:,4:7].mode(axis=1)
# 0 1 2 # <- 3 values tied for mode so 3 labels
#zone run
#2 5 22 23 24
My thinking is, df.mode returns a dataframe. By default, dataframes if no column values are given allocates indices as column names. In this case,0 is allocated because that is how pandas/python begins count.
Because it is a dataframe, the only way to change the column name which in this case is an index is to apply the .rename(columnn) method. Hence, to get what you need you will have to;
df1.iloc[:,3:10].agg('mode', axis=1).reset_index().rename(columns={0:''})
zone run
0 2 5 23

How to get the mode of a column in pandas where there are few of the same mode values pandas

I have a data frame and i'd like to get the mode of a specific column.
i'm using:
freq_mode = df.mode()['my_col'][0]
However I get the error:
ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index my_col')
I'm guessing it's because I have few mode that are the same.
I need any of the mode, it doesn't matter. How can I use any() to get any of the mode existed?
For me your code working nice with sample data.
If necessary select first value of Series from mode use:
freq_mode = df['my_col'].mode().iat[0]
We can see the one column
df=pd.DataFrame({"A":[14,4,5,4,1,5],
"B":[5,2,54,3,2,7],
"C":[20,20,7,3,8,7],
"train_label":[7,7,6,6,6,7]})
X=df['train_label'].mode()
print(X)
DataFrame
A B C train_label
0 14 5 20 7
1 4 2 20 7
2 5 54 7 6
3 4 3 3 6
4 1 2 8 6
5 5 7 7 7
Output
0 6
1 7
dtype: int64

How to Create a CDF out of a PDF in SQL

So I have a datatable that looks something like that following. ID represents an object, bin represents how I am segmenting the data, and percent is how much of a data falls into that bin.
id bin percent
2 8 0.20030698388
2 16 0.14504988488
2 24 0.12356101304
2 32 0.09976976208
2 40 0.09056024558
2 48 0.07137375287
2 56 0.04067536454
2 64 0.03914044512
2 72 0.02916346891
2 80 0.16039907904
3 8 0.36316695352
3 16 0.03958691910
3 24 0.11876075731
3 32 0.13253012048
3 40 0.03098106712
3 48 0.07228915662
3 56 0.07745266781
3 64 0.02581755593
3 72 0.02065404475
3 80 0.11876075731
I am looking for a function to turn this dataset into a cdf partitioning id. I have tried cume_dist and percent_rank, but they do not appear to work.
I am facing a similar problem and found this great tutorial for doing exactly that:
https://dwaincsql.com/2015/05/14/excel-in-t-sql-part-2-the-normal-distribution-norm-dist-density-functions/
It tries to rebuild the Excel function NORM.DIST function which gives you either the PDF if you set the cummulative flag as FALSE and the CDF if you set it as TRUE. I assumed that CUME_DIST would do the exact same thing in SQL. However, it turns out that the latter distributes by counting the elements whereas Excel uses the relative differences in the values.

Why are some of my ranges insane?

I tried parsing a common string depiction of ranges (e.g. 1-9) into actual ranges (e.g. 1 .. 9), but often got weird results when including two digit numbers. For example, 1-10 results in the single value 1 instead of a list of ten values and 11-20 gave me four values (11 10 21 20), half of which aren't even in the expected numerical range:
put get_range_for('1-9');
put get_range_for('1-10');
put get_range_for('11-20');
sub get_range_for ( $string ) {
my ($start, $stop) = $string.split('-');
my #values = ($start .. $stop).flat;
return #values;
}
This prints:
1 2 3 4 5 6 7 8 9
1
11 10 21 20
Instead of the expected:
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
(I figured this out before posting this question, so I have answered below. Feel free to add your own answer if you'd like to elaborate).
The problem is indeed that .split returns Str rather than Int, which the original answer solves. However, I would rather implement my "get_range_for" like this:
sub get_range_for($string) {
Range.new( |$string.split("-")>>.Int )
}
This would return a Range object rather than an Array. But for iteration (which is what you most likely would use this for), this wouldn't make any difference. Also, for larger ranges the other implementation of "get_range_for" could potentially eat a lot of memory because it vivifies the Range into an Array. This doesn't matter much for "3-10", but it would for "1-10000000".
Note that this implementation uses >>.Int to call the Int method on all values returned from the .split, and then slips them as separate parameters with | to Range.new. This will then also bomb should the .split return 1 value (if it couldn't split) or more than 2 values (if multiple hyphens occurred in the string).
The result of split is a Str, so you are accidentally creating a range of strings instead of a range of integers. Try converting $start and $stop to Int before creating the range:
put get_range_for('1-9');
put get_range_for('1-10');
put get_range_for('11-20');
sub get_range_for ( $string ) {
my ($start, $stop) = $string.split('-');
my #values = ($start.Int .. $stop.Int).flat; # Simply added .Int here
return #values;
}
Giving you what you expect:
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20