Checking errors in my program - input

I'm trying to make some changes to my dictionary counter in python. I want make some changes to my current counter, but not making any progress so far. I want my code to show the number of different words.
This is what I have so far:
# import sys module in order to access command line arguments later
import sys
# create an empty dictionary
dicWordCount = {}
# read all words from the file and put them into
#'dicWordCount' one by one,
# then count the occurance of each word

you can use the Count function from collections lib:
from collections import Counter
q = Counter(fileSource.read().split())
total = sum(q.values())

First, your first problem, add a variable for the word count and one for the different words. So wordCount = 0 and differentWords = 0. In the loop for your file reading put wordCount += 1 at the top, and in your first if statement put differentWords += 1. You can print these variables out at the end of the program as well.
The second problem, in your printing, add the if statement, if len(strKey)>4:.
If you want a full example code here it is.
import sys
fileSource = open(sys.argv[1], "rt")
dicWordCount = {}
wordCount = 0
differentWords = 0
for strWord in fileSource.read().split():
wordCount += 1
if strWord not in dicWordCount:
dicWordCount[strWord] = 1
differentWords += 1
else:
dicWordCount[strWord] += 1
for strKey in sorted(dicWordCount, key=dicWordCount.get, reverse=True):
if len(strKey) > 4: # if the words length is greater than four.
print(strKey, dicWordCount[strKey])
print("Total words: %s\nDifferent Words: %s" % (wordCount, differentWords))

For your first qs, you can use set to help you count the number of different words. (Assume there is a space between every two words)
str = 'apple boy cat dog elephant fox'
different_word_count = len(set(str.split(' ')))
For your second qs, using a dictionary to help you record the word_count is ok.

How about this?
#gives unique words count
unique_words = len(dicWordCount)
total_words = 0
for k, v in dicWordCount.items():
total_words += v
#gives total word count
print(total_words)
You don't need a separate variable for counting word counts since you're using dictionary, and to count the total words, you just need to add the values of the keys(which are just counts)

Related

How do you detect blank lines in Fortran?

Given an input that looks like the following:
123
456
789
42
23
1337
3117
I want to iterate over this file in whitespace-separated chunks in Fortran (any version is fine). For example, let's say I wanted to take the average of each chunk (e.g. mean(123, 456, 789) then mean(42, 23, 1337) then mean(31337)).
I've tried iterating through the file normally (e.g. READ), reading in each line as a string and then converting to an int and doing whatever math I want to do on each chunk. The trouble here is that Fortran "helpfully" ignores blank lines in my text file - so when I try and compare against the empty string to check for the blank line, I never actually get a .True. on that comparison.
I feel like I'm missing something basic here, since this is a typical functionality in every other modern language, I'd be surprised if Fortran didn't somehow have it.
If you're using so-called "list-directed" input (format = '*'), Fortran does special handling to spaces, commas, and blank lines.
To your point, there's a feature which is using the BLANK keyword with read
read(iunit,'(i10)',blank="ZERO",err=1,end=2) array
You can set:
blank="ZERO" will return a valid zero value if a blank is found;
blank="NULL" is the default behavior that skips blank/returns an error depending on the input format.
If all your input values are positive, you could use blank="ZERO" and then use the location of zero values to process your data.
EDIT as #vladimir-f has correctly pointed out, you not only have blanks in between lines, but also after the end of the numbers in most lines, so this strategy will not work.
You can instead load everything into an array, and process it afterwards:
program array_with_blanks
integer :: ierr,num,iunit
integer, allocatable :: array(:)
open(newunit=iunit,file='stackoverflow',form='formatted',iostat=ierr)
allocate(array(0))
do
read(iunit,'(i10)',iostat=ierr) num
if (is_iostat_end(ierr)) then
exit
else
array = [array,num]
endif
end do
close(iunit)
print *, array
end program
Just read each line as a character (but note Francescalus's comment on the format). Then read the character as an internal file.
program stuff
implicit none
integer io, n, value, sum
character (len=1000) line
n = 0
sum = 0
io = 0
open( 42, file="stuff.txt" )
do while( io == 0 )
read( 42, "( a )", iostat = io ) line
if ( io /= 0 .or. line == "" ) then
if ( n > 0 ) print *, ( sum + 0.0 ) / n
n = 0
sum = 0
else
read( line, * ) value
n = n + 1
sum = sum + value
end if
end do
close( 42 )
end program stuff
456.000000
467.333344
3117.00000

Confused Beginner learning Python

I am working on a problem in Python and don't understand the answer.
for number in range(1, 10):
if number % 2 == 0:
print(number)
The answer to this problem is 2,4,6,8
Can anyone explain this answer?
range is a function in python which generates a sequence of integers, for example:
r=range(3)
returns a iterable object range(0,3) which generates sequence of integers from 0 to 3-1(2),inorder for you to see the elements in it , you can loop through it:
for i in r:
print(i)
#prints number from 0 to 3-1
Or, wrap it in a list:
list(range(3)) //returns [0,1,2]
range can take 3 params as input start,end and optionally step.The parameters start and end are basically lower and upper bounds to the sequence.In the above example since we have given only one integer range considers start as 0 and end as 3. This function range(start,end,[step]) generates integers in the following manner: start,start+1....end-1 considering the above example 0,0+1...3-1
if you give both the start and the end params to the range, the function generates integers from start upto but not including end, Example:
for i in range(3,8):print(i) #prints numbers from 3 to 8-1
if you give the third parameter which is the step(which is usually 1 by default), then range adds that number to the sequence :
list(range(3,8)) or list(range(3,8,1)) # will return [3,4,5,6,7],sequence generation will be like:3,3+1,(3+1)+1...
list(range(3,8,2)) #returns [3,5,7];3,3+2,(3+2)+2....
So , coming to your question now :
for number in range(1, 10): if number % 2 == 0: print(number)
In the above code you are basically telling python to loop over the sequence of integeres between 1 to 9 and print the numbers which are divisible by 2,which prints 2,4,6,8.
Hope this helped you :)

Iterate on OrientRecord object

I am trying to increment twice in a loop and print the OrientRecord Objects using Python.
Following is my code -
for items in iteritems:
x = items.oRecordData
print (x['attribute1'])
y=(next(items)).oRecordData #Here is the error
print (y['attribute2'])
Here, iteritems is a list of OrientRecord objects. I have to print attributes of two consecutive objects in one loop.
I am getting the following error -
TypeError: 'OrientRecord' object is not an iterator
Try using a different approach to it:
for i in range(0,len(iteritems),2):
x = iteritems[i].oRecordData
print (x['attribute1'])
y = iteritems[i+1].oRecordData
print (y['attribute2'])
The range() function will start from 0 and iterate by 2 steps.
However, this will work properly only if the total amount (range) of records is an even number, otherwise it'll return:
IndexError: list index out of range
I hope this helps.

Python while loop index error

I've made a reverse function, it reverses the sentence, however it generates index error.
what the program does is append the last word from s and puts it into rev[],
it then deletes the word s[-1].
s = "This is awesome"
def Reverse1(s):
s = s.split(" ") #reverses the word instead of letters
rev = []
while True:
rev.append (s[-1])
del s[-1]
print (rev)
return
reverse1(s)
its returning index error as it tries to continue when s is empty
so I think its the while loop statement.
any ideas?
You need to stop the while loop, you can use something like this
while n in range(len(s)):

How to load 2D array from a text(csv) file into Octave?

Consider the following text(csv) file:
1, Some text
2, More text
3, Text with comma, more text
How to load the data into a 2D array in Octave? The number can go into the first column, and all text to the right of the first comma (including other commas) goes into the second text column.
If necessary, I can replace the first comma with a different delimiter character.
AFAIK you cannot put stings of different size into an array. You need to create a so called cell array.
A possible way to read the data from your question stored in a file Test.txt into a cell array is
t1 = textread("Test.txt", "%s", "delimiter", "\n");
for i = 1:length(t1)
j = findstr(t1{i}, ",")(1);
T{i,1} = t1{i}(1:j - 1);
T{i,2} = strtrim(t1{i}(j + 1:end));
end
Now
T{3,1} gives you 3 and
T{3,2} gives you Text with comma, more text.
After many long hours of searching and debugging, here's how I got it to work on Octave 3.2.4. Using | as the delimiter (instead of comma).
The data file now looks like:
1|Some text
2|More text
3|Text with comma, more text
Here's how to call it: data = load_data('data/data_file.csv', NUMBER_OF_LINES);
Limitation: You need to know how many lines you want to get. If you want to get all, then you will need to write a function to count the number of lines in the file in order to initialize the cell_array. It's all very clunky and primitive. So much for "high level languages like Octave".
Note: After the unpleasant exercise of getting this to work, it seems that Octave is not very useful unless you enjoy wasting your time writing code to do the simplest things. Better choices seems to be R, Python, or C#/Java with a Machine Learning or Matrix library.
function all_messages = load_data(filename, NUMBER_OF_LINES)
fid = fopen(filename, "r");
all_messages = cell (NUMBER_OF_LINES, 2 );
counter = 1;
line = fgetl(fid);
while line != -1
separator_index = index(line, '|');
all_messages {counter, 1} = substr(line, 1, separator_index - 1); % Up to the separator
all_messages {counter, 2} = substr(line, separator_index + 1, length(line) - separator_index); % After the separator
counter++;
line = fgetl(fid);
endwhile
fprintf("Processed %i lines.\n", counter -1);
fclose(fid);
end