overlapping and enumerate in python - numpy

Given two files
We need to find of all the number overlapping the data from both the file which are prime.
For check the prime number we need to develop a function called check_prime and use the same.
My code :
import math
def is_prime(num):
if num == 1:
return False
if num == 2:
return True
for i in range(2,int(math.sqrt(num))+1):
if num % i == 0:
return False
return True
one = []
theFile = open("One.txt", "r")
array = []
for val in theFile:
array.append(val)
print(array)
theFile = open("Two.txt", "r")
array1 = []
for val in theFile:
array1.append(val)
print(array1)
for i in array:
one.append(i)
print(one)

You are almost there but here are the missing bits in your code:
1) Reading from the files
To avoid writing twice the same code to open both files, and to handle more than two files, we can loop through the file names instead of opening each one separately
So instead of:
theFile = open("One.txt", "r")
#[...]
theFile = open("Two.txt", "r")
We could use:
file_names = ['One.txt', 'Two.txt']
for i in file_names:
theFile = open(i, "r")
2) Extracting the numbers from the files
Then you extract the values in the text file. The list of numbers in each file gets imported as a list containing a string with numbers in it.
So there are 2 things we need to do:
1) extract the string from the list
2) read each string number in the list separated by commas.
If you do:
for val in theFile:
array.append(val)
You will only append one list containing one string to your array.
In your code, you create two lists: array and array1 but then only loop through the array list which puts in your one list only the data from the array list, not using array1 at all. Nothing to worry about, I also get confused sometimes between array[1] and array1 if I name several lists ending in 1,2,3.
So instead we could do:
for val in theFile:
array = array + val.split(",")
We use + because we want all the number-strings in one single list and not one list containing several lists (you can try to replace this by: array = array.append(val.split(",")) and you'll see you get a list containing lists but what we want is all number-strings from all files in one single list so better to concatenate the elements in the lists into one single list.
Now that you have your array list that contains all string-numbers from your text files, you need to transform them into integers so you can run your excellent is_prime function.
So we create a second list that I've called array2 where we will store the string-numbers as integers and not as strings.
The final output that you want is a list of the unique prime numbers in both text files, so we check that the number is not already in array2 before appending it.
for nbrs in array:
if int(nbrs) not in array2:
array2.append(int(nbrs))
Almost there! You've already done the rest of the work from there on:
You need to pass all the unique numbers in array2 to your is_prime function to check whether they are prime or not.
We store the result of the is_prime function (True or False) into the list is_nbr_prime.
is_nbr_prime = []
for i in array2:
is_nbr_prime.append(is_prime(i))
Now, because you want to return the number themselves, we need to find the indexes of the prime numbers to extract them from array2, which are the indexes of the True values in is_nbr_prime:
idx = [i for i, val in enumerate(is_nbr_prime) if val] #we get the index of the values that are True in is_nbr_prime list
unique_prime_nbrs = [array2[i] for i in idx] # we pass the index to array2 containing the list of unique numbers to take out only prime numbers.
That's it, you have your unique prime numbers in the list unique_prime_nbrs .
If we put all the steps together into two functions, the final code is:
def is_prime(num):
if num == 1:
return False
if num == 2:
return True
for i in range(2,int(math.sqrt(num))+1):
if num % i == 0:
return False
return True
def check_prime(file_names):
array = []
array2 = []
for i in file_names:
theFile = open(i, "r")
for val in theFile:
array = array + val.split(",")
for nbrs in array:
if int(nbrs) not in array2:
array2.append(int(nbrs))
is_nbr_prime = []
for i in array2:
is_nbr_prime.append(is_prime(i))
idx = [i for i, val in enumerate(is_nbr_prime) if val]
unique_prime_nbrs = [array2[i] for i in idx]
return unique_prime_nbrs
To call the function, we need to pass a list of file names, for instance:
file_names = ['One.txt', 'Two.txt']
unique_prime_nbrs = check_prime(file_names)
print(unique_prime_nbrs)
[5, 7, 13, 17, 19, 23]

There is a bunch of stuff you need to do:
When reading the text from the input files, convert it to integers before storing anywhere.
Instead of making array a list, make it a set. This will enable testing membership in much shorter time.
Before storing a values from the first file in array, check if it is a prime, using the is_prime function you wrote.
When reading the integers from the second file, before adding the values to array1, test if they are already in array. No need to heck for prime-ness, because array would already contain only primes.
Finally, before outputting the values from array1 you would need to convert them back to strings, and use the join string method to join them with separating commas.
So, get to it.

Related

How do you store and load a single element numpy array?

When I store and load a numpy array I have no issues unless the array only has a single element. When this happens, I am able to store and retrieve it, but the resulting type is not an array.
I was expecting that I would be able to retrieve the single element as I do multiple element arrays
import numpy as np
# set up the lists
listW = ["The Dog","The Cat"]
list = ["The Pig"]
# convert lists to arrays
arrayW = np.array(listW)
array = np.array(list)
# Displaying the original array
print('ArrayW:', arrayW)
print('Array:', array)
print('ArrayW[0]:', arrayW[0])
print('Array[0]:', array[0])
# storage files
fileW = "C:\\Test\\testW"
file = "C:\\Test\\test"
# Saving the original array
np.savetxt(fileW, arrayW, fmt='%s', delimiter=',')
np.savetxt(file, array, fmt='%s', delimiter=',')
# Loading the array from the saved files
testW = np.loadtxt(fileW, dtype=object, delimiter=',')
test = np.loadtxt(file, dtype=object, delimiter=',')
# print out results
print("testW:", testW)
print("test:", test)
print("testW[0]:", testW[0])
print("test[0]:", test[0])
When you run this, you get the following output:
ArrayW: ['The Dog' 'The Cat']
Array: ['The Pig']
ArrayW[0]: The Dog
Array[0]: The Pig
testW: ['The Dog' 'The Cat']
test: The Pig
testW[0]: The Dog
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
The error is due to the fact that the read in value 'test' is not an array. This works if the stored array has more than 1 value.

Change number format in Excel using names of headers - openpyxl [duplicate]

I have an Excel (.xlsx) file that I'm trying to parse, row by row. I have a header (first row) that has a bunch of column titles like School, First Name, Last Name, Email, etc.
When I loop through each row, I want to be able to say something like:
row['School']
and get back the value of the cell in the current row and the column with 'School' as its title.
I've looked through the OpenPyXL docs but can't seem to find anything terribly helpful.
Any suggestions?
I'm not incredibly familiar with OpenPyXL, but as far as I can tell it doesn't have any kind of dict reader/iterator helper. However, it's fairly easy to iterate over the worksheet rows, as well as to create a dict from two lists of values.
def iter_worksheet(worksheet):
# It's necessary to get a reference to the generator, as
# `worksheet.rows` returns a new iterator on each access.
rows = worksheet.rows
# Get the header values as keys and move the iterator to the next item
keys = [c.value for c in next(rows)]
for row in rows:
values = [c.value for c in row]
yield dict(zip(keys, values))
Excel sheets are far more flexible than CSV files so it makes little sense to have something like DictReader.
Just create an auxiliary dictionary from the relevant column titles.
If you have columns like "School", "First Name", "Last Name", "EMail" you can create the dictionary like this.
keys = dict((value, idx) for (idx, value) in enumerate(values))
for row in ws.rows[1:]:
school = row[keys['School'].value
I wrote DictReader based on openpyxl. Save the second listing to file 'excel.py' and use it as csv.DictReader. See usage example in the first listing.
with open('example01.xlsx', 'rb') as source_data:
from excel import DictReader
for row in DictReader(source_data, sheet_index=0):
print(row)
excel.py:
__all__ = ['DictReader']
from openpyxl import load_workbook
from openpyxl.cell import Cell
Cell.__init__.__defaults__ = (None, None, '', None) # Change the default value for the Cell from None to `` the same way as in csv.DictReader
class DictReader(object):
def __init__(self, f, sheet_index,
fieldnames=None, restkey=None, restval=None):
self._fieldnames = fieldnames # list of keys for the dict
self.restkey = restkey # key to catch long rows
self.restval = restval # default value for short rows
self.reader = load_workbook(f, data_only=True).worksheets[sheet_index].iter_rows(values_only=True)
self.line_num = 0
def __iter__(self):
return self
#property
def fieldnames(self):
if self._fieldnames is None:
try:
self._fieldnames = next(self.reader)
self.line_num += 1
except StopIteration:
pass
return self._fieldnames
#fieldnames.setter
def fieldnames(self, value):
self._fieldnames = value
def __next__(self):
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = next(self.reader)
self.line_num += 1
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == ():
row = next(self.reader)
d = dict(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
The following seems to work for me.
header = True
headings = []
for row in ws.rows:
if header:
for cell in row:
headings.append(cell.value)
header = False
continue
rowData = dict(zip(headings, row))
wantedValue = rowData['myHeading'].value
I was running into the same issue as described above. Therefore I created a simple extension called openpyxl-dictreader that can be installed through pip. It is very similar to the suggestion made by #viktor earlier in this thread.
The package is largely based on source code of Python's native csv.DictReader class. It allows you to select items based on column names using openpyxl. For example:
import openpyxl_dictreader
reader = openpyxl_dictreader.DictReader("names.xlsx", "Sheet1")
for row in reader:
print(row["First Name"], row["Last Name"])
Putting this here for reference.

Loop over Pandas dataframe to populate list (Python)

I have the following dataframe:
import pandas as pd
action = ['include','exclude','ignore','include', 'exclude', 'exclude','ignore']
names = ['john','michael','joshua','peter','jackson','john', 'erick']
df = pd.DataFrame(list(zip(action,names)), columns = ['action','names'])
I also have a list of starting participants like this:
participants = [['michael','jackson','jeremiah','martin','luis']]
I want to iterate over df['action']. If df['action'] == 'include', add another list to the participants list that includes all previous names and the one in df['names']. So, after the first iteration, participants list should look like this:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john']]
I have managed to achieve this with the following code (I donĀ“t know if this part could be improved, although it is not my question):
for i, row in df.iterrows():
if df.at[i,'action'] == 'include':
person = [df.at[i,'names']]
old_list = participants[-1]
new_list = old_list + person
participants.append(new_list)
else:
pass
The main problem (and my question is), how do I accomplish the same but removing the name when df['action'] == 'exclude'? So, after the second iteration, I should have this list in participants:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john'],['jackson','jeremiah','martin','luis','john']]
You can just add a elif to your code. With the remove method you can remove a item by value. Just be careful your person is a list and not a string. I just call it by index with [0].
elif df.at[i, 'action'] == 'exclude':
person = [df.at[i, 'names']]
participants.append(participants[-1].remove(person[0]))

Extract array elements from another array indices

I have a numpy array, a:
a = np.array([[-21.78878256, 97.37484004, -11.54228119],
[ -5.72592375, 99.04189958, 3.22814204],
[-19.80795922, 95.99377136, -10.64537733]])
I have another array, b:
b = np.array([[ 54.64642121, 64.5172014, 44.39991983],
[ 9.62420892, 95.14361441, 0.67014312],
[ 49.55036427, 66.25136632, 40.38778238]])
I want to extract minimum value indices from the array, b.
ixs = [[2],
[2],
[2]]
Then, want to extract elements from the array, a using the indices, ixs:
The expected answer is:
result = [[-11.54228119]
[3.22814204]
[-10.64537733]]
I tried as:
ixs = np.argmin(b, axis=1)
print ixs
[2,2,2]
result = np.take(a, ixs)
print result
Nope!
Any ideas are welcomed
You can use
result = a[np.arange(a.shape[0]), ixs]
np.arange will generate indices for each row and ixs will have indices for each column. So effectively result will have required result.
You can try using below code
np.take(a, ixs, axis = 1)[:,0]
The initial section will create a 3 by 3 array and slice the first column
>>> np.take(a, ixs, axis = 1)
array([[-11.54228119, -11.54228119, -11.54228119],
[ 3.22814204, 3.22814204, 3.22814204],
[-10.64537733, -10.64537733, -10.64537733]])

structure of while loop

I am trying to use a while loop in Python to provide an error message while there is a backslash character in the user input. The code provides the error message when a fraction is input and requests a second input. The problem occurs when the second input differs in length from the original input and I do not know how to fix this since my knowledge of Python is limited. Any help is appreciated!
size = getInput('Size(in): ')
charcount = len(size)
for i in range(0,charcount):
if size[i] == '/':
while size[i] == '/':
getWarningReply('Please enter size as a decimal', 'OKAY')
size = getInput('Size(in): ')
elif size[i] == '.':
#Convert size input from string to list, then back to string because strings are immutable whereas lists are not
sizechars = list(size)
sizechars[i] = 'P'
size = "".join(sizechars)
That is not a good way to go about doing what you want because if the length of the new size is shorter than the original length, charcount, then you can easily go out of range.
I'm by no means a Python master, but an easily better way to do this is to wrap the entire thing in a while loop instead of nesting a while loop within the for loop:
not_decimal = True
while not_decimal:
found_slash = False
size = getInput('Size(in): ')
charcount = len(size)
for i in range(0, charcount):
if size[i] == '/':
print 'Please enter size as a decimal.'
found_slash = True
break
elif size[i] == '.':
#Convert size input from string to list, then back to string because strings are immutable whereas lists are not
sizechars = list(size)
sizechars[i] = 'P'
size = "".join(sizechars)
if not found_slash:
not_decimal = False