Jython CSV connection - jython

Hi
I have a code as given below
def value():
file=open('C:/Documents and Settings/242481/My Documents/file.csv','r')
for line in file:
return "<TOTAL>"+line+"</TOTAL>"
when i execute the script only the first row of the csv file is returned.
how do i get all the rows in the csv file in the for loop.
Thanks in advance : Aadith

That's because the return returns from the function with the first line on the first iteration through the loop.
You could extract the values from line on each iteration of the for loop using regular expressions, but it's a much better idea to just use the the csv module instead of writing your own ad-hoc CSV parser. That means that you don't have to worry about getting the rules about quoting right, for instance. By way of an example, supposing you want to get the total of all the numbers in the second column, you could do:
import csv
def total_of_second_column():
total = 0.0
fp = open('C:/Documents and Settings/242481/My Documents/file.csv')
try:
cr = csv.reader(fp)
for row in cr:
total += float(row[1])
return total
finally:
fp.close()
... although of course you can do anything arbitrarily complex with the values you find in row[0], row[1], etc. on each iteration of the for loop.
Update: In the comment below you ask "is there any way i can execute the return statement as many times as the number of rows in csv file.?"
It sounds as if you might be looking for the yield keyword here. There's a great explanation of generators and yield in the question The Python yield keyword explained but to give a brief example of how you might use it in your example, you could do:
def two_column_products():
fp open('C:/Documents and Settings/242481/My Documents/file.csv')
try:
cr = csv.reader(fp)
for row in cr:
yield float(row[1]) * float(row[2])
finally:
fp.close()
for result in two_column_products():
print "got result:", result
This will print the product of the second and third column of each row in turn.

Related

Getting specific rows in a Powershell variable/array

I hope I'm able to ask my question as simple as possible. I am very new to working with PowerShell.
Now to my question:
I use Invoke-Sqlcmd to run a query, which puts Data in a variable, let's say $Data.
In this case I query for triggers in an SQL Database.
Then I kind of split the array to get more specific information:
$Data2 = $Data | Where {$_.table -like 'dbo.sportswear'}
$Data3 = $Data2 | Where {$_.event -match "Delete"}
So in the end I have a variable with these Indexes(?), I'm not sure if they are called indexes.
table
trigger_name
activation
event
type
status
definition
Now all I want is to check something in the definition.
So I create a $Data4 = $Data3.definition, so far so good.
But now I have a big text and I want only the content of 2-3 specific rows.
When I used like $Data4[1] or $Data4[1..100], I realized that PowerShell sees every char as a line/row.
But when I just write $Data4 it shows me the content nice formatted with paragraphs, new lines and so on.
Has anyone an idea how I can get specific rows or lines of my variable?
Thank you all :)
It appears $Data4 is a formatted string. Since it is a single string, any indexed element lookups return single characters (of type System.Char). If you want indexes to return longer substrings, you will need to split your string into multiple strings somehow or come up with a more sophisticated search mechanism.
If we assume the rows you are after are actual lines separated by line feed and/or carriage return, you can just split on those newline characters and use indexes to access your lines:
# Array indexing starts at 0 for line 1. So [1] is line 2.
# Outputs lines 2,3,4
($Data4 -split '\r?\n')[1..3]
# Outputs lines 2,7,20
($Data4 -split '\r?\n')[1,6,19]
-split uses regex to match characters and perform a string split on all matches. It results in an array of substrings. \r matches a carriage return. \n matches a line feed. ? matches 0 or one character, which is needed in case there are no carriage returns preceding your line feeds.

How to split a CSV file into groups using Pentaho?

I am new to Pentaho and am trying to read a CSV file (which I already did) and create blocks of data based on an identifier.
Eg
1|A|B|C
2|D|E|F
8|G|H|I|J|K
4|L|M
1|N|O|P
4|Q|R|S|T
5|U|V|W
I need to split and group this as such:
(each block starts when the first column is equal to '1')
Block a)
1|A|B|C
2|D|E|F
8|G|H|I|J|K
4|L|M
Block b)
1|N|O|P
4|Q|R|S|T
5|U|V|W
Eg
a |1|A|B|C
a |2|D|E|F
a |8|G|H|I|J|K
a |4|L|M
b |1|N|O|P
b |4|Q|R|S|T
b |5|U|V|W
How can this be achieved using Penatho? Thanks.
I found a similar question but answers don't really help my case
Pentaho Kettle split CSV into multiple records
I think I got the answer.
I created the transformation in this zip that can transform your "csv" file in rows almost like you described but I don't know what you intend to do next, so maybe you can give us more details. =)
I'll explain what I did:
1) First, we grab the row full text with a Text input step
When you look at configurations of Text Input step, you'll see I used a ';' has separator, when your input file uses '|' so I'm not spliting columns with the '|' but loading the whole line in one column. Grabbing the row's full text, nothing else.
2) Next we apply a regex eval to separate the ID from the rest of our string.
^(\d+)\|(.*)
Which means: in the beginning of the text I expect one or more digits followed by a pipe and anything after that. Capture the digits in the beginning of the string in one column and everything after the pipe to another column.
That gives you this output: (blue is the first capture group, red is the second)
3) Now what you need is to add a 'sequence' that only goes up if there is a row_id = 1. Which I did in the Mod JS Value with the following code:
var sequence
//if it's the first row, set sequence to 1
if(sequence == null){
sequence = 1;
}else{
//if it's not the first row, check if the row_id is equal to 1 (string)
if(row_id == '1'){
// increment the sequence
sequence++;
}else{
//nothing
}
}
And that will give you this output that seem to be what you expected: (green, the group/sequence done)
Hope it helps =)

Read text file line by line but only specific columns

How do we read a specific file line by line while skipping some columns in it?
For example, I have a text file which has data, sorted out in 5 columns, but I need to read only two columns out of it, they can be first two or any other random combination (I mean, need a solution which would work with any combination of columns like first and third only).
Code something like this
open(1, file=data_file)
read (1,*) ! to skip first line, with metadata
lmax = 0
do while (.true.)
! read column 1 and 3 here, either write
! that to an array or just loop through each row
end do
99 continue
close (1)
Any explanation or example would help a lot.
High Performance Mark's answer gives the essentials of simple selective column reading: one still reads the column but transfers it to a then-ignored variable.
To extend that answer, then, consider that we want to read the second and fourth columns of a five-column line:
read(*,*) junk, x, junk, y
The first value is transferred into junk, then the second into x, then the third (replacing the one just acquired a moment ago) into junk and finally the fourth into y. The fifth is ignored because we've run out of input items and the transfer statement terminates (and the next read in a loop will go to the next record).
Of course, this is fine when we know it's those columns we want. Let's generalize to when we don't know in advance:
integer col1, col2 ! The columns we require, defined somehow (assume col1<col2)
<type>, dimension(nrows) :: x, y, junk(3) ! For the number of rows
integer i
do i=1,nrows
read(*,*) junk(:col1-1), x(i), junk(:col2-col1-1), y(i)
end do
Here, we transfer a number of values (which may be zero) up to just before the first column of interest, then the value of interest. After that, more to-be-ignored values (possibly zero), then the final value of interest. The rest of the row is skipped.
This is still very basic and avoids many potential complications in requirements. To some extent, it's such a basic approach one may as well just consider:
do i=1,nrows
read(*,*) allofthem(:5)
x(i) = allofthem(col1)
y(i) = allofthem(col2)
end do
(where that variable is a row-by-row temporary) but variety and options are good.
This is very easy. You simply read 5 variables from each line and ignore the ones you have no further use for. Something like
do i = 1, 100
read(*,*) a(i), b, c(i), d, e
end do
This will overwrite the values in b, d, and e at every iteration.
Incidentally, your line
99 continue
is redundant; it's not used as the closing line for the do loop and you're not branching to it from anywhere else. If you are branching to it from unseen code you could just attach the label 99 to the next line and delete the continue statement. Generally, continue is redundant in modern Fortran; specifically it seems redundant in your code.

Write data to file in columns (Fortran)

I need to write some data to file in Fortran 90. How should I use WRITE (*,*) input to have the values grouped in columns? WRITE always puts a new line after each call, that's the problem.
code example:
open (unit = 4, file = 'generated_trajectories1.dat', form='formatted')
do time_nr=0, N
write (4,*) dble(time_nr)*dt, initial_traj(time_nr)
end do
And now the point is to have it written in separate columns.
You can use implied DO loops to write values as single records. Compare the following two examples:
integer :: i
do i=1,10
write(*,'(2I4)') i, 2*i
end do
It produces:
1 2
2 4
3 6
...
Using implied DO loops it can rewritten as:
integer :: i
write(*, '(10(2I4))') (i, 2*i, i=1,10)
This one produces:
1 2 2 4 3 6 ...
If the number of elements is not fixed at compile time, you can either use the <n> extension (not supported by gfortran):
write(*, '(<n>(2I4))') (i, 2*i, i=1,n)
It takes the number of repetitions of the (2I4) edit descriptor from the value of the variable n. In GNU Fortran you can first create the appropriate edit descriptor using internal files:
character(len=20) :: myfmt
write(myfmt, '("(",I0,"(2I4))")') n
write(*, fmt=myfmt) (i, 2*i, i=1,n)
Of course, it also works with list directed output (that is output with format of *):
write(*, *) (i, 2*i, i=1,10)
This really depends on what data you are trying to write to file (i.e. whether you have a scalar within a loop or an array...). Can you include a description of this in your question?
If your are trying to write a scalar multiple times to the same row then try using non-advancing I/O, passing the keyword argument advance="no" to the write statement, e.g.
integer :: x
do x = 1,10
write(*, fmt='(i4,1x)', advance="no") x
end do
However, be aware of a suprise with non-advancing I/O.
The answer depends on your answer to Chris's question. If you want a single line, then you will have to use non-advancing IO as described by Chris. Without this, with multiple formatted write statement you will always get multiple lines.
Also, you will likely need to use formatted IO instead of list-directed (*) IO. The rules are loose for list-directed IO. Different compilers may produce different output. With many output items, line breaks are likely to keep the output lines from being too long.
Here a format that should work if all of your variables are reals:
write (4, '( *(2X, ES14.6) )', advance="no" )
how about the good old $ edit descriptor:
write(*, fmt='(i4,$)') x
remember to do a write(*,*) after your loop...

how to find a word in ASCII file using python

I want to find a word and its index but the problem is I am only getting its first position while the word appear more than one time in file. The file's content is,
[MAKE DATA:STUDENT1=AENIE:AGE14,STUDENT2=JOHN:AGE15,STUDENT3=KELLY:AGE14,STUDENT4=JACK:AGE16,STUDENT5=SNOW:AGE16;SET RECORD:STUDENT1=GOOD,STUDENT2=,STUDENT3=BAD,STTUDENT4=,STUDENT5=GOOD]
following is my code,
import sys,os,csv
x = str(raw_input("Enter file name :")) + '.ASCII'
fp = open(x,'r')
data = fp.read()
fp.close()
found = data.find("STUDENT1")
print found
here the word "STUDENT1" appear two time while my code gives its only 1st index position. I want its second index position too. Similarly a word may appear several times in file so how can I find its all index position?
Use the optional start parameter to str.find() to search the string again starting after the previous match:
found = data.find("STUDENT1")
while found != -1:
print found
found = data.find("STUDENT1", found+1)
It would be slightly more efficient (but less concise) to use found+len("STUDENT1") instead of found+1.
Alternatively you could use the re.finditer():
import re
for match in re.finditer("STUDENT1", data):
print match.start()