Remove duplicate rows in a text file

Remove duplicate rows in a text file - file-io

How do I remove duplicate rows of string while reading a .txt file using Fortran?
This is my code currently and I'm headed on a really wrong path. Currently, I'm trying to hold the first line constant for example and then comparing it to the lines after it.
PROGRAM REM_DUP
IMPLICIT NONE
CHARACTER(632) :: ROW3, ROW4
INTEGER :: I
OPEN(UNIT=23, FILE="APM_FORMATTED.TXT", ACTION="READ", STATUS="OLD")
OPEN(UNIT=25, FILE="APM_DUPLICATES.TXT", ACTION="WRITE", STATUS="NEW")
DO
READ(23,'(A632)', END=199) ROW3
I=1
OPEN(UNIT=24, FILE="APM_FORMATTED1.TXT", ACTION="READWRITE", ACCESS="APPEND", STATUS="OLD")
DO
READ(24,'(A632)', END=299) ROW4
IF(ROW3(33:52).EQ.ROW4(33:52)) THEN
I=I+1
IF (I.GE.3) THEN
WRITE(25,'(A632)') ROW3
ENDIF
ELSE
WRITE(24, '(A632)') ROW3
ENDIF
ENDDO
CLOSE(24)
ENDDO
199 CLOSE(23)
299 CLOSE(24)
CLOSE(25)
END PROGRAM REM_DUP

The following might be horrendously slow, but it should work.
i=1
READ(23,'(A632)') row3
WRITE(24,'(A632)') row3 ! assume first read was unique (pretty good assumption)
DO
READ(23,'(A632)',IOSTAT=ierr) row3
! a successful read returns ierr=0; end-of-file returns -1
IF(ierr/=0) EXIT
! make sure we are reading from the top of the file
REWIND(24)
flag=.false.
! loop through file 24 for comparing
DO k=1,i
READ(24,'(A632)') row4
! if the line is repeated, write row3 to bad file (?) & set flag as true
IF(row3(33:52)==row4(33:52)) THEN
WRITE(25,'(A632)') row3
flag = .true.
ENDIF
ENDDO
! if row3 is not repeated it, add to file 24 & increment i
IF(.not.flag) THEN
WRITE(24,'(A632)') row3
i=i+1
ENDIF
ENDDO
CLOSE(24); CLOSE(23); CLOSE(25)
Hopefully the comments are enough to understand.

Related

Trying to figure out the structure of an input file based on a code

I'm sort of troubleshooting a code written in Fortran but I'm not yet familiar with it. I have the following code (a part), and basically, I have to 'reverse-engineer' the structure of the input file.
OPEN(10, FILE=TRIM(10)//'.txt', STATUS='OLD')
READ(10, *) ! header
READ(10, *) NB_TEMP
ALLOCATE(TEMP(abs(NB_TEMP)))
IF (NB_TEMP <0) THEN
NB_TEMP = ABS(NB_TEMP)
READ(10, *) TEMP_0, TEMP_D
TEMP(1) = TEMP_0
DO I=2, NB_TEMP
TEMP(I) = TEMP(I-1) + TEMP_D
ENDDO
ELSEIF (NB_TEMP>0) THEN
READ(10,*) TEMP(:)
ENDIF
READ(10, *) NB_PRS
ALLOCATE(PRS(ABS(NB_PRS)))
IF (NB_PRS<0) THEN
NB_PRS = ABS(NB_PRS)
READ(10, *) PRS_0, PRS_D
PRS(1) = PRS_0*PI/180.
DO I=2, NB_PRS
PRS(I) = PRS(I-1) + PRS_D*PI/180.
ENDDO
ELSEIF(NB_PRS>0) THEN
READ(10,*) PRS(:)
DO I=1, NB_PRS
PRS(I) = PRS(I)*PI/180.
ENDDO
ENDIF
So, I know I'm opening the .txt file first. Then, the first value I read is the "NB_TEMP". I'm not understanding what is happening at the second READ command. Does the program read the same values but set them differently, as "NB_PRS" I don't really have an error or anything - I'm simply trying to understand what this code does line-by-line and what would the structure be like for the input text file.
Thanks in advance!

Each read command reads a line from the file. Lets us examine them one by one
READ(10, *) ! header
Ignore the first line
READ(10, *) NB_TEMP
Read an integer value into NB_TEMP.
READ(10, *) TEMP_0, TEMP_D
Read two real values. This happens if NB_TEMP is negative, and they represent the starting value and the step for an arithmetic sequence stored into TEMP(:). This is asserted from TEMP(I) = TEMP(I-1) + TEMP_D.
READ(10,*) TEMP(:)
Read multiple real values and store them into TEMP(:) array. The should be NB_TEMP count values in one line here.
and similarly for PRS where the program branches depending if the integer NB_PRS is positive or negative.
Here are some valid inputs they way I interpret this code
! TEST INPUT #1, DEFINED TEMP and PRS
6
32.0000 40.0000 60.0000 90.0000 120.0000 130.0000
9
0.00000 5.0000 15.0000 20.0000 25.0000 30.0000 45.0000 55.0000 60.0000
! TEST INPUT #2, SEQUENCE TEMP and DEFINED PRS
-6
30.0000 10.0000
9
0.00000 5.0000 15.0000 20.0000 25.0000 30.0000 45.0000 55.0000 60.0000

Is there a way to separate values like "csv" strings on multiple columns on Redshift SQL?

For example: if I do a select preferences from stores I get this outcome:
|preferences |
|----------------------------------------------------------------------|
|"debit_rate"=>"0.00", "credit_rate_1"=>"0.01", "credit_rate_2"=>"0.02"|
|"debit_rate"=>"0.03", "credit_rate_1"=>"0.04", "credit_rate_2"=>"0.05"|
|"debit_rate"=>"0.06", "credit_rate_1"=>"0.07", "credit_rate_2"=>"0.08"|
|"debit_rate"=>"0.09", "credit_rate_1"=>"0.10", "credit_rate_2"=>"0.11"|
Is there a way for me to get this outcome?
debit_rate
credit_rate_1
credit_rate_2
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11

It looks like you are small change from making these sting into valid json. redshift has json functions that will allow for more intelligent parsing of these strings. See https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html
If you just change the '=>' to ':' and wrap the whole thing in curly braces '{}' you should be there.
Then you can cast these strings to be type SUPER and access the data by key value. See: https://docs.aws.amazon.com/redshift/latest/dg/query-super.html

Exclude rows where two columns BOTH have a value of zero, not one or the other

In SQL, I have two interger columns and I want to exclude rows where BOTH columns have a value of zero.
So in the below example, I would want rows 2 and 3 excluded only.
Col 1 Col 2
1 0.00 1.53
2 0.00 0.00
3 0.00 0.00
4 6.84 0.00
I have tried the below to test if it brings me just rows where one column holds 0.00 but the other does not, and there are no rows. Which means, my first argument is not working.
WHERE
(COL1 > 0.00 AND COL2 > 0.00)
AND (COL1 = 0.00
OR COL2 = 0.00)

You would use AND with a NOT to filter results so that if both columns contain 0, they are filtered:
SELECT * FROM mytable WHERE NOT(col1 = 0 AND col2 = 0)
If you want numbers to appear that have NO 0 in them such as a row: 12,12 the Exclusive OR XOR / <> will not work.
Your best practice is to use the WHERE NOT example I have listed.

Typically it is best to include example code showing what you have tried along with what is happening versus what you expect to happen. Based on the wording of this question it looks like all you would need in your WHERE clause is something like this. If this doe snot answer your question please be more specific as to why and what you have tried.
Proposed WHERE clause:
WHERE
-- This is going to only include rows
-- where one or both column values are non-zero
Col1 <> 0
OR Col2 <> 0

iterate over line gnupllot

all,
I have a file that contains "time" in the first column and then bunch of data points in the following columns, and I want to print all of them to the same file and show how each object moves differently in time, but i am not sure how to iterative over such a file, I have search for a long time but to no luck.
Here is an example of some data:
0 0.001 0.006
1 0.001 0.090
2 0.005 0.099
3 0.008 0.999
4 0.009 0.100
5 0.010 0.100
Expect in my file i have 100 + lines after the time column. This is what i have so far in my gnuplot loop:
do for [i=2:99] {
plot 'data.out' using 1:i w l lt 7 lw 1 }
Any help is appreciated, thanks all.

in case you want to have everything in "one plot", you could interchange the order of the for loop and the plot command:
plot for [i=2:99] 'data.out' using 1:i w l lt 7 lw 1
In order to determine the number of columns automatically, one might use the stat command as in:
fName = 'data.out'
stat fName nooutput
N = STATS_columns #number of columns found in file
plot for [i=2:N] fName u 1:i w l lt 7 lw 1

Best way to store this data in SQL Table

I am trying to best figure out a way to store this particular case of data in a static database table
on the front end the user will be presented with a simple table (say 5x5 by default) but the user can expand or delete rows/ columns at will.
The idea is that the first column will be labels that are the same as the headers, similar to a graph and that any change to either is reflected to the other, so the graph (not really a graph, but I feel it is a valid comparison) remains a square
col1 col2 col3 col4 col5
row1 1 5 6 8 9
row2 2 3 4 5 3
row3 0 0 0 0 0 all values can be different
row4 0 0 0 0 0 the row and column names will be the same always
row5 0 0 0 0 0
IE (start with 5x5, if user removes row5 or col5 it becomes 4x4)
so because of this dynamic amount of columns I am unsure how to represent it in a table in the database.
each cell in the "map" will just contain an int and I am having a hard time finding an elegant way to map this in my database for usage with linq to sql
any ideas?
EDIT: I could simply wrap everything from the front end in a JSON object and have a bunch of code for that but i'm not sure if that would be easiest or not, mainly looking for some good input for now

Here's a start, using a Graph table with separate tables for columns, rows and cells. It' doesn't enforce the "squareness" of the graph, but you could enforce that in your application logic.
Graph
- GraphID
GraphColumn
- GraphID
- ColumnID
- ColumnName
GraphRow
- GraphID
- RowID
- RowName
GraphCell
- ColumnID
- RowID
- CellValue

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove duplicate rows in a text file - file-io

Related

Trying to figure out the structure of an input file based on a code

Is there a way to separate values like "csv" strings on multiple columns on Redshift SQL?

Exclude rows where two columns BOTH have a value of zero, not one or the other

iterate over line gnupllot

Best way to store this data in SQL Table

Categories

Resources