Come up with the regular expression the matches all the digits in the string "Arizona: 479, 501, 870. California: 209, 213, 650." - python-re

I was trying to get the digits from the following code:
import re
fetch_code = re.findall([0-5][0-9], "Arizona: 479, 501, 870. California: 209, 213, 650.")
print(fetch_code)
but got this error: IndexError: list index out of range

The problem seems to be that your call to findall is malformed.
Inside the call you use this parameters (pattern, string, (and optional) flags=0)
This return all non-overlapping matches of pattern in the string specified.
If there are no groups, return a list of strings matching the whole pattern. If there is exactly one group, return a list of strings matching that group. If multiple groups are present, return a list of tuples of strings matching the groups.
In your specific case, forming the call adecuately would be:
fetch_code = re.findall(r'[0-5][0-9]', "Arizona: 479, 501, 870. California: 209, 213, 650.")
This returns a list of strings and, based on your regular expression, only numbers between 00 and 59:
['47', '50', '20', '21', '50']
I don't know if that is the output that you expected but, anyway, it gives no errors and it is formed with the right syntax.
I also recommend you to take a look to re documentation.
Have a noice day!

Related

Pandas str split. Can I skip line which gives troubles?

I have a dataframe (all5) including one column with dates('CREATIE_DATUM'). Sometimes the notation is 01/JAN/2015 sometimes it's written as 01-JAN-15.
I only need the year, so I wrote the following code line:
all5[['Day','Month','Year']]=all5['CREATIE_DATUM'].str.split('-/',expand=True)
but I get the following error:
columns must be same length as key
so I assume somewhere in my dataframe (>100.000 lines) a value has more than two '/' signs.
How can I make my code skip this line?
You can try to use pd.to_datetime and then use .dt property to access day, month and year:
x = pd.to_datetime(all5["CREATIE_DATUM"])
all5["Day"] = x.dt.day
all5["Month"] = x.dt.month
all5["Year"] = x.dt.year

finding if a line starts with a specified string in a clob and then extract

I have a CLOB column that I want to search for a line that starts with '305' then extract something from that line, some of my rows will have multiple lines that start with '305' or '305 somewhere in the entire cell, so I'd only want to find the first line where it starts with '305' the entire cell content is split into lines like this
301|10500000908|
302|20171021|20171104|
303|00001|8306.7|
302|20171008|20171020|
303|00001|13174.5|
302|20170704|20171007|
303|00001|2508.7|
302|20170419|20170703|
303|00001|6962.9|
302|20170330|20170418|
303|00001|7628.2|
302|20170305|20170329|
--- my instr(dbms_lob.substr(flow_data, 4000, 1 ),'305|', 1, 1) keeps finding this line
303|00001|8489.1|
302|20170120|20170304|
303|00001|1997.9|
302|20161021|20170119|
303|00001|12359.8|
302|20160722|20161020|
303|00001|7354.0|
302|20160516|20160721|
303|00001|26.4|
304|20171105|
305|00001|5936.1|
--- i want to find this line and then extract the '5936.1' from it
304|20171021|
305|00001|5710.4|
304|20171008|
305|00001|5163.1|
304|20170704|
304|20170419|
305|00001|7390.8|
304|20170330|
305|00001|7363.2|
304|20170305|
305|00001|7181.4|
304|20170120|
305|00001|9200.2|
304|20161021|
305|00001|4791.3|
305|00001|2877.5|
304|20160516|
305|00001|4116.9|
306|0393|20160511|
307|SUPP|20160511|
310|A|20160511|
311|E|20160516|
when I use instr(dbms_lob.substr(flow_data, 4000, 1 ),'305|', 1, 1) it keeps finding the wrong line. by the way there are no gaps between the lines, I inserted them to keep the text separated.
Thanks all
Mac
If I follow you correctly, you can use regexp_substr():
select regexp_substr(flow_data, '^305\|[^|]*\|([^|]*)', 1, 1, 'm', 1) as val
from t
Argument breakdown:
flow_data: the value to search (CLOBs are allowed)
'^305\|[^|]*\|([^|]*)': the regex. We search for 305 at the beginning of a line, and capture the third value in the CSV list
1: start the search at the beginning of source string
1: return the first match
m - multiline mode : ^ matches at the begin of each line
1: return the first captured part of the match

Multiple Criteria and Multiple Returns Excel

It's been a while since I've been here. I've been struggling with a formula on Excel using multiple lookups giving multiple returns.
In this sheet, the inputs are:
Location, Subject, Level.
I used the following formula to return the teacher's name in H4:
=INDEX(D2:D26, MATCH(1, (H1=A2:A26)(H2=B2:B26)(H3=C2:C26), 0))
I'm trying to have it return the multiple student IDs.
With the following inputs:
Location Lookup: U
Subject Lookup: QC
Level Lookup: 2
I'm expecting the following student IDs being returned, but I'm not sure of how to solve this.
1012, 1013, 1014, 1015, 1016, 1017, 1018
!
'
Can you please help??
Thank you so much!
Use this array formula in H5 cell to get student IDs and fill down as you need.
=IFERROR(INDEX($E$1:$E$26,SMALL(IF(($A$1:$A$26=$H$1)*($B$1:$B$26=$H$2)*($C$1:$C$26=$H$3),ROW($D$1:$D$26)),ROWS($A$1:$A1))),"")
As it is a array formula, Press CTRL+SHIFT+ENTER to evaluate the formula.

Changing the order of columns in a CSV file in VB.NET

I have a CSV files output from a software without headers,
I need to change the order of columns based on a config file
initial-column Final-Column
1 5
2 3
3 1
Any ideas how to go about this?
There is very very little to go on, such as how the config file works and what the data looks like.
Note that using the layout of {1, 5, 2, 3, 3, 1} you arent just reordering the columns, that drops one (4) and duplicates columns 1 and 3.
Using some fake random data left over from this answer, this reads it in, then writes it back out in a a different order. You will have to modify it to take the config file into consideration.
Sample data:
Ndxn fegy n, 105, Imaypfrtzghkh, -1, red, 1501
Mfyze, 1301, Kob dlfqcqtkoccxwbd, 0, blue, 704
Xe fnzeifvpha, 328, Mnarhrlselxhcyby hq, -1, red, 1903
Dim csvFile As String = "C:\Temp\mysqlbatch.csv"
Dim lines = File.ReadAllLines(csvFile)
Dim outFile As String = "C:\Temp\mysqlbatch2.csv"
Dim data As String()
Dim format As String = "{0}, {4}, {1}, {2}, {2}, {0}"
Using fs As New StreamWriter(outFile, False)
For Each s As String In lines
' not the best way to split a CSV,
' no OP data to know if it will work
data = s.Split(","c)
' specifiy the columns to write in
' the order desired
fs.WriteLine(String.Format(format,
data(0),
data(1),
data(2),
data(3),
data(4),
data(5)
)
)
Next
End Using
This approach uses the format string and placeholder ({N}) to control the order. The placeholders and array elements are all zero based, so {1, 5, 2, 3, 3, 1} becomes {0, 4, 1, 2, 2, 0}. Your config file contents could simply be a collection of these format strings. Note that you can have more args to String.Format() than there are placeholders but not fewer.
Output:
Ndxn fegy n, red, 105, Imaypfrtzghkh, Imaypfrtzghkh, Ndxn fegy n
Mfyze, blue, 1301, Kob dlfqcqtkoccxwbd, Kob dlfqcqtkoccxwbd, Mfyze
Xe fnzeifvpha, red, 328, Mnarhrlselxhcyby hq, Mnarhrlselxhcyby hq, Xe fnzeifvpha
Splitting the incoming data on the comma (s.Split(","c)) will work in many cases, but not all. If the data contains commas (as in some currencies "1,23") it will fail. In this case the seperator char is usually ";" instead, but the data can have commons for other reasons ("Jan 22, 2016" or "garden hose, green"). The data may have to be split differently.
Note: All the OPs previous posts are vba related. The title includes VB.NET and is tagged vb.net, so this is a VB.NET answer

Fortran runtime error while reading a file: "Bad repeat count"

I'm trying to read an input file with fortran but I get the following error at runtime:
At line 118 of file prog.f90 (unit = 53, file = 'data.dat')
Fortran runtime error: Bad repeat count in item 1 of list input
The data file is the following
3, 5, 3 %comment
%%%%%%%%%%%%%%
1d0, 0d0, 0d0 % comment
0d0, 0d0, 1d0
%%%%%%%%%%%%%%
1, 1, identity, 1, 1 %comment
1, 2, sigmax, 2, 2
2, 3, sigmax, 2, 2
1, 3, sigmaz, 1, 3
3, 3, identity, 1, 1
%%%%%%%%%%%%%%
0, 0 %comment
and the interesting part of prog.f90 is
COMPLEX(KIND(1D0)), DIMENSION(:), ALLOCATABLE:: H1, H2
INTEGER :: i,A,B,C
CHARACTER(50) :: GHOST
OPEN(UNIT=53,file='data.dat',status='old')
READ(53,*) A,B,C
READ(53,*) GHOST
ALLOCATE (H1(A),H2(A))
READ(53,*) (H1(i), i=1,A)
READ(53,*) (H2(i), i=1,A)
where the 118th line is READ(53,*) (H1(i), i=1,A). I tryed also with an explicit do loop but with the same result.
I haven't tested this, but I'd expect
READ(53,*) (H1(i), i=1,A)
to try to read 3 complex numbers. It gets fed the line
1d0, 0d0, 0d0 % comment
from which it gets 1½ complex numbers and then barfs on the % sign, misinterpreting it as a syntactically invalid repeat count.
I'd suggest providing 3 complex numbers in the file when that read statement is executed.
The numbers are dimensioned complex, while in fortran complex numbers should be in the file with parenthesis as:
( realpart , imaginarypart ) ( realpart , imaginarypart )
I really don't know what the standards say regarding the input form you have presented, but after some testing gfortran throws that Bad repeat count error regardless of the % comment. It throws that error even with four or more comma separated reals on the line.
Now ifort on the other hand reads the line just the way you have it -- but watch out -- it reads each of the comma separated values as the real part of your complex variable, setting the imaginary part to zero. ( that is it only uses the first two values on each line and discards the third )
You will really need to study the code to make sure you understand what was intended to sort out how to fix this. If the later (ifort) behavior is the intention one simple fix would be to declare a couple of reals. Read into the reals, then assign those to your complex variables.