how to iterate the required part of string in python - iteration

As a part of my project in python I have to iterate the roll number which has 10 digits
our roll numbers be like:
188w1a0501,188w1a0502--------188w1a0599,188w1a05a1--188w1a05a9,188w1a05b1--188w1a05b9----
upto the last member .
getting these numbers one by one into a variable so that i can send them to my function.
How to do this?

You can make use of regular expression to get the roll numbers.
import re
data='188w1a0501, 188w1a0502--------188w1a0599, 188w1a05a1--188w1a05a9, 188w1a05b1--188w1a05b9----'
tokens = re.split(r",|-|\( | \)", data)
formatted = [token.strip() for token in tokens if len(token.strip())>0]
print(formatted)
Output would be
['188w1a0501', '188w1a0502', '188w1a0599', '188w1a05a1', '188w1a05a9', '188w1a05b1', '188w1a05b9']

Related

Error with dateutil.parser when iterating through list

Converted large text file to list of strings (each row = one element in list) ['...','...','...']
sample_data = ['2017-May-15 13:56:49.578 Event Dispense Sc 06mm Beschichtungsbreite ist: 5.99 mm', '2017-May-15 14:12:11.062 Event Runtime SC 09mm neuer Druck: 27.560PSI']
Trying to extract dates from each list element (each row contains one date with standardized format)
my code:
dparser.parse(sample_data[0],fuzzy=True))
returns the desired date.
However, when trying to iterate through the list as shown below
for elements in sample_data:
dparser.parse(elements,fuzzy=True)
I get an error message: ValueError: Unknown string format
Though I can not see the actual data, from the documentation http://dateutil.readthedocs.io/en/stable/parser.html . It means the tzinfo is not a valid string format
Example: if the date is 15thMarch 2018 and not 15th March 2018. it would raise a ValueError, try inspecting the list to know if that's the case.
Solved with regex functions and some wrangling.
Still can't tell why iteration with dparser.parse doesn't work.

How to use Bioproject ID, for example, PRJNA12997, in biopython?

I have an Excel file in which are given more then 2000 organisms, where each one of them has a Bioproject ID associated (like PRJNA12997). The idea is to use these IDs to get the sequence for a later multiple alignment with other five sequences that I have in a text file.
Can anyone help me understand how I can do this using biopython? At least the part with the bioproject ID.
You can first get the info using Bio.Entrez:
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
# This call to efetch fails sometimes with a 400 error.
handle = Entrez.efetch(db="bioproject", id="PRJNA12997")
I've been trying, and Entrez.read(handle) doesn't seems to work. But if you do record_xml = handle.read() you'll get the XML entry for this record. In this XML you can get the ID for the organism, in this case 12997.
handle = Entrez.esearch(db="nuccore", term="12997[BioProject]")
search_results = Entrez.read(handle)
Now you can efecth from your search results. At this point you should use Biopython to parse whatever you will get in the efetch step, playing with the rettype http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/
for result in search_results["IdList"]:
entry = Entrez.efetch(db="nuccore", id=result, rettype="fasta")
this_seq_in_fasta = entry.read()

REGEX_EXTRACT error in PIG

I have a CSV file with 3 columns: tweetid , tweet, and Userid. However within the tweet column there are comma separated values.
i.e. of 1 row of data:
`396124437168537600`,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143
I want to extract all 3 fields individually, but REGEX_EXTRACT is giving me an error with this code:
a = LOAD tweets USING PigStorage(',') AS (f1,f2,f3);
b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\"(.*)',1);
The error is:
error: Filter's condition must evaluate to boolean.
In the use case shared, reading the data using PigStrorage(',') will result in missing savava143 (last field value)
A = LOAD '/Users/muralirao/learning/pig/a.csv' USING PigStorage(',') AS (f1,f2,f3);
DUMP A;
Output : A : Observe that the last field value is missing.
(396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.")
For the use case shared, to extract all the values from CSV file with field values having ',' we can use either CSVExcelStorage or CSVLoader.
Approach 1 : Using CSVExcelStorage
Ref : http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html
Input : a.csv
396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143
Pig Script :
REGISTER piggybank.jar;
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (f1,f2,f3);
DUMP A;
Output : A
(396124437168537600,I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.,savava143)
Approach 2 : Using CSVLoader
Ref : http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVLoader.html
Below script makes use of CSVLoader(), DUMP A will result in the same output seen earlier.
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (f1,f2,f3);
The error is that you do not want to FILTER based on a regex but GENERATE new fields based on a regex. To filter, you need to know if the line have to be filtered, hence the boolean requirement.
Therefore, you have to use :
b = FOREACH a GENERATE REGEX_EXTRACT(FIELD, REGEX, HOW_MANY_GROUPS_TO_RETURN);
However, as #Murali Rao said, your values are not just coma separated but CSV (think how you will handle a coma in tweet : it is not a field separator, just some content).

Zoho Creator making a custom function for a report

Trying to wrap my head around zoho creator, its not as simple as they make it out to be for building apps… I have an inventory database, and i have four fields that I call to fill a field called Inventory Number (Inv_Num1) –
First Name (First_Name)
Last Name (Last_Name)
Year (Year)
Number (Number)
I have a Custom Function script that I call through a Custom Action in the form report. What I am trying to do is upload a CSV file with 900 entries. Of course, not all of those have those values (first/last/number) so I need to bulk edit all of them. However when I do the bulk edit, the Inv_Num1 field is not updated with the new values. I use the custom action to populate the Inv_Num1 field with the values of the other 4 fields.
Heres is my script:
void onetime.UpdateInv()
{
for each Inventory_Record in Management
{
FN = Inventory_Record.First_Name.subString(0,1);
LN = Inventory_Record.Last_Name.subString(0,1);
YR = Inventory_Record.Year.subString(2,4);
NO = Inventory_Record.Number;
outputstr = FN + LN + YR + NO;
Inventory_Record.Inv_Num1 = outputstr;
}
}
I get this error back when I try to run this function
Error.
Error in executing UpdateInv workflow.
Error in executing For Each Record task.
Error in executing Set Variable task. Unable to update template variable FN.
Error evaluating STRING expression :
Even though there is a First Name for example, it still thinks there is none. This only happens on the fields I changed with Bulk Edit. If I do each one by hand, then the custom action works—but of course then the Inv_Num1 is already updated through my edit on success functions and makes the whole thing moot.
this may be one year late, you might have found the solution but just to highlight, the error u were facing was just due to the null value in first name.
you just have put a null check on each field and u r good to go.
you can generate the inv_number on the time of bulk uploading also by adding null check in the same code on and placing the code on Add> On Submt.( just the part inside the loop )
the Better option would be using a formula field, you just have to put this formula in that formula field and you'll get your inventory_number autogenerated , you can rename the formula_field to Inv Number or whaterver u want.
Since you are using substring directly in year Field, I am assuming the
year field as string.else you would have to user Year.tostring().substring(2,4) & instead of if(Year=="","",...) you have to put if(Year==null , null,...);
so here's the formula
if(First_Name=="","",First_Name.subString(0,1))+if(Last_Name =="","",Last_Name.subString(0,1)) + if(Year=="","",Year.subString(2,4)+Number
Let me know ur response if u implement this.
Without knowing the datatype its difficult to fix, but making the assumption that your Inventory_Record.number is a numeric data item you are adding a string to a number:
The "+" is used for string Concatenation - Joiner but it also adds two numbers together so think "a" + "b" = "ab" for strings but for numbers 1 + 2 = 3.
All good, but when you do "a" + 2 the system doesn't know whether to add or concatenate so gives an error.

How to use Substring in SSIS

I want to export data from SharePoint list to SQL using SSIS.
In SharePoint list, i have a column as multi select, So i am getting below value in my column
1;#control 1;#3;#control 3
I want to use substring in derived column in such a way that i should get below result
1,3
I want only ID from the given column.
I have tried below code
SUBSTRING(ColumnName,1,FINDSTRING(ColumnName,";#",1) - 1)
But it only gives me answer as
1
Can anyone please help me out.?
Because there is an unknown number of controls selected in your SharePoint Multi-Select, a Derived Column transformation is not going to work for you. You'll have to use a script.
One way to parse your string is with regular expressions. You'll have to add an output to the script transformation and assign your parsed string to that output.
Regex controlExpression = new Regex(#"control ([0-9]+)");
MatchCollection controlMatches = controlExpression.Matches(--YOUR INPUT HERE--);
String output = string.Join(",",
(controlMatches.Cast<Match>().Select(n => n.Groups[1].ToString())).ToArray());