I want to delete the rest of a loaded csv file based on the occurrence of a string.
Remove(Row, RowCnd(Interval, Pos(Top, findMeThePositionOfaGivenString('TeddyBear')),
Pos(Bottom, 1), Select(1, 0))
Or just any approach to dynamically delete a range of rows!
If you're doing this during the data import stage then I would recommend
load yourstuff
from yourfile
where index(givenstring,'Teddybear')=0;
Index will return the position of the string in the larger string.
eg. index('ABC','BC')=2 so index()=0 means the string does no exist in the searched text. Be careful of the capitalisation as it will honour that, so use upper or lower to remove that kind of confusion.
I hope I understood your request.
Related
I have a column that contains a specific set of text that I need to be retained and the rest removed or moved to another column. Unfortunately, I am not able to use normal text-to-column due to the variation of the text arrangement.
For example, I need the word Issue and the id associated with it to be separated. I am struggling to figure out a way to do this with the variation of the arrangement of the text I need.
If someone can help me find a solution using Alteryx would be much appreciated, if not Pandas would also work.
Thanks all.
Use str.extract with Pattern to extract specific text from the data frame [Pandas]
df['After']=df['Before'].str.extract(pat='(ISSUE \d+|issue \d+)',expand=False)
For an Alteryx-only solution, the easiest way would be an Alteryx Formula using REGEX_Replace:
REGEX_Replace([Before],".*(issue \d+).*","?1",1)
If you don't like RegEx, basic string manipulations can do it also: basically it's a Substring...
Substring([Before], *starting index*, *length*)
The starting index is easy: it's just FindString([Before],"ISSUE")
The length isn't too hard either: it's the index (using FindString again) of the first comma in the substring that starts with "ISSUE": SubString([Before],FindString([Before],"ISSUE"))
Combining all that and spreading it out a bit:
Substring(
[Before],
FindString([Before],"ISSUE"),
FindString(
SubString(
[Before],
FindString([Before],"ISSUE")
),","
)
)
The list starts empty. Then I want to append an value to it for each iteration in a loop if certain condition is met. I don't see append option in Variable Operation.
You can use string split for this, assuming you know of a delimiter that won't ever be in your list of values. I've used a semi-colon, and $local_joinedList$ starts off empty.
If (certain condition is met)
Variable Operation: $local_joinedList$;$local_newValue$ To $local_joinedList$
End If
String Operation: Split "$local_joinedList$" with delimiter ";" and assign output to $my-list-variable$
This overwrites $my-list-variable$.
If you need to append to an existing list, you can do it the same way by using String Join first, append your values to the string, then split it again afterward.
String Operation: Join elements of "$my-list-variable$" by delimiter ";" and assign output to $local_joinedList$
Lists are buggy in Automation Anywhere and have been buggy for several versions. I suggest not using them and instead use XML.
It it a much more versatile approach and allows you to do much more that with lists. You can search, filter, insert, delete etc.
For the example you mention, you would use the "Insert Node" command.
Throwing in my 2 cents as well - my-list-variable appears to be the only mutable in size list you can work with. From my experience with 10.7, it only grows though.
So if you made a list with 60 values, and you wanted to use my-list-variable again for 55, you'll need to clear out those remaining 5 values and create an if condition when looping over the list to ensure the values are not whatever you set those 5 values to be.
I used lime's answer as a reference (thanks lime!) to populate a list variable from some data in an Excel spreadsheet.
Here's my automation for it:
I have a dataset with columns containing numbers. However, some of the rows in that column have missing data. Instead of numbers, a dash (-) is placed in the cell.
What I want to happen is to separate those rows with a dash and output them to a separate excel file. Those without the dash, should output to a csv file.
I tried the "filter rows" but it gives me an error:
Unexpected conversion error while converting value [constant String] to a Number
constant String : couldn't convert String to number
constant String : couldn't convert String to number : non-numeric character found at position 1 for value [-]
My condition is if
Column1 CONTAINS - (String)
You cant try to convert to number in the select step,and handler the error, if can not convert to number that mean that is (-)
You can convert missing value indicators (like a dash or any other string) to null in Text-File-Input - see field option "Null if". That way you still can use the metadata detection feature and will not trip over a dash arriving in a Number field.
With CSV-File-Input you should stick to the String datatype until a Null-If step has cleansed the values, so you can change the datatype to Number in a Select-Values step.
If you must preserve the dash character, don't use metadata detection (as it suggests datatype Number) or use more rows to sample (so a field with a dash is encountered) or just revert the datatype to String again before saving and running the transformation.
My solution lies on the first 'Replace in String'. I replaced the dash into something numeric and can easily be distinguished from the rest of the numbers (I used 9999) and carried on with the rest of my process.
In filter rows, I had no problems anymore with the data type because both my variables and condition contained numbers, therefore, it no longer had to convert anything.
After filter rows, I added the 'Null-if' to remove the random 9999 that I used
just to have something to replace the dash.
After that, the separation was made just as I hope it would.
Thanks to #marabu for the Null-if idea.
I have a column with values that are duplicated e.g.
VMS5796,VMS5650,VMS5650,CSL,VMA5216,CSL,VMA5113
I'm applying a transform using jython that removes the duplicates (On error is set to keep original), here's the code:
return list(set(value.split(",")))
Which works in the preview, but isn't getting applied to the column. What am I doing wrong?
The Map function is very powerful and an underused function in Python / Jython. It probably is unclear what this code does internally, but it is extremely fast in processing millions of bits of values from a list or array in your columns cells' values that need to be 'mapped' as a string type and then applying a join with a separator char such as a comma ', '
deduped_list = list(set(value.split(",")))
return ', '.join(map(str, deduped_list))
There are probably other, even slightly faster variations than this, but this should get you going in the right direction.
Interestingly, you can also get the 'printable representation' repr(object) which is acceptable to an EVAL like OpenRefine's and can be useful for seeing the representation of your values as well..., which I just found out about, researching this answer in more depth for you.
deduped_list = list(set(value.split(",")))
return ', '.join(map(repr, deduped_list))
Preview implicitly formats things for display. Your expression returns an array (which can't be stored in a cell), so if you'd like to get it string form, tack a .join(',') on the end.
I'm extracting terms from the query calling ExtractTerms() on the Query object that I get as the result of QueryParser.Parse(). I get a HashTable, but each item present as:
Key - term:term
Value - term:term
Why are the key and the value the same? And more why is term value duplicated and separated by colon?
Do highlighters only insert tags or to do anything else? I want not only to get text fragments but to highlight the source text (it's big enough). I try to get terms and by offsets to insert tags by hand. But I worry if this is the right solution.
I think the answer to this question may help.
It is because .Net 2.0 doesnt have an equivalent to java's HashSet. The conversion to .Net uses Hashtables with the same value in key/value. The colon you see is just the result of Term.ToString(), a Term is a fieldname + the term text, your field name is probably "term".
To highlight an entire document using the Highlighter contrib, use the NullFragmenter