Parsing multiple values with Google Refine - openrefine

I've a CSV column with content like this (just an example):
[{"qual"=>"05-Admmin "name"=>"CLARK C COHO"}, {"qual"=>"20-Soc Con", "name"=>"ALPHA S A"}, {"qual"=>"20-Soc Con", "name"=>"JACK SA"}
I would like to extract automatically the values from "name" field and separate it by comma, resulting in something like this: CLARCK C COHO, ALPHA S A, JACK SA and so on.
I know that I can get a specific value with this code:
value.parseJson()[0].name
I've been reading the documentation but i'm not figuring out how to loop this between all fields.
Any tips?
EDIT:
Here is another example of the column. The content really look like this:
[{"qual"=>"49-SocAdm", "name"=>"ALVARO R L"}, {"qual"=>"49-SocAdm", "name"=>"GABRIEL G L"}]

The data in your CSV is not in JSON format. I do not know what it is. A kind of key-value format, but I do not know which one. In addition, it sometimes lacks a comma or a bracket. We could try to transform it into a valid JSOn, but it will be easier to extract information using regular expressions. Here is an example with Python / Jython.
import re
pattern = re.compile(r'"name"=>"(.+?)"', re.M)
return ", ".join(pattern.findall(value))

Related

Does SQL have a number separator (like an underscore) to split up large number literals

I'm trying to make it easier to read some SQL code where we need to hardcode in some large numbers
I'd like to do something like this:
SELECT 3_800_000
Which, according to this post, is really treated like this:
SELECT 3 _800_000
SELECT 3 AS [_800_000]
JS allows numeric separators
let x = 1000000000000
let y = 1_000_000_000_000
console.log(x==y) // true
Also, C# added digit separators in 7.0 as well
// These are equivalent.
var bigNumber = 123456789012345678;
var bigNumberSplit = 123_456_789_012_345_678;
Is something similar possible in T-SQL?
Note: I'm not looking for a way to format the output, I'm looking for a way to make the source code easier to read for big numbers
The answer is, unfortunately (and as commenters pointed out, thanks folks), that this is not currently a feature of T-SQL.
I was not aware of those JS and C#, so thanks for that.

Need to extract specific text from a column on excel using either Alteryx or Pandas

I have a column that contains a specific set of text that I need to be retained and the rest removed or moved to another column. Unfortunately, I am not able to use normal text-to-column due to the variation of the text arrangement.
For example, I need the word Issue and the id associated with it to be separated. I am struggling to figure out a way to do this with the variation of the arrangement of the text I need.
If someone can help me find a solution using Alteryx would be much appreciated, if not Pandas would also work.
Thanks all.
Use str.extract with Pattern to extract specific text from the data frame [Pandas]
df['After']=df['Before'].str.extract(pat='(ISSUE \d+|issue \d+)',expand=False)
For an Alteryx-only solution, the easiest way would be an Alteryx Formula using REGEX_Replace:
REGEX_Replace([Before],".*(issue \d+).*","?1",1)
If you don't like RegEx, basic string manipulations can do it also: basically it's a Substring...
Substring([Before], *starting index*, *length*)
The starting index is easy: it's just FindString([Before],"ISSUE")
The length isn't too hard either: it's the index (using FindString again) of the first comma in the substring that starts with "ISSUE": SubString([Before],FindString([Before],"ISSUE"))
Combining all that and spreading it out a bit:
Substring(
[Before],
FindString([Before],"ISSUE"),
FindString(
SubString(
[Before],
FindString([Before],"ISSUE")
),","
)
)

SAP Smartforms layout trouble with packed numbers

I'm currently trying to fill a smartform with some information. I have a simple text element and read the data fields via &name&. The data itself gets read perfectly fine, however the layout is incorrect. Some fields are just plain text, and others are defined as packed numbers with 2 decimals. These packed number fields for some reason are out of alignment, always showing one line below everything else that is supposed to be in this line. That looks like that:
How can I get the 121,08 in this example on the same height as the rest? The text element looks like the following:
test &field1& &field2& &field3&
Only field 2 is a packed number, therefore I think it might have something to do with that.
Use as below. C will remove extra space.
&field2(C)& &field2(C)& &field3(C)&

Parsing a SQL spatial column in Python

I am struggling a bit as I am new to programming. I am currently writing a python script and I am a bit stuck. The goal is to parse some spatial information the gets pulled from SQL to a format that is usable for my py script down the line.
I was able to CAST through a SQL query and fetchall using the obdc module. However once I fetch the data that is where it gets trick for me. Here is an example of a print from the fetchall:
[(u'POLYGON ((7014.186279296875 6602.99658203125 1612.5, 7015.984375 6600.416015625 1612.5))',), (u'POLYGON ((6730.962646484375 6715.2490234375 1522.5, 6730.0869140625 6714.13916015625 1522.5))',)]
I am not exactly sure what I am getting here it is like a list of tuples. which I have tried converting to a list of list, but there must be something I am missing.
Here is the usable format I am looking for:
[[7014.186279296875, 6602.99658203125, 1612.5], [7015.984375, 6600.416015625, 1612.5]]
[[6730.962646484375, 6715.2490234375, 1522.5], [6730.0869140625, 6714.13916015625, 1522.5]]
Any ideas of how I can accomplish this? Maybe there is a better way to CAST in SQL or a module in python that would be easier to use instead of just doing a cursor.fetchall() and parsing? Or any any parsing help would be useful. Thanks.
If you want to do parsing, that should be straight forward. For example you've provided next code would do the thing:
result = []
for element in data:
single_elements = element[0][10:-2].split(', ')
for se in single_elements:
row = str(se).split(' ')
result.append([float(a) for a in row])
Result will contain what you need. If parsing is not an option, then paste some of your code so I can see how you're fetching data.

Extract terms from query for highlighting

I'm extracting terms from the query calling ExtractTerms() on the Query object that I get as the result of QueryParser.Parse(). I get a HashTable, but each item present as:
Key - term:term
Value - term:term
Why are the key and the value the same? And more why is term value duplicated and separated by colon?
Do highlighters only insert tags or to do anything else? I want not only to get text fragments but to highlight the source text (it's big enough). I try to get terms and by offsets to insert tags by hand. But I worry if this is the right solution.
I think the answer to this question may help.
It is because .Net 2.0 doesnt have an equivalent to java's HashSet. The conversion to .Net uses Hashtables with the same value in key/value. The colon you see is just the result of Term.ToString(), a Term is a fieldname + the term text, your field name is probably "term".
To highlight an entire document using the Highlighter contrib, use the NullFragmenter