How do I Process This String?

How do I Process This String? - sql

I have some results in one of my tables and the results vary, each; represents multiple entries in one column which I need to split out.
Here is my SQL and the results:
select REGEXP_COUNT(value,';') as cnt,
description
from mytable;
1 {Managed By|xBoss}{xBoss xBoss Number|X0910505569}{Time
Requested|2009-04-15 20:47:11.0}{Time Arrived|2009-04-15 21:46:11.0};
1 {Managed By|Modern Management}{xBoss Number|}{Time Requested|2009-04-
16 14:01:29.0}{Time Arrived|2009-04-16 14:44:11.0};
2 {Managed By|xBoss}{xBoss Number|X091480092}{Time Requested|2009-05-28
08:58:41.0}{Time Arrived|};{Managed By|Jims Allocation}{xBoss xBoss
Number|}{Time Requested|}{Time Arrived|};
Desired output:
R1:
Managed By: xBoss
Time Requested:2009-10-19 07:53:45.0
Time Arrived: 2009-10-19 07:54:46.0
R2:
Managed By:Own Arrangements
Number: x5876523
Time Requested: 2009-10-19 07:57:46.0
Time Arrived:
R3:
Managed By: xBoss
Time Requested:2009-10-19 08:07:27.0
select
SPLIT_PART(description, '}', 1),
SPLIT_PART(description, '}', 2),
SPLIT_PART(description, '}', 3),
SPLIT_PART(description, '}', 4),
SPLIT_PART(description, '}', 5)
as description_with_tag from mytable;
This is ok when the count is 1, but when there are multiple ; in the description it doesn't give me the results.
Is it possible to put this into an array based on the count?

First, it's worth pointing out that data in this type of format cannot take advantage of all the benefits that Redshift can offer. Amazon Redshift is a columnar database that can provide amazing performance when data is stored in appropriate columns. However, selecting specific text from a text field will always perform poorly.
Therefore, my main advice would be to pre-process the data into normal rows and columns so that Redshift can provide you the best capabilities.
However, to answer your question, I would recommend making a Scalar User-Defined Function:
CREATE FUNCTION f_extract_curly (s TEXT, key TEXT)
RETURNS TEXT
STABLE
AS $$
# List of items in {brackets}
items = s[1:-1].split('}{')
# Dictionary of Key|Value from items
entries = {i.split('|')[0]: i.split('|')[1] for i in items}
# Return desired value
return entries.get(key, None)
$$ LANGUAGE plpythonu;
I loaded sample data with:
CREATE TABLE foo (
description TEXT
);
INSERT INTO foo values('{Managed By|xBoss}{xBoss xBoss Number|X0910505569}{Time Requested|2009-04-15 20:47:11.0}{Time Arrived|2009-04-15 21:46:11.0};');
INSERT INTO foo values('{Managed By|Modern Management}{xBoss Number|}{Time Requested|2009-04-16 14:01:29.0}{Time Arrived|2009-04-16 14:44:11.0};');
INSERT INTO foo values('{Managed By|xBoss}{xBoss Number|X091480092}{Time Requested|2009-05-28 08:58:41.0}{Time Arrived|};{Managed By|Jims Allocation}{xBoss xBoss Number|}{Time Requested|}{Time Arrived|};');
Then I tested it with:
SELECT
f_extract_curly(description, 'Managed By'),
f_extract_curly(description, 'Time Requested')
FROM foo
and got the result:
xBoss 2009-04-15 20:47:11.0
Modern Management 2009-04-16 14:01:29.0
xBoss
It doesn't know how to handle lines that have the same field specified twice (with semi-colons between). You did not provide enough sample input and output lines for me to figure out what you wanted in such situations, but feel free to tweak the code for your requirements.

There is no array data type in Redshift. There are 2 options:
1) First split_part by ';', then union results separately for every index of the first split_part output, then split_part results by '}' and finally get what you need.
2) Create a Python UDF and process these strings with Python. I guess this is the best solution for your use case.
3) Transform your data outside Redshift. From your data structure it seems like it's much better to process it before copying to Redshift, unnesting the arrays into rows and extracting keys from your objects into columns.

Related

extract values inside an array column in amazon athena

I have a table in athena aws where the column 'metadata_stopinfo' has the structure that you can see in the image.
I am trying to extract values that are inside that array, however when I try
SELECT
"json_extract_scalar"(metadata_stopinfo, '$.city')
FROM "table"
I have the following problem
SYNTAX_ERROR: line 2:5: Unexpected parameters (array(row("address" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"carrierreference" varchar,"contacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"mobilephone" varchar,"name" varchar,"officephone" varchar,"userid" varchar)),"containerinfo" array(row("containerid" varchar,"containeridtype" varchar,"equipmentcode" varchar,"equipmenttype" varchar)),"conveyancelinenumber" varchar,"conveyancetype" varchar,"conveyancetypeoriginal" varchar,"dateinfo" row("arrivalestimateddate" varchar,"arrivalestimateddateend" varchar,"arrivalestimatedendoffset" varchar,"arrivalestimatedoffset" varchar,"arrivalrequesteddate" varchar,"deliveryestimateddate" varchar,"deliveryestimateddateend" varchar,"deliveryestimatedendoffset" varchar,"deliveryestimatedoffset" varchar,"deliveryrequesteddate" varchar,"deliveryrequesteddateend" varchar,"deliveryrequestedendoffset" varchar,"deliveryrequestedoffset" varchar,"departureestimateddate" varchar,"departureestimateddateend" varchar,"departureestimatedendoffset" varchar,"departureestimatedoffset" varchar,"departurerequesteddate" varchar,"pickuprequesteddate" varchar,"pickuprequesteddateend" varchar,"pickuprequestedendoffset" varchar,"pickuprequestedoffset" varchar,"pickupestimateddate" varchar,"pickupestimateddateend" varchar,"pickupestimatedendoffset" varchar,"pickupestimatedoffset" varchar),"deliverynotenumber" varchar,"instructions" array(row("customerspecificsubtype" varchar,"header" boolean,"instructionsubtype" varchar,"instructiontype" varchar,"text" varchar)),"locationid" varchar,"partnercarrieraddress" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"partnercarriercontacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"name" varchar,"officephone" varchar)),"partnercarrierid" varchar,"partnercarriername" varchar,"partnerid" varchar,"partnername" varchar,"partnertimezone" varchar,"partnertype" varchar,"productquantity" row("number" double,"originalunitofmeasure" varchar,"quantitytype" varchar,"unitofmeasure" varchar),"sequencenumber" bigint,"shipmentidentifier" varchar,"stoptype" varchar,"transportinfo" row("description" varchar,"transportcode" varchar,"transportoriginalcode" varchar),"vesselinfo" row("lloydsnumber" varchar,"shipsradiocallnumber" varchar,"vesselname" varchar,"vesselnumber" varchar,"voyagetripnumber" varchar))), varchar(6)) for function json_extract_scalar. Expected: json_extract_scalar(varchar(x), JsonPath) , json_extract_scalar(json, JsonPath)
My question is, how can i extract values inside de column ?

json_extract_scalar unsurprisingly works with json (note that even if yur data was in json format, json_extract_scalar(metadata_stopinfo, '$.city') still would not have worked cause your data is an array), while your column contains array's of row's, so you need to work with it correspondingly. For example you can use indexes to access elements in array (in presto array indexes start from 1):
SELECT
metadata_stopinfo[1] r
FROM "table"
And then access the fields:
The fields may be of any SQL type, and are accessed with field reference operator .
SELECT
metadata_stopinfo[1].city city
FROM "table"
Also you can flatten the array with unnest:
SELECT r.city
FROM "table",
unnest(metadata_stopinfo) as t(r)

Presto Produce JSON results

I have a json table which was created by
CREATE TABLE `normaldata_source`(
`column1` int,
`column2` string,
`column3` struct<column4:string>)
A sample data is:
{
"column1": 9,
"column2": "Z",
"column3": {
"column4": "Y"
}
}
If I do
SELECT column3
FROM normaldata_source
it will produce a result {column4=y}. However, I want it to be in json form {"column4": "y"}
Is this possible?
*Edit This query gives me the following result:
SELECT CAST(column3 AS JSON) as column3_json
FROM normaldata_source

As of Trino 357 (formerly known as Presto SQL), you can now cast a row to JSON and it will preserve the column names:
WITH normaldata_source(column1, column2, column3) AS (
VALUES (9, 'Z', cast(row('Y') as row(column4 varchar)))
)
SELECT cast(column3 as json)
FROM normaldata_source
=>
_col0
-----------------
{"column4":"Y"}
(1 row)

I encountered this same problem and was thoroughly stumped on how to proceed in light of deep compositional nesting/structs. I'm using Athena (managed Presto w/ Hive Connector from AWS). In the end, I worked around it by doing a CTAS (create table as select) where I selected the complex column I wanted, under the conditions I wanted) and wrote it to an external table with an underlying SerDe format of JSON. Then, via the HiveConnector's $path magic column (or by listing the files under the external table location), I obtain the resulting files and streamed out of those.
I know this isn't a direct answer to the question at hand - I believe we have to wait for https://github.com/trinodb/trino/pull/3613 in order to support arbitrary struct/array compositions -> json. But maybe this will help someone else who kind of assumed they'd be able to do this.
Although I originally saw this as an annoying workaround, I'm now starting to think it was the right call for my application anyway

Regex comparison in Oracle between 2 varchar columns (from different tables)

I am trying to find a way to capture relevant errors from oracle alertlog. I have one table (ORA_BLACKLIST) with column values as below (these are the values which I want to ignore from
V$DIAG_ALERT_EXT)
Below are sample data in ORA_BLACKLIST table. This table can grow based on additional error to ignore from alertlog.
ORA-07445%[kkqctdrvJPPD
ORA-07445%[kxsPurgeCursor
ORA-01013%
ORA-27037%
ORA-01110
ORA-2154
V$DIAG_ALERT_EXT contains a MESSAGE_TEXT column which contains sample text like below.
ORA-01013: user requested cancel of current operation
ORA-07445: exception encountered: core dump [kxtogboh()+22] [SIGSEGV] [ADDR:0x87] [PC:0x12292A56]
ORA-07445: exception encountered: core dump [java_util_HashMap__get()] [SIGSEGV]
ORA-00600: internal error code arguments: [qercoRopRowsets:anumrows]
I want to write a query something like below to ignore the black listed errors and only capture relevant info like below.
select
dae.instance_id,
dae.container_name,
err_count,
dae.message_level
from
ORA_BLACKLIST ob,
V$DIAG_ALERT_EXT dae
where
group by .....;
Can someone suggest a way or sample code to achieve it?
I should have provided the exact contents of blacklist table. It currently contains some regex (perl) and I want to convert it to oracle like regex and compare with v$diag_alert_ext message_text column. Below are sample perl regex in my blacklist table.
ORA-0(,|$| )
ORA-48913
ORA-00060
ORA-609(,|$| )
ORA-65011
ORA-65020 ORA-31(,|$| )
ORA-7452 ORA-959(,|$| )
ORA-3136(,|)|$| )
ORA-07445.[kkqctdrvJPPD
ORA-07445.[kxsPurgeCursor –

Your blacklist table looks like like patterns, not regular expressions.
You can write a query like this:
select dae.* -- or whatever columns you want
from V$DIAG_ALERT_EXT dae
where not exists (select 1
from ORA_BLACKLIST ob
where dae.message_text like ob.<column name>
);
This will not have particularly good performance if the tables are large.

APEX 5: How can I store the results of a MultiSelection from a List in an APEX-Item?

My example list:
How can I store the results in the Input Field "Your Choose" as comma-separate values (in this example: ALABAMA, ARIZONA).
Would APEX_UTIL.TABLE_TO_STRING be suitable?

You need to be more specific while asking questions.
I assume that you have page with
item PX_MULTISELECT.
process Before Header that loads data from DB to the input PX_MULTISELECT
And basing on this I assume you are asking how to get data separated with : from DB.
In process that loads data you should fetch data from db using listagg function
begin
select
listagg(COLUMN, ':') within group (order by 1)
into
:PX_MULTISELECT
from
...
where
...
end;
LISTAGG documentation

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec?

I'm looking for a more efficient way to run many columns updates on the same table like this:
UPDATE TABLE table
SET col = regexp_replace( col, 'foo', 'bar' )
WHERE regexp_match( col, 'foo' );
Such that foo, and bar, will be a combination of 40 different regex-replaces. I doubt even 25% of the dataset needs to be updated at all, but what I'm wanting to know is it is possible to cleanly achieve the following in SQL.
A single pass update
A single match of the regex, triggers a single replace
Not running all possible regexp_replaces if only one matches
Not updating all columns if only one needs the update
Not updating a row if no column has changed
I'm also curious, I know in MySQL (bear with me)
UPDATE foo SET bar = 'baz'
Has an implicit WHERE bar != 'baz' clause
However, in PostgreSQL I know this doesn't exist: I think I could at least answer one of my questions if I knew how to skip a single row's update if the target columns weren't updated.
Something like
UPDATE TABLE table
SET col = *temp_var* = regexp_replace( col, 'foo', 'bar' )
WHERE col != *temp_var*

Do it in code. Open up a cursor, then: grab a row, run it through the 40 regular expressions, and if it changed, save it back. Repeat until the cursor doesn't give you any more rows.
Whether you do it that way or come up with the magical SQL expression, it's still going to be a row scan of the entire table, but the code will be much simpler.
Experimental Results
In response to criticism, I ran an experiment. I inserted 10,000 lines from a documentation file into a table with a serial primary key and a varchar column. Then I tested two ways to do the update. Method 1:
in a transaction:
opened up a cursor (select for update)
while reading 100 rows from the cursor returns any rows:
for each row:
for each regular expression:
do the gsub on the text column
update the row
This takes 1.16 seconds with a locally connected database.
Then the "big replace," a single mega-regex update:
update foo set t =
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(t,
E'\bcommit\b', E'COMMIT'),
E'\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b',
E'9ACF10762B5F3D3B1B33EA07792A936A25E45010'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:53:13\b', E'04:53:13'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bUpdate\b', E'UPDATE'),
E'\bversion\b', E'VERSION'),
E'\bto\b', E'TO'), E'\b2.9.1\b',
E'2.9.1'), E'\bcommit\b', E'COMMIT'),
E'\b61c89e56f361fa860f18985137d6bf53f48c16ac\b',
E'61C89E56F361FA860F18985137D6BF53F48C16AC'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:51:58\b', E'04:51:58'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bNEWS:\b', E'NEWS:'),
E'\bAdd\b', E'ADD'), E'\bnotes\b',
E'NOTES'), E'\bfor\b', E'FOR'),
E'\bthe\b', E'THE'), E'\b2.9.1\b',
E'2.9.1'), E'\brelease.\b',
E'RELEASE.'), E'\bThanks\b',
E'THANKS'), E'\bto\b', E'TO'),
E'\beveryone\b', E'EVERYONE'),
E'\bfor\b', E'FOR')
The mega-regex update takes 0.94 seconds to update.
At 0.94 seconds compared to 1.16, it's true that the mega-regex update is faster, running in 81% of the time of doing it in code. It is not, however a lot faster. And ye Gods, look at that update statement. Do you want to write that, or try to figure out what went wrong when Postgres complains that you dropped a parenthesis somewhere?
Code
The code used was:
def stupid_regex_replace
sql = Select.new
sql.select('id')
sql.select('t')
sql.for_update
sql.from(TABLE_NAME)
Cursor.new('foo', sql, {}, #db) do |cursor|
until (rows = cursor.fetch(100)).empty?
for row in rows
for regex, replacement in regexes
row['t'] = row['t'].gsub(regex, replacement)
end
end
sql = Update.new(TABLE_NAME, #db)
sql.set('t', row['t'])
sql.where(['id = %s', row['id']])
sql.exec
end
end
end
I generated the regular expressions dynamically by taking words from the file; for each word "foo", its regular expression was "\bfoo\b" and its replacement string was "FOO" (the word uppercased). I used words from the file to make sure that replacements did happen. I made the test program spit out the regex's so you can see them. Each pair is a regex and the corresponding replacement string:
[[/\bcommit\b/, "COMMIT"],
[/\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b/,
"9ACF10762B5F3D3B1B33EA07792A936A25E45010"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:53:13\b/, "04:53:13"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bUpdate\b/, "UPDATE"],
[/\bversion\b/, "VERSION"],
[/\bto\b/, "TO"],
[/\b2.9.1\b/, "2.9.1"],
[/\bcommit\b/, "COMMIT"],
[/\b61c89e56f361fa860f18985137d6bf53f48c16ac\b/,
"61C89E56F361FA860F18985137D6BF53F48C16AC"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:51:58\b/, "04:51:58"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bNEWS:\b/, "NEWS:"],
[/\bAdd\b/, "ADD"],
[/\bnotes\b/, "NOTES"],
[/\bfor\b/, "FOR"],
[/\bthe\b/, "THE"],
[/\b2.9.1\b/, "2.9.1"],
[/\brelease.\b/, "RELEASE."],
[/\bThanks\b/, "THANKS"],
[/\bto\b/, "TO"],
[/\beveryone\b/, "EVERYONE"],
[/\bfor\b/, "FOR"]]
If this were a hand-generated list of regex's, and not automatically generated, my question is still appropriate: Which would you rather have to create or maintain?

For the skip update, look at suppress_redundant_updates - see http://www.postgresql.org/docs/8.4/static/functions-trigger.html.
This is not necessarily a win - but it might well be in your case.
Or perhaps you can just add that implicit check as an explicit one?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I Process This String? - sql

Related

extract values inside an array column in amazon athena

Presto Produce JSON results

Regex comparison in Oracle between 2 varchar columns (from different tables)

APEX 5: How can I store the results of a MultiSelection from a List in an APEX-Item?

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec?

Categories

Resources