Split SQL string with specific string instead of separator? - sql

I have table that looks like:
|ID | String
|546 | 1,2,1,5,7,8
|486 | 2,4,8,1,5,1
|465 | 18,11,20,1,4,18,11
|484 | 11,10,11,12,50,11
I want to split the string to this:
|ID | String
|546 | 1,2
|546 | 1,5
|486 | 1,5,1
|486 | 1
|465 | 1,4
My goal is to show ID and all the strings starting with 1 with just the next number after them.
I filtered all rows without '%1,%' and I don't know how to continue.

If you use SQL Server 2016+, you may try to use a JSON-based approach. You need to transform the data into a valid JSON array and parse the JSON array with OPENJSON(). Note that STRING_SPLIT() is not an option here, because as is mentioned in the documentation, the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
Table:
CREATE TABLE Data (
ID int,
[String] varchar(100)
)
INSERT INTO Data
(ID, [String])
VALUES
(546, '1,2,1,5,7,8'),
(486, '2,4,8,1,5,1'),
(465, '18,11,20,1,4,18,11'),
(484, '11,10,11,12,50,11')
Statement:
SELECT
ID,
CONCAT(FirstValue, ',', SecondValue) AS [String]
FROM (
SELECT
d.ID,
j.[value] As FirstValue,
LEAD(j.[value]) OVER (PARTITION BY d.ID ORDER BY CONVERT(int, j.[key])) AS SecondValue
FROM Data d
CROSS APPLY OPENJSON(CONCAT('[', d.[String], ']')) j
) t
WHERE t.FirstValue = '1'
Result:
----------
ID String
----------
465 1,4
486 1,5
486 1,
546 1,2
546 1,5

Something like :
SELECT ID, S.value
FROM Data
CROSS APPLY STRING_SPLIT(REPLACE(',' + String, ',1,', '#1,'), '#') AS S
WHERE value LIKE '1,%'
?

Related

Oracle 12c Json split

This is how I am getting result in Oracle 12c
Id
Start Date Range
End Date Range
1
[ "2019-01-07","2019-02-17","2019-03-17"]
[ "2019-01-14","2019-02-21","2019-03-21"]
And I want it
Id
Start Date Range
End Date Range
1
2019-01-07
2019-01-14
1
2019-02-17
2019-02-21
1
2019-03-17
2019-03-21
Earlier I had asked this question for single-column split and below is the link
How to replace special characters and then break line in oracle
But when I add another column there is the cartesian product.
You can use json_table to extract the strings from the JSON arrays, presumably as actual dates:
select t.id, s.n, s.start_date, e.end_date
from your_table t
cross apply json_table (
t.start_range, '$[*]'
columns
n for ordinality,
start_date date path '$'
) s
join json_table (
t.end_range, '$[*]'
columns
n for ordinality,
end_date date path '$'
) e
on e.n = s.n
The for ordinality clauses provide an index into each array, and the join then matches up the 'related' array entries.
ID | N | START_DATE | END_DATE
-: | -: | :--------- | :--------
1 | 1 | 07-JAN-19 | 14-JAN-19
1 | 2 | 17-FEB-19 | 21-FEB-19
1 | 3 | 17-MAR-19 | 21-MAR-19
If you want string rather than dates for some reason you can just change the data type in the column clauses.
db<>fiddle
You can also do it with regexp_substr and connect by after replacing [,] and " with empty spaces.
Schema and insert statements:
create table testtable(Id int, Start_Date_Range varchar(500), End_Date_Range varchar(500));
insert into testtable values(1 ,'[ "2019-01-07","2019-02-17","2019-03-17"]', '[ "2019-01-14","2019-02-21","2019-03-21"]');
Query:
select distinct id, trim(regexp_substr(replace(replace(replace(Start_Date_Range,'"',''),'[',''),']',''),'[^,]+', 1, level) ) Start_Date_Range,
trim(regexp_substr(replace(replace(replace(end_Date_Range,'"',''),'[',''),']',''),'[^,]+', 1, level) ) End_Date_Range,
level
from testtable
connect by regexp_substr(Start_Date_Range, '[^,]+', 1, level) is not null
order by id, level;
Output:
ID
START_DATE_RANGE
END_DATE_RANGE
LEVEL
1
2019-01-07
2019-01-14
1
1
2019-02-17
2019-02-21
2
1
2019-03-17
2019-03-21
3
db<fiddle here
In a comment to the OP, I pointed out that the data model is not quite right. The values in the two JSON arrays are related; such data should be encoded in a single object, not two. There should be a single array of objects, each object having two members: start date and end date.
To illustrate the data model I am suggesting, I start with a sample input table (with an additional id), I use Alex Poole's answer exactly as is to generate the table the OP asked about, but then I follow that with JSON generating functions to put the data back into JSON format to illustrate how I think the input data should look like. (The provider of the JSON string should create a JSON in this format, rather than sending two separate JSON arrays of strings representing dates).
What I do not show here is how to use a single call to JSON_TABLE to split the data from the single array of objects created at the end of my query. That is a lot simpler than the query to get data out of two separate JSON arrays.
NOTE - this is not really an answer; I wrote it as an answer because it obviously wouldn't fit in a comment.
with
t (id, start_date_range, end_date_range) as (
select 1, '["2019-01-07","2019-02-17","2019-03-17"]',
'["2019-01-14","2019-02-21","2019-03-21"]' from dual union all
select 5, '["2020-04-23","2020-06-15"]',
'["2020-04-30","2020-06-19"]' from dual
)
, shown_as_table(id, n, start_date, end_date) as (
select t.id, s.n, to_char(s.start_date, 'yyyy-mm-dd'),
to_char(e.end_date, 'yyyy-mm-dd')
from t
cross apply json_table (
t.start_date_range, '$[*]'
columns
n for ordinality,
start_date date path '$'
) s
join json_table (
t.end_date_range, '$[*]'
columns
n for ordinality,
end_date date path '$'
) e
on e.n = s.n
)
select id, json_arrayagg(
json_object('start' value start_date, 'end' value end_date)
format json
order by n
) as date_range_array
from shown_as_table
group by id
;
Output:
ID DATE_RANGE_ARRAY
-- -------------------------------------------------------------------------------------------------------------------------------
1 [{"start":"2019-01-07","end":"2019-01-14"},{"start":"2019-02-17","end":"2019-02-21"},{"start":"2019-03-17","end":"2019-03-21"}]
5 [{"start":"2020-04-23","end":"2020-04-30"},{"start":"2020-06-15","end":"2020-06-19"}]

How to use REGEXP_SUBSTR properly?

Currently in my select statement I have id and value. The value is json which looks like this:
{"layerId":"nameOfLayer","layerParams":{some unnecessary data}
I would like to have in my select id and nameOfLayer so the output would be for example:
1, layerName
2, layerName2
etc.
The json looks always the same so the layerID is the first.
Could you tell me how can I use REGEXP_SUBSTR properly in my select query which looks like this now?
select
id,
value
from
...
where
table1.id = table2.bookmark_id
and ...;
In Oracle 11g, you can extract the layerId using the following regular expression, where js is the name of your JSON column:
regexp_replace(js, '^.*"layerId":"([^"]+).*$', '\1')
This basically extracts the string between double quotes after "layerId":.
In more recent versions, you would add a check constraint on the table to ensure that the document is valid JSON, and then use the dot notation to access the object attribute as follows:
create table mytable (
id int primary key,
js varchar2(200),
constraint ensure_js_is_json check (js is json)
);
insert into mytable values (1, '{"layerId":"nameOfLayer","layerParams":{} }');
select id, t.js.layerId from mytable t;
Demo on DB Fiddle:
ID | LAYERID
-: | :----------
1 | nameOfLayer
Don't use regular expressions; use a JSON_TABLE or JSON_VALUE to parse JSON:
Oracle 18c Setup:
CREATE TABLE test_data (
id INTEGER,
value VARCHAR2(4000)
);
INSERT INTO test_data ( id, value )
SELECT 1, '{"layerId":"nameOfLayer","layerParams":{"some":"unnecessary data"}}' FROM DUAL UNION ALL
SELECT 2, '{"layerParams":{"layerId":"NOT THIS ONE!"},"layerId":"nameOfLayer"}' FROM DUAL UNION ALL
SELECT 3, '{"layerId":"Name with \"Quotes\"","layerParams":{"layerId":"NOT THIS ONE!"}}' FROM DUAL;
Query 1:
SELECT t.id,
j.layerId
FROM test_data t
CROSS JOIN
JSON_TABLE(
t.value,
'$'
COLUMNS (
layerId VARCHAR2(50) PATH '$.layerId'
)
) j
Query 2:
If you only want a single value you could, alternatively, use JSON_VALUE:
SELECT id,
JSON_VALUE( value, '$.layerId' ) AS layerId
FROM test_data
Output:
Both output:
ID | LAYERID
-: | :-----------------
1 | nameOfLayer
2 | nameOfLayer
3 | Name with "Quotes"
Query 3:
You can try regular expressions but they do not always work as expected:
SELECT id,
REPLACE(
REGEXP_SUBSTR( value, '[{,]"layerId":"((\\"|[^"])*)"', 1, 1, NULL, 1 ),
'\"',
'"'
) AS layerID
FROM test_data
Output:
ID | LAYERID
-: | :-----------------
1 | nameOfLayer
2 | NOT THIS ONE!
3 | Name with "Quotes"
So if you can guarantee that no-one is going to put data into the database where the JSON is in a different order then this may work; however the JSON specification allows key-value pairs to be in any order so regular expressions are not a general solution that will parse every JSON string. You should be using a proper JSON parser and there are 3rd party solutions available for Oracle 11g or you can upgrade to Oracle 12c where there is a native solution.
db<>fiddle here
I think you can use regexp_substr like this:
regexp_substr(str, '[^"]+',1,2) as layer_id,
regexp_substr(str, '[^"]+',1,4) as layername
Db<>fiddle demo
Cheers!!

Convert an array into a Map

I have a table with a column like
[{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}]
Which is of the format array<struct<key:string,value:array<string>>>
I want to convert the column into below format :
{"e":["253","203","204"],"st":["mi"],"k2":["1","2"]}
which is of the type map<string,array<string>>
I have tried exploding the array but that does not work. Any ideas how I can do this in hive.
Without use of external libraries it's impossible. Please refer to brickhouse or create your own UDAF.
Note: further code provides snippets to reproduce the problem and solving the problem that Hive's built-in functions can solve. i.e map<string,string> not map<string, array<string>>.
-- reproducing the problem
CREATE TABLE test_table(id INT, input ARRAY<STRUCT<key:STRING,value:ARRAY<STRING>>>);
INSERT INTO TABLE test_table
SELECT
1 AS id,
ARRAY(
named_struct("key","e", "value", ARRAY("253","203","204")),
named_struct("key","st", "value", ARRAY("mi")),
named_struct("key","k2", "value", ARRAY("1", "2"))
) AS input;
SELECT id, input FROM test_table;
+-----+-------------------------------------------------------------------------------------------------------+--+
| id | input |
+-----+-------------------------------------------------------------------------------------------------------+--+
| 1 | [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}] |
+-----+-------------------------------------------------------------------------------------------------------+--+
With exploding and using STRUCT features, we can split the keys and values.
SELECT id, exploded_input.key, exploded_input.value
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x;
+-----+------+----------------------+--+
| id | key | value |
+-----+------+----------------------+--+
| 1 | e | ["253","203","204"] |
| 1 | st | ["mi"] |
| 1 | k2 | ["1","2"] |
+-----+------+----------------------+--+
The idea is to use your UDAF to "collect" a map while aggregating on id.
What Hive can solve with built in function is generating map<string,string> by converting rows to strings with a special delimiter, aggregate rows via another special delimiter and use str_to_map built-in function on the delimiters to generate map<string, string>.
SELECT
id,
str_to_map(
-- outputs: e:253,203,204#st:mi#k2:1,2 with delimiters between aggregated rows
concat_ws('#', collect_list(list_to_string)),
'#', -- first delimiter
':' -- second delimiter
) mapped_output
FROM (
SELECT
id,
-- outputs 3 rows: (e:253,203,203), (st:mi), (k2:1,2)
CONCAT(exploded_input.key,':' , CONCAT_WS(',', exploded_input.value)) as list_to_string
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x
) y
GROUP BY id;
Which outputs a string to string map like:
+-----+-------------------------------------------+--+
| id | mapped_output |
+-----+-------------------------------------------+--+
| 1 | {"e":"253,203,204","st":"mi","k2":"1,2"} |
+-----+-------------------------------------------+--+
with input_set as (
select array(named_struct('key','e','value',array('253','203','204')),named_struct('key','st','value',array('mi')),named_struct('key','k2','value',array('1','2'))) as input_array
), break_input_set as (
select y.col_num as y_col_num,y.col_value as y_col_value from input_set lateral view posexplode(input_set.input_array) y as col_num, col_value
), create_map as (
select map(y_col_value.key,y_col_value.value) as final_map from break_input_set
)
select * from create_map;
var Array = [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}];
var obj = {}
for(var i=0;i<Array.length;i++){
obj[Array[i].key] = Array[i].value
}
obj will be in the required format

How to transpose a table in sql?

a | b |
+------+-------+
| 1 | a,b,c |
| 1 | d,e,f |
| 1 | g,h |
+------+-------+
I would want output like below with a sql script
1 , a,b,c,d,e,f,g,h
I'm assuming you're using sql-server for this, but you can modify the query to work for MySQL.
This one is a little tough, but you can use the STUFF argument to concatenate the strings. It'll come out with as a query similar to this:
SELECT
[a],
STUFF((
SELECT ', ' + [b]
FROM #YourTable
WHERE (a = 1)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
FROM #YourTable Results
GROUP BY [a]
Basically, you use the STUFF argument to iterate through the rows where a=1 in your table and concatenate them with a comma and space in-between each value for the [b] column.
Then you group them by the [a] column to prevent you from returning one row for each original [a] row in the table.
I encourage you to check out this post for credits to the query and other possible solutions to your answer.
For Postgres you can use:
select a, string_agg(b::text, ',' order by b)
from the_table
group by a;

PostgreSQL query on text array value

I have a table where one column has an array - but stored in a text format:
mytable
id ids
-- -------
1 '[3,4]'
2 '[3,5]'
3 '[3]'
etc ...
I want to find all records that have the value 5 as an array element in the ids column.
I was trying to achieve this by using the "string to array" function and removing the [ symbols with the translate function, but couldn't find a way.
You can do this: http://www.sqlfiddle.com/#!1/5c148/12
select *
from tbl
where translate(ids, '[]','{}')::int[] && array[5];
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
You can also use bool_or: http://www.sqlfiddle.com/#!1/5c148/11
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id
from a
group by id
having bool_or(elem = 5);
To see the original elements:
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id, '[' || array_to_string(array_agg(elem), ',') || ']' as ids
from a
group by id
having bool_or(elem = 5);
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
Postgresql DDL is atomic, if it's not late yet in your project, just structure your stringly-typed array to a real array: http://www.sqlfiddle.com/#!1/6e18c/2
alter table tbl
add column id_array int[];
update tbl set id_array = translate(ids,'[]','{}')::int[];
alter table tbl drop column ids;
Query:
select *
from tbl
where id_array && array[5]
Output:
| ID | ID_ARRAY |
-----------------
| 2 | 3,5 |
You can also use contains operator: http://www.sqlfiddle.com/#!1/6e18c/6
select *
from tbl
where id_array #> array[5];
I prefer the && syntax though, it directly connotes intersection. It reflects that you are detecting if there's an intersection between two sets(array is a set)
http://www.postgresql.org/docs/8.2/static/functions-array.html
If you store the string representation of your arrays slightly differently, you can cast to array of integer directly:
INSERT INTO mytable
VALUES
(1, '{3,4}')
,(2, '{3,5}')
,(3, '{3}');
SELECT id, ids::int[]
FROM mytable;
Else, you have to put in one more step:
SELECT (translate(ids, '[]','{}'))::int[]
FROM mytable
I would consider making the column an array type to begin with.
Either way, you can find your row like this:
SELECT id, ids
FROM (
SELECT id, ids, unnest(ids::int[]) AS elem
FROM mytable
) x
WHERE elem = 5