(SQLite) Selecting # separated data as multiple rows - sql

I have a table wihch contains foreign keys concactenated (separator #) in a field. I want to transform the data to one row per FK so that I can do a join on the data.
My table looks like this:
ARCHE
id_a | str_ids
str_ids field contains concactenate FK as follows: #id1#id2#id4#id7#
(There is a different number of agregated id's for each row)
I am not really familiar with SQLite, and I have trouble finding the equivalent. I understood I have to do this "with recursive" but it seems I can't get the hang of this.
The oracle equivalent of what I am looking for is as follows:
select
id_a
,trim(regexp_substr(str_ids, '[^#]+', 1, LEVEL)) as id_b
from arche
connect by trim(regexp_substr(str_ids, '[^#]+', 1, LEVEL)) IS NOT NULL

In SQLite you can use a recursive common table expression to solve this. The recursive CTE selects from the original table and splits the strings into subparts, which the main query selects from.
WITH RECURSIVE cte(id, val, etc) AS(
SELECT id_a, '', str_ids FROM arche
UNION ALL
SELECT
id
, SUBSTR(etc, 0, INSTR(etc, '#'))
, SUBSTR(etc, INSTR(etc, '#')+1)
FROM cte
WHERE etc <> ''
)
SELECT id AS id_a, REPLACE(val, 'id', '') AS id_b
FROM cte
WHERE val <> ''
ORDER BY id, val
Here is an example :
Schema (SQLite v3.26)
Query #1
WITH RECURSIVE cte(id, val, etc) AS(
SELECT 1, '', '#id1#id2#id4#id7#'
UNION ALL
SELECT
id
, SUBSTR(etc, 0, INSTR(etc, '#'))
, SUBSTR(etc, INSTR(etc, '#')+1)
FROM cte
WHERE etc <> ''
)
SELECT id AS id_a, val AS id_b
FROM cte
WHERE val <> ''
ORDER BY id, val;
| id_a | id_b |
| ---- | ---- |
| 1 | id1 |
| 1 | id2 |
| 1 | id4 |
| 1 | id7 |
View on DB Fiddle
NB2 :
REGEXP_REPLACE does not exists in SQLite, I replaced it with REPLACE
you need a # ad the end of string for this to work (having two #` is ok, too)
this is not a very performant approach ; if you have lots of rows to process, this might not scale well.

Related

How to use REGEXP_SUBSTR properly?

Currently in my select statement I have id and value. The value is json which looks like this:
{"layerId":"nameOfLayer","layerParams":{some unnecessary data}
I would like to have in my select id and nameOfLayer so the output would be for example:
1, layerName
2, layerName2
etc.
The json looks always the same so the layerID is the first.
Could you tell me how can I use REGEXP_SUBSTR properly in my select query which looks like this now?
select
id,
value
from
...
where
table1.id = table2.bookmark_id
and ...;
In Oracle 11g, you can extract the layerId using the following regular expression, where js is the name of your JSON column:
regexp_replace(js, '^.*"layerId":"([^"]+).*$', '\1')
This basically extracts the string between double quotes after "layerId":.
In more recent versions, you would add a check constraint on the table to ensure that the document is valid JSON, and then use the dot notation to access the object attribute as follows:
create table mytable (
id int primary key,
js varchar2(200),
constraint ensure_js_is_json check (js is json)
);
insert into mytable values (1, '{"layerId":"nameOfLayer","layerParams":{} }');
select id, t.js.layerId from mytable t;
Demo on DB Fiddle:
ID | LAYERID
-: | :----------
1 | nameOfLayer
Don't use regular expressions; use a JSON_TABLE or JSON_VALUE to parse JSON:
Oracle 18c Setup:
CREATE TABLE test_data (
id INTEGER,
value VARCHAR2(4000)
);
INSERT INTO test_data ( id, value )
SELECT 1, '{"layerId":"nameOfLayer","layerParams":{"some":"unnecessary data"}}' FROM DUAL UNION ALL
SELECT 2, '{"layerParams":{"layerId":"NOT THIS ONE!"},"layerId":"nameOfLayer"}' FROM DUAL UNION ALL
SELECT 3, '{"layerId":"Name with \"Quotes\"","layerParams":{"layerId":"NOT THIS ONE!"}}' FROM DUAL;
Query 1:
SELECT t.id,
j.layerId
FROM test_data t
CROSS JOIN
JSON_TABLE(
t.value,
'$'
COLUMNS (
layerId VARCHAR2(50) PATH '$.layerId'
)
) j
Query 2:
If you only want a single value you could, alternatively, use JSON_VALUE:
SELECT id,
JSON_VALUE( value, '$.layerId' ) AS layerId
FROM test_data
Output:
Both output:
ID | LAYERID
-: | :-----------------
1 | nameOfLayer
2 | nameOfLayer
3 | Name with "Quotes"
Query 3:
You can try regular expressions but they do not always work as expected:
SELECT id,
REPLACE(
REGEXP_SUBSTR( value, '[{,]"layerId":"((\\"|[^"])*)"', 1, 1, NULL, 1 ),
'\"',
'"'
) AS layerID
FROM test_data
Output:
ID | LAYERID
-: | :-----------------
1 | nameOfLayer
2 | NOT THIS ONE!
3 | Name with "Quotes"
So if you can guarantee that no-one is going to put data into the database where the JSON is in a different order then this may work; however the JSON specification allows key-value pairs to be in any order so regular expressions are not a general solution that will parse every JSON string. You should be using a proper JSON parser and there are 3rd party solutions available for Oracle 11g or you can upgrade to Oracle 12c where there is a native solution.
db<>fiddle here
I think you can use regexp_substr like this:
regexp_substr(str, '[^"]+',1,2) as layer_id,
regexp_substr(str, '[^"]+',1,4) as layername
Db<>fiddle demo
Cheers!!

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

Oracle Sql. Recursive select with join

I have a table with applications and table with messages. Applications has hierarchical structure, e.g each application has a parent. And I have a table with messages. Each message has a key and an application id. I want to be able to select message by it's key. If it's found for current application then return it, if no then try to find it with parent id.
Apps table:
id | name | parentId
--------------------
1 |parent| NULL
2 |child | 1
Msg table:
key | text | app
-------------------------------------------------
overriden.system.msg | some text | 1
parent.msg | parent txt | 1
overriden.system.msg | overriden text | 2
So if I'm in the child app(2) on key overriden.system.msg I want to get overriden text. On key parent.msg I want to get parent txt. I know that it must be done with cte, but I have very little expirience with sql and cte is mind-blowing for me right now. Could you please provide working query for this situation? Or maybe you have a better vision how to achieve such functionality without recursion?
The above should work fine:
with app_tree (id, app, lvl) as
( select id , parentID , 0
from Apps
where id = 2
union all
select t.app, a.parentID, t.lvl + 1 from
Apps a
join app_tree t
on t.app = a.id ),
all_msg as (
select key, text,
row_number() over ( partition by key order by lvl) overrideLevel
from app_tree a
join msg m
on m.app = a.id )
select * from all_msg where
overrideLevel =1
It returns:
KEY TEXT
-------------------------------------
overriden.system.msg overriden text
parent.msg parent txt
First it gets the list of all applications for specified id using recursive query with parentid. After that it gets a list of all functions for all applications and generates increasing numbers for the same keys basing on the level. At the end it just takes the first possible level and ignore all parent's levels for the same key.
Mb it will help you:
with app (id, name, parent_id) as
(select 1, 'parent', null from dual union all
select 2, 'child', 1 from dual)
,msg (key, text, app) as
(select 'overriden.system.msg', 'some text', 1 from dual union all
select 'parent.msg', 'parent txt', 1 from dual union all
select 'overriden.system.msg', 'overriden text', 2 from dual)
select key
,max(text) keep (dense_rank last order by nvl2(text,level,0)) msg
from
(select *
from app a
join msg m on (a.id = m.app)) v
start with v.parent_id is null
connect by prior v.id = v.parent_id and prior v.key = v.key
group by key

PostgreSQL query on text array value

I have a table where one column has an array - but stored in a text format:
mytable
id ids
-- -------
1 '[3,4]'
2 '[3,5]'
3 '[3]'
etc ...
I want to find all records that have the value 5 as an array element in the ids column.
I was trying to achieve this by using the "string to array" function and removing the [ symbols with the translate function, but couldn't find a way.
You can do this: http://www.sqlfiddle.com/#!1/5c148/12
select *
from tbl
where translate(ids, '[]','{}')::int[] && array[5];
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
You can also use bool_or: http://www.sqlfiddle.com/#!1/5c148/11
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id
from a
group by id
having bool_or(elem = 5);
To see the original elements:
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id, '[' || array_to_string(array_agg(elem), ',') || ']' as ids
from a
group by id
having bool_or(elem = 5);
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
Postgresql DDL is atomic, if it's not late yet in your project, just structure your stringly-typed array to a real array: http://www.sqlfiddle.com/#!1/6e18c/2
alter table tbl
add column id_array int[];
update tbl set id_array = translate(ids,'[]','{}')::int[];
alter table tbl drop column ids;
Query:
select *
from tbl
where id_array && array[5]
Output:
| ID | ID_ARRAY |
-----------------
| 2 | 3,5 |
You can also use contains operator: http://www.sqlfiddle.com/#!1/6e18c/6
select *
from tbl
where id_array #> array[5];
I prefer the && syntax though, it directly connotes intersection. It reflects that you are detecting if there's an intersection between two sets(array is a set)
http://www.postgresql.org/docs/8.2/static/functions-array.html
If you store the string representation of your arrays slightly differently, you can cast to array of integer directly:
INSERT INTO mytable
VALUES
(1, '{3,4}')
,(2, '{3,5}')
,(3, '{3}');
SELECT id, ids::int[]
FROM mytable;
Else, you have to put in one more step:
SELECT (translate(ids, '[]','{}'))::int[]
FROM mytable
I would consider making the column an array type to begin with.
Either way, you can find your row like this:
SELECT id, ids
FROM (
SELECT id, ids, unnest(ids::int[]) AS elem
FROM mytable
) x
WHERE elem = 5

How to select parent ids

I have table with such structure.
ElementId | ParentId
-------------------
1 | NULL
2 | 1
3 | 2
4 | 3
Let say current element has Id 4. I want to select all parent ids.
Result should be: 3, 2, 1
How I can do it? DB is MSSQL
You can use recursive queries for this: http://msdn.microsoft.com/en-us/library/aa175801(SQL.80).aspx
You can use it like this:
with Hierachy(ElementID, ParentID, Level) as (
select ElementID, ParentID, 0 as Level
from table t
where t.ElementID = X -- insert parameter here
union all
select t.ElementID, t.ParentID, th.Level + 1
from table t
inner join Hierachy th
on t.ParentId = th.ElementID
)
select ElementID, ParentID
from Hierachy
where Level > 0
I think it might be easiest to do the following:
while parent != NULL
get parent of current element
I can't think of any way of doing this in plain SQL that wouldn't cause issues on larger databases.
if you want pure sql try:
select ParentId from myTable Desc
that would work in mysql... you might need to modify the Desc (sort in descending order) part