select linked rows in the same table - sql

I'm creating a branching dialog game, and used a dialog tool that outputs JSON with a link and a link_path to connect dialogs together. I've parsed and inserted this structure in PostgreSQL.
I want to query a subset of rows, let's say starting with row 1, and follow the link_path until the link_path is null. Successive rows may be out of order.
For example, in the table below,
starting with row 1, I find row with link_path = b,
this gives me row 3, I find row with link_path = c,
this gives me row 4, row 4's link_path is null, so we return this set: [row 1, row 3, row 4]
--
link link_path info
--------------------------
a b asdjh
w y akhaq
b c uiqwd
c isado
y z qwiuu
z nzabo
In PostgreSQL, how can I select rows like this without creating a loop of queries? My goal is performance.

You can use a recursive query:
with recursive cte as (
select t.* from mytable t where link = 'a'
union all
select t.*
from cte c
inner join mytable t on t.link = c.link_path
)
select * from cte
Demo on DB Fiddle:
link | link_path | info
:--- | :-------- | :----
a | b | asdjh
b | c | uiqwd
c | null | isado

Related

Is it possible to map values onto a table given corresponding row and column indices in SQL?

I have a SQL table in the form of:
| value | row_loc | column_loc |
|-------|---------|------------|
| a | 0 | 1 |
| b | 1 | 1 |
| c | 1 | 0 |
| d | 0 | 0 |
I would like to find a way to map it onto a table/grid, given the indices, using SQL. Something like:
| d | a |
| c | b |
(The context being, I would like to create a colour map with colours corresponding to values a, b, c, d, in the locations specified)
I would be able to do this iteratively in python, but cannot figure out how to do it in SQL, or if it is even possible! Any help or guidance on this problem would be greatly appreciated!
EDIT: a, b, c, d are examples of numeric values (which would not be able to be selected using named variables in practice, so I'm relying on selecting them based on location. Also worth noting, the number of rows and columns will always be the same. The value column is also not the primary key to this table, so is not necessarily unique, it is just as a continuous value.
Yes, it is possible, assuming the column number is limited since SQL supports only determined number of columns. The number of rows in result set depends on number of distinct row_loc values so we have to group by column row_loc. Then choose value using simple case.
with t (value, row_loc, column_loc) as (
select 'a', 0, 1 from dual union all
select 'b', 1, 1 from dual union all
select 'c', 1, 0 from dual union all
select 'd', 0, 0 from dual
)
select max(case column_loc when 0 then value else null end) as column0
, max(case column_loc when 1 then value else null end) as column1
from t
group by row_loc
order by row_loc
I tested it on Oracle. Not sure what to do if multiple values match on same coordinate, I chose max. For different vendors you could also utilize special clauses such as count ... filter (where ...). Or the Oracle pivot clause can also be used.

Hive: merge or tag multiple rows based on neighboring rows

I have the following table and want to merge multiple rows based on neighboring rows.
INPUT
EXPECTED OUTPUT
The logic is that since "abc" is connected to "abcd" in the first row and "abcd" is connected to "abcde" in the second row and so on, thus "abc", "abcd", "abcde", "abcdef" are connected and put in one array. The same applied to the rest rows. The number of connected neighboring rows are arbitrary.
The question is how to do that using Hive script without any UDF. Do I have to use Spark for this type of operation? Thanks very much.
One idea I had is to tag rows first as
How to do that using Hive script only?
This is an example of a CONNECT BY query which is not supported in HIVE or SPARK, unlike DB2 or ORACLE, et al.
You can simulate such a query with Spark Scala, but it is far from handy. Putting a tag in means the question is less relevant then, imo.
Here is a work-around using Hive script to get the intermediate table.
drop table if exists step1;
create table step1 STORED as orc as
with src as
(
select split(u.tmp,",")[0] as node_1, split(u.tmp,",")[1] as node_2
from
(select stack (7,
"abc,abcd",
"abcd,abcde",
"abcde,abcdef",
"bcd,bcde",
"bcde,bcdef",
"cdef,cdefg",
"def,defg"
) as tmp
) u
)
select node_1, node_2, if(node_2 = lead(node_1, 1) over (order by node_1), 1, 0) as tag, row_number() OVER (order by node_1) as row_num
from src;
drop table if exists step2;
create table step2 STORED as orc as
SELECT tag, row_number() over (ORDER BY tag) as row_num
FROM (
SELECT cast(v.tag as int) as tag
FROM (
SELECT
split(regexp_replace(repeat(concat(cast(key as string), ","), end_idx-start_idx), ",$",""), ",") as tags --repeat the row number by the number of rows
FROM (
SELECT COALESCE(lag(row_num, 1) over(ORDER BY row_num), 0) as start_idx, row_num as end_idx, row_number() over (ORDER BY row_num) as key
FROM step1 where tag=0
) a
) b
LATERAL VIEW explode(tags) v as tag
) c ;
drop table if exists step3;
create table step3 STORED as orc as
SELECT
a.node_1, a.node_2, b.tag
FROM step1 a
JOIN step2 b
ON a.row_num=b.row_num;
The final table looks like
select * from step3;
+---------------+---------------+------------+
| step3.node_1 | step3.node_2 | step3.tag |
+---------------+---------------+------------+
| abc | abcd | 1 |
| abcd | abcde | 1 |
| abcde | abcdef | 1 |
| bcd | bcde | 2 |
| bcde | bcdef | 2 |
| cdef | cdefg | 3 |
| def | defg | 4 |
+---------------+---------------+------------+
The third column can be used to collect node pairs.

SQL Order random rows based on 2 columns

How to sort this table in Oracle9:
START | END | VALUE
A | F | 1
D | H | 9
F | C | 8
C | D | 12
To make it look like this?:
START | END | VALUE
A | F | 1
F | C | 12
C | D | 8
D | H | 9
Goal is to start every next row with the end from the previous row.
This cannot be done with the order by clause alone, as it would have to find the record without a predecessor first, then find the next record comparing end and start column of the two records etc. This is an iterative process for which you need a recursive query.
That recursive query would find the first record, then the next and so on, giving them sequence numbers. Then you'd use the result and order by those generated numbers.
Here is how to do it in standard SQL. This is supported from Oracle 11g onwards only, however. In Oracle 9 you'll have to use CONNECT BY with which I am not familiar. Hopefully you or someone else can convert the query for you:
with chain(startkey, endkey, value, pos) as
(
select startkey, endkey, value, 1 as pos
from mytable
where not exists (select * from mytable prev where prev.endkey = mytable.startkey)
union all
select mytable.startkey, mytable.endkey, mytable.value, chain.pos + 1 as pos
from chain
join mytable on mytable.startkey = chain.endkey
)
select startkey, endkey, value
from chain
order by pos;
UPDATE: As you say the data is cyclic, you'd have to change above query so as to start with an arbitrarily chosen row and stop when through:
with chain(startkey, endkey, value, pos) as
(
select startkey, endkey, value, 1 as pos
from mytable
where rownum = 1
union all
select mytable.startkey, mytable.endkey, mytable.value, chain.pos + 1 as pos
from chain
join mytable on mytable.startkey = chain.endkey
)
cycle startkey set cycle to 1 default 0
select startkey, endkey, value
from chain
where cycle = 0
order by pos;

How do I select rows where only return keys that don't have '1' in column c

Title is confusing I know, I'm just not sure how to word this. Anyway let me describe with a table:
| key | column b | column c |
|-----|----------|----------|
| a | 13 | 2 |
| a | 14 | 2 |
| a | 15 | 1 |
| b | 16 | 2 |
| b | 17 | 2 |
I'd like to select all keys where column c doesn't equal 1, so the select will result in returning only key 'b'
To clarify, my result set should not contain keys that have a row where column c is set to 1. Therefore I'd like a sql query that would return the keys that satisfy the previous statement.
To make my question as clear as possible. From the table above, what I want returned by some sql statement is a result set containing [{b}] based on the fact that key 'a' has at least one row where column c is equal to 1 whereas key 'b' does not have any rows that contain 1 in column c.
SELECT t.[Key]
FROM TableName t
WHERE NOT EXISTS (SELECT 1
FROM TableName
WHERE t.[key] = [key]
AND ColumnC = 1)
GROUP BY t.[Key]
SELECT KEY
FROM WhateverYourTableNameIs
WHERE c <> '1'
I would do this using group by and aggregation:
select [key]
from table t
group by [key]
having sum(case when c = 1 then 1 else 0 end) = 0;
The having clause counts the number of rows that have c = 1. The = 0 says that there are no such rows for a given key.
Elaboration based on other comments:
You asked for ALL keys where column c doesn't equal 1. That is exactly what the query I suggested will give you. The other part of your question so the SELECT will result in returning only key 'b', is ambiguous. The question as asked will give you results from columns A and B. There is nothing in your question to limit the result set. You either need an additional condition to your WHERE clause, or your question is inherently unanswerable.

SQLite - select the newest row with a certain field value

I have an SQLite question which essentially boils down to the following problem.
id | key | data
1 | A | x
2 | A | x
3 | B | x
4 | B | x
5 | A | x
6 | A | x
New data is appended to the end of the table with an auto-incremented id.
Now, I want to create a query which returns the latest row for each key, like this:
id | key | data
4 | B | x
6 | A | x
I've tried some different queries but I have been unsuccessful. How do you select only the latest rows for each "key" value in the table?
use this SQL-Query:
select * from tbl where id in (select max(id) from tbl group by key);
You could split the main task into two subroutine.
You could move with the approach first retrieve all id/key value then get the id for the latest value of A and B keys,
Now you could easly write a query to get latest value for A and B because you have value of id's for both A and B keys.
SELECT *
FROM mytable
JOIN
( SELECT MAX(id) AS maxid
FROM mytable
GROUP BY "key"
) AS grp
ON grp.maxid = mytable.id
Side note: it's best not to use reserved words like keyas identifiers (for tables, fields. etc.)
Without nested SELECTs, or JOINs but only if the field determining "newest" is primary key (e.g. autoincrement):
SELECT * FROM table GROUP BY key DESC;