Convert a string of ids into a string of equivalent names - sql

I have this table (mock data) :
ID
Name
Location
1
Main
/
2
Photos
/1/3
3
Media
/1
4
Charts
/
5
Expenses
/4
The column Location is a string with ids that refer to that very table.
I'm looking for a query to convert ids into names, something like this :
ID
Name
Location
FullName
1
Main
/
/
2
Photos
/1/3
/Main/Media
3
Media
/1
/Main
4
Charts
/
/
5
Expenses
/4
/Charts
This is some mock data, in my real table I have more complex locations.
I'm not the owner of the table so I can't modify the schema. I can only read it.
Someone has an idea ?
Thank you very much
I've been exploring with this function : regexp_split_to_table
WITH flat_data AS (
SELECT DISTINCT
col.id col_id,
col.name col_name,
col.location col_full_loc,
regexp_split_to_table(col.location, '/') as loc_item
FROM collection col),
clean_data AS (
SELECT
col_id,
col_name,
col_full_loc,
CASE WHEN loc_item = '' THEN null ELSE loc_item::integer END loc_item,
ROW_NUMBER() over (partition by col_id, loc_item)
FROM flat_data
) select * from clean_data
So I've managed to have something like this :
| ID | Name | Location | AfterFunction |
| -- | -- | -- | -- |
| 1 | Main | / | |
| 2 | Photos | /1/3 | |
| 2 | Photos | /1/3 | 3 |
| 2 | Photos | /1/3 | |
| 2 | Photos | /1/3 | 1 |
| 3 | Media | /1 | |
| 3 | Media | /1 | 1 |
| 4 | Charts | / | |
| 5 | Expenses | /4 | |
| 5 | Expenses | /4 | 4 |
But at some point I lose the order of sublocation item
EDIT : table style

Outlook to the solution
ignore the first slash in the location to simplify the split and mapping (add it again at the end)
use regexp_split_to_table along with WITH ORDINALITY to preserve the order
outer join the location part to the original table (cast the idto textis it is int)
string_agg the location names to one string using the ordinality column and add the fixed slash prefix.
Query
with t2 as (
select * from t,
regexp_split_to_table(substr(t.location,2), '/') WITH ORDINALITY x(part, rn)
),
t3 as (
select t2.*, t.name part_name from t2
left outer join t on t2.part = t.id::text)
select
t3.id, t3.name, t3.location,
'/'||coalesce(string_agg(t3.part_name,'/' order by t3.rn),'') loc_name
from t3
group by 1,2,3
order by 1
gives result
id|name |location|loc_name |
--+--------+--------+-----------+
1|Main |/ |/ |
2|Photos |/1/3 |/Main/Media|
3|Media |/1 |/Main |
4|Charts |/ |/ |
5|Expenses|/4 |/Charts |
Below the result of the subqueries to illustrated the steps
-- T2
id|name |location|part|rn|
--+--------+--------+----+--+
1|Main |/ | | 1|
2|Photos |/1/3 |1 | 1|
2|Photos |/1/3 |3 | 2|
3|Media |/1 |1 | 1|
4|Charts |/ | | 1|
5|Expenses|/4 |4 | 1|
-- T3
id|name |location|part|rn|part_name|
--+--------+--------+----+--+---------+
1|Main |/ | | 1|Main |
2|Photos |/1/3 |1 | 1|Photos |
2|Photos |/1/3 |3 | 2|Photos |
3|Media |/1 |1 | 1|Media |
4|Charts |/ | | 1|Charts |
5|Expenses|/4 |4 | 1|Expenses |

Related

Postgresql query substract from one table

I have a one tables in Postgresql and cannot find how to build a query.
The table contains columns nr_serii and deleteing_time. I trying to count nr_serii and substract from this positions with deleting_time.
My query:
select nr_serii , count(nr_serii ) as ilosc,count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii, deleting_time
output is:
+--------------------+
| "666666";1;1 |
| "456456";1;0 |
| "333333";3;0 |
| "333333";1;1 |
| "111111";1;1 |
| "111111";3;0 |
+--------------------+
The part of table with raw data:
+--------------------------------+
| "666666";"2020-11-20 14:08:13" |
| "456456";"" |
| "333333";"" |
| "333333";"" |
| "333333";"" |
| "333333";"2020-11-20 14:02:23" |
| "111111";"" |
| "111111";"" |
| "111111";"2020-11-20 14:08:04" |
| "111111";"" |
+--------------------------------+
And i need substract column ilosc and column ilosc_delete
example:
nr_serii:333333 ilosc:3-1=2
Expected output:
+-------------+
| "666666";-1 |
| "456456";1 |
| "333333";2 |
| "111111";2 |
| ... |
+-------------+
I think this is very simple solution for this but i have empty in my head.
I see what you want now. You want to subtract the number where deleting_time is not null from the ones where it is null:
select nr_serii,
count(*) filter (where deleting_time is null) - count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii;
Here is a db<>fiddle.

In Hive, what is the difference between explode() and lateral view explode()

Assume there is a table employee:
+-----------+------------------+
| col_name | data_type |
+-----------+------------------+
| id | string |
| perf | map<string,int> |
+-----------+------------------+
and the data inside this table:
+-----+------------------------------------+--+
| id | perf |
+-----+------------------------------------+--+
| 1 | {"job":80,"person":70,"team":60} |
| 2 | {"job":60,"team":80} |
| 3 | {"job":90,"person":100,"team":70} |
+-----+------------------------------------+--+
I tried the following two queries but they all return the same result:
1. select explode(perf) from employee;
2. select key,value from employee lateral view explode(perf) as key,value;
The result:
+---------+--------+--+
| key | value |
+---------+--------+--+
| job | 80 |
| team | 60 |
| person | 70 |
| job | 60 |
| team | 80 |
| job | 90 |
| team | 70 |
| person | 100 |
+---------+--------+--+
So, what is the difference between them? I did not find suitable examples. Any help is appreciated.
For your particular case both queries are OK. But you can't use multiple explode() functions without lateral view. So, the query below will fail:
select explode(array(1,2)), explode(array(3, 4))
You'll need to write something like:
select
a_exp.a,
b_exp.b
from (select array(1, 2) as a, array(3, 4) as b) t
lateral view explode(t.a) a_exp as a
lateral view explode(t.b) b_exp as b

SELECTing Related Rows Based on a Single Row Match

I have the following table running on Postgres SQL 9.5:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
|10 | 4567890 | abc123-ab |
|11 | 4567890 | gex890-aj |
|12 | 4567890 | ghi567-ef |
+---+------------+-------------+
I am looking for the rows for each trans_id based on a LIKE query, like this:
SELECT * FROM table
WHERE message LIKE '%def-234%'
This, of course, returns just three rows, the three that match my pattern in the message column. What I am looking for, instead, is all the rows matching that trans_id in groups of messages that match. That is, if a single row matches the pattern, get all the rows with the trans_id of that matching row.
That is, the results would be:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
+---+------------+-------------+
Notice rows 10, 11, and 12 were not SELECTed because there was not one of them that matched the %def-234% pattern.
I have tried (and failed) to write a sub-query to get the all the related rows when a single message matches a pattern:
SELECT sub.*
FROM (
SELECT DISTINCT trans_id FROM table WHERE message LIKE '%def-234%'
) sub
WHERE table.trans_id = sub.trans_id
I could easily do this with two queries, but the first query to get a list of matching trans_ids to include in a WHERE trans_id IN (<huge list of trans_ids>) clause would be very large, and would not be a very inefficient way of doing this, and I believe there exists a way to do it with a single query.
Thank you!
This will do the job I think :
WITH sub AS (
SELECT trans_id
FROM table
WHERE message LIKE '%def-234%'
)
SELECT *
FROM table JOIN sub USING (trans_id);
Hope this help.
Try this:
SELECT ID, trans_id, message
FROM (
SELECT ID, trans_id, message,
COUNT(*) FILTER (WHERE message LIKE '%def234%')
OVER (PARTITION BY trans_id) AS pattern_cnt
FROM mytable) AS t
WHERE pattern_cnt >= 1
Using a FILTER clause in the windowed version of COUNT function we can get the number of records matching the predefined pattern within each trans_id slice. The outer query uses this count to filter out irrelevant slices.
Demo here
You can do this.
WITH trans
AS
(SELECT DISTINCT trans_id
FROM t1
WHERE message LIKE '%def234%')
SELECT t1.*
FROM t1,
trans
WHERE t1.trans_id = trans.trans_id;
I think this will perform better. If you have enough data, you can do an explain on both Sub query and CTE and compare the output.

Correlated subquery issue in Hive / Spark SQL to compare aggregates

I have a table of about a billion game interactions where each record describes attributes of a gamer on a particular day (gamer ID, date on which the game was played, attribute 1, [attribute 2, ...])
+---------+
| TABLE_1 |
+-------+-----------+-------+-------+-------+
|gmr_id | played_dt | attr1 | attr2 | attr3 |
+-------+-----------+-------+-------+-------+
|1 | 2017-01-01| 1 | 2 | txt |
|1 | 2017-01-02| 3 | 2 | txt |
|2 | 2017-01-02| 1 | 2 | txt |
+-------+-----------+-------+-------+-------+
I have another table with millions of records where the gamers moves are recorded for each game played:
+---------+
| TABLE_2 |
+-------+-----------+---------+---------+---------+
|gmr_id | played_dt | finish | attacks | deaths |
+-------+-----------+---------+---------+---------+
|1 | 2017-01-01| 10 | 1 | 9 |
|1 | 2017-01-03| 12 | 10 | 2 |
|2 | 2017-01-02| 1 | 0 | 0 |
|4 | 2017-01-03| 1 | 0 | 1 |
|1 | 2017-01-04| 3 | 1 | 2 |
+-------+-----------+---------+---------+---------+
For every record in TABLE_1 -- specifically for every gmr_id and played_dt, I am trying to compare sums of moves in next two days of played_dt with previous five days (1 if true, else 0) and join to TABLE_1 based on gmr_id and played_dt:
i.e.
Sum of finishes, attacks and deaths for the gmr_id from played_dt to two days after, i.e. SUM(finishes), SUM(attacks) etc. BETWEEN played_dt AND DATE_ADD(played_dt, 2)
Sum of finishes, attacks and deaths for the gmr_id from five days before to one day before played_dt, i.e SUM(finishes), SUM(attacks) etc. BETWEEN DATE_SUB(played_dt, 5) AND DATE_SUB(played_dt, 1)
Compare and set flags i.e. get a row that looks like: gmr_id, played_dt, future_finish_gt_past_finish (1 if finish in days after played_dt greater than days before else 0), future_attacks_gt_past_attacks (1 if attacks in days after played_dt are greater than days before else 0), etc.
Join the row: gmr_id, played_dt, finish_f, attack_f
with TABLE_1 row: gmr_id, played_dt, attr1, attr2, attr3
on gmr_id, played_dt
I have tried writing correlated sub-query but to no-avail:
SELECT
t1.gmr_id,
t1.played_dt,
(SELECT
t2.gmr_id,
SUM(t2.finish) `future_finish`,
SUM(t2.attacks) `future_attacks`
FROM TABLE_2 t2 WHERE t2.played_dt BETWEEN played_dt AND DATE_ADD(played_dt, 2)
GROUP BY t2.gmr_id),
(SELECT
t2.gmr_id,
SUM(t2.finish) `past_finish`,
SUM(t2.attacks) `past_attacks`
FROM TABLE_2 t2 WHERE t2.played_dt BETWEEN DATE_SUB(played_dt, 5) AND DATE_SUB(played_dt, 1)
GROUP BY t2.gmr_id),
CASE WHEN future_finish > past_finish THEN 1 ELSE 0 END `finish_f`,
CASE WHEN future_attacks > past_attacks THEN 1 ELSE 0 END `attack_f`
FROM
TABLE_1 t1;
The expected output looks like this:
+---------+
| TABLE_1 |
+-------+-----------+-------+-------+-------+-----------+-----------+
|gmr_id | played_dt | attr1 | attr2 | attr3 | finish_f | attack_f |
+-------+-----------+-------+-------+-------+-----------+-----------+
|1 | 2017-01-01| 1 | 2 | txt | 1 | 0 |
|1 | 2017-01-02| 3 | 2 | txt | 1 | 1 |
|2 | 2017-01-02| 1 | 2 | txt | 0 | 1 |
+-------+-----------+-------+-------+-------+-----------+-----------+
I am using Hive 1.2 (or could use Spark 1.5) to do this, but so far I have been unable to do so. What would be the best way to accomplish this? I would greatly appreciate your help.

Pivot table using flat table structure in SQL Server without aggregation

I have a flat table structure which I've turned into a column based table. I'm struggling with getting the rowId from my raw data to appear in my column based table. Any help greatly appreciated.
Raw data in table derived from three different tables:
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 1 |rowId |[NULL] |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
| 2 |rowId |[NULL] |
SQL using pivot to give me the column structure:
select [displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId]
from
(
SELECT [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId])
) as P
Result:
|displayOrder|itemDescription|itemName |price1|price2|price3|rowId |
| ---------- |:-------------:|:------------:|:----:|:----:|:----:|-----:|
|1 |Green |Polo Shirt |4.2 |5.3 |7.5 |[NULL]|
|3 |Red watch |Digital Watch |4.0 |2.0 |1.5 |[NULL]|
I understand why I'm getting the NULL value for rowId. What I'm stuck on and I'm not sure if it's possible to do as I've looked an many example and none seem to do this, that is to pull the value for rowId from the raw data and add it to my structure.
It looks obvious now!
I'm now not including rowId as part of my flat structure.
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
I've updated the SQL, you can see I'm pulling in the rowId from tblFlatColumnValues
select [rowId],[displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3]
from
(
SELECT cv.rowId, [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3])
) as P