Merging multiple rows of STRUCT into a single now - sql

I am using CASE WHEN to return a struct of data
SELECT name,
CASE WHEN year=1 THEN STRUCT (variables) END as Period_1,
CASE WHEN year=2 THEN STRUCT (variables) END as Period_2,
CASE WHEN year=3 THEN STRUCT (variables) END as Period_3
FROM table
The result should ideally look like this (one example):
|name|Period_1|Period_2|Period_3|
|1st | Struct | Struct | Struct |
Currently, it looks like:
|name|Period_1|Period_2|Period_3|
|1st | struct | Null | Null |
|1st | Null | Struct | null |
|1st | Null | null | struct |
How would I go about consolidating these rows and thus removing the null
If it’s not struct I would have used some aggregating like min() or max() and group by name but here it’s not possible

Related

Cast VARCHAR columns to int, bigint, time, etc (PL/pgSQL)

Problem
(This is for an open source, analytics library.)
Here's our query results from events_view:
id | visit_id | name | prop0 | prop1 | url
------+----------+--------+----------------------------+-------+------------
2004 | 4 | Magnus | 2021-10-26 02:25:55.790999 | 142 | cnn.com
2007 | 4 | Hartis | 2021-10-26 02:26:37.773999 | 25 | fox.com
Currently all columns are VARCHAR.
Column | Type | Collation | Nullable | Default
----------+-------------------+-----------+----------+---------
id | bigint | | |
visit_id | character varying | | |
name | character varying | | |
prop0 | character varying | | |
prop1 | character varying | | |
url | character varying | | |
They should be something like
Column | Type | Collation | Nullable | Default
----------+------------------------+-----------+----------+---------
id | bigint | | |
visit_id | bigint | | |
name | character varying | | |
prop0 | time without time zone | | |
prop1 | bigint | | |
url | character varying | | |
Desired result
Hardcoding these castings as in SELECT visit::bigint, name::varchar, prop0::time, prop1::integer, url::varchar FROM tbl won't do, column names are known in run time only.
To simplify things we could cast each column into only three types: boolean, numeric, or varchar. Use regexps below for matching types:
boolean: ^(true|false|t|f)$
numeric: ^(,-)[0-9]+(,\.[0-9]+)$
varchar: every result that does not match boolean and numeric above
What should be the SQL that discover what type each column is and dynamically cast them?
These are a few ideas rather than a true solution for this tricky job. A slow but very reliable function can be used instead of regular expressions.
create or replace function can_cast(s text, vtype text)
returns boolean language plpgsql immutable as
$body$
begin
execute format('select %L::%s', s, vtype);
return true;
exception when others then
return false;
end;
$body$;
Data may be presented like this (partial list of columns from your example)
create or replace temporary view tv(id, visit_id, prop0, prop1) as
values
(
2004::bigint,
4::bigint,
case when can_cast('2021-10-26 02:25:55.790999', 'time') then '2021-10-26 02:25:55.790999'::time end,
case when can_cast('142', 'bigint') then '142'::bigint end
), -- determine the types
(2007, 4, '2021-10-26 02:26:37.773999', 25)
-- the rest of the data here;
I believe that it is possible to generate the temporary view DDL dynamically as a select from events_view too.

Casting string to int i.e. the string "res"

I have a column in a table which is type array<string>. The table is partitioned daily since 2018-01-01. At some stage, the values in the array goes from strings to integers. The data looks like this:
| yyyy_mm_dd | h_id | p_id | con |
|------------|-------|------|---------------|
| 2018-10-01 | 52988 | 1 | ["res", "av"] |
| 2018-10-02 | 52988 | 1 | ["1","2"] |
| 2018-10-03 | 52988 | 1 | ["1","2"] |
There is a mapping between the strings and integers. "res" maps to 1 and "av" maps to 2 etc. However, I've written a query to perform some logic. Here is a snippet (subquery) of it:
SELECT
t.yyyy_mm_dd,
t.h_id,
t.p_id,
CAST(e.con AS INT) AS api
FROM
my_table t
LATERAL VIEW EXPLODE(con) e AS con
My problem is that this doesn't work for the earlier dates when strings were used instead of integers. Is there anyway to select con and remap the strings to integers so the data is across all partitions?
Expected output:
| yyyy_mm_dd | h_id | p_id | con |
|------------|-------|------|---------------|
| 2018-10-01 | 52988 | 1 | ["1","2"] |
| 2018-10-02 | 52988 | 1 | ["1","2"] |
| 2018-10-03 | 52988 | 1 | ["1","2"] |
Once the values selected are all integers (within a string array), then the CAST(e.con AS INT) will work
Edit: To clarify, I will put the solution as a subquery before I use lateral view explode. This way I am exploding on a table where all partitions have integers in con. I hope this makes sense.
CAST(e.api as INT) returns NULL if not possible to cast. collect_list will collect an array including duplicates and without NULLs. If you need array without duplicated elements, use collect_set().
SELECT
t.yyyy_mm_dd,
t.h_id,
t.p_id,
collect_list(--array of integers
--cast case as string if you need array of strings
CASE WHEN e.api = 'res' THEN 1
WHEN e.api = 'av' THEN 2
--add more cases
ELSE CAST(e.api as INT)
END
) as con
FROM
my_table t
LATERAL VIEW EXPLODE(con) e AS api
GROUP BY t.yyyy_mm_dd, t.h_id, t.p_id

Store mapping with multiple conditions in database

I have a SELECT-Statement where ich have to map keys. I Need to store this mapping an a database because this mapping can Change. As the mapping-condition is only base on one key, it is relativly simple.
SELECT
table.flield1 AS COL1
, (SELECT value from TransformationTable WHERE key = table.field2) AS COL2
[...]
Now i have a case where the mapping-condition is more complicated. In SQL it is like:
CASE
WHEN table.field1 = 'ORG' AND table.field2 IN (1,2,3) THEN 01
WHEN table.field1 = 'ORG' AND table.field2 NOT IN (5,76,88) OR IN (9) THEN 02
WHEN table.field1 != 'ORG' AND table.field2 IN (1,2,3) THEN 03
END
How can I store such condition in a database so that I can select the value like in example 1.
Does some one have an idea?
Essentially to perform this task you need something that can evaluate an expression for you on the fly. The only ways I have solved this kind of problem in the past are:
Easy to implement but less flexible: embed the expression in a view then query from the view; change view as mapping expression changes. Remember a view is just a stored SQL statement and a SQL statement can contain and evaluate complex expressions.
Harder to implement but more flexible: store a lexical construct for the expression in a table somewhere; then read in that "expression" and use it to generate the SQL dynamically; then run the dynamically generated SQL.
Have fun!!!
If You want a table solution, then I think of something like this:
PARENT | CHILD | LOGICAL | COLUMN | OPERATOR | VALUES | OUTPUT |
OPERATOR ID | OPERATOR ID | OPERATOR | (expression) | OPERATOR | VALUES | ATTRIBUTE |
--------------------------- ---------------------------------------------------------------
10001 | 10003 | AND | field1 | IN | 1,2,3 | 01 |
10001 | 10004 | AND | field2 | NOT IN | 5,76,88 | 01 |
10002 | 10005 | | field1 | IN | ... | 02 |
10002 | 10006 | | field2 | NOT IN | ... | 02 |
Columns PARENT OPERATOR ID and CHILD OPERATOR ID are additional, in case You need nested AND-OR operators ( ... AND ( ... OR ... AND ( ... OR ... ) ) )

How do I replace a column value, conditionally depending on another column value (with LEFT JOIN)?

I have this query:
SELECT * from #b as t
LEFT outer JOIN WR_16h_vs_MVA_16h_csv as csv
on t.PROBE_ID = csv.PROBE_ID;
which returns results that look like this:
|id|...|...|...|functionCC_A|...|functionCC_B|...|
------------------------------------------
|1 | | | | lalala | | NULL | |
|2 | | | | asdad | | bababa | |
|3 | | | | NULL | | NULL | |
|n | | | | werwer | | NULL | |
There are two functionCC columns because of a JOIN. I want a single functionCC column but here are the cases:
if functionCC_A is NULL, use value from functionCC_B
iffunctionCC_A has a value and so does functionCC_B, use
functionCC_B
if functionCC_A has a value but functionCC_B is NULL, use functionCC_A
if both NULL leave as NULL
How can I craft my query so that I can replace the first functionCC (functionCC_A) column value conditionally, depending on the value in the second functionCC column (functionCC_B)?
The COALESCE() function returns the first non-null value from a list so something like:
SELECT COALESCE(csv.functionCC,t.functionCC) AS functionCC
from #b as t
LEFT outer JOIN WR_16h_vs_MVA_16h_csv as csv
on t.PROBE_ID = csv.PROBE_ID;
This satisfies all the criteria listed since you want 'b' if it's populated and COALESCE() will return NULL if none of the listed fields are populated.
I wasn't sure if the _A and _B were for illustration purposes, assumed one column named functionCC coming from each of the tables, so might have to adjust the names above.

insert values only if subselect is not null

I am trying to write sql that will create a new table using data from an existing table.
The new table will have, among other columns, two columns like so:
reserved boolean
reserved_for character varying(10)
In the original table, I have data that looks like this:
id | identification | department | description | lastchange | available | type
------------+----------------+------------------------------------+-------------------------------------------+---------------------+-----------+--------
9145090050 | | | Reserved for llb | 2011-05-20 11:46:21 | f |
9145090096 | | | Reserved for ppa | 2013-01-26 12:31:56 | f |
9145090046 | | | | 2011-05-06 10:34:21 | f |
If the original table has the text "Reserved for ..." then, I want the reserved field in the new table to be set to "true" and reserved_for to contain the 3 or 4 characters that follow the "Reserved for" text in the original table.
So using the above table as an example, I want my new table to look like this;
id | reserved | reserved_for | lastchange |
------------+----------+----------------+---------------------+
9145090050 | true | llb | 2011-05-20 11:46:21 |
9145090096 | true | ppa | 2013-01-26 12:31:56 |
9145090046 | false | | 2011-05-06 10:34:21 |
The query I have to extract the 4 characters after the "Reserved for " looks like this:
select
substring(description from 13 for 4)
from
definition
where
description like 'Reserved for%';
It works in that it extracts the characters I need. How do I write the conditional statement in my create table command?
I think this is just some string manipulation on the original table:
select id,
(description like 'Reserved for%') as Reserved,
(case when description like 'Reserved for%'
then substring(description from 14)
end) as Reserved_For,
last_change
from original;
Perhaps this will help..
insert into yourtable(id,reserved,reserved_for,lastchange)
select t1.id,
case when t1.description like 'Reserved for%' then true else false end as reserved,
case when when t1.description like 'Reserved for%' then substr('reserved for llb',strpos('reserved for llb','for') + 4) else null end as reserved_for,
lastchange
from table t1