Reducing a Postgres table to JSON - sql

I have a following table in Postgres
+----+-----------+----------------------+---------+
| id | user_fk| language_fk | details |
+----+-----------+----------------------+---------+
| 1 | 2 | en-us | 123 |
| 2 | 3 | en-us | 456 |
| 3 | 4 | en-us | 789 |
| 4 | 4 | es-la | 012 |
+----+-----------+----------------------+---------+
And I want to reduce this to the following SQL statement:
UPDATE users SET details = '{"en-us": "789", "es-la": "012"}' WHERE id = 4;
UPDATE users SET details = '{"en-us": "123"}' WHERE id = 2;
UPDATE users SET details = '{"en-us": "456"}' WHERE id = 3;
So I want to reduce languages per user and put it in a different table. Is there a way to do this in Postgres?

Use the function jsonb_object_agg() to get the expected output:
select
min(id) as id,
user_fk,
jsonb_object_agg(language_fk, details) as details
from users
group by user_fk
id | user_fk | details
----+---------+----------------------------------
1 | 2 | {"en-us": "123"}
2 | 3 | {"en-us": "456"}
3 | 4 | {"en-us": "789", "es-la": "012"}
(3 rows)
You cannot update the table in this way because of different types of old and new details column. Create a new table with reduced columns using create table from select:
create table new_users as
select
min(id) as id,
user_fk,
jsonb_object_agg(language_fk, details) as details
from users
group by user_fk;

Related

PostgreSQL row-level security involving foreign key with other table

I wonder if the following is possible in PostgreSQL using RLS (or any other mechanism). I want a user to be able to get certain rows of a table if its id matches a column in another table.
For e.g. we have following tables:
"user" table:
columns: id, name
| id | name |
| --- | --- |
| 1 | one |
| 2 | two |
| 3 | three|
| 4 | four |
"tenant" table:
columns: id, name
| id | name |
| --- | --- |
| 1 | t1 |
| 2 | t2 |
"user_tenant" table:
columns: user_id, tenant_id
| user_id | tenant_id|
| --- | --- |
| 1 | t1 |
| 2 | t2 |
| 3 | t1 |
| 4 | t2 |
Now I want only users who has same tenant_id.
output:
| id | name |
| --- | --- |
| 1 | one |
| 3 | three|
To achieve this, I need to create policy something like this:
CREATE POLICY tenant_policy ON "user" USING (tenant_id = current_setting('my_user.current_tenant')::uuid);
but with above policy it's not working as I am getting all users.
Note: user & tenant table have many-to-many relationship.
P.S. I know we can do this either by join or some other condition. But I want to achieve the above output using PostgreSQL using RLS(row level security)
Thanks in advance!!
If row level security is not working that may be because one of the following applies:
you didn't enable row level security:
ALTER TABLE "user" ENABLE ROW LEVEL SECURITY;
the user owns the table
You can enable row level security for the owner with
ALTER TABLE "user" FORCE ROW LEVEL SECURITY;
you are a superuser, which is always exempt from RLS
you are a user defines with BYPASSRLS
the parameter row_security is set to off
Other than that, you will probably have to join with user_tenant in your policy:
CREATE POLICY tenant_policy ON "user"
USING (
EXISTS(SELECT 1 FROM user_tenant AS ut
WHERE ut.user_id = "user".id
AND ut.tenant_id = current_setting('my_user.current_tenant')::uuid
)
);

Snowflake Create View with JSON (VARIANT) field as columns with dynamic keys

I am having a problem creating VIEWS with Snowflake that has VARIANT field which stores JSON data whose keys are dynamic and keys definition is stored in another table. So I want to create a VIEW that has dynamic columns based on the foreign key.
Here are my table looks like:
companies:
| id | name |
| -- | ---- |
| 1 | Company 1 |
| 2 | Company 2 |
invoices:
| id | invoice_number | custom_fields | company_id |
| -- | -------------- | ------------- | ---------- |
| 1 | INV-01 | {"1": "Joe", "3": true, "5": "2020-12-12"} | 1 |
| 2 | INV-01 | {"2":"Hello", "4": 1000} | 2 |
customization_fields:
| id | label | data_type | company_id |
| -- | ----- | --------- | ---------- |
| 1 | manager | text | 1 |
| 2 | reference | text | 2 |
| 3 | emailed | boolean | 1 |
| 4 | account | integer | 2 |
| 5 | due_date | date | 1 |
So I want to create a view for getting each companies invoices something like:
CREATE OR REPLACE VIEW companies_invoices AS SELECT * FROM invoices WHERE company_id = 1
which should get a result like below:
| id | invoice_number | company_id | manager | emailed | due_date |
| -- | -------------- | ---------- | ------- | ------- | -------- |
| 1 | INV-01 | 1 | Joe | true | 2020-12-12 |
So my challenge above here is I cannot make sure the keys when I write the query. If I know that I could write
SELECT
id,
invoice_number,
company_id,
custom_fields:"1" AS manager,
custom_fields:"3" AS emailed,
custom_fields:"5" AS due_date
FROM invoices
WHERE company_id = 1
These keys and labels are written in the customization_fields table, so I tried different ways and I am not able to do that.
So could anyone tell me if we can do or not? If we can please give me an example so it would really help.
You cannot do what you want to do with a view. A view has a fixed set of columns and they have specific types. Retrieving a dynamic set of columns requires some other mechanism.
If you're trying to change the number of columns or the names of the columns based on the rows in the customization_fields table, you can't do it in a view.
If you have a defined schema and just need to grab dynamic JSON properties, you may want to consider looking into Snowflake's GET function. It allows you to get any part of a JSON using a string for the path rather than using a literal path in the SQL statement. For example:
create temp table foo(v variant);
insert into foo select parse_json('{ "name":"John", "age":30, "car":null }');
-- This uses a literal path in the SQL to get to a JSON property
select v:name::string as first_name from foo;
-- This uses the GET function to get the value from a path in a string
select get(v, 'name')::string as first_name from foo;
You can replace the 'name' in the second parameter of the GET function with the value stored in the customization_fields table.
In SF, You will have to use a Stored Proc function to retrieve the dynamic set of columns

Is there a way to create phantom tables in a postgres database?

I have a table called groups in my postgres production server, they have a column called manager_id. What I'm trying to do is to create simulated look up tables base off the ids of the manager. For example, if I have the following rows for table groups:
| id |manager_id| ... |
| 1 | 1 | .
| 2 | 3 | .
| 4 | 1 |
| 5 | 2 |
| 7 | 2 |
I would like to access make believe tables like group-1
| id |manager_id| ... |
| 1 | 1 | .
| 4 | 1 | .
Or group-2
| id |manager_id| ... |
| 5 | 2 | .
| 7 | 2 | .
I am not sure if this is possible, and yes I'm aware I could query for it, but for the purpose of the question (and my very specific needs, which I am having trouble finding a workaround), can I do something like that? If yes, can I do it without duplicating data, just picking up references from the original one?
Generally: just a use a where clause to filter on the manager you want the result from:
select *
from groups
where manager_id = 1
You could take one step forward and create views:
create view v_groups_1 as
select *
from groups
where manager_id = 1
create view v_groups_2 as
select *
from groups
where manager_id = 2
You can then run queries against the views just like you would do with a regular table.

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Is there a single query that can update a "sequence number" across multiple groups?

Given a table like below, is there a single-query way to update the table from this:
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | NULL |
| 2 | 1 | 2010-04-27 | NULL |
| 3 | 2 | 2010-04-28 | NULL |
| 4 | 3 | 2010-04-28 | NULL |
To this (note that created_at is used for ordering, and sequence is "grouped" by type_id):
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | 1 |
| 2 | 1 | 2010-04-27 | 2 |
| 3 | 2 | 2010-04-28 | 1 |
| 4 | 3 | 2010-04-28 | 1 |
I've seen some code before that used an # variable like the following, that I thought might work:
SET #seq = 0;
UPDATE `log` SET `sequence` = #seq := #seq + 1
ORDER BY `created_at`;
But that obviously doesn't reset the sequence to 1 for each type_id.
If there's no single-query way to do this, what's the most efficient way?
Data in this table may be deleted, so I'm planning to run a stored procedure after the user is done editing to re-sequence the table.
You can use another variable storing the previous type_id (#type_id). The query is ordered by type_id, so whenever there is a change in type_id, sequence has to be reset to 1 again.
Set #seq = 0;
Set #type_id = -1;
Update `log`
Set `sequence` = If(#type_id=(#type_id:=`type_id`), (#seq:=#seq+1), (#seq:=1))
Order By `type_id`, `created_at`;
I don't know MySQL very well, but you could use a sub query though it may be very slow.
UPDATE 'log' set 'sequence' = (
select count(*) from 'log' as log2
where log2.type_id = log.type_id and
log2.created_at < log.created_at) + 1
You'll get duplicate sequences, though, if two type_ids have the same created_at date.