Batch insert one to many - sql

Let's say we have two tables:
flight segmment
| name | type | | name | type |
|-------------|------| |-------------|------|
| origin | text | | origin | text |
| destination | text | | destination | text |
| flight_id | int |
Where the relationship is One to Many (one flight can have multiple segments).
I would like to know if there is a way to batch insert a lot of rows at the same time while keeping the relationship?
Here's my current method but it may be not optimised:
From this documentation here, I'm using this way:
INSERT INTO flight (origin, destination) VALUES
('PAR', 'LON'), -- returning id: 1
('PAR', 'BKK'), -- returning id: 2
('PAR', 'DPS'); -- returning id: 3
I insert multiple flights using above request and I return the id of all of them. Then programmatically I'm updating my segments before inserting them with ids of flight.
Lastly I'm inserting every segments using the same methods as flights.
I will end with a second request that seems like:
INSERT INTO segment (origin, destination, flight_id) VALUES
('PAR', 'LON', 1),
('PAR', 'AMS', 2),
('AMS', 'BKK', 2),
('PAR', 'IST', 3),
('IST', 'DPS', 3);
The second problem is that the relationship is one to many so some methods (Multiple INSERTS into one table and many to many table, Insert data in 3 tables at a time using Postgres or How to use RETURNING with ON CONFLICT in PostgreSQL?) seems not to be my use case as I want to batch insert multiple flight AND multiple segments with a relationship one to many.
Is there a way to do this with only one query?

Related

Issue displaying empty value of repeated columns in Google Data Studio

I've got an issue when trying to visualize in Google Data Studio some information from a denormalized table.
Context: I want to gather all the contact of a company and there related orders in a table in Big Query. Contacts can have no order or multiple orders. Following Big Query best practice, this table is denormalized and all the orders for a client are in arrays of struct. It looks like this:
Fields Examples:
+-------+------------+-------------+-----------+
| Row # | Contact_Id | Orders.date | Orders.id |
+-------+------------+-------------+-----------+
|- 1 | 23 | 2019-02-05 | CB1 |
| | | 2020-03-02 | CB293 |
|- 2 | 2321 | - | - |
|- 3 | 77 | 2010-09-03 | AX3 |
+-------+------------+-------------+-----------+
The issue is when I want to use this table as a data source in Data Studio.
For instance, if I build a table with Contact_Id as dimension, everything is fine and I can see all my contacts. However, if I add any dimensions from the Orders struct, all info from contact with no orders are not displayed. For instance, all info from Contact_Id 2321 is removed from the table.
Have you find any workaround to visualize these empty arrays (for instance as null values)?
The only solution I've found is to build an intermediary table with the orders unnested.
The way I've just discovered to work around this is to add an extra field in my DS-> BQ connector:
ARRAY_LENGTH(fields.orders) AS numberoforders
This will return zero if the array is empty - you can then create calculated fields within DataStudio - using the "numberoforders" field to force values to NULL or zero.
You can fix this behaviour by changing a little your query on the BigQuery connector.
Instead of doing this:
SELECT
Contact_id,
Orders
FROM myproject.mydataset.mytable
try this:
SELECT
Contact_id,
IF(ARRAY_LENGTH(Orders) > 0, Orders, [STRUCT(CAST(NULL AS DATE) AS date, CAST(NULL AS STRING) AS id)]) AS Orders
FROM myproject.mydataset.mytable
This way you are forcing your repeated field to have, at least, an array containing NULL values and hence Data Studio will represent those missing values.
Also, if you want to create new calculated fields using one of the nested fields, you should check before if the value is NULL to avoid filling all NULL values. For example, if you have a repeated and nested field which can be 1 or 0, and you want to create a calculated field swaping the value, you should do:
IF(myfield.key IS NOT NULL, IF(myfield.key = 1, 0, 1), NULL)
Here you can see what happens if you check before swaping and if you don't:
Original value No check Check
1 0 0
0 1 1
NULL 1 NULL
1 0 0
NULL 1 NULL

Spatial Clustering - associate cluster attribute (id) to the geometry that is part of the cluster

I'm having certain issues into associate a clustered set of geometries with their own proprieties.
Data
I've a table with a set of geometries,
buildings {
gid integer,
geom geometry(Multipoligon,4326)
}
And I've run the function ST_ClusterWithin with a certain threshold over a the "buildings" table.
From that cluster analysis, I got a table that named "clusters",
clusters {
cid Integer,
geom geometry(GeometryCollection,4326)
}
Question
I would love to extract into a table all geometry with associated its own cluster information.
clustered_building {
gid Integer
cid Integer
geom geometry(Multipoligon,4326)
}
gid | cid | geom |
-----+------------+-----------------------+
1 | 1 | multypoligon(...) |
2 | 1 | multypoligon(...) |
3 | 1 | multypoligon(...) |
4 | 2 | multypoligon(...) |
5 | 3 | multypoligon(...) |
6 | 3 | multypoligon(...) |
What I Did (but does not work)
I've been trying using the two function ST_GeometryN / ST_NumGeometries parse each MultyGeometry and extract the information of the cluster with this query that is derived from one of the Standard Example of the ST_Geometry manual page.
INSERT INTO clustered_building (cid, c_item , geom)
SELECT sel.cid, n, ST_GeometryN(sel.geom, n) as singlegeom
FROM ( SELECT cid, geom, ST_NumGeometries(geom) as num
FROM clusters") AS sel
CROSS JOIN generate_series(1,sel.num) n
WHERE n <= ST_NumGeometries(sel.geom);
The query, it takes few seconds if I force to use a series of 10.
CROSS JOIN generate_series(1,10)
But it got stuck when I ask to generate a series according to the number of item in each GeometryCollection.
And also, this query does not allow me to link the single geometry to his own features into the building table because I'm losing the "gid"
could someone please help me,
thanks
Stefano
I don't have you data, but using some dummy values, where ids 1, 2 and 3 intersect and 4 and 5, you can do something like the following:
WITH
temp (id, geom) AS
(VALUES (1, ST_Buffer(ST_Makepoint(0, 0), 2)),
(2, ST_Buffer(ST_MakePoint(1, 1), 2)),
(3, ST_Buffer(ST_MakePoint(2, 2), 2)),
(4, ST_Buffer(ST_MakePoint(9, 9), 2)),
(5, ST_Buffer(ST_MakePoint(10, 10), 2))),
clusters(geom) as
(SELECT
ST_Makevalid(
ST_CollectionExtract(
unnest(ST_ClusterIntersecting(geom)), 3))
FROM temp
)
SELECT array_agg(temp.id), cl.geom
FROM clusters cl, temp
WHERE ST_Intersects(cl.geom, temp.geom)
GROUP BY cl.geom;
If you wrap the final cl.geom is ST_AsText, you will see something like:
{1,2,3} | MULTIPOLYGON(((2.81905966523328 0.180940334766718,2.66293922460509 -0.111140466039203,2.4142135623731 -0.414213562373094,2.11114046603921 -0.662939224605089,1.81905966523328 -0.819059665233282,1.84775906502257 -0.765366864730179,1.96157056080646 -0.390180644032256,2 0,2 3.08780778723872e-16,2 0,2.39018064403226 0.0384294391935396,2.76536686473018 0.152240934977427,2.81905966523328 0.180940334766718))......
{4,5} | MULTIPOLYGON(((10.8190596652333 8.18094033476672,10.6629392246051 7.8888595339608,10.4142135623731 7.58578643762691,10.1111404660392 7.33706077539491,9.76536686473018 7.15224093497743,9.39018064403226 7.03842943919354,9 7,8.60981935596775 7.03842943919354,8.23463313526982 7.15224093497743,7.8888595339608 7.33706077539491,7.58578643762691 7.5857864376269,7.33706077539491 7.88885953396079,7.15224093497743 8.23463313526982
where you can see the ids 1,2,3, below to the first multipolygon, and 4,5 the other.
The general idea is you cluster the data, and then you intersect the returned clusters with the original data, using array_agg to group the ids together, so that the returned Multipolygons now contain the original ids. The use of ST_CollectionExtract with 3 as the second paramter, in conjunction with unnest, which splits the geometry collection returned by ST_ClusterIntersecting back into rows, returns each contiguous cluster as a (Multi)Polygon. The ST_MakeValid is because sometimes when you intersect geometries with other related geometries, such as the original polygons with your clustered polygonss, you get strange rounding effects and GEOS error about non-noded intersections and the like.
I answered a similar question on gis.stackexchange recently that you might find useful.

Single record buffering in SAP ABAP

My table is stud.
+-----+------+-------+
| no | name | grade |
+-----+------+-------+
| 101 | naga | A |
| 102 | raj | A |
| 103 | john | A |
+-----+------+-------+
The query I'm using is:
SELECT * FROM stud WHERE no = 101 AND grade = 'A'.
If am using single record buffering, how much data is being stored in the buffer area?
This query doesn't do anything. There is no "into" clause. meaning it wont store anything selected.
You are probably looking to do something like this....
SELECT * FROM stud into wa_stud WHERE no = 101 AND grade = 'A'.
"processing of each single row is performed here
endselect.
or perhaps something like this, where only 1 row (the first rows ordered by primary key) is selected...
select single * from stud into wa_stud where no = 101 and grade = 'A' .
or perhaps you want everything brought in to a table, meaning number and grade does not include the full primary key.
select * from stud into table it_stud where no = 101 and grade = 'A'.
this is from ABAP Keyword documentation in SE38:
SAP Buffer - Single Record Buffering
Only those rows in the table are buffered that are actually accessed.
This requires less space in the buffer than when using generic or full
buffering. On the other hand, more administration work is required and
significantly more direct database accesses.
So since your query returns a single record (based on the data you displayed) it should just get one row and hold in the buffer.
I'd suggest looking at SAP help and Google - also have a look at SELECT SINGLE and incompletely specified keys - there used to be a problem with the buffer being bypassed in some situations - have a read for reference.

Storing a COUNT of values in a table

I have a table with data along the (massively simplified) lines of:
User | Value
-----|------
UsrA | 100
UsrA | 102
UsrB | 100
UsrA | 100
UsrB | 101
and, for reasons far to obscure to go into, I need to store the COUNT of each value in a table for future retrieval - ending up with something like
User | Value100Count | Value101Count | Value102Count
-----|---------------|---------------|--------------
UsrA | 2 | 0 | 1
UsrB | 1 | 1 | 0
However, there could be up to 255 different Values - meaning potentially 255 different ValueXCount columns. I know this is a horrible way to do things, but is there an easy way to get the data into a format that can be easily INSERTed into the destination table? Is there a better way to store the COUNT of values per user (unfortunately I do need to store this information; grabbing it from the source table each time isn't an option)?
The whole thing isn't very pretty, but you know that, rather than your table with 255 columns I'd consider setting up another table with:
User | Value | CountOfValue
And set a primary key over User and Value.
You could then insert the count's for given user/value combos into the CountOfValue field
As I said, the design is horrible and it feels like you would be better off starting from scratch, normalizing and doing counts live.
Check out indexed views. You can maintain the table automatically, with integrity and as a bonus it can get used in queries that already do count(*) on that data.

SQL adjustable structure redundancy

I have to build a database structure which allow a totally modular structure. Let's take an example, it will be easier to understand.
We have a website record, looking like this :
WEBSITE A
| ----- SECTION A
| |-- SUBSECTION 1
| | | -- Data 1 : Value 1
| | | -- Data 2 : Value 2
| | | ...
| | | -- Data N : Value N
| |
| |-- SUBSECTION 2
| | | -- Data 52 : Value 1
| | | -- Data 53 : Value 2
| | | ...
| | | -- Data M : Value M
| |
| ...
|
| ----- SECTION B
| |
| ...
...
Model 1 :
And so on. The trouble is that I have to implement a permission system. For instance, User A have access to Section A,B,D,Z from website 1 whereas User 2 have acces to section C,V,W,X from website 2.
First, I though that building this as a tree would be the most efficient way to do.
Here is my first database representation :
TABLE website (id, id_client, name, address)
TABLE section (id, id_website, name)
TABLE sub_section (id, id_section, name)
TABLE data (id, id_sub_section, key, value)
With this representation, it would be easy to give some restricted access to the employees.
However, both websites will have common data. For instance, all websites will have section A,B,C,D with the same structure. It implies a lot of redundancy. For each website, we'll have a lot of common structure, the only difference will be the attribute value in the TABLE data.
The second problem is that this structure have to be totally modular. For instance, the admin should be able to add a section, a subsection or a data to a website record. That's the reason why I though that this model is easier to manage.
Model 2 :
I have a second model, easier to store but harder to exploit :
TABLE website (id, id_client, Value 1, Value 2, Value 3 ... Value N)
TABLE section (id, name, Data 1, Data 2, Data 3 .. Data N, ..., Data 52, Data 53, Data M) (it represents the name of the columns)
TABLE subsection (id, id_section, name, Data 1, Data 2, Data N)
By doing this, I have a table where data are stored and "structural tables" with section and subsection in common with both websites. If the admin wants to add a section / subsection, we're going back to the tree structure to store additionnal data, looking like this :
TABLE additional_section (id,id_website,name)
TABLE additionnal_subsection (id,id_section, id_additional_section, name)
TABLE additional_data (id, id_subsection, id_additionnal_subsection, key, value)
It avoids a lot of redundancy and facilitate the permissions management.
Here's my question :
What's the best model for this kind of application ? Model 1 ? Model 2 ? Another one ?
Thanks for reading and for your answers !
I would suggest that you modify Model 1.
You can eliminate the redundancy in the section table by removing the id_website FK from that table and create a new table between the website table and it.
This new table WebsiteSection has a PK that consist of an FK to website AND an FK to section, allowing each section to be part of multiple websites.
Section data that is common to all websites would then be stored in the section table while section data that is site specific would be stored in the WebsiteSection table.