Subtitute Parts/Rows Postgresql - sql

I want to create relationship in Postgresql that allows me to have ‘substitute’ parts. For example, parts with id 1, 2 and 4 are substitutes.
One way to do this would be to setup a sub_id field and fill that with the substitute part’s id. But as you can see from the dataset below, a simple query:
SELECT *
FROM part
WHERE sub_id = 1
would not return all the substitute parts for part AA (part A2 would be missed). How can I ensure all the substitute parts are catered for?
I know that if the sub_id was 1 for part A2, all would be good. However, it’s possible that in real world usage the users end up making a mistake and that would return the wrong result.
id part sub_id
1 AA 1
2 A 1
3 B NULL
4 A2 2

You can try a different approach.
I don't know what your needs are but this is a general idea:
Set up a table that define a part "group":
partgid, part type, partg name
1 A Engine for A319
2 B Wheel for A319
code:
Create Table partgroup
(
partgid int,
parttype text,
partgname text
);
then define parts in the group:
partid partgid partname manufacturerid
12 1 Engine X319 800
13 1 Engine XL319 800
14 1 Engine XFR319 784
15 2 Wheel F1111 341
code:
Create Table parts
(
partid int,
partgid int,
partname text,
manufacturerid int
);
then you can access all parts in a specific group in a simple query:
By ID:
select *
from parts
where partgid='ID'
By name:
select *
from parts
left join partgroup on (partgroup.partgid=parts.partgid)
where partgroup.parttype= 'TYPE NAME'

Related

SQL Group by joining with time difference

I have this college project with a good focus on the frontend, but I'm struggling with a SQL query (PostgreSQL) that needs to be executed at one of the backend endpoints.
The table I'm speaking of is the following:
id
todo_id
column_id
time_in_status
0
259190
3
0
1
259190
10300
30
2
259190
10001
60
3
259190
10600
90
4
259190
6
30
A good way to simplify what it is, is saying it's a to-do organizer by vertical columns where each column would be represented by its column_id, and each row is task column change event.
With all that said what I need to get the job done is to generate a view (or another suggested better way) from this table that will show how long each task spent on each column_id. Also for a certain todo_id, column_id is not unique, so that could be multiple events on column 10300 and the table below would group by it and sum them
For example, the table above would output a view like this:
id
todo_id
time_in_column_3
time_in_column_10300
time_in_column_10001
...
0
259190
0
30
60
...
select *
from crosstab(
'select todo_id, id, time_in_status
from t'
)
as t(todo_id int, "time_in_column_3" int, "time_in_column_10300" int, "time_in_column_10001" int, "time_in_column_10600" int, "time_in_column_6" int )
todo_id
time_in_column_3
time_in_column_10300
time_in_column_10001
time_in_column_10600
time_in_column_6
259190
0
30
60
90
30
Fiddle

How to extract characters from a string stored as json data and place them in dynamic number of columns in SQL Server

I have a column of string in SQL Server that stores JSON data with all the braces and colons included.
My problem is to extract all the key and value pairs and store them in separate columns with the key as the column header. What makes this challenging is that every record has different number of key/value pairs.
For example in the image below showing 3 records, the first record has 5 key/value pairs- EndUseCommunityMarket of 2, EndUseProvincial Market of 0, and so on. The second record has 1 key/value pair, and the third record has two key/value pairs.
If I have to show how I want this in excel it would be like:
I have seen some SQL code examples that does something similar but for a fixed number of columns, unlike this one it varies for every record.
Please I need a SQL statement that can achieve this as I am working on thousands of records.
Below is this data copied from sql server:
catch_ext
{"NfdsFadMonitoring":{"EndUseEaten":1}}
{"NfdsFadMonitoring":{"EndUseCommunityMarket":3}}
{"NfdsFadMonitoring":{"SpeciesComment":"","EndUseCommunityMarket":2}}
{"NfdsFadMonitoring":{"SpeciesComment":"mix reef fis","EndUseEaten":31}}
{"NfdsFadMonitoring":{"SpeciesComment":"10 fish with a total of 18kg","EndUseCommunityMarket":0,"EndUseProvincialMarket":0,"EndUseUrbanMarket":8,"EndUseEaten":1,"EndUseGivenAway":1}}
{"NfdsFadMonitoring":{"SpeciesComment":"mix reef fis","EndUseEaten":18}}
I expect you don't want to dynamically create a table, instead you probably want to create a property mapping table. Here is a quick overview of the design.
Object table -- this stores the base information about your object
============
ID -- unique id field for every object.
Name
Property types table -- this stores all the property types
====================
Property_Type_ID -- unique type id
Description -- describes property
Object2Property -- stores the values for each property
===============
ObjectID -- the object
Property_Type_ID -- the property type
Value -- the value.
Using a model like this lets your properties be as dynamic as you wish but you don't have to create columns dynamically -- something that is hard and error prone.
using your specific example the tables would look like this
OBJECT
ID NAME
1 WHAOO
2 RED SNAMPPER
3 KAWAKAWA
Property Types
ID DESC
1 EndUseCommunityMarket
2 EndUseProvincialMarket
3 EndUseUrbanMarket
4 EndUseEaten
5 EndUseGivenAway
6 Comment
Map
ObjID TypeID Value
1 1 2
1 2 0
1 3 0
1 4 0
1 5 0
2 2 50
3 3 8
3 5 1
A. ROWS
Dynamic columns are a lot like rows.
You could use OPENJSON (Transact-SQL)
DECLARE #json2 NVARCHAR(4000) = N'{"NfdsFadMonitoring":{"SpeciesComment":"10 fish with a total of 18kg","EndUseCommunityMarket":0,"EndUseProvincialMarket":0,"EndUseUrbanMarket":8,"EndUseEaten":1,"EndUseGivenAway":1}}';
SELECT [key], value
FROM OPENJSON(#json2,'lax $.NfdsFadMonitoring')
Output
key value
SpeciesComment 10 fish with a total of 18kg
EndUseCommunityMarket 0
EndUseProvincialMarket 0
EndUseUrbanMarket 8
EndUseEaten 1
EndUseGivenAway 1
Your inputs
CREATE TABLE ForEloga (Id int,Json nvarchar(max));
Insert into ForEloga Values
(1,'{"NfdsFadMonitoring":{"EndUseEaten":1}}'),
(2,'{"NfdsFadMonitoring":{"EndUseCommunityMarket":3}}'),
(3,'{"NfdsFadMonitoring":{"SpeciesComment":"","EndUseCommunityMarket":2}}'),
(4,'{"NfdsFadMonitoring":{"SpeciesComment":"mix reef fis","EndUseEaten":31}}'),
(5,'{"NfdsFadMonitoring":{"SpeciesComment":"10 fish with a total of 18kg","EndUseCommunityMarket":0,"EndUseProvincialMarket":0,"EndUseUrbanMarket":8,"EndUseEaten":1,"EndUseGivenAway":1}}'),
(6,'{"NfdsFadMonitoring":{"SpeciesComment":"mix reef fis","EndUseEaten":18}}');
SELECT Id, [key], value
FROM ForEloga CROSS APPLY OPENJSON(Json,'lax $.NfdsFadMonitoring')
Output
Id key value
1 EndUseEaten 1
2 EndUseCommunityMarket 3
3 SpeciesComment
3 EndUseCommunityMarket 2
4 SpeciesComment mix reef fis
4 EndUseEaten 31
5 SpeciesComment 10 fish with a total of 18kg
5 EndUseCommunityMarket 0
5 EndUseProvincialMarket 0
5 EndUseUrbanMarket 8
5 EndUseEaten 1
5 EndUseGivenAway 1
6 SpeciesComment mix reef fis
6 EndUseEaten 18
B. COLUMNS: CROSS APPLY WITH WITH
If you know all possible properties then I recommend CROSS APPLY with WITHas shown in Example 3 - Join rows with JSON data stored in table cells using CROSS APPLY in OPENJSON (Transact-SQL).
SELECT store.title, location.street, location.lat, location.long
FROM store
CROSS APPLY OPENJSON(store.jsonCol, 'lax $.location')
WITH (
street varchar(500),
postcode varchar(500) '$.postcode',
lon int '$.geo.longitude',
lat int '$.geo.latitude'
) AS location
Try this:
Table Schema:
CREATE TABLE #JsonValue(sp_name VARCHAR(100),catch_ext VARCHAR(1000))
INSERT INTO #JsonValue VALUES ('WAHOO','{"NfdsFadMonitoring":{"EndUseEaten":1}}')
INSERT INTO #JsonValue VALUES ('RUBY SNAPPER','{"NfdsFadMonitoring":{"EndUseCommunityMarket":3}}')
INSERT INTO #JsonValue VALUES ('KAWAKAWA','{"NfdsFadMonitoring":{"SpeciesComment":"","EndUseCommunityMarket":2}}')
INSERT INTO #JsonValue VALUES ('XXXXXXXX','{"NfdsFadMonitoring":{"SpeciesComment":"mix reef fis","EndUseEaten":31}}')
INSERT INTO #JsonValue VALUES ('YYYYYYYY','{"NfdsFadMonitoring":{"SpeciesComment":"10 fish with a total of 18kg","EndUseCommunityMarket":0,"EndUseProvincialMarket":0,"EndUseUrbanMarket":8,"EndUseEaten":1,"EndUseGivenAway":1}}')
INSERT INTO #JsonValue VALUES ('ZZZZZZZZZZ','{"NfdsFadMonitoring":{"SpeciesComment":"mix reef fis","EndUseEaten":18}}')
Query:
SELECT sp_name
,ISNULL(MAX(CASE WHEN [Key]='EndUseCommunityMarket' THEN Value END),'')EndUseCommunityMarket
,ISNULL(MAX(CASE WHEN [Key]='EndUseProvincialMarket' THEN Value END),'')EndUseProvincialMarket
,ISNULL(MAX(CASE WHEN [Key]='EndUseUrbanMarket' THEN Value END),'')EndUseUrbanMarket
,ISNULL(MAX(CASE WHEN [Key]='EndUseEaten' THEN Value END),'')EndUseEaten
,ISNULL(MAX(CASE WHEN [Key]='EndUseGivenAway' THEN Value END),'')EndUseGivenAway
FROM(
SELECT sp_name, [key], value
FROM #JsonValue CROSS APPLY OPENJSON(catch_ext,'$.NfdsFadMonitoring')
)D
GROUP BY sp_name
Output:
sp_name EndUseCommunityMarket EndUseProvincialMarket EndUseUrbanMarket EndUseEaten EndUseGivenAway
------------- --------------------- ---------------------- ----------------- ----------- ---------------
KAWAKAWA 2
RUBY SNAPPER 3
WAHOO 1
XXXXXXXX 31
YYYYYYYY 0 0 8 1 1
ZZZZZZZZZZ 18
Hope this will help you.

T-SQL: identify difference of x in integer sequence

I have a simple table: here is an example of what the values look like (after taking out confidential values).
What I need to do is count the number of instances the difference between a Date_Nbr value and the one following is greater than 2. I could imagine creating a second table where -- for each Group and Person, an integer count of the number of instances is placed.
Is this even possible in T-SQL? I haven't shown any code, because I'm at a total loss.
For those who need table definitions, it would be this:
Date_Nbr int not null,
Group varchar(5) not null,
Person varchar(5) not null
The values shown would exist within that table (let's call it Date_Seq).
I would be grateful for any ideas. Thank you.
Date_Nbr | Group | Person
1 C A
4 C A
5 C A
8 C A
10 C A
11 C A
13 C A
14 C A
15 C A
P.S. --
I described it verbally above, but this is a visual description of what I hope to achieve, where "Count_Gaps" is the number of times a difference greater than 2 was found in the sequence of "Date_Nbr".
Group | Person | Count_Gaps
C A [integer value]
All records that do not have a corresponding person+group record within 2 higher.
SELECT
*
FROM <table> t1
LEFT join <table> t2 ON t1.group=t2.group AND t1.person=t2.person AND (t1.Date_nbr+1)=t2.Date_nbr
LEFT join <table> t3 ON t1.group=t3.group AND t1.person=t3.person AND (t1.Date_nbr+2)=t3.Date_nbr
WHERE t2.Date_nbr IS NULL
AND t3.Date_nbr IS NULL

How to label a big set of “transitive groups” with a constraint?

EDIT after #NealB solution: the #NealB's solution is very very fast comparated with any another one, and dispenses this new question about "add a constraint to improve performance". The #NealB's not need any improve, have O(n) time and is very simple.
The problem of "label transitive groups with SQL" have an elegant solution using recursion and CTE... But this solution consumes an exponential time (!). I need to work with 10000 itens: with 1000 itens need 1 second, with 2000 need 1 day...
Constraint: in my case is possible to break the problem into pieces of ~100 itens or less, but only to select one group of ~10 itens, and discard all the other ~90 labeled itens...
There are a generic algotithm to add and use this kind of "pre-selection", to reduce the quadratic, O(N^2), time? Perhaps, as showed by comments and #wildplasser, a O(N log(N)) time; but I expect, with "pre-selection" to reduce to O(N) time.
(EDIT)
I try to use alternative algorithm, but it need some improvement to use as solution here; or, to really increase performance (to O(N) time), need to use "pre-selection".
The "pre-selection" (constraint) is based on a "super-set grouping"... Stating by the original "How to label 'transitive groups' with SQL?" question t1 table,
table T1
(original T1 augmented by "super-set grouping label" ssg, and more one row)
ID1 | ID2 | ssg
1 | 2 | 1
1 | 5 | 1
4 | 7 | 1
7 | 8 | 1
9 | 1 | 1
10 | 11 | 2
So there are three groups,
g1: {1,2,5,9} because "1 t 2", "1 t 5" and "9 t 1"
g2: {4,7,8} because "4 t 7" and "7 t 8"
g3: {10,11} because "10 t 11"
The super-group is only a auxiliary grouping,
ssg1: {g1,g2}
ssg2: {g3}
If we have M super-group-items and N total T1 items, the average group length will be less tham N/M. We can suppose (for my typical problem) also that ssg maximum length is ~N/M.
So, the "label algorithm" need to run only M times with ~N/M items if it use the ssg constraint.
An SQL only soulution appears to be a bit of a problem here. With the help of some procedural
programming on top of SQL the solution appears to be failry simple and efficient. Here is a brief outline
of a solution as could be implemented using any procedural language invoking SQL.
Declare table R with primary key ID where ID corresponds the same domain as ID1 and ID2 of table T1.
Table R contains one other non-key column, a Label number
Populate table R with the range of values found in T1. Set Label to zero (no label).
Using your example data, the initial setup for R would look like:
Table R
ID Label
== =====
1 0
2 0
4 0
5 0
7 0
8 0
9 0
Using a host language cursor plus an auxiliary counter, read each row from T1. Lookup ID1 and ID2 in R. You will find one of
four cases:
Case 1: ID1.Label == 0 and ID2.Label == 0
In this case neither one of these IDs have been "seen" before: Add 1 to the counter and then update both
rows of R to the value of the counter: update R set R.Label = :counter where R.ID in (:ID1, :ID2)
Case 2: ID1.Label == 0 and ID2.Label <> 0
In this case, ID1 is new but ID2 has already been assigned a label. ID1 needs to be assigned to the
same label as ID2: update R set R.Lablel = :ID2.Label where R.ID = :ID1
Case 3: ID1.Label <> 0 and ID2.Label == 0
In this case, ID2 is new but ID1 has already been assigned a label. ID2 needs to be assigned to the
same label as ID1: update R set R.Lablel = :ID1.Label where R.ID = :ID2
Case 4: ID1.Label <> 0 and ID2.Label <> 0
In this case, the row contains redundant information. Both rows of R should contain the same Label value. If not,
there is some sort of data integrity problem. Ahhhh... not quite see edit...
EDIT I just realized that there are situations where both Label values here could be non-zero and different. If both are non-zero and different then two Label groups need to be merged at this point. All you need to do is choose one Label and update the others to match with something like: update R set R.Label to ID1.Label where R.Label = ID2.Label. Now both groups have been merged with the same Label value.
Upon completion of the cursor, table R will contain Label values needed to update T2.
Table R
ID Label
== =====
1 1
2 1
4 2
5 1
7 2
8 2
9 1
Process table T2
using something along the lines of: set T2.Label to R.Label where T2.ID1 = R.ID. The end result should be:
table T2
ID1 | ID2 | LABEL
1 | 2 | 1
1 | 5 | 1
4 | 7 | 2
7 | 8 | 2
9 | 1 | 1
This process is puerly iterative and should scale to fairly large tables without difficulty.
I suggest you check this and use some
general-purpose language for solving it.
http://en.wikipedia.org/wiki/Disjoint-set_data_structure
Traverse the graph, maybe run DFS or BFS from each node,
then use this disjoint set hint. I think this should work.
The #NealB solution is the faster(!) See an example of PostgreSQL implementation here.
Below an example of another "brute force algorithm", only for curiosity!
As #peter.petrov and #RBarryYoung suggested, some performance problems can be avoided abandoning the CTE recursion... I do some issues at the basic labeler, and, abover I add the constraint for grouping by a super-set label. This new transgroup1_loop() function is working!
PS: this solution still have performance limitations, please post your answer with better, or with some adaptation of this one.
-- DROP table transgroup1;
CREATE TABLE transgroup1 (
id serial NOT NULL PRIMARY KEY,
items integer[], -- two or more items in the transitive relationship
ssg_label varchar(12), -- the super-set gropuping label
dels integer[] DEFAULT array[]::integer[]
);
INSERT INTO transgroup1(items,ssg_label) values
(array[1, 2],'1'),
(array[1, 5],'1'),
(array[4, 7],'1'),
(array[7, 8],'1'),
(array[9, 1],'1'),
(array[10, 11],'2');
-- or SELECT array[id1, id2],ssg_label FROM t1, with 10000 items
them, with these two functions we can solve the problem,
CREATE FUNCTION transgroup1_loop(p_ssg varchar, p_max_i integer DEFAULT 100)
RETURNS integer AS $funcBody$
DECLARE
cp_dels integer[];
i integer;
BEGIN
i:=1;
LOOP
UPDATE transgroup1
SET items = array_uunion(transgroup1.items,t2.items),
dels = transgroup1.dels || t2.id
FROM transgroup1 AS t1, transgroup1 AS t2
WHERE transgroup1.id=t1.id AND t1.ssg_label=$1 AND
t1.id>t2.id AND t1.items && t2.items;
cp_dels := array(
SELECT DISTINCT unnest(dels) FROM transgroup1
); -- ensures all itens to del
RAISE NOTICE '-- bug, repeting dels, item-%; % dels! %', i, array_length(cp_dels,1), array_to_string(cp_dels,';','*');
EXIT WHEN i>p_max_i OR array_length(cp_dels,1)=0;
DELETE FROM transgroup1
WHERE ssg_label=$1 AND id IN (SELECT unnest(cp_dels));
UPDATE transgroup1 SET dels=array[]::integer[];
i:=i+1;
END LOOP;
UPDATE transgroup1 -- only to beautify
SET items = ARRAY(SELECT unnest(items) ORDER BY 1 desc);
RETURN i;
END;
$funcBody$ LANGUAGE plpgsql VOLATILE;
to run and see results, you can use
SELECT transgroup1_loop('1'); -- run with ssg-1 items only
SELECT transgroup1_loop('2'); -- run with ssg-2 items only
-- show all with a sequential group label:
SELECT *, dense_rank() over (ORDER BY id) AS group_label from transgroup1;
results:
id | items | ssg_label | dels | group_label
----+-----------+-----------+------+-------------
4 | {8,7,4} | 1 | {} | 1
5 | {9,5,2,1} | 1 | {} | 2
6 | {11,10} | 2 | {} | 3
PS: the function array_uunion() is the same as original,
CREATE FUNCTION array_uunion(anyarray,anyarray) RETURNS anyarray AS $$
-- ensures distinct items of a concatemation
SELECT ARRAY(SELECT unnest($1) UNION SELECT unnest($2))
$$ LANGUAGE sql immutable;

List category/subcategory tree and display its sub-categories in the same row

I have a hierarchical table of Regions and sub-regions, and I need to list a tree of regions and sub-regions (which is easy), but also, I need a column that displays, for each region, all the ids of it's sub regions.
For example:
id name superiorId
-------------------------------
1 RJ NULL
2 Tijuca 1
3 Leblon 1
4 Gavea 2
5 Humaita 2
6 Barra 4
I need the result to be something like:
id name superiorId sub-regions
-----------------------------------------
1 RJ NULL 2,3,4,5,6
2 Tijuca 1 4,5,6
3 Leblon 1 null
4 Gavea 2 4
5 Humaita 2 null
6 Barra 4 null
I have done that by creating a function that retrieves a STUFF() of a region row,
but when I'm selecting all regions from a country, for example, the query becomes really, really slow, since I execute the function to get the region sons for each region.
Does anybody know how to get that in an optimized way?
The function that "retrieves all the ids as a row" is:
I meant that the function returns all the sub-region's ids as a string, separated by a comma.
The function is:
CREATE FUNCTION getSubRegions (#RegionId int)
RETURNS TABLE
AS
RETURN(
select stuff((SELECT CAST( wine_reg.wine_reg_id as varchar)+','
from (select wine_reg_id
, wine_reg_name
, wine_region_superior
from wine_region as t1
where wine_region_superior = #RegionId
or exists
( select *
from wine_region as t2
where wine_reg_id = t1.wine_region_superior
and (
wine_region_superior = #RegionId
)
) ) wine_reg
ORDER BY wine_reg.wine_reg_name ASC for XML path('')),1,0,'')as Sons)
GO
When we used to make these concatenated lists in the database we took a similar approach to what you are doing at first
then when we looked for speed
we made them into CLR functions
http://msdn.microsoft.com/en-US/library/a8s4s5dz(v=VS.90).aspx
and now our database is only responsible for storing and retrieving data
this sort of thing will be in our data layer in the application