Merging Complicated Tables

Merging Complicated Tables - sql

I'm trying to merge tables where rows correspond to a many:1 relationship with "real" things.
I'm writing a blackjack simulator that stores game history in a database with a new set of tables generated each run. The tables are really more like templates, since each game gets its own set of the 3 mutable tables (players, hands, and matches). Here's the layout, where suff is a user-specified suffix to use for the current run:
- cards
- id INTEGER PRIMARY KEY
- cardValue INTEGER NOT NULL
- suit INTEGER NOT NULL
- players_suff
- whichPlayer INTEGER PRIMARY KEY
- aiType TEXT NOT NULL
- hands_suff
- id BIGSERIAL PRIMARY KEY
- whichPlayer INTEGER REFERENCES players_suff(whichPlayer) *
- whichHand BIGINT NOT NULL
- thisCard INTEGER REFERENCES cards(id)
- matches_suff
- id BIGSERIAL PRIMARY KEY
- whichGame INTEGER NOT NULL
- dealersHand BIGINT NOT NULL
- whichPlayer INTEGER REFERENCES players_suff(whichPlayer)
- thisPlayersHand BIGINT NOT NULL **
- playerResult INTEGER NOT NULL --AKA who won
Only one cards table is created because its values are constant.
So after running the simulator twice you might have:
hands_firstrun
players_firstrun
matches_firstrun
hands_secondrun
players_secondrun
matches_secondrun
I want to be able to combine these tables if you used the same AI parameters for both of those runs (i.e. players_firstrun and players_secondrun are exactly the same). The problem is that the way I'm inserting hands makes this really messy: whichHand can't be a BIGSERIAL because the relationship of hands_suff rows to "actual hands" is many:1. matches_suff is handled the same way because a blackjack "game" actually consists of a set of games: the set of pairs of each player vs. the dealer. So for 3 players, you actually have 3 rows for each round.
Currently I select the largest whichHand in the table, add 1 to it, then insert all of the rows for one hand. I'm worried this "query-and-insert" will be really slow if I'm merging 2 tables that might both be arbitrarily huge.
When I'm merging tables, I feel like I should be able to (entirely in SQL) query the largest values in whichHand and whichGame once then use them combine the tables, incrementing them for each unique whichHand and whichGame in the table being merged.
(I saw this question, but it doesn't handle using a generated ID in 2 different places). I'm using Postgres and it's OK if the answer is specific to it.
* sadly postgres doesn't allow parameterized table names so this had to be done by manual string substitution. Not the end of the world since the program isn't web-facing and no one except me is likely to ever bother with it, but the SQL injection vulnerability does not make me happy.
** matches_suff(whichPlayersHand) was originally going to reference hands_suff(whichHand) but foreign keys must reference unique values. whichHand isn't unique because a hand is made up of multiple rows, with each row "holding" one card. To query for a hand you select all of those rows with the same value in whichHand. I couldn't think of a more elegant way to do this without resorting to arrays.
EDIT:
This is what I have now:
thomas=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------------+-------+--------
public | cards | table | thomas
public | hands_first | table | thomas
public | hands_second | table | thomas
public | matches_first | table | thomas
public | matches_second | table | thomas
public | players_first | table | thomas
public | players_second | table | thomas
(7 rows)
thomas=# SELECT * FROM hands_first
thomas-# \g
id | whichplayer | whichhand | thiscard
----+-------------+-----------+----------
1 | 0 | 0 | 6
2 | 0 | 0 | 63
3 | 0 | 0 | 41
4 | 1 | 1 | 76
5 | 1 | 1 | 23
6 | 0 | 2 | 51
7 | 0 | 2 | 29
8 | 0 | 2 | 2
9 | 0 | 2 | 92
10 | 0 | 2 | 6
11 | 1 | 3 | 101
12 | 1 | 3 | 8
(12 rows)
thomas=# SELECT * FROM hands_second
thomas-# \g
id | whichplayer | whichhand | thiscard
----+-------------+-----------+----------
1 | 0 | 0 | 78
2 | 0 | 0 | 38
3 | 1 | 1 | 24
4 | 1 | 1 | 18
5 | 1 | 1 | 95
6 | 1 | 1 | 40
7 | 0 | 2 | 13
8 | 0 | 2 | 84
9 | 0 | 2 | 41
10 | 1 | 3 | 29
11 | 1 | 3 | 34
12 | 1 | 3 | 56
13 | 1 | 3 | 52
thomas=# SELECT * FROM matches_first
thomas-# \g
id | whichgame | dealershand | whichplayer | thisplayershand | playerresult
----+-----------+-------------+-------------+-----------------+--------------
1 | 0 | 0 | 1 | 1 | 1
2 | 1 | 2 | 1 | 3 | 2
(2 rows)
thomas=# SELECT * FROM matches_second
thomas-# \g
id | whichgame | dealershand | whichplayer | thisplayershand | playerresult
----+-----------+-------------+-------------+-----------------+--------------
1 | 0 | 0 | 1 | 1 | 0
2 | 1 | 2 | 1 | 3 | 2
(2 rows)
I'd like to combine them to have:
hands_combined table:
id | whichplayer | whichhand | thiscard
----+-------------+-----------+----------
1 | 0 | 0 | 6 --Seven of Spades
2 | 0 | 0 | 63 --Queen of Spades
3 | 0 | 0 | 41 --Three of Clubs
4 | 1 | 1 | 76
5 | 1 | 1 | 23
6 | 0 | 2 | 51
7 | 0 | 2 | 29
8 | 0 | 2 | 2
9 | 0 | 2 | 92
10 | 0 | 2 | 6
11 | 1 | 3 | 101
12 | 1 | 3 | 8
13 | 0 | 4 | 78
14 | 0 | 4 | 38
15 | 1 | 5 | 24
16 | 1 | 5 | 18
17 | 1 | 5 | 95
18 | 1 | 5 | 40
19 | 0 | 6 | 13
20 | 0 | 6 | 84
21 | 0 | 6 | 41
22 | 1 | 7 | 29
23 | 1 | 7 | 34
24 | 1 | 7 | 56
25 | 1 | 7 | 52
matches_combined table:
id | whichgame | dealershand | whichplayer | thisplayershand | playerresult
----+-----------+-------------+-------------+-----------------+--------------
1 | 0 | 0 | 1 | 1 | 1
2 | 1 | 2 | 1 | 3 | 2
3 | 2 | 4 | 1 | 5 | 0
4 | 3 | 6 | 1 | 7 | 2
Each value of "thiscard" represents a playing card in the range [1..104]--52 playing cards with an extra bit representing if it's face up or face down. I didn't post the actual table for space reasons.
So player 0 (aka the dealer) had a hand of (Seven of Spades, Queen of Spaces, 3 of Clubs) in the first game.

I think you're not using PostgreSQL the way it's intended to be used, plus your table design may not be suitable for what you want to achieve. Whilst it was difficult to understand what you want your solution to achieve, I wrote this, which seems to solve everything you want using a handful of tables only, and functions that return recordsets for simulating your requirement for individual runs. I used Enums and complex types to illustrate some of the features that you may wish to harness from the power of PostgreSQL.
Also, I'm not sure what parameterized table names are (I have never seen anything like it in any RDBMS), but PostgreSQL does allow something perfectly suitable: recordset returning functions.
CREATE TYPE card_value AS ENUM ('1', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K');
CREATE TYPE card_suit AS ENUM ('Clubs', 'Diamonds', 'Hearts', 'Spades');
CREATE TYPE card AS (value card_value, suit card_suit, face_up bool);
CREATE TABLE runs (
run_id bigserial NOT NULL PRIMARY KEY,
run_date timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE players (
run_id bigint NOT NULL REFERENCES runs,
player_no int NOT NULL, -- 0 can be assumed as always the dealer
ai_type text NOT NULL,
PRIMARY KEY (run_id, player_no)
);
CREATE TABLE matches (
run_id bigint NOT NULL REFERENCES runs,
match_no int NOT NULL,
PRIMARY KEY (run_id, match_no)
);
CREATE TABLE hands (
hand_id bigserial NOT NULL PRIMARY KEY,
run_id bigint NOT NULL REFERENCES runs,
match_no int NOT NULL,
hand_no int NOT NULL,
player_no int NOT NULL,
UNIQUE (run_id, match_no, hand_no),
FOREIGN KEY (run_id, match_no) REFERENCES matches,
FOREIGN KEY (run_id, player_no) REFERENCES players
);
CREATE TABLE deals (
deal_id bigserial NOT NULL PRIMARY KEY,
hand_id bigint NOT NULL REFERENCES hands,
card card NOT NULL
);
CREATE OR REPLACE FUNCTION players(int) RETURNS SETOF players AS $$
SELECT * FROM players WHERE run_id = $1 ORDER BY player_no;
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION matches(int) RETURNS SETOF matches AS $$
SELECT * FROM matches WHERE run_id = $1 ORDER BY match_no;
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION hands(int) RETURNS SETOF hands AS $$
SELECT * FROM hands WHERE run_id = $1 ORDER BY match_no, hand_no;
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION hands(int, int) RETURNS SETOF hands AS $$
SELECT * FROM hands WHERE run_id = $1 AND match_no = $2 ORDER BY hand_no;
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION winner_player (int, int) RETURNS int AS $$
SELECT player_no
FROM hands
WHERE run_id = $1 AND match_no = $2
ORDER BY hand_no DESC
LIMIT 1
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION next_player_no (int) RETURNS int AS $$
SELECT CASE WHEN EXISTS (SELECT 1 FROM runs WHERE run_id = $1) THEN
COALESCE((SELECT MAX(player_no) FROM players WHERE run_id = $1), 0) + 1 END
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION next_match_no (int) RETURNS int AS $$
SELECT CASE WHEN EXISTS (SELECT 1 FROM runs WHERE run_id = $1) THEN
COALESCE((SELECT MAX(match_no) FROM matches WHERE run_id = $1), 0) + 1 END
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION next_hand_no (int) RETURNS int AS $$
SELECT CASE WHEN EXISTS (SELECT 1 FROM runs WHERE run_id = $1) THEN
COALESCE((SELECT MAX(hand_no) + 1 FROM hands WHERE run_id = $1), 0) END
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION card_to_int (card) RETURNS int AS $$
SELECT ((SELECT enumsortorder::int-1 FROM pg_enum WHERE enumtypid = 'card_suit'::regtype AND enumlabel = ($1).suit::name) * 13 +
(SELECT enumsortorder::int-1 FROM pg_enum WHERE enumtypid = 'card_value'::regtype AND enumlabel = ($1).value::name) + 1) *
CASE WHEN ($1).face_up THEN 2 ELSE 1 END
$$ LANGUAGE SQL; -- SELECT card_to_int(('3', 'Spades', false))
CREATE OR REPLACE FUNCTION int_to_card (int) RETURNS card AS $$
SELECT ((SELECT enumlabel::card_value FROM pg_enum WHERE enumtypid = 'card_value'::regtype AND enumsortorder = ((($1-1)%13)+1)::real),
(SELECT enumlabel::card_suit FROM pg_enum WHERE enumtypid = 'card_suit'::regtype AND enumsortorder = (((($1-1)/13)::int%4)+1)::real),
$1 > (13*4))::card
$$ LANGUAGE SQL; -- SELECT i, int_to_card(i) FROM generate_series(1, 13*4*2) i
CREATE OR REPLACE FUNCTION deal_cards(int, int, int, int[]) RETURNS TABLE (player_no int, hand_no int, card card) AS $$
WITH
hand AS (
INSERT INTO hands (run_id, match_no, player_no, hand_no)
VALUES ($1, $2, $3, next_hand_no($1))
RETURNING hand_id, player_no, hand_no),
mydeals AS (
INSERT INTO deals (hand_id, card)
SELECT hand_id, int_to_card(card_id)::card AS card
FROM hand, UNNEST($4) card_id
RETURNING hand_id, deal_id, card
)
SELECT h.player_no, h.hand_no, d.card
FROM hand h, mydeals d
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION deals(int) RETURNS TABLE (deal_id bigint, hand_no int, player_no int, card int) AS $$
SELECT d.deal_id, h.hand_no, h.player_no, card_to_int(d.card)
FROM hands h
JOIN deals d ON (d.hand_id = h.hand_id)
WHERE h.run_id = $1
ORDER BY d.deal_id;
$$ LANGUAGE SQL;
INSERT INTO runs DEFAULT VALUES; -- Add first run
INSERT INTO players VALUES (1, 0, 'Dealer'); -- dealer always zero
INSERT INTO players VALUES (1, next_player_no(1), 'Player 1');
INSERT INTO matches VALUES (1, next_match_no(1)); -- First match
SELECT * FROM deal_cards(1, 1, 0, ARRAY[6, 63, 41]);
SELECT * FROM deal_cards(1, 1, 1, ARRAY[76, 23]);
SELECT * FROM deal_cards(1, 1, 0, ARRAY[51, 29, 2, 92, 6]);
SELECT * FROM deal_cards(1, 1, 1, ARRAY[101, 8]);
INSERT INTO matches VALUES (1, next_match_no(1)); -- Second match
SELECT * FROM deal_cards(1, 2, 0, ARRAY[78, 38]);
SELECT * FROM deal_cards(1, 2, 1, ARRAY[24, 18, 95, 40]);
SELECT * FROM deal_cards(1, 2, 0, ARRAY[13, 84, 41]);
SELECT * FROM deal_cards(1, 2, 1, ARRAY[29, 34, 56, 52]);
SELECT * FROM deals(1); -- This is the output you need (hands_combined table)
-- This view can be used to retrieve the list of all winning hands
CREATE OR REPLACE VIEW winning_hands AS
SELECT DISTINCT ON (run_id, match_no) *
FROM hands
ORDER BY run_id, match_no, hand_no DESC;
SELECT * FROM winning_hands;

Wouldn't using the UNION operator work?
For the hands relation:
SELECT * FROM hands_first
UNION ALL
SELECT * FROM hands_second
For the matches relation:
SELECT * FROM matches_first
UNION ALL
SELECT * FROM matches_second
As a more long term solution I'd consider restructuring the DB because it will quickly become unmanageable with this schema. Why not improve normalization by introducing a games table?
In other words Games have many Matches, matches have many players for each game and players have many hands for each match.
I'd recommend drawing the UML for the entity relationships on paper (http://dawgsquad.googlecode.com/hg/docs/database_images/Database_Model_Diagram(Title).png), then improving the schema so it can be queried using normal SQL operators.
Hope this helps.
EDIT:
In that case you can use a subquery on the union of both tables with the rownumber() PG function to represent the row number:
SELECT
row_number() AS id,
whichplayer,
whichhand,
thiscard
FROM
(
SELECT * FROM hands_first
UNION ALL
SELECT * FROM hands_second
);
The same principle would apply to the matches table. Obviously this doesn't scale well to even a small number of tables, so would prioritize normalizing your schema.
Docs on some PG functions: http://www.postgresql.org/docs/current/interactive/functions-window.html

to build new table with all rows of two tables, do:
CREATE TABLE hands AS
select 1 as hand, id, whichplayer, whichhand, thiscard
from hands_first
union all
select 2 as hand, id, whichplayer, whichhand, thiscard
from hands_second
after that, to insert data of new matche, create sequence with start on current last + 1
CREATE SEQUENCE matche START 3;
before insert read sequence value, and use it in inserts:
SELECT nextval('matche');

Your database structure is not great, and I know for sure it is not scalable approach creating tables on fly. There are performance drawbacks creating physical tables instead of using an existing structure. I suggest you refactor your db structure if can.
You can however use the UNION operator to merge your data.

Related

Find best match in tree given a combination of multiple keys

I have a structure / tree that looks similar to this.
CostType is mandatory and can exist by itself, but it can have a parent ProfitType or Unit and other CostTypes as children.
There can only be duplicate Units. Other cannot appear multiple times in the structure.
| ID | name | parent_id | ProfitType | CostType | Unit |
| -: | ------------- | --------: |
| 1 | Root | (NULL) |
| 2 | 1 | 1 | 300 | | |
| 3 | 1-1 | 2 | | 111 | |
| 4 | 1-1-1 | 3 | | | 8 |
| 5 | 1-2 | 2 | | 222 | |
| 6 | 1-2-1 | 5 | | 333 | |
| 7 | 1-2-1-1 | 6 | | | 8 |
| 8 | 1-2-1-2 | 6 | | | 9 |
Parameters | should RETURN |
(300,111,8) | 4 |
(null,111,8) | 4 |
(null,null,8) | first match, 4 |
(null,222,8) | best match, 5 |
(null,333,null) | 6 |
I am at a loss on how I could create a function that receives (ProfitType, CostType, Unit) and return the best matching ID from the structure.

This isn't giving exactly the answers you provided as example, but see my comment above - if (null,222,8) should be 7 to match how (null,333,8) returns 4 then this is correct.
Also note that I formatted this using temp tables instead of as a function, I don't want to trip a schema change audit so I posted what I have as temp tables, I can rewrite it as a function Monday when my DBA is available, but I thought you might need it before the weekend. Just edit the "DECLARE #ProfitType int = ..." lines to the values you want to test
I also put in quite a few comments because the logic is tricky, but if they aren't enough leave a comment and I can expand my explanation
/*
ASSUMPTIONS:
A tree can be of arbitrary depth, but will not exceed the recursion limit (defaults to 100)
All trees will include at least 1 CostType
All trees will have at most 1 ProfitType
CostType can appear multiple times in a traversal from root to leaf (can units?)
*/
SELECT *
INTO #Temp
FROM (VALUES (1,'Root',NULL, NULL, NULL, NULL)
, (2,'1', 1, 300, NULL, NULL)
, (3,'1-1', 2, NULL, 111, NULL)
, (4,'1-1-1', 3, NULL, NULL, 8)
, (5,'1-2', 2, NULL, 222, NULL)
, (6,'1-2-1', 5, NULL, 333, NULL)
, (7,'1-2-1-1', 6, NULL, NULL, 8)
, (8,'1-2-1-2', 6, NULL, NULL, 9)
) as TempTable(ID, RName, Parent_ID, ProfitType, CostType, UnitID)
--SELECT * FROM #Temp
DECLARE #ProfitType int = NULL--300
DECLARE #CostType INT = 333 --NULL --111
DECLARE #UnitID INT = NULL--8
--SELECT * FROM #Temp
;WITH cteMatches as (
--Start with all nodes that match one criteria, default a score of 100
SELECT N.ID as ReportID, *, 100 as Score, 1 as Depth
FROM #Temp AS N
WHERE N.CostType= #CostType OR N.ProfitType=#ProfitType OR N.UnitID = #UnitID
), cteEval as (
--This is a recursive CTE, it has a (default) limit of 100 recursions
--, but that can be raised if your trees are deeper than 100 nodes
--Start with the base case
SELECT M.ReportID, M.RName, M.ID ,M.Parent_ID, M.Score
, M.Depth, M.ProfitType , M.CostType , M.UnitID
FROM cteMatches as M
UNION ALL
--This is the recursive part, add to the list of matches the match when
--its immediate parent is also considered. For that match increase the score
--if the parent contributes another match. Also update the ID of the match
--to the parent's IDs so recursion can keep adding if more matches are found
SELECT M.ReportID, M.RName, N.ID ,N.Parent_ID
, M.Score + CASE WHEN N.CostType= #CostType
OR N.ProfitType=#ProfitType
OR N.UnitID = #UnitID THEN 100 ELSE 0 END as Score
, M.Depth + 1, N.ProfitType , N.CostType , N.UnitID
FROM cteEval as M INNER JOIN #Temp AS N on M.Parent_ID = N.ID
)SELECT TOP 1 * --Drop the "TOP 1 *" to see debugging info (runners up)
FROM cteEval
ORDER BY SCORE DESC, DEPTH
DROP TABLE #Temp

I'm sorry I don't have enough rep to comment.
You'll have to define "best answer" (for example, why isn't the answer to null,222,8 7 or null instead of 5?), but here's the approach I'd use:
Derive a new table where ProfitType and CostType are listed explicitly instead of only by inheritance. I would approach that by using a cursor (how awful, I know) and following the parent_id until a ProfitType and CostType is found -- or the root is reached. This presumes an unlimited amount of child/grandchild levels for parent_id. If there is a limit, then you can instead use N self joins where N is the number of parent_id levels allowed.
Then you run multiple queries against the derived table. The first query would be for an exact match (and then exit if found). Then next query would be for the "best" partial match (then exit if found), followed by queries for 2nd best, 3rd best, etc. until you've exhausted your "best" match criteria.
If you need nested parent CostTypes to be part of the "best match" criteria, then I would make duplicate entries in the derived table for each row that has multiple CostTypes with a CostType "level". level 1 is the actual CostType. level 2 is that CostType's parent, level 3 etc. Then your best match queries would return multiple rows and you'd need to pick the row with the lowest level (which is the closest parent/grandparent).

SQLite - select rows by existance of date in given intervals

I have an sqlite3 database with two tables that looks like this:
Table: Position
| pk | name | ...
------------------
| 1 | pos1 | ...
| 2 | pos2 | ...
Table: Status
| pk_position | datetime | ...
----------------------
| 1 | 20170201 | ...
| 1 | 20170204 | ...
| 1 | 20170205 | ...
| 1 | 20170207 | ...
| 2 | 20170204 | ...
| 2 | 20170201 | ...
| 2 | 20170208 | ...
Where datetime is "YYYYMMDD" (i.e. %Y%m%d) and pk_position is a ForeginKey of the table Position.
I need the following: given two intervals of time int1 = [day1:day2] and int2 = [day3:day4], I want a unique selection of pk_position for which there exists at least 1 row with datetime contained in each interval.
Examples (using example tables):
int1 = ["20170201" : "20170202"] and int2 = ["20170202" : "20170203"] => (null)
int1 = ["20170204" : "20170205"] and int2 = ["20170205" : "20170206"] => 1
int1 = ["20170203" : "20170204"] and int2 = ["20170204" : "20170205"] => 1, 2
I tried to use the EXISTS but I can't find any smart way to achieve this.
Thanks!
OBS: I tried to keep the question as broad as possible. In reality in all my use cases the intervals will have the form [day1 : day2], [day2 : day3] (i.e. they share a common boundary), just like all examples. If doing this for a common boundary is easier, I'll be happy with a solution to this simpler problem.

I think this will get you there (I don't have a way to test SQLite, so it wouldn't surprise me if there wasn't a missing comma, or the semicolons are wrong, or something like that, but this is the logic in SQL, adapted to SQLite as best I can without final testing):
CREATE TEMP TABLE
_time_intervals
(
int1 TEXT,
int2 TEXT,
int1Start TEXT,
int1End TEXT,
int2Start TEXT,
int2End TEXT
);
INSERT INTO
_time_intervals(int1, int2)
VALUES ( '["20170201" : "20170202"]', '["20170202" : "20170203"]' );
UPDATE _time_intervals
SET int1Start = SUBSTR(int1, 3, 8),
SET int1End = SUBSTR(int1, 16, 8),
SET int2Start = SUBSTR(int2, 3, 8),
SET int2End = SUBSTR(int2, 16, 8)
SELECT
pk_position
FROM
(
SELECT
s.pk_position,
1 AS int1_counter,
0 AS int2_counter
FROM
Status AS s
WHERE
s.datetime BETWEEN (SELECT int1Start FROM _time_intervals) AND (SELECT int1End FROM _time_intervals)
UNION ALL
SELECT
s.pk_position,
0 AS int1_counter,
1 AS int2_counter
FROM
Status AS s
WHERE
s.datetime BETWEEN (SELECT int2Start FROM _time_intervals) AND (SELECT int2End FROM _time_intervals)
) AS sq
GROUP BY
pk_position
HAVING
SUM(int1_counter) > 0
AND
SUM(int2_counter) > 0;
DROP TABLE _time_intervals;

Find sql connected component between many to many entities

I have two basic entities: financial plan and purchase request. Theese two entities are in many-to-many relationship:
CREATE TABLE FinancialPlan
(
ID int NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE PurchaseRequest
(
ID int NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE FP_PR
(
FP_ID FOREIGN KEY REFERENCES FinancialPlan(ID),
PR_ID FOREIGN KEY REFERENCES PurchaseRequest(ID)
);
Problem: find all requests, related to specified plan, and all plans, related to requests, related to specified plan, ...
Model could be represented as a graph, where each node represents a plan, or a request, and each edge represents a relationship, then the problem could be rephrased as find connected component, which specified node belongs to.
Example:
Plan Request FP_PR
ID | ID | FP_ID|PR_ID|
----| ----| -----|-----|
1 | 1 | 1 |1 |
2 | 2 | 2 |1 |
3 | 3 | 2 |2 |
4 | 3 |2 |
5 | 4 |2 |
5 |3 |
Find connected component of finplan ID=1
Desired output:
FP_ID | PR_ID|
------+------+
1 | 1 |
2 | 1 |
2 | 2 |
3 | 2 |
4 | 2 |
I am currently doing it recursively on app side, which may generate to many requests and hang the DB server, could this be done with some recursive DB approach?
Visualization:
Starting entity is marked by arrow.
Desired output is circled.

SQL Server solution
I guess the main problem is you need to compare by PR_ID then FP_ID. So in recursive part there must be a CASE statement. On 1 run we take data by FP_ID on second by PR_ID and etc with the help of modulo.
DECLARE #fp int = 1
;WITH cte AS (
SELECT f.FP_ID,
f.PR_ID,
1 as lev
FROM #FP_PR f
WHERE f.FP_id = #fp
UNION ALL
SELECT f.FP_ID,
f.PR_ID,
lev+1
FROM cte c
CROSS JOIN #FP_PR f -- You can use INNER JOIN instead
WHERE CASE (lev+1)%2 WHEN 0 THEN f.PR_ID WHEN 1 THEN f.FP_ID END = CASE (lev+1)%2 WHEN 0 THEN c.PR_ID WHEN 1 THEN c.FP_ID END
AND NOT (f.PR_ID = c.PR_ID AND f.FP_ID = c.FP_ID)
)
SELECT *
FROM cte
Output:
FP_ID PR_ID lev
1 1 1
2 1 2
2 2 3
3 2 4
4 2 4

Insert into multiple tables

A brief explanation on the relevant domain part:
A Category is composed of four data:
Gender (Male/Female)
Age Division (Mighty Mite to Master)
Belt Color (White to Black)
Weight Division (Rooster to Heavy)
So, Male Adult Black Rooster forms one category. Some combinations may not exist, such as mighty mite black belt.
An Athlete fights Athletes of the same Category, and if he classifies, he fights Athletes of different Weight Divisions (but of the same Gender, Age and Belt).
To the modeling. I have a Category table, already populated with all combinations that exists in the domain.
CREATE TABLE Category (
[Id] [int] IDENTITY(1,1) NOT NULL,
[AgeDivision_Id] [int] NULL,
[Gender] [int] NULL,
[BeltColor] [int] NULL,
[WeightDivision] [int] NULL
)
A CategorySet and a CategorySet_Category, which forms a many to many relationship with Category.
CREATE TABLE CategorySet (
[Id] [int] IDENTITY(1,1) NOT NULL,
[Championship_Id] [int] NOT NULL,
)
CREATE TABLE CategorySet_Category (
[CategorySet_Id] [int] NOT NULL,
[Category_Id] [int] NOT NULL
)
Given the following result set:
| Options_Id | Championship_Id | AgeDivision_Id | BeltColor | Gender | WeightDivision |
|------------|-----------------|----------------|-----------|--------|----------------|
1. | 2963 | 422 | 15 | 7 | 0 | 0 |
2. | 2963 | 422 | 15 | 7 | 0 | 1 |
3. | 2963 | 422 | 15 | 7 | 0 | 2 |
4. | 2963 | 422 | 15 | 7 | 0 | 3 |
5. | 2964 | 422 | 15 | 8 | 0 | 0 |
6. | 2964 | 422 | 15 | 8 | 0 | 1 |
7. | 2964 | 422 | 15 | 8 | 0 | 2 |
8. | 2964 | 422 | 15 | 8 | 0 | 3 |
Because athletes may fight two CategorySets, I need CategorySet and CategorySet_Category to be populated in two different ways (it can be two queries):
One Category_Set for each row, with one CategorySet_Category pointing to the corresponding Category.
One Category_Set that groups all WeightDivisions in one CategorySet in the same AgeDivision_Id, BeltColor, Gender. In this example, only BeltColor varies.
So the final result would have a total of 10 CategorySet rows:
| Id | Championship_Id |
|----|-----------------|
| 1 | 422 |
| 2 | 422 |
| 3 | 422 |
| 4 | 422 |
| 5 | 422 |
| 6 | 422 |
| 7 | 422 |
| 8 | 422 |
| 9 | 422 | /* groups different Weight Division for BeltColor 7 */
| 10 | 422 | /* groups different Weight Division for BeltColor 8 */
And CategorySet_Category would have 16 rows:
| CategorySet_Id | Category_Id |
|----------------|-------------|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
| 7 | 7 |
| 8 | 8 |
| 9 | 1 | /* groups different Weight Division for BeltColor 7 */
| 9 | 2 | /* groups different Weight Division for BeltColor 7 */
| 9 | 3 | /* groups different Weight Division for BeltColor 7 */
| 9 | 4 | /* groups different Weight Division for BeltColor 7 */
| 10 | 5 | /* groups different Weight Division for BeltColor 8 */
| 10 | 6 | /* groups different Weight Division for BeltColor 8 */
| 10 | 7 | /* groups different Weight Division for BeltColor 8 */
| 10 | 8 | /* groups different Weight Division for BeltColor 8 */
I have no idea how to insert into CategorySet, grab it's generated Id, then use it to insert into CategorySet_Category
I hope I've made my intentions clear.
I've also created a SQLFiddle.
Edit 1: I commented in Jacek's answer that this would run only once, but this is false. It will run a couple of times a week. I have the option to run as SQL Command from C# or a stored procedure. Performance is not crucial.
Edit 2: Jacek suggested using SCOPE_IDENTITY to return the Id. Problem is, SCOPE_IDENTITY returns only the last inserted Id, and I insert more than one row in CategorySet.
Edit 3: Answer to #FutbolFan who asked how the FakeResultSet is retrieved.
It is a table CategoriesOption (Id, Price_Id, MaxAthletesByTeam)
And tables CategoriesOptionBeltColor, CategoriesOptionAgeDivision, CategoriesOptionWeightDivison, CategoriesOptionGender. Those four tables are basically the same (Id, CategoriesOption_Id, Value).
The query look like this:
SELECT * FROM CategoriesOption co
LEFT JOIN CategoriesOptionAgeDivision ON
CategoriesOptionAgeDivision.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionBeltColor ON
CategoriesOptionBeltColor.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionGender ON
CategoriesOptionGender.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionWeightDivision ON
CategoriesOptionWeightDivision.CategoriesOption_Id = co.Id

The solution described here will work correctly in multi-user environment and when destination tables CategorySet and CategorySet_Category are not empty.
I used schema and sample data from your SQL Fiddle.
First part is straight-forward
(ab)use MERGE with OUTPUT clause.
MERGE can INSERT, UPDATE and DELETE rows. In our case we need only to INSERT. 1=0 is always false, so the NOT MATCHED BY TARGET part is always executed. In general, there could be other branches, see docs. WHEN MATCHED is usually used to UPDATE; WHEN NOT MATCHED BY SOURCE is usually used to DELETE, but we don't need them here.
This convoluted form of MERGE is equivalent to simple INSERT, but unlike simple INSERT its OUTPUT clause allows to refer to the columns that we need.
MERGE INTO CategorySet
USING
(
SELECT
FakeResultSet.Championship_Id
,FakeResultSet.Price_Id
,FakeResultSet.MaxAthletesByTeam
,Category.Id AS Category_Id
FROM
FakeResultSet
INNER JOIN Category ON
Category.AgeDivision_Id = FakeResultSet.AgeDivision_Id AND
Category.Gender = FakeResultSet.Gender AND
Category.BeltColor = FakeResultSet.BeltColor AND
Category.WeightDivision = FakeResultSet.WeightDivision
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
(Championship_Id
,Price_Id
,MaxAthletesByTeam)
VALUES
(Src.Championship_Id
,Src.Price_Id
,Src.MaxAthletesByTeam)
OUTPUT inserted.id AS CategorySet_Id, Src.Category_Id
INTO CategorySet_Category (CategorySet_Id, Category_Id)
;
FakeResultSet is joined with Category to get Category.id for each row of FakeResultSet. It is assumed that Category has unique combinations of AgeDivision_Id, Gender, BeltColor, WeightDivision.
In OUTPUT clause we need columns from both source and destination tables. The OUTPUT clause in simple INSERT statement doesn't provide them, so we use MERGE here that does.
The MERGE query above would insert 8 rows into CategorySet and insert 8 rows into CategorySet_Category using generated IDs.
Second part
needs temporary table. I'll use a table variable to store generated IDs.
DECLARE #T TABLE (
CategorySet_Id int
,AgeDivision_Id int
,Gender int
,BeltColor int);
We need to remember the generated CategorySet_Id together with the combination of AgeDivision_Id, Gender, BeltColor that caused it.
MERGE INTO CategorySet
USING
(
SELECT
FakeResultSet.Championship_Id
,FakeResultSet.Price_Id
,FakeResultSet.MaxAthletesByTeam
,FakeResultSet.AgeDivision_Id
,FakeResultSet.Gender
,FakeResultSet.BeltColor
FROM
FakeResultSet
GROUP BY
FakeResultSet.Championship_Id
,FakeResultSet.Price_Id
,FakeResultSet.MaxAthletesByTeam
,FakeResultSet.AgeDivision_Id
,FakeResultSet.Gender
,FakeResultSet.BeltColor
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
(Championship_Id
,Price_Id
,MaxAthletesByTeam)
VALUES
(Src.Championship_Id
,Src.Price_Id
,Src.MaxAthletesByTeam)
OUTPUT
inserted.id AS CategorySet_Id
,Src.AgeDivision_Id
,Src.Gender
,Src.BeltColor
INTO #T(CategorySet_Id, AgeDivision_Id, Gender, BeltColor)
;
The MERGE above would group FakeResultSet as needed and insert 2 rows into CategorySet and 2 rows into #T.
Then join #T with Category to get Category.IDs:
INSERT INTO CategorySet_Category (CategorySet_Id, Category_Id)
SELECT
TT.CategorySet_Id
,Category.Id AS Category_Id
FROM
#T AS TT
INNER JOIN Category ON
Category.AgeDivision_Id = TT.AgeDivision_Id AND
Category.Gender = TT.Gender AND
Category.BeltColor = TT.BeltColor
;
This will insert 8 rows into CategorySet_Category.

Here is not the full answer, but direction which you can use to solve this:
1st query:
select row_number() over(order by t, Id) as n, Championship_Id
from (
select distinct 0 as t, b.Id, a.Championship_Id
from FakeResultSet as a
inner join
Category as b
on
a.AgeDivision_Id=b.AgeDivision_Id and
a.Gender=b.Gender and
a.BeltColor=b.BeltColor and
a.WeightDivision=b.WeightDivision
union all
select distinct 1, BeltColor, Championship_Id
from FakeResultSet
) as q
2nd query:
select q2.CategorySet_Id, c.Id as Category_Id from (
select row_number() over(order by t, Id) as CategorySet_Id, Id, BeltColor
from (
select distinct 0 as t, b.Id, null as BeltColor
from FakeResultSet as a
inner join
Category as b
on
a.AgeDivision_Id=b.AgeDivision_Id and
a.Gender=b.Gender and
a.BeltColor=b.BeltColor and
a.WeightDivision=b.WeightDivision
union all
select distinct 1, BeltColor, BeltColor
from FakeResultSet
) as q
) as q2
inner join
Category as c
on
(q2.BeltColor is null and q2.Id=c.Id)
OR
(q2.BeltColor = c.BeltColor)
of course this will work only for empty CategorySet and CategorySet_Category tables, but you can use select coalese(max(Id), 0) from CategorySet to get current number and add it to row_number, thus you will get real ID which will be inserted into CategorySet row for second query

What I do when I run into these situations is to create one or many temporary tables with row_number() over clauses giving me identities on the temporary tables. Then I check for the existence of each record in the actual tables, and if they exist update the temporary table with the actual record ids. Finally I run a while exists loop on the temporary table records missing the actual id and insert them one at a time, after the insert I update the temporary table record with the actual ids. This lets you work through all the data in a controlled manner.

##IDENTITY is your friend to the 2nd part of question
https://msdn.microsoft.com/en-us/library/ms187342.aspx
and
Best way to get identity of inserted row?
Some API (drivers) returns int from update() function, i.e. ID if it is "insert". What API/environment do You use?
I don't understand 1st problem. You should not insert identity column.

Below query will give final result For CategorySet rows:
SELECT
ROW_NUMBER () OVER (PARTITION BY Championship_Id ORDER BY Championship_Id) RNK,
Championship_Id
FROM
(
SELECT
Championship_Id
,BeltColor
FROM #FakeResultSet
UNION ALL
SELECT
Championship_Id,BeltColor
FROM #FakeResultSet
GROUP BY Championship_Id,BeltColor
)BASE

Selecting a record based on integer being in an array field

I have a database of houses. Within the houses mssql database record is a field called areaID. A house could be in multiple areas so an entry could be as follows in the database:
+---------+----------------------+-----------+-------------+-------+
| HouseID | AreaID | HouseType | Description | Title |
+---------+----------------------+-----------+-------------+-------+
| 21 | 17, 32, 53 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
| 23 | 23, 73 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
| 24 | 53, 12, 153, 72, 153 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
| 23 | 23, 53 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
If I open a page that called for houses only in area 53 how would I search for it. I know in MySQL you can use find_in_SET but I am using Microsoft SQL Server 2005.

If your formatting is EXACTLY
N1, N2 (e.g.) one comma and space between each N
Then use this WHERE clause
WHERE ', ' + AreaID + ',' LIKE '%, 53,%'
The addition of the prefix and suffix makes every number, anywhere in the list, consistently wrapped by comma-space and suffixed by comma. Otherwise, you may get false positives with 53 appearing in part of another number.
Note
A LIKE expression will be anything but fast, since it will always scan the entire table.
You should consider normalizing the data into two tables:
Tables become
House
+---------+----------------------+----------+
| HouseID | HouseType | Description | Title |
+---------+----------------------+----------+
| 21 | B | data | data |
| 23 | B | data | data |
| 24 | B | data | data |
| 23 | B | data | data |
+---------+----------------------+----------+
HouseArea
+---------+-------
| HouseID | AreaID
+---------+-------
| 21 | 17
| 21 | 32
| 21 | 53
| 23 | 23
| 23 | 73
..etc
Then you can use
select * from house h
where exists (
select *
from housearea a
where h.houseid=a.houseid and a.areaid=53)

2 options, change the id's of AreaId so that you can use the & operator OR create a table that links the House and Area's....

What datatype is AreaID?
If it's a text field you could something like
WHERE (
AreaID LIKE '53,%' -- Covers: multi number seq w/ 53 at beginning
OR AreaID LIKE '% 53,%' -- Covers: multi number seq w/ 53 in middle
OR AreaID LIKE '% 53' -- Covers: multi number seq w/ 53 at end
OR AreaID = '53' -- Covers: single number seq w/ only 53
)
Note: I haven't used SQL-Server in some time, so I'm not sure about the operators. PostgreSQL has a regex function, which would be better at condensing that WHERE statement. Also, I'm not sure if the above example would include numbers like 253 or 531; it shouldn't but you still need to verify.
Furthermore, there are a bunch of functions that iterate through arrays, so storing it as an array vs text might be better. Finally, this might be a good example to use a stored procedure, so you can call your homebrewed function instead of cluttering your SQL.

Use a Split function to convert comma-separated values into rows.
CREATE TABLE Areas (AreaID int PRIMARY KEY);
CREATE TABLE Houses (HouseID int PRIMARY KEY, AreaIDList varchar(max));
GO
INSERT INTO Areas VALUES (84);
INSERT INTO Areas VALUES (24);
INSERT INTO Areas VALUES (66);
INSERT INTO Houses VALUES (1, '84,24,66');
INSERT INTO Houses VALUES (2, '24');
GO
CREATE FUNCTION dbo.Split (#values varchar(512)) RETURNS table
AS
RETURN
WITH Items (Num, Start, [Stop]) AS (
SELECT 1, 1, CHARINDEX(',', #values)
UNION ALL
SELECT Num + 1, [Stop] + 1, CHARINDEX(',', #values, [Stop] + 1)
FROM Items
WHERE [Stop] > 0
)
SELECT Num, SUBSTRING(#values, Start,
CASE WHEN [Stop] > 0 THEN [Stop] - Start ELSE LEN(#values) END) Value
FROM Items;
GO
CREATE VIEW dbo.HouseAreas
AS
SELECT h.HouseID, s.Num HouseAreaNum,
CASE WHEN s.Value NOT LIKE '%[^0-9]%'
THEN CAST(s.Value AS int)
END AreaID
FROM Houses h
CROSS APPLY dbo.Split(h.AreaIDList) s
GO
SELECT DISTINCT h.HouseID, ha.AreaID
FROM Houses h
INNER JOIN HouseAreas ha ON ha.HouseID = h.HouseID
WHERE ha.AreaID = 24

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas