database model structure - sql

I have a column groups. Groups has different type stored in group_types (buyers, sellers, referee). Only when the group is of type buyer it has another type (more specialized) like electrical and mechanical.
I'm a bit puzzled with how I will store this in a database.
Someone can suggest me a database structure?
thanks

Store your group_types as a hieararchical table (with nested sets or parent-child model):
Parent-child:
typeid parent name
1 0 Buyers
2 0 Sellers
3 0 Referee
4 1 Electrical
5 1 Mechanic
SELECT *
FROM mytable
WHERE group IN
(
SELECT typeid
FROM group_types
START WITH
typeid = 1
CONNECT BY
parent = PRIOR typeid
)
will select all buyers in Oracle.
Nested sets:
typeid lower upper Name
1 1 2 Buyers
2 3 3 Sellers
3 4 4 Referee
4 1 1 Electrical
5 2 2 Mechanic
SELECT *
FROM group_types
JOIN mytable
ON group BETWEEN lower AND upper
WHERE typeid = 1
will select all buyers in any database.
Nested sets is implementable anywhere and more performant, if you don't need hierarchical ordering or frequent updates on group_types.
Parent-child is implementable easily in Oracle and SQL Server and with a little effort in MySQL. It allow easy structure changing and hierarchical ordering.
See this article in my blog on how to implement it in MySQL:
Hierarchical queries in MySQL

You could possibly store additional types like, buyer_mechanical or buyer_electrical.

You could try:
Group
group_id
group_name
group_parent_id
with entries (1, buyers, 0), (2, sellers, 0), (3, referee, 0), (4, electrical, 1), (5, mechanical, 1)
This has the advantage of being infinitely scalable, so each subgroup can have as many subgroups as you want.

Typically, you have extension tables. These are simply additional tables in your schema which hold additional information linked to the main table by some type of key
For example let's say your main table is:
People
PersonId int, PK
GroupTypeId int, FK to GroupTypes
Name varchar(100)
GroupTypes
GroupTypeId int, PK
GroupTypeName varchar(20)
BuyerTypes
BuyerTypeId int, PK
BuyerTypeName varchar(20)
BuyerData
PersonId int, FK
BuyerTypeId int FK
====
Additionally, the BuyerData would have a composite primary key (PK) on PersonId and BuyerTypeId
When pulling Buyer data out, you could use a query like
SELECT *
FROM People P
INNER JOIN BuyerData BD on (P.PersonId = BD.PersonId)
INNER JOIN BuyerTypes BT on (BD.BuyerTypeId = BT.BuyerTypeId)

grouptype: ID, Name ('buyers', 'sellers', 'referee')
group: GroupTypeID, ID, Name ('electrical' and 'mechanical' if grouptypeid == 'buyers')
contact: GroupTypeID (NOT NULL), GroupID (NULL), other attributes
Table Group is populated with records for GroupTypes as required.
Contact.GroupID can be NULL since a GroupType need not have any Groups.
UI has to take care of Group selection. You can have a trigger check the group/type logic.

Related

DB: How to set up a many to many table(s) to handle multiple selectable conditions

I am working on a search filter for a website that will help users find a venue(for get-togethers and ceremonies) that meets their needs. Filters would include such things as: style, amenities, event type, etc. Multiple options in a category can apply to a venue, so a user can select multiple options from style, amenities and event type categories when searching.
My issue is in how I should approach the table design in the database. Currently I have a Venue table with a unique id and basic information, and a number of tables representing each category (style, amenities, etc) where they contain an id and name field.
I know that I need an intermediary table to hold foreign keys, so each option applicable to a category is associated to the venue.
Option 1: Create for each category table a many to many intermediary table with foreign keys to that category and the venue.
Option 2: Create one large intermediary table with foreign keys for every category, as well as the Venue
i.e.
fk_venue
fk_style
fk_amenities
...
I am trying to decide what is more efficient and less of a problem in coding for. Option 1 would require a query to each table which may become complicated to work with, where as option 2 seems easier to query but might have a much larger number of records to handle a venue with many amenities AND event types for example.
This doesn't seem like a new problem but I have had trouble finding resources that detail how best to approach this. We are currently using MSSQL for the DB and are building the site using .net core.
Go with option one. Create a join table to record the many-to-many relationships of each available feature of a venue. Option 2 is very wasteful in terms of storage. Consider a case where you have a venue with only one amenity, when 50 amenities types are available. Also, as I understand what you are suggesting for option 2, you would have to update your database design each time you add an amenity, event_type, or style. That would be a very difficult thing support wise.
In the case of Option 1, some of the tables would be:
Table Name: venue_amenities
Columns: venue_id, amenity_id
Table Name: venue_event_types
Columns: venue_id, event_type_id
Table Name: venue_styles
Columns: venue_id, style_id
When you query everything with a filter, you could query it like:
select distinct
v.venue_id
from venues v
inner join venue_amenities va on v.venue_id = va.venue_id
inner join venue_event_types vet on v.venue_id = vet.venue_id
inner join venue_styles vs on v.venue_id = vs.venue_id
where va.amenity_id in ([selected amenities])
and vet.event_type_id in ([selected event types])
and vs.venue_style in ([selected styles])
Option 3: You could start out with a meta data design. This would allow you to have multiple records per item or entity.
Often these things evolve with the development of tasks, or the evolution of the process and learning the data or the customer understanding some of the finer details that are drawn out as time goes on.
I've seen similar things where people design for hashtags or white lists, searching for that might get you closer to what you are looking for. Here is a working example to get you started.
declare #venue as table(
VenueID int identity(1,1) not null primary key clustered
, Name_ nvarchar(255) not null
, Address_ nvarchar(255) null
);
declare #venueType as table (
VenueTypeID int identity(1,1) not null primary key clustered
, VenueType nvarchar(255) not null
);
declare #venueStuff as table (
VenueStuffID int identity(1,1) not null primary key clustered
, VenueID int not null -- constraint back to venueid
, VenueTypeID int not null -- constraint to dim or lookup table for ... attribute types
, AttributeValue nvarchar(255) not null
);
insert into #venue (Name_)
select 'Bob''s Funhouse'
insert into #venueStuff (VenueID, VenueTypeID, AttributeValue)
select 1, 1, 'Scarrrrry' union all
select 1, 2, 'Food Avaliable' union all
select 1, 3, 'Game tables provided' union all
select 1, 4, 'Creepy';
insert into #venueType (VenueType)
select 'Haunted House Theme' union all
select 'Gaming' union all
select 'Concessions' union all
select 'post apocalyptic';
select a.Name_
, b.AttributeValue
, c.VenueType
from #venue a
join #venueStuff b
on a.VenueID = b.VenueID
join #venueType c
on c.VenueTypeID = b.VenueTypeID

Merge and order rows

I have a table in the following structure. I am writing a query to get all item_ids where key_name='topic' and key_string_value='investing', which is the simple part.
select item_id from table where key_name='topic' and key_string_value='investing'
But then for all the item_ids returned above, I want to order them by the values set for each item_id in key_name='importance' and key_name='product'.The table structure is making it very difficult as I am not an SQL expert. Any help would be appreciated.
item_id key_name key_string_value Key_float_value
1 topic investing null
1 importance null 500
1 product A null
1 product B null
2 topic Starting null
2 product B null
2 importance null 300
2 topic retail null
3 importance null 400
3 topic investing null
3 product C null
4 topic Starting null
4 topic investing null
4 importance null 400
4 product D null
#Schwern is on right - your structure should be normalized, and the names should be better too. All this makes me think: homework.
The answer to the homework question is a self join, and looks like this:
select t1.item_id , imp.key_float_value, prd.key_string_value
from [table] t1
LEFT OUTER JOIN [table] imp on imp.item_id = t1.item_id and imp.key_name='importance'
LEFT OUTER JOIN [table] prd on prd.item_id = t1.item_id and prd.key_name='product'
where t1.key_name='topic' and t1.key_string_value='investing'
ORDER BY imp.key_float_value, prd.key_string_value
The square brackets on `[table] are because the use of the table keyword as the table name requires the name to be delimited. Square brackets for TSQL. Others use double quotes (")
You have a very poorly design table that will be slow and difficult to work with. SQL is not a key/value store; it works on rows, columns and relationships. Rather than fight it, I would suggest redesigning it. Either use a NoSQL database which is easier to use and works more like normal programming data structures, or redesign it.
Here's the redesign I would suggest.
CREATE TABLE item (
id INTEGER PRIMARY KEY,
importance INTEGER DEFAULT 0
);
CREATE TABLE item_topics (
item_id INTEGER REFERENCES item(id),
topic TEXT NOT NULL
);
CREATE TABLE item_products (
item_id INTEGER REFERENCES item(id),
product TEXT NOT NULL
);
The item itself, and any scalar (ie. single value) attributes go into one table. Anything which can be a list (products and topics) needs its own table relating each item to its elements. If this seems clunky, that's because it is, but that's how SQL works.
To find all items whose topic is investing, you have to join on the item_topics table.
SELECT item.id
FROM item
JOIN item_topics ON item.id = item_topics.id
WHERE topic = 'investing'
Then to order them, add ORDER BY item.importance.

User to location mapping with country state and city in the same table

I have a user table that has among others the fields CityId, StateId, CountryId. I was wondering if it was a good idea to store them[City, State, Country] in separate tables and put their respective ids in the User table or put all the three entities in one table.
While the former is conventional, I am concerned about the extra tables to join and so would want to store all these three different location types in one table like so
RowId - unique row id
LocationType - 1 for City, 2 for state, etc
ActualLocation - Can be a city name if the locationType is 1 and so on..
RowId LocationType ActualLocation
1 1 Waltham
2 1 Yokohama
3 2 Delaware
4 2 Wyoming
5 3 US
6 3 Japan
the problem is I am only able to get the city name for all three fields using a join like this
select L.ActualLocation as CityName,
L.ActualLocation as StateName,
L.ActualLocation as CountryName
from UserTable U,
AllLocations L
WHERE
(L.ID = U.City and L.LocationType= 1)
AND
(L.ID = U.State and L.LocationType = 2)
What worked best for us was to have a country table (totally separate table, which can store other country related information, a state table (ditto), and then the city table with ID's to the other tables.
CREATE TABLE Country (CountryID int, Name varchar(50))
CERATE TABLE State (StateID int, CountryID int, Name varchar(50))
CREATE TABLE City (CityID int, StateID int, Name varchar(50))
This way you can enforce referential integrity using standard database functions and add additional information about each entity without having a bunch of blank columns or 'special' values.
You actually need to select from your location table three times - so you will still have the joins:
select L1.ActualLocation as CityName,
L2.ActualLocation as StateName,
L3.ActualLocation as CountryName
from UserTable U,
AllLocations L1,
AllLocations L2,
AllLocations L3
WHERE
(L1.ID = U.City and L1.LocationType= 1)
AND
(L2.ID = U.State and L2.LocationType = 2)
AND
(L3.ID = U.Country and L3.LocationType = 3)
HOWEVER
Depending what you want to do with this, you might want to think about the model... You probably want a separate table that would contain the location "Springfield Missouri" and "Springfield Illinois" - depending how "well" you want to manage this data, you would need to manage the states and countries as separate inter-related reference data (see, for example, ISO 3361 part 2). Most likely overkill for you though, and it might be easiest just to store the text of the location with the user - not "pure" modeling, but much simplified for simple needs... just pulling the "word" out into a separate table doesn't really give you much other than complex queries

What the simplest way to sub-query a variable number of rows into fields of the parent query?

What the simplest way to sub-query a variable number of rows into fields of the parent query?
PeopleTBL
NameID int - unique
Name varchar
Data: 1,joe
2,frank
3,sam
HobbyTBL
HobbyID int - unique
HobbyName varchar
Data: 1,skiing
2,swimming
HobbiesTBL
NameID int
HobbyID int
Data: 1,1
2,1
2,2
The app defines 0-2 Hobbies per NameID.
What the simplest way to query the Hobbies into fields retrieved with "Select * from PeopleTBL"
Result desired based on above data:
NameID Name Hobby1 Hobby2
1 joe skiing
2 frank skiing swimming
3 sam
I'm not sure if I understand correctly, but if you want to fetch all the hobbies for a person in one row, the following query might be useful (MySQL):
SELECT NameID, Name, GROUP_CONCAT(HobbyName) AS Hobbies
FROM PeopleTBL
JOIN HobbiesTBL USING NameID
JOIN HobbyTBL USING HobbyID
Hobbies column will contain all hobbies of a person separated by ,.
See documentation for GROUP_CONCAT for details.
I don't know what engine are you using, so I've provided an example with MySQL (I don't know what other sql engines support this).
Select P.NameId, P.Name
, Min( Case When H2.HobbyId = 1 Then H.HobbyName End ) As Hobby1
, Min( Case When H2.HobbyId = 2 Then H.HobbyName End ) As Hobby2
From HobbyTbl As H
Join HobbiesTbl As H2
On H2.HobbyId = H.HobbyId
Join PeopleTbl As P
On P.NameId = H2.NameId
Group By P.NameId, P.Name
What you are seeking is called a crosstab query. As long as the columns are static, you can use the above solution. However, if you want to dynamic build the columns, you need to build the SQL statement in middle-tier code or use a reporting tool.

SQL - How to store and navigate hierarchies?

What are the ways that you use to model and retrieve hierarchical info in a database?
I like the Modified Preorder Tree Traversal Algorithm. This technique makes it very easy to query the tree.
But here is a list of links about the topic which I copied from the Zend Framework (PHP) contributors webpage (posted there by Posted by Laurent Melmoux at Jun 05, 2007 15:52).
Many of the links are language agnostic:
There is 2 main representations and algorithms to represent hierarchical structures with databases :
nested set also known as modified preorder tree traversal algorithm
adjacency list model
It's well explained here:
http://www.sitepoint.com/article/hierarchical-data-database
Managing Hierarchical Data in MySQL
http://www.evolt.org/article/Four_ways_to_work_with_hierarchical_data/17/4047/index.html
Here are some more links that I've collected:
http://en.wikipedia.org/wiki/Tree_%28data_structure%29
http://en.wikipedia.org/wiki/Category:Trees_%28structure%29
adjacency list model
http://www.sqlteam.com/item.asp?ItemID=8866
nested set
http://www.sqlsummit.com/AdjacencyList.htm
http://www.edutech.ch/contribution/nstrees/index.php
http://www.phpriot.com/d/articles/php/application-design/nested-trees-1/
http://www.dbmsmag.com/9604d06.html
http://en.wikipedia.org/wiki/Tree_traversal
http://www.cosc.canterbury.ac.nz/mukundan/dsal/BTree.html (applet java montrant le fonctionnement )
Graphes
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
Classes :
Nested Sets DB Tree Adodb
http://www.phpclasses.org/browse/package/2547.html
Visitation Model ADOdb
http://www.phpclasses.org/browse/package/2919.html
PEAR::DB_NestedSet
http://pear.php.net/package/DB_NestedSet
utilisation : https://www.entwickler.com/itr/kolumnen/psecom,id,26,nodeid,207.html
PEAR::Tree
http://pear.php.net/package/Tree/download/0.3.0/
http://www.phpkitchen.com/index.php?/archives/337-PEARTree-Tutorial.html
nstrees
http://www.edutech.ch/contribution/nstrees/index.php
The definitive pieces on this subject have been written by Joe Celko, and he has worked a number of them into a book called Joe Celko's Trees and Hierarchies in SQL for Smarties.
He favours a technique called directed graphs. An introduction to his work on this subject can be found here
What's the best way to represent a hierachy in a SQL database? A generic, portable technique?
Let's assume the hierachy is mostly read, but isn't completely static. Let's say it's a family tree.
Here's how not to do it:
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date,
mother integer,
father integer
);
And inserting data like this:
person_id name dob mother father
1 Pops 1900/1/1 null null
2 Grandma 1903/2/4 null null
3 Dad 1925/4/2 2 1
4 Uncle Kev 1927/3/3 2 1
5 Cuz Dave 1953/7/8 null 4
6 Billy 1954/8/1 null 3
Instead, split your nodes and your relationships into two tables.
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date
);
create table ancestor (
ancestor_id integer,
descendant_id integer,
distance integer
);
Data is created like this:
person_id name dob
1 Pops 1900/1/1
2 Grandma 1903/2/4
3 Dad 1925/4/2
4 Uncle Kev 1927/3/3
5 Cuz Dave 1953/7/8
6 Billy 1954/8/1
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
1 3 1
2 3 1
1 4 1
2 4 1
1 5 2
2 5 2
4 5 1
1 6 2
2 6 2
3 6 1
you can now run arbitary queries that don't involve joining the table back on itself, which would happen if you have the heirachy relationship in the same row as the node.
Who has grandparents?
select * from person where person_id in
(select descendant_id from ancestor where distance=2);
All your descendants:
select * from person where person_id in
(select descendant_id from ancestor
where ancestor_id=1 and distance>0);
Who are uncles?
select decendant_id uncle from ancestor
where distance=1 and ancestor_id in
(select ancestor_id from ancestor
where distance=2 and not exists
(select ancestor_id from ancestor
where distance=1 and ancestor_id=uncle)
)
You avoid all the problems of joining a table to itself via subqueries, a common limitation is 16 subsuqeries.
Trouble is, maintaining the ancestor table is kind of hard - best done with a stored procedure.
I've got to disagree with Josh. What happens if you're using a huge hierarchical structure like a company organization. People can join/leave the company, change reporting lines, etc... Maintaining the "distance" would be a big problem and you would have to maintain two tables of data.
This query (SQL Server 2005 and above) would let you see the complete line of any person AND calculates their place in the hierarchy and it only requires a single table of user information. It can be modified to find any child relationship.
--Create table of dummy data
create table #person (
personID integer IDENTITY(1,1) NOT NULL,
name varchar(255) not null,
dob date,
father integer
);
INSERT INTO #person(name,dob,father)Values('Pops','1900/1/1',NULL);
INSERT INTO #person(name,dob,father)Values('Grandma','1903/2/4',null);
INSERT INTO #person(name,dob,father)Values('Dad','1925/4/2',1);
INSERT INTO #person(name,dob,father)Values('Uncle Kev','1927/3/3',1);
INSERT INTO #person(name,dob,father)Values('Cuz Dave','1953/7/8',4);
INSERT INTO #person(name,dob,father)Values('Billy','1954/8/1',3);
DECLARE #OldestPerson INT;
SET #OldestPerson = 1; -- Set this value to the ID of the oldest person in the family
WITH PersonHierarchy (personID,Name,dob,father, HierarchyLevel) AS
(
SELECT
personID
,Name
,dob
,father,
1 as HierarchyLevel
FROM #person
WHERE personID = #OldestPerson
UNION ALL
SELECT
e.personID,
e.Name,
e.dob,
e.father,
eh.HierarchyLevel + 1 AS HierarchyLevel
FROM #person e
INNER JOIN PersonHierarchy eh ON
e.father = eh.personID
)
SELECT *
FROM PersonHierarchy
ORDER BY HierarchyLevel, father;
DROP TABLE #person;
FYI: SQL Server 2008 introduces a new HierarchyID data type for this sort of situation. Gives you control over where in the "tree" your row sits, horizontally as well as vertically.
Oracle: SELECT ... START WITH ... CONNECT BY
Oracle has an extension to SELECT that allows easy tree-based retrieval. Perhaps SQL Server has some similar extension?
This query will traverse a table where the nesting relationship is stored in parent and child columns.
select * from my_table
start with parent = :TOP
connect by prior child = parent;
http://www.adp-gmbh.ch/ora/sql/connect_by.html
I prefer a mix of the techinques used by Josh and Mark Harrison:
Two tables, one with the data of the Person and other with the hierarchichal info (person_id, parent_id [, mother_id]) if the PK of this table is person_id, you have a simple tree with only one parent by node (which makes sense in this case, but not in other cases like accounting accounts)
This hiarchy table can be transversed by recursive procedures or if your DB supports it by sentences like SELECT... BY PRIOR (Oracle).
Other posibility is if you know the max deep of the hierarchy data you want to mantain is use a single table with a set of columns per level of hierarchy
We had the same issue when we implemented a tree component for [fleXive] and used the nested set tree model approach mentioned by tharkun from the MySQL docs.
In addition to speed things (dramatically) up we used a spreaded approach which simply means we used the maximum Long value for the top level right bounds which allows us to insert and move nodes without recalculating all left and right values. Values for left and right are calculated by dividing the range for a node by 3 und use the inner third as bounds for the new node.
A java code example can be seen here.
If you're using SQL Server 2005 then this link explains how to retrieve hierarchical data.
Common Table Expressions (CTEs) can be your friends once you get comfortable using them.