SQL: multi valued attributes - sql

I created a table that contains information about a company. One attribute is their telephone number. A company can have many telephone numbers.
How do I create multi-valued attributes in SQL?

In a separate table like:
CREATE TABLE Company
(
Id int identity primary key,
Name nvarchar(100) not null UNIQUE --UNIQUE is optional
)
GO
CREATE TABLE CompanyPhones
(
Id int identity primary key,
Phone nvarchar(100) not null,
CompanyId int NOT NULL REFERENCES Company(Id) ON DELETE CASCADE
)
How to use these structures:
SELECT CompanyPhones.Phone
FROM Company
JOIN CompanyPhones
ON Company.Id = CompanyPhones.CompanyId
WHERE Company.Name=N'Horns and Hoogs Ltd.'

There is generally no such thing as multi-valued attribute in relational databases.
Possible solutions for your problem:
Create a separate table for storing phone numbers which references your company table by primary key and contains undefinite number of rows per company.
For example, if you have table company with fields id, name, address, ... then you can create a table companyphones with fields companyid, phone.
(NOT recommended in general, but if you only need to show a list of phones on website this might be an option) Storing telephones in a single field using varchar(...) or text and adding separators between numbers.

In addition to Oleg and Sergey's answers, a third option might be to create multiple phone fields on the company table - for example, as SwitchboardPhone and FaxNumber for the main switchboard and the fax line, respectively.
This type of solution is generally regarded as a form of denormalisation, and is generally only suitable where there is a small number of multiple options, each with a clearly defined role.
So, for example, this is quite a common way to represent landline and mobile/cellphone numbers for a contact list table, but would be thoroughly unsuitable for a list of all phone extensions within a company.

There're some possibilities in different implementations of RDBMS.
For example, in PostgreSQL you can use array or hstore or even JSON (in 9.3 version):
create table Company1 (name text, phones text[]);
insert into Company1
select 'Financial Company', array['111-222-3333', '555-444-7777'] union all
select 'School', array['444-999-2222', '555-222-1111'];
select name, unnest(phones) from Company1;
create table Company2 (name text, phones hstore);
insert into Company2
select 'Financial Company', 'mobile=>555-444-7777, fax=>111-222-3333'::hstore union all
select 'School', 'mobile=>444-999-2222, fax=>555-222-1111'::hstore;
select name, skeys(phones), svals(phones) from Company2
sql fiddle demo
You can also create indexes on these fields - https://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys, Can PostgreSQL index array columns?
In SQL Server, you can use xml datatype to store multivalues:
create table Company (name nvarchar(128), phones xml);
insert into Company
select 'Financial Company', '<phone type="mobile">555-444-7777</phone><phone>111-222-3333</phone>' union all
select 'School', '<phone>444-999-2222</phone><phone type="fax">555-222-1111</phone>'
select
c.name,
p.p.value('#type', 'nvarchar(max)') as type,
p.p.value('.', 'nvarchar(max)') as phone
from Company as c
outer apply c.phones.nodes('phone') as p(p)
sql fiddle demo
You can also create xml indexes on xml type column.

Related

PostgreSQL Insert into table with subquery selecting from multiple other tables

I am learning SQL (postgres) and am trying to insert a record into a table that references records from two other tables, as foreign keys.
Below is the syntax I am using for creating the tables and records:
-- Create a person table + insert single row
CREATE TABLE person (
pname VARCHAR(255) NOT NULL,
PRIMARY KEY (pname)
);
INSERT INTO person VALUES ('personOne');
-- Create a city table + insert single row
CREATE TABLE city (
cname VARCHAR(255) NOT NULL,
PRIMARY KEY (cname)
);
INSERT INTO city VALUES ('cityOne');
-- Create a employee table w/ForeignKey reference
CREATE TABLE employee (
ename VARCHAR(255) REFERENCES person(pname) NOT NULL,
ecity VARCHAR(255) REFERENCES city(cname) NOT NULL,
PRIMARY KEY(ename, ecity)
);
-- create employee entry referencing existing records
INSERT INTO employee VALUES(
SELECT pname FROM person
WHERE pname='personOne' AND <-- ISSUE
SELECT cname FROM city
WHERE cname='cityOne
);
Notice in the last block of code, where I'm doing an INSERT into the employee table, I don't know how to string together multiple SELECT sub-queries to get both the existing records from the person and city table such that I can create a new employee entry with attributes as such:
ename='personOne'
ecity='cityOne'
The textbook I have for class doesn't dive into sub-queries like this and I can't find any examples similar enough to mine such that I can understand how to adapt them for this use case.
Insight will be much appreciated.
There doesn’t appear to be any obvious relationship between city and person which will make your life hard
The general pattern for turning a select that has two base tables giving info, into an insert is:
INSERT INTO table(column,list,here)
SELECT column,list,here
FROM
a
JOIN b ON a.x = b.y
In your case there isn’t really anything to join on because your one-column tables have no column in common. Provide eg a cityname in Person (because it seems more likely that one city has many person) then you can do
INSERT INTO employee(personname,cityname)
SELECT p.pname, c.cname
FROM
person p
JOIN city c ON p.cityname = c.cname
But even then, the tables are related between themselves and don’t need the third table so it’s perhaps something of an academic exercise only, not something you’d do in the real world
If you just want to mix every person with every city you can do:
INSERT INTO employee(personname,cityname)
SELECT pname, cname
FROM
person p
CROSS JOIN city c
But be warned, two people and two cities will cause 4 rows to be inserted, and so on (20 people and 40 cities, 800 rows. Fairly useless imho)
However, I trust that the general pattern shown first will suffice for your learning; write a SELECT that shows the data you want to insert, then simply write INSERT INTO table(columns) above it. The number of columns inserted to must match the number of columns selected. Don’t forget that you can select fixed values if no column from the query has the info (INSERT INTO X(p,c,age) SELECT personname, cityname, 23 FROM ...)
The following will work for you:
INSERT INTO employee
SELECT pname, cname FROM person, city
WHERE pname='personOne' AND cname='cityOne';
This is a cross join producing a cartesian product of the two tables (since there is nothing to link the two). It reads slightly oddly, given that you could just as easily have inserted the values directly. But I assume this is because it is a learning exercise.
Please note that there is a typo in your create employee. You are missing a comma before the primary key.

find all stores customers have made purchase

I have a table that contains all the transaction information
(transaction date, customer name, transaction amount, unit amount, store ID).
How to create a table which could properly show the stores that customer has visited? The customer could make purchase from multiple stores, so the relationship between customer and store should be one to many. I am trying to check the stores each customer had visited.
Example:
Transaction table
store_ID
transaction_date
customer_name
transaction_amount
transaction_unit
Expected Output
customer_name
store_list
This is just a Hypothesis problem. Maybe list all shopped store seprately could be better? (But I guess it might create chaos if we want to check customer who made purchase in those stores). Thanks in advance :)
Since you already have a table with all information needed, maybe you are asking for a view, to avoid denormalized/duplicated data.
Example is shown below. Here I will use sqlite syntax (text instead of varchar, etc). We are normalizing data to avoid problems - for example you have shown a table with customer names, but different customers can have the same name, so we use customer_id instead.
First the tables:
create table if not exists store (id integer primary key, name text, address text);
create table if not exists customer (id integer, name text, document integer);
create table if not exists unit (symbol text primary key);
insert into unit values ('USD'),('EUR'),('GBP'); -- static data
create table if not exists transact (id integer primary key, store_id integer references store(id), customer_id integer references customer(id), transaction_date date, amount integer, unit text references unit(symbol));
-- transaction is a reserved word, so transact was used instead
Now the view:
-- store names concatenated with ',' using sqlite's group_concat
create view if not exists stores_visited as select cust.name, group_concat(distinct store.name order by store.id) from transact trx join customer cust on trx.customer_id=cust.id join store on trx.store_id=store.id group by cust.name order by cust.id;
-- second version, regular select with multiple lines
-- create view if not exists stores_visited as select cust.id, cust.name, store.id, store.name from transact trx join customer cust on trx.customer_id=cust.id join store on trx.store_id=store.id;
Sample data:
insert into store values
(1,'store one','address one'),
(2,'store two','address two')
(3,'store three','address three');
insert into customer values
(1,'customer one','custdoc one'),
(2,'customer two','custdoc two'),
(3,'customer three','custdoc three');
insert into transact values
(1,1,1,date('2021-01-01'),1,'USD'),
(2,1,1,date('2021-01-02'),1,'USD'),
(3,1,2,date('2021-01-03'),1,'USD'),
(5,1,3,date('2021-01-04'),1,'USD'),
(6,2,1,date('2021-01-05'),1,'USD'),
(7,2,3,date('2021-01-06'),1,'USD'),
(8,3,2,date('2021-01-07'),1,'USD');
Testing:
select * from stores_visited;
-- customer one|store one,store one,store two
-- customer three|store one,store two
-- customer two|store one

SQL: combine two tables for a query

I want to query two tables at a time to find the key for an artist given their name. The issue is that my data is coming from disparate sources and there is no definitive standard for the presentation of their names (e.g. Forename Surname vs. Surname, Forename) and so to this end I have a table containing definitive names used throughout the rest of my system along with a separate table of aliases to match the varying styles up to each artist.
This is PostgreSQL but apart from the text type it's pretty standard. Substitute character varying if you prefer:
create table Artists (
id serial primary key,
name text,
-- other stuff not relevant
);
create table Aliases (
artist integer references Artists(id) not null,
name text not null
);
Now I'd like to be able to query both sets of names in a single query to obtain the appropriate id. Any way to do this? e.g.
select id from ??? where name = 'Bloggs, Joe';
I'm not interested in revising my schema's idea of what a "name" is to something more structured, e.g. separate forename and surname, since it's inappropriate for the application. Most of my sources don't structure the data, sometimes one or the other name isn't known, it may be a pseudonym, or sometimes the "artist" may be an entity such as a studio.
I think you want:
select a.id
from artists a
where a.name = 'Bloggs, Joe' or
exists (select 1
from aliases aa
where aa.artist = a.id and
aa.name = 'Bloggs, Joe'
);
Actually, if you just want the id (and not other columns), then you can use:
select a.id
from artists a
where a.name = 'Bloggs, Joe'
union all -- union if there could be duplicates
select aa.artist
from aliases aa
where aa.name = 'Bloggs, Joe';

SQL Server - matching attributes query

SQL Server Gurus ...
Currently using MS SQL Server 2016
I know Joe Celko and all SQL purists are squirming at the thought of using bitmasks, but I have a use case in which I need to query for all widgets that contain a set of given attributes.
Each widget may contain several hundred attributes.
The attributes of a widget are either present or not (1 = present, 0 = not
present)
One way I thought to do this is via bitmasks – the attributes to be found (a bitmask) could be ANDed with the attributes of each widget to find matches in a single operation. For example, the widgets table might be:
widets table:
widget_uid Uniqueidentifier
attributes BigInt
SELECT widget_uid
FROM widgets
WHERE ( attributes & bitmask ) = bitmask;
Problem is, using a BigInt for the attributes limits the number of attributes to 64 (a widget can have several hundred attributes), I could group the attributes in chunks of 64 bits, ie:
widets table:
widget_uid Uniqueidentifier
attributes0 BigInt -- Attributes 0-63
attributes1 BigInt -- Attributes 64-127
attributes2 BigInt -- Attributes 128-191
SELECT widget_uid
FROM widgets
WHERE ( attributes0 & bitmask0 ) = bitmask0
AND ( attributes1 & bitmask1 ) = bitmask1
AND ( attributes2 & bitmask2 ) = bitmask2
... but was wondering if anyone has come up with a solution for bit operations using bitmasks with greater than 64 bits – or if other (more efficient?) solutions would exist?
In the use case, the widgets table does contain other columns, but I am only concerned with the attributes matching portion of the query at the moment.
Any and all ideas are welcome - would be interested in knowing how others tackle this particular problem.
Thanks in advance.
We had a similar use case, on a significantly large data set. This was for an e-commerce site with products and attributes. Our case was a bit more complex than here, where we had any possible number of attributes and then values assigned to those attributes. e.g. Color - Red/Green/Blue, Size - S/M/L etc.
We found that associated tables with good indexing was the key in our case. While this may not be an option for you we found this to be the optimal solution for a dynamic data set.
I can code you up an example if you feel it will be helpful.
Edited to add example:
DROP TABLE IF EXISTS #Widgets
DROP TABLE IF EXISTS #Attributes
DROP TABLE IF EXISTS #WidgetAttributes
CREATE TABLE #Widgets (widget_UID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, Name NVARCHAR(255))
CREATE TABLE #Attributes (Attribute_UID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, Name NVARCHAR(255))
CREATE TABLE #WidgetAttributes (widget_UID UNIQUEIDENTIFIER,Attribute_UID UNIQUEIDENTIFIER)
CREATE NONCLUSTERED INDEX ix_WidgetAttribute ON #WidgetAttributes (Attribute_UID) INCLUDE (widget_UID)
INSERT INTO #Widgets (widget_UID, Name) values
( '{c63bea73-2331-4698-82c9-f71845ab8601}', N'Widget 1' ),
( '{a0865b8f-606b-4273-9207-39a8a26016c4}', N'Widget 2' ),
( '{211fe27e-ab98-4b61-83a3-3d006d66db5a}', N'Widget 3' )
INSERT INTO #Attributes (Attribute_UID, Name)
VALUES
( '{99354dc0-d0b2-4919-a887-edf115eeb1bd}', N'Height' ),
( '{136bbe4c-497d-472f-a905-670e4a7805d0}', N'Width' ),
( '{f006f950-30d1-453e-8e09-4f7d140fa3cb}', N'Depth' ),
( '{0d190639-677f-4b75-8d36-1bdac00de132}', N'Colour' )
-- Set links
-- Widget 1 All attributes
-- Widget 2 Height Width
-- Widget 3 Colour
INSERT INTO #WidgetAttributes (widget_UID, Attribute_UID)
SELECT '{c63bea73-2331-4698-82c9-f71845ab8601}',Attribute_UID FROM #Attributes
UNION ALL
SELECT TOP (2) '{a0865b8f-606b-4273-9207-39a8a26016c4}',Attribute_UID FROM #Attributes WHERE Name<> 'Colour'
UNION ALL
SELECT '{211fe27e-ab98-4b61-83a3-3d006d66db5a}',Attribute_UID FROM #Attributes WHERE Name = 'Colour'
-- #SearchAttributes to hold list of attributes you are trying to find
DECLARE #SearchAttributes TABLE (Attribute_UID UNIQUEIDENTIFIER)
INSERT INTO #SearchAttributes
SELECT Attribute_UID FROM #Attributes WHERE Name<> 'Colour'
;WITH cte AS (
SELECT WA.widget_UID, COUNT(1) AttributesPresent FROM #WidgetAttributes WA
JOIN #SearchAttributes SA ON SA.Attribute_UID = WA.Attribute_UID
GROUP BY WA.widget_UID
)
SELECT cte.AttributesPresent
, W.widget_UID
, W.Name
FROM cte
JOIN #Widgets W ON W.widget_UID = cte.widget_UID
ORDER BY cte.AttributesPresent DESC
Gives an output of:
AttributesPresent widget_UID Name
----------------- ------------------------------------ ----------
3 C63BEA73-2331-4698-82C9-F71845AB8601 Widget 1
2 A0865B8F-606B-4273-9207-39A8A26016C4 Widget 2
We used an approach of counting how many attributes were present for each so we not only had the option of "exact match" but also "closest fit".
Using bitmask in databases is wrong approach. Even if you somewhow manage it to work, you will not be able to use indexes to speed up execution.
Use standard solution, this is standard situation. There is standard M:N relationship between Widgets and Attributes (both should be tables, of course). You will add another table that will assign Attributes to Widgets - you can call it WidgetAttributes.
It will have 3 columns: Id, WidgetId, AttributeId
Then you can simply for example get list of Widgets that have Attribute:
select w.*
from Widgets w
inner join WidgetAttributes wa on wa.WidgetId = w.Id
inner join Attributes a on a.Id = wa.AttributeId
where a.AttributeName='xxx'

DB: How to set up a many to many table(s) to handle multiple selectable conditions

I am working on a search filter for a website that will help users find a venue(for get-togethers and ceremonies) that meets their needs. Filters would include such things as: style, amenities, event type, etc. Multiple options in a category can apply to a venue, so a user can select multiple options from style, amenities and event type categories when searching.
My issue is in how I should approach the table design in the database. Currently I have a Venue table with a unique id and basic information, and a number of tables representing each category (style, amenities, etc) where they contain an id and name field.
I know that I need an intermediary table to hold foreign keys, so each option applicable to a category is associated to the venue.
Option 1: Create for each category table a many to many intermediary table with foreign keys to that category and the venue.
Option 2: Create one large intermediary table with foreign keys for every category, as well as the Venue
i.e.
fk_venue
fk_style
fk_amenities
...
I am trying to decide what is more efficient and less of a problem in coding for. Option 1 would require a query to each table which may become complicated to work with, where as option 2 seems easier to query but might have a much larger number of records to handle a venue with many amenities AND event types for example.
This doesn't seem like a new problem but I have had trouble finding resources that detail how best to approach this. We are currently using MSSQL for the DB and are building the site using .net core.
Go with option one. Create a join table to record the many-to-many relationships of each available feature of a venue. Option 2 is very wasteful in terms of storage. Consider a case where you have a venue with only one amenity, when 50 amenities types are available. Also, as I understand what you are suggesting for option 2, you would have to update your database design each time you add an amenity, event_type, or style. That would be a very difficult thing support wise.
In the case of Option 1, some of the tables would be:
Table Name: venue_amenities
Columns: venue_id, amenity_id
Table Name: venue_event_types
Columns: venue_id, event_type_id
Table Name: venue_styles
Columns: venue_id, style_id
When you query everything with a filter, you could query it like:
select distinct
v.venue_id
from venues v
inner join venue_amenities va on v.venue_id = va.venue_id
inner join venue_event_types vet on v.venue_id = vet.venue_id
inner join venue_styles vs on v.venue_id = vs.venue_id
where va.amenity_id in ([selected amenities])
and vet.event_type_id in ([selected event types])
and vs.venue_style in ([selected styles])
Option 3: You could start out with a meta data design. This would allow you to have multiple records per item or entity.
Often these things evolve with the development of tasks, or the evolution of the process and learning the data or the customer understanding some of the finer details that are drawn out as time goes on.
I've seen similar things where people design for hashtags or white lists, searching for that might get you closer to what you are looking for. Here is a working example to get you started.
declare #venue as table(
VenueID int identity(1,1) not null primary key clustered
, Name_ nvarchar(255) not null
, Address_ nvarchar(255) null
);
declare #venueType as table (
VenueTypeID int identity(1,1) not null primary key clustered
, VenueType nvarchar(255) not null
);
declare #venueStuff as table (
VenueStuffID int identity(1,1) not null primary key clustered
, VenueID int not null -- constraint back to venueid
, VenueTypeID int not null -- constraint to dim or lookup table for ... attribute types
, AttributeValue nvarchar(255) not null
);
insert into #venue (Name_)
select 'Bob''s Funhouse'
insert into #venueStuff (VenueID, VenueTypeID, AttributeValue)
select 1, 1, 'Scarrrrry' union all
select 1, 2, 'Food Avaliable' union all
select 1, 3, 'Game tables provided' union all
select 1, 4, 'Creepy';
insert into #venueType (VenueType)
select 'Haunted House Theme' union all
select 'Gaming' union all
select 'Concessions' union all
select 'post apocalyptic';
select a.Name_
, b.AttributeValue
, c.VenueType
from #venue a
join #venueStuff b
on a.VenueID = b.VenueID
join #venueType c
on c.VenueTypeID = b.VenueTypeID