I'd like to know if there's an efficient way to count the number of occurences of a permutation of entities from one side of the m:n relationship. Hopefully, the next example will illustrate properly what I mean:
Let's imagine a base with people and events of some sort. People can organize multiple events and events can be organized by more than one person. What i'd like to count is whether a certain tuple of people have already organized an event or if it's their first time. My first idea to do this is to add an attribute to the m:n relationship
PeopleID | EventID | TimesOrganized
100 1 1
200 1 1
300 2 1
400 3 1
Now, there's an event no. 4 that's again organized by persons 200 and 100 (let's say they should be added in that order). The new table should look like:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
Now, if I added an event organized by persons 200 and 300 it would look like this:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
200 5 1
300 5 1
How would I go about keeping the third column updated properly and what are my options?
I should also add that this a part of the larger project we have for one of the classes and we'll be implementing an application that uses the database in some way, so I might as well move this to application logic if there's no easy way.
I wouldn't recommend tracking a TimesOrganized column as you suggest.
You can simple query it as needed using a COUNT(EventId)..GROUP BY PeopleID.
If you do feel you need to maintain the value somewhere it probably is better normalized to the (presumed) table People. Something like People.TimesOrganized. But then you have to increment it as you go instead of just recalculating as needed.
If you want to count how many many time someone have organized an event the problem is not m:n, but 1:m. Just count the event grouped by the people, that's it, you don't really need to have that column in the table, if it's not needed a lot of time.
That said I find you table a little confusing, there are detail and aggregation mixed, the third one downright wrong: the PeopleID 200 had organized 3 event and the 300 have 2 event.
Related
I have Table User, which consists of two columns - id_user and name_user?
id_user
name_user
1
Vova
2
John
3
Ivan
4
Kate
I need to make relation between two users. If I understood correctly I should make Table relation_ids:
id_user_1
id_user_2
1
2
1
3
1
4
2
3
Tell me please do I need to double relationships between two users (for example 1-2 and 2-1)? Is there another ways to make relation within one table?
I'm creating a simple directory listing page where you can specify what kind of thing you want to list in the directory e.g. a person or a company.
Each user has an UserTypeID and there is a dbo.UserType lookup table. The dbo.UserType lookup table is like this:
UserTypeID | UserTypeParentID | Name
1 NULL Person
2 NULL Company
3 2 IT
4 3 Accounting Software
In the dbo.Users table we have records like this:
UserID | UserTypeID | Name
1 1 Jenny Smith
2 1 Malcolm Brown
3 2 Wall Mart
4 3 Microsoft
5 4 Sage
My SQL (so far) is very simple: (excuse the pseudo-code style)
DECLARE #UserTypeID int
SELECT
*
FROM
dbo.Users u
INNER JOIN
dbo.UserType ut
WHERE
ut.UserTypeID = #UserTypeID
The problem is here is that when people want to search for companies they will enter in '2' as the UserTypeID. But both Microsoft and Sage won't show up because their UserTypeIDs are 3 and 4 respectively. But its the final UserTypeParentID which tells me that they're both Companies.
How could I rewrite the SQL to ask it to return to return records where the UserTypeID = #UserTypeID or where its final UserTypeParentID is also equal to #UserTypeID. Or am I going about this the wrong way?
Schema Change
I would suggest you to break it down this schema a little bit more, to make your queries and life simpler, with this current schema you will end up writing a recursive query every time you want to get simplest data from your Users table, and trust me you dont want to do this to yourself.
I would break down this schema of these tables as follow:
dbo.Users
UserID | UserName
1 | Jenny
2 | Microsoft
3 | Sage
dbo.UserTypes_Type
TypeID | TypeName
1 | Person
2 | IT
3 | Compnay
4 | Accounting Software
dbo.UserTypes
UserID | TypeID
1 | 1
2 | 2
2 | 3
3 | 2
3 | 3
3 | 4
You say that you are "creating" this - excellent because you have the opportunity to reconsider your whole approach.
Dealing with hierarchical data in a relational database is problematic because it is not designed for it - the model you choose to represent it will have a huge impact on the performance and ease of construction of your queries.
You have opted for an Adjacently List model which is great for inserts (and deletes) but a bugger for selects because the query has to effectively reconstruct the hierarchy path. By the way an Adjacency List is the model almost everyone goes for on their first attempt.
Everything is a trade off so you should decide what queries will be most common - selects (and updates) or inserts (and deletes). See this question for starters. Also, since SQL Server 2008, there is a native HeirachyID datatype (see this) which may be of assistance.
Of course, you could store your data in an XML file (in SQL Server or not) which is designed for hierarchical data.
When we store a one to many association in a database, which is a better approach. One - Many mapping in a table or storing the many part as an array. I'm specific to postgres database (constraint)
For example: If we define the relationship as follows
a b
1 - 2
1 - 3
1 - 6
2 - 3
2 - 4
3 - 5
3 - 6
Here, the one part is a and the many part is b (Primary key being a, b)
The same thing can be stored as an array as (similar to an adjacency list).
1 - {2,3,6}
2 - {3,4}
3 - {5,6}
Which of this is more efficient. I may have to do some operations on this such as transitive closure etc. and, the graph may be really huge.
A practical example of the above may be something like connections of a particular profile (LinkedIn connections), or any social graph scenario
In your example the relationship is many to many, not one to many. Multiple a records can be associated with one b and multiple b records can be associated with one a. As such, the correct normalized form is a join table.
Hypothetically, imagine this DB relationship represents one profile "liking" another profile in a social media context. In that case you may want to store additional information; a timestamp of when the "like" was initiated, the degree to which the profiled shruged/liked/loved the other profile, etc. It then becomes apparent that in the array implementation there is nowhere to store this additional data. You need a join table so that each "like" can have its own metadata.
Here is the structure I would recommend:
PK A B
100 1 - 2
200 1 - 3
300 1 - 6
400 2 - 3
500 2 - 4
600 3 - 5
700 3 - 6
Where PK is an auto generated PK, hopefully from a sequence, and A, B are constrained by a unique index. This structure is future proof for eventually dropping the unique index on A, B, a headache I've had to deal with occasionally.
In a unique table, I have multiple lines with the same reference information (ID). For the same day, customers had drink and the Appreciation is either 1 (yes) or 0 (no).
Table
ID DAY Drink Appreciation
1 1 Coffee 1
1 1 Tea 0
1 1 Soda 1
2 1 Coffee 1
2 1 Tea 1
3 1 Coffee 0
3 1 Tea 0
3 1 Iced Tea 1
I first tried to see who appreciated a certain drink, which is obviously very simple
Select ID, max(appreciation)
from table
where (day=1 and drink='coffee' and appreciation=1)
or (day=1 and drink='tea' and appreciation=1)
Since I am not even interested in the drink, I used max to remove duplicates and keep only the lane with the highest appreciation.
But what I want to do now is to see who in fact appreciated every drink they had. Again, I am not interested in every lane in the end, but only the ID and the appreciation. How can I modify my where to have it done on every single ID? Adding the ID in the condition is also not and option. I tried switching or for and, but it doesn't return any value. How could I do this?
This should do the trick:
SELECT ID
FROM table
WHERE DRINK IN ('coffee','tea') -- or whatever else filter you want.
group by ID
HAVING MIN(appreciation) > 0
What it does is:
It looks for the minimum appreciation and see to it that that is bigger than 0 for all lines in the group. And the group is the ID, as defined in the group by clause.
as you can see i'm using the having clause, because you can't have aggregate functions in the where section.
Of course you can join other tables into the query as you like. Just be carefull not to add some unwanted filter by joining, which might reduce your dataset in this query.
For ten years we've been using the same custom sorting on our tables, I'm wondering if there is another solution which involves fewer updates, especially since today we'd like to have a replication/publication date and wouldn't like to have our replication replicate unnecessary entries.I had a look into nested sets, but it doesn't seem to do the job for us.
Base table:
id | a_sort
---+-------
1 10
2 20
3 30
After inserting:
insert into table (a_sort) values(15)
An entry at the second position.
id | a_sort
---+-------
1 10
2 20
3 30
4 15
Ordering the table with:
select * from table order by a_sort
and resorting all the a_sort entries, updating at least id=(2,3,4)
will of course produce the desired output:
id | a_sort
---+-------
1 10
4 20
2 30
3 40
The column names, the column count, datatypes, a possible join, possible triggers or the way the resorting is done is/are irrelevant to the problem.Also we've found some pretty neat ways to do this task fast.
only; how the heck can we reduce the updates in the db to 1 or 2 max.
Seems like an awfully common problem.
The captain obvious in me thougth once "use an a_sort float(53), insert using a fixed value of ordervaluefirstentry+abs(ordervaluefirstentry-ordervaluenextentry)/2".
But this would only allow around 1040 "in between" entries - so never resorting seems a bit problematic ;)
You really didn't describe what you're doing with this data, so forgive me if this is a crazy idea for your situation:
You could make a sort of 'linked list' where instead of a column of values, you have a column for the 'next highest valued' id. This would decrease the number of updates to a maximum of 2.
You can make it doubly linked and also have a column for next lowest, which would bring the maximum number of updates to 3.
See:
http://en.wikipedia.org/wiki/Linked_list