A) suppose a table. that i want to perform a DELETE function on .
This is done in ms access 2003 sql query. NOTE There are many many many many entries. in the few hundred thousand ... to million ranges.so hopefully if there can be a code that deals well with a large dataset. there is 3 types of mood only.
DayNumber Mood
1 Mad
2 Sad
2 Happy
2 Sad
3 Sad
3 Happy
when there are a few moods in one day we only want to keep the most important one.
so lets have a delete function delete for duplicates of days . first deleting the less important moods. importance of moods is Happy>Mad>Sad. So I want:
DayNumber Mood
1 Mad
2 Happy
3 Happy
B) I was first starting at easier without three options for mood jsut two . where Happy>Sad
DayNumber Mood
1 Sad
2 Sad
2 Happy
3 Sad
3 Happy
Where I will Ideally get
DayNumber Mood
1 Sad
2 Happy
3 Happy
It doesnt matter whether you do the first example or secodn for me I'm stuck either way !
This is what i have for the second question so far.. btu it doesnt work cuz i have an aggregate function in the where clause .
DELETE FROM Table
WHERE (Mood='Sad') and (COUNT(DayNumber)=2);
If you have a small & fixed number of moods, you can hardwire the hierarchy like so:
DELETE FROM Table a
WHERE
(a.Mood='Sad'
AND EXISTS
(SELECT 1
FROM Table b
WHERE b.DayNumber = a.DayNumber
AND b.Mood in ('Happy','Mad')))
OR
(a.Mood = 'Mad'
AND EXISTS
(SELECT 1
FROM Table c
WHERE c.DayNumber = a.DayNumber
AND c.Mood = 'Happy')))
DELETE FROM Table where Mood='Sad' AND DayNumber IN (SELECT DayNumber FROM Table WHERE Mood = 'Happy')
Related
I am new to SAS (and Proc SQL) and I am working this out as an exercise to improve my familiarity with SAS, but can't seem to get the correct solution.
I have resort data for two neighboring resorts that contains a guest identifier, resort identifier, when the person was admitted into the resort, and when they left. I have already sorted the data by guest identifier, admission date, and leave date. The data looks something like this:
ID Resort Admission_Date Leave_Date
1 B 15SEP2020 24SEP2020
1 A 24SEP2020 01OCT2020
1 B 25SEP2020 27SEP2020
1 B 28DEC2020 29DEC2020
2 B 07FEB2020 09FEB2020
2 A 09FEB2020 22FEB2020
3 B 26DEC2019 29DEC2019
3 B 30JAN2021 23FEB2021
3 A 23FEB2021 12MAR2021
3 B 13APR2021 16APR2021
3 B 05MAY2021 07MAY2021
My goal here is to identify those guests that went from resort A to resort B (and vice versa). I realize that some guests visited both resorts multiple times. To avoid this issue of multiple resort visits I would like to summarize the data so that we only have the first "switch" between hotels. In other words, once a guest switches from resort A to B (or from B to A) we do not care if they go back to the first resort.
Thus, the end dataset should look something like this:
ID Resort Admission_Date Leave_Date
1 B 15SEP2020 24SEP2020
1 A 24SEP2020 01OCT2020
2 B 07FEB2020 09FEB2020
2 A 09FEB2020 22FEB2020
3 B 26DEC2019 29DEC2019
3 A 23FEB2021 12MAR2021
I realize that this may have a simple solution, but I am not able to come up with it on my own at this time so any help on this is greatly appreciated!
I'd like to know if there's an efficient way to count the number of occurences of a permutation of entities from one side of the m:n relationship. Hopefully, the next example will illustrate properly what I mean:
Let's imagine a base with people and events of some sort. People can organize multiple events and events can be organized by more than one person. What i'd like to count is whether a certain tuple of people have already organized an event or if it's their first time. My first idea to do this is to add an attribute to the m:n relationship
PeopleID | EventID | TimesOrganized
100 1 1
200 1 1
300 2 1
400 3 1
Now, there's an event no. 4 that's again organized by persons 200 and 100 (let's say they should be added in that order). The new table should look like:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
Now, if I added an event organized by persons 200 and 300 it would look like this:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
200 5 1
300 5 1
How would I go about keeping the third column updated properly and what are my options?
I should also add that this a part of the larger project we have for one of the classes and we'll be implementing an application that uses the database in some way, so I might as well move this to application logic if there's no easy way.
I wouldn't recommend tracking a TimesOrganized column as you suggest.
You can simple query it as needed using a COUNT(EventId)..GROUP BY PeopleID.
If you do feel you need to maintain the value somewhere it probably is better normalized to the (presumed) table People. Something like People.TimesOrganized. But then you have to increment it as you go instead of just recalculating as needed.
If you want to count how many many time someone have organized an event the problem is not m:n, but 1:m. Just count the event grouped by the people, that's it, you don't really need to have that column in the table, if it's not needed a lot of time.
That said I find you table a little confusing, there are detail and aggregation mixed, the third one downright wrong: the PeopleID 200 had organized 3 event and the 300 have 2 event.
I have a couple of tables, which I have simplified to the extreme for the purpose of this example.
Table 1, 'Units'
ID UnitName PartNumber
1 UnitX 1
2 UnitX 1
3 UnitX 2
4 UnitX 3
5 UnitX 3
6 UnitY 1
7 UnitY 2
8 UnitY 3
Table 2, 'Parts'
ID PartName Quantity
1 Screw 2
2 Wire 1
3 Ducttape 1
I would like to query on these tables which of these units would be Possible to build, AND if so, which one could be built first ideally to make efficient use of these parts.
Now the question is: can this be done using SQL, or is a background application required/more efficient?
So in this example, it is easy, because only one unit (unit Y) can be built.. But I guess you get the idea. (I'm not looking for a shake and bake answer, just your thoughts on this.)
Thanks
As you present it, it is efficient to use sql. As you described PartNumber column of table Units is a foreign key on ID column of Parts table, so a simple outer join or selecting units that the PartNumber are "NOT IN" the Parts table would give you the units that can not be build.
However if your db schema consists of many non normalised tables, or is very complex without indexes, other "bad" things etc
it could be examined whether specific application code is faster. But i really doubt it for the particular case, the query seems trivial.
In a unique table, I have multiple lines with the same reference information (ID). For the same day, customers had drink and the Appreciation is either 1 (yes) or 0 (no).
Table
ID DAY Drink Appreciation
1 1 Coffee 1
1 1 Tea 0
1 1 Soda 1
2 1 Coffee 1
2 1 Tea 1
3 1 Coffee 0
3 1 Tea 0
3 1 Iced Tea 1
I first tried to see who appreciated a certain drink, which is obviously very simple
Select ID, max(appreciation)
from table
where (day=1 and drink='coffee' and appreciation=1)
or (day=1 and drink='tea' and appreciation=1)
Since I am not even interested in the drink, I used max to remove duplicates and keep only the lane with the highest appreciation.
But what I want to do now is to see who in fact appreciated every drink they had. Again, I am not interested in every lane in the end, but only the ID and the appreciation. How can I modify my where to have it done on every single ID? Adding the ID in the condition is also not and option. I tried switching or for and, but it doesn't return any value. How could I do this?
This should do the trick:
SELECT ID
FROM table
WHERE DRINK IN ('coffee','tea') -- or whatever else filter you want.
group by ID
HAVING MIN(appreciation) > 0
What it does is:
It looks for the minimum appreciation and see to it that that is bigger than 0 for all lines in the group. And the group is the ID, as defined in the group by clause.
as you can see i'm using the having clause, because you can't have aggregate functions in the where section.
Of course you can join other tables into the query as you like. Just be carefull not to add some unwanted filter by joining, which might reduce your dataset in this query.
I am using SQL Server 2005.
I have a site that people can vote on awesome motorcycles. Each time a user votes, there is one for the first bike and one vote against the second bike. Two votes are stored in the database. The vote table looks like this:
VoteID VoteDate BikeID Vote
1 2012-01-12 123 1
2 2012-01-12 125 0
3 2012-01-12 126 0
4 2012-01-12 129 1
I want to tally the votes for each bike quite frequently, say each hour. My idea is to store the tally as a percentage of contest won versus lost on the bike table as an attribute of the bike. So, if a bike won 10 contests and lost 20 contest, they would have a score (tally) of 33. I would tally up daily, weekly, and monthly scores.
BikeID BikeName DailyTally WeeklyTally MonthlyTally
1 Big Dog 5 10 50
2 Big Cat 3 15 40
3 Small Dog 9 8 0
4 Fish Face 19 21 0
Right now, there are about 500 votes per day being cast. We anticipate 2500 - 5000 per day in the next month or so.
What is the best way to tally the data and what is the best way to store it? Should the tallies be on their own table? Should a trigger be used to run a new tally each time a bike is voted on? Should a stored procedure be run hourly to get all tallies?
Any ideas would be very helpful!
Store your VoteDate as a datetime value instead of just date.
For your tallies, you can just make that a view and calculate it on the fly. This should be very simple to do using GROUP BY and DATEPART functions. If you need exact code for how to do this, please open a new question.
For that low volume of rows it doesn't make any sense to store aggregations in a table when you can just calculate them whenever you want to see them and get accurate and immediate results that are up-to-date.
I agree with #JNK try a view or just a normal stored proc to calculate the outputs on the fly. If you find it becomes too slow as your data grows I would investigate other routes then (like caching the data in another table etc). Probably worth keeping it simple to start with; you can always resuse the logic from the SP/VIEW later if you do want to setup a scheduled task.
Edit :
Removed the index view as per #Damien_The_Unbeliever comments its not deterministic and i'm stupid :)