I am new to SAS (and Proc SQL) and I am working this out as an exercise to improve my familiarity with SAS, but can't seem to get the correct solution.
I have resort data for two neighboring resorts that contains a guest identifier, resort identifier, when the person was admitted into the resort, and when they left. I have already sorted the data by guest identifier, admission date, and leave date. The data looks something like this:
ID Resort Admission_Date Leave_Date
1 B 15SEP2020 24SEP2020
1 A 24SEP2020 01OCT2020
1 B 25SEP2020 27SEP2020
1 B 28DEC2020 29DEC2020
2 B 07FEB2020 09FEB2020
2 A 09FEB2020 22FEB2020
3 B 26DEC2019 29DEC2019
3 B 30JAN2021 23FEB2021
3 A 23FEB2021 12MAR2021
3 B 13APR2021 16APR2021
3 B 05MAY2021 07MAY2021
My goal here is to identify those guests that went from resort A to resort B (and vice versa). I realize that some guests visited both resorts multiple times. To avoid this issue of multiple resort visits I would like to summarize the data so that we only have the first "switch" between hotels. In other words, once a guest switches from resort A to B (or from B to A) we do not care if they go back to the first resort.
Thus, the end dataset should look something like this:
ID Resort Admission_Date Leave_Date
1 B 15SEP2020 24SEP2020
1 A 24SEP2020 01OCT2020
2 B 07FEB2020 09FEB2020
2 A 09FEB2020 22FEB2020
3 B 26DEC2019 29DEC2019
3 A 23FEB2021 12MAR2021
I realize that this may have a simple solution, but I am not able to come up with it on my own at this time so any help on this is greatly appreciated!
Related
This has been driving me and my team up the wall. I cannot compose a query that will strict match a single record that has a specific permutation of look ups.
We have a single lookup table
room_member_lookup:
room | member
---------------
A | Michael
A | Josh
A | Kyle
B | Kyle
B | Monica
C | Michael
I need to match a room with an exact list of members but everything else I've tried on stack overflow will still match room A even if I ask for a room with ONLY Josh and Kyle
I've tried queries like
SELECT room FROM room_member_lookup
WHERE member IN (Josh, Michael)
GROUP BY room
HAVING COUNT(1) = 2
However this will still return room A even though that has 3 members I need a exact member permutation and that matches the room even not partials.
SELECT room
FROM room_member_lookup a
WHERE member IN ('Monica', 'Kyle')
-- Make sure that the room 'a' has exactly two members
and (select count(*)
from room_member_lookup b
where a.room=b.room)=2
GROUP BY room
-- and both members are in that room
HAVING COUNT(1) = 2
Depending on the SQL dialect, one can build a dynamic table (CTE or select .. union all) to hold the member set (Monica and Kyle, for example), and then look for set equivalence using MINUS/EXCEPT sql operators.
The situation: I have a database storing biological specimen data. One table contains data about each specimen. Each specimen has between 1 and 8 parts, which are ordered.
I would like to enumerate each subpart in a query, using the specimen id and the number of parts. So if I have 2 specimens, A and B, and A has 2 parts and B has 3 parts, I want the result:
Parts:
A - 1
A - 2
B - 1
B - 2
B - 3
I realize that this is probably a trivial task, but I don't know the correct terminology to talk about it in a way that help pages and Google will understand. Thank you.
Edit to add thoughts: If I were dealing with something like this in a non-SQL context, I'd use a for loop to iterate the enumeration process over each specimen, but I don't understand how to implement anything remotely similar in SQL.
You mentioned "main table" which implies there's some other table for the sub parts. What you're after is likely a simple JOIN:
SELECT
*
FROM
maintable
INNER JOIN
subtable
ON
subtable.mainid = maintable.id
If you want an exact query, post a screenshot of your database tables and their column names and any relationships
When we store a one to many association in a database, which is a better approach. One - Many mapping in a table or storing the many part as an array. I'm specific to postgres database (constraint)
For example: If we define the relationship as follows
a b
1 - 2
1 - 3
1 - 6
2 - 3
2 - 4
3 - 5
3 - 6
Here, the one part is a and the many part is b (Primary key being a, b)
The same thing can be stored as an array as (similar to an adjacency list).
1 - {2,3,6}
2 - {3,4}
3 - {5,6}
Which of this is more efficient. I may have to do some operations on this such as transitive closure etc. and, the graph may be really huge.
A practical example of the above may be something like connections of a particular profile (LinkedIn connections), or any social graph scenario
In your example the relationship is many to many, not one to many. Multiple a records can be associated with one b and multiple b records can be associated with one a. As such, the correct normalized form is a join table.
Hypothetically, imagine this DB relationship represents one profile "liking" another profile in a social media context. In that case you may want to store additional information; a timestamp of when the "like" was initiated, the degree to which the profiled shruged/liked/loved the other profile, etc. It then becomes apparent that in the array implementation there is nowhere to store this additional data. You need a join table so that each "like" can have its own metadata.
Here is the structure I would recommend:
PK A B
100 1 - 2
200 1 - 3
300 1 - 6
400 2 - 3
500 2 - 4
600 3 - 5
700 3 - 6
Where PK is an auto generated PK, hopefully from a sequence, and A, B are constrained by a unique index. This structure is future proof for eventually dropping the unique index on A, B, a headache I've had to deal with occasionally.
I have a couple of tables, which I have simplified to the extreme for the purpose of this example.
Table 1, 'Units'
ID UnitName PartNumber
1 UnitX 1
2 UnitX 1
3 UnitX 2
4 UnitX 3
5 UnitX 3
6 UnitY 1
7 UnitY 2
8 UnitY 3
Table 2, 'Parts'
ID PartName Quantity
1 Screw 2
2 Wire 1
3 Ducttape 1
I would like to query on these tables which of these units would be Possible to build, AND if so, which one could be built first ideally to make efficient use of these parts.
Now the question is: can this be done using SQL, or is a background application required/more efficient?
So in this example, it is easy, because only one unit (unit Y) can be built.. But I guess you get the idea. (I'm not looking for a shake and bake answer, just your thoughts on this.)
Thanks
As you present it, it is efficient to use sql. As you described PartNumber column of table Units is a foreign key on ID column of Parts table, so a simple outer join or selecting units that the PartNumber are "NOT IN" the Parts table would give you the units that can not be build.
However if your db schema consists of many non normalised tables, or is very complex without indexes, other "bad" things etc
it could be examined whether specific application code is faster. But i really doubt it for the particular case, the query seems trivial.
A) suppose a table. that i want to perform a DELETE function on .
This is done in ms access 2003 sql query. NOTE There are many many many many entries. in the few hundred thousand ... to million ranges.so hopefully if there can be a code that deals well with a large dataset. there is 3 types of mood only.
DayNumber Mood
1 Mad
2 Sad
2 Happy
2 Sad
3 Sad
3 Happy
when there are a few moods in one day we only want to keep the most important one.
so lets have a delete function delete for duplicates of days . first deleting the less important moods. importance of moods is Happy>Mad>Sad. So I want:
DayNumber Mood
1 Mad
2 Happy
3 Happy
B) I was first starting at easier without three options for mood jsut two . where Happy>Sad
DayNumber Mood
1 Sad
2 Sad
2 Happy
3 Sad
3 Happy
Where I will Ideally get
DayNumber Mood
1 Sad
2 Happy
3 Happy
It doesnt matter whether you do the first example or secodn for me I'm stuck either way !
This is what i have for the second question so far.. btu it doesnt work cuz i have an aggregate function in the where clause .
DELETE FROM Table
WHERE (Mood='Sad') and (COUNT(DayNumber)=2);
If you have a small & fixed number of moods, you can hardwire the hierarchy like so:
DELETE FROM Table a
WHERE
(a.Mood='Sad'
AND EXISTS
(SELECT 1
FROM Table b
WHERE b.DayNumber = a.DayNumber
AND b.Mood in ('Happy','Mad')))
OR
(a.Mood = 'Mad'
AND EXISTS
(SELECT 1
FROM Table c
WHERE c.DayNumber = a.DayNumber
AND c.Mood = 'Happy')))
DELETE FROM Table where Mood='Sad' AND DayNumber IN (SELECT DayNumber FROM Table WHERE Mood = 'Happy')