I am new to SAS (and Proc SQL) and I am working this out as an exercise to improve my familiarity with SAS, but can't seem to get the correct solution.
I have resort data for two neighboring resorts that contains a guest identifier, resort identifier, when the person was admitted into the resort, and when they left. I have already sorted the data by guest identifier, admission date, and leave date. The data looks something like this:
ID Resort Admission_Date Leave_Date
1 B 15SEP2020 24SEP2020
1 A 24SEP2020 01OCT2020
1 B 25SEP2020 27SEP2020
1 B 28DEC2020 29DEC2020
2 B 07FEB2020 09FEB2020
2 A 09FEB2020 22FEB2020
3 B 26DEC2019 29DEC2019
3 B 30JAN2021 23FEB2021
3 A 23FEB2021 12MAR2021
3 B 13APR2021 16APR2021
3 B 05MAY2021 07MAY2021
My goal here is to identify those guests that went from resort A to resort B (and vice versa). I realize that some guests visited both resorts multiple times. To avoid this issue of multiple resort visits I would like to summarize the data so that we only have the first "switch" between hotels. In other words, once a guest switches from resort A to B (or from B to A) we do not care if they go back to the first resort.
Thus, the end dataset should look something like this:
ID Resort Admission_Date Leave_Date
1 B 15SEP2020 24SEP2020
1 A 24SEP2020 01OCT2020
2 B 07FEB2020 09FEB2020
2 A 09FEB2020 22FEB2020
3 B 26DEC2019 29DEC2019
3 A 23FEB2021 12MAR2021
I realize that this may have a simple solution, but I am not able to come up with it on my own at this time so any help on this is greatly appreciated!
I have 2 tables: Customers and Actions, where each customer has uniqe ID (which can be found in each table).
Part of the customers became club members at a specific date (change between the customers). I'm trying to summarize their purchases until that date, and to get those who purchase more than (for example) 200 until they become club members.
For example, I can have the following customer:
custID purchDate purchAmount
1 2015-05-12 100
1 2015-07-12 150
1 2015-12-29 320
Now, assume that custID=1 became a club member at 2015-12-25; in that case, I'd like to get SUM(purchAmount)=250 (pay attention that I'd like to get this customer because 250>200).
I tried the following:
SELECT cust.custID, SUM(purchAmount)totAmount
FROM customers cust
JOIN actions act
ON cust.custID=act.custID
WHERE act.clubMember=1
AND cust.purchDate<act.clubMemberDate
GROUP BY cust.custID
HAVING totAmount>200;
Is it the right way to "attack" this question, or should I use something like while loop over the clubMemberDate (which telling the truth-I don't know how to do)?
I'm working with Teradata.
Your help will be appreciated.
I am new to SSAS and have a situation where I need help.
I have a many to many relationship table which contains info about the competitors of a property and the type of competitor it is at any given date.
So, something like this:
PID Type CompID Date
1 A 1 1/1/2001
1 A 2 1/1/2001
1 B 1 2/1/2001
1 B 1 3/1/2001
2 A 1 1/1/2001
2 B 1 1/1/2001
Now I need to include this in the cube and relate it to the main fact table. I have defined the relationship as a many to many but while writing the query to retrieve the information using MDX, I am stuck.
What I need is all the measures for a given property and all the aggregated measures of all its competitors of a given type at a given date.
So, given a proeprty ID, I need to identify the list of its competitors of a given type and on a given date and then I have to aggregate the measures for all these competitor properties.
I am stuck at this place where I have to identify all the competitors of a given property.
e.g. if I fire this query:
Select
{
TYNBC,YOYNBC_Improvement,LYNBC,TYADR,YOYADR_Improvement,LYADR
}
on 1,
Stay_DATE.Month.Month on 0
FROM Cube1
where {Hotel.Hotel_Key.&[480]}*{Stay_DATE.Hierarchy.Year.&[2015]}
The result would be the measures for a given property.
What I want in the result is all the above measures and the same measures for the competitors of the property 480 for a given date and a given competitor type. The issue I am facing is in identifying the competitors of the property in mdx because competitor table is added as a factless fact with many to many relationship in the cube. So, how do I retrieve the list of competitor properties when there is no hierarchy defined as it is not a dimension.
Thanks for your help in advance.
Using Access 2010 and its version of SQL, I am trying to find a way to relate two tables in a query where I do not have strict, unique values in each table, using concatenated fields that are mostly unique, then matching each unmatched next record (measured by a date field or the record id) in each table.
My business receives checks that we do not cash ourselves, but rather forward to a client for processing. I am trying to build a query that will match the checks that we forward to the client with a static report that we receive from the client indicating when checks were cashed. I have no control over what the client reports back to us.
When we receive a check, we record the name of the payor, the date that we received the check, the client's account number, the amount of the check, and some other details in a table called "Checks". We add a matching field which comes as close as we can get to a unique identifier to match against the client reports (more on that in a minute).
Checks:
ID Name Acct Amt Our_Date Match
__ ____ ____ ____ _____ ______
1 Dave 1001 10.51 2/14/14 1001*10.51
2 Joe 1002 12.14 2/28/14 1002*12.14
3 Sam 1003 50.00 3/01/14 1003*50.00
4 Sam 1003 50.00 4/01/14 1003*50.00
5 Sam 1003 50.00 5/01/14 1003*50.00
The client does not report back to us the date that WE received the check, the check number, or anything else useful for making unique matches. They report the name, account number, amount, and the date of deposit. The client's report comes weekly. We take that weekly report and append the records to make a second table out of it.
Return:
ID Name Acct Amt Their_Date Unique1
__ ____ ____ ____ _____ ______
355 Dave 1001 10.51 3/25/14 1001*10.51
378 Joe 1002 12.14 4/04/14 1002*12.14
433 Sam 1003 50.00 3/08/14 1003*50.00
599 Sam 1003 50.00 5/11/14 1003*50.00
Instead of giving us back the date we received the check, we get back the date that they processed it. There is no way to make a rule to compare the two dates, because the deposit dates vary wildly. So the closest thing I can get for a unique identifier is a concatenated field of the account number and the amount.
I am trying to match the records on these two tables so that I know when the checks we forward get deposited. If I do a simple join using the two concatenated fields, it works most of the time, but we run into a problem with payors like Sam, above, who is making regular monthly payments of the same amount. In a simple join, if one of Sam's payments appears in the Return table, it matches to all of the records in the Checks table.
To limit that behavior and match the first Sam entry on the Return table to the first Sam entry on the Checks table, I wrote the following query:
SELECT return.*, checks.*
FROM return, checks
WHERE (( ( checks.id ) = (SELECT TOP 1 id
FROM checks
WHERE match = return.unique1
ORDER BY [our_date]) ));
This works when there is only one of Sam's records in the Return table. The problem comes when the second entry for Sam hits the Return table (Return.ID 599) as the client's weekly reports are added to the table. When that happens, the query appropriately (for my purposes) only lists that two of Sam's checks have been processed, but uses the "Top 1 ID" record to supply the row's details from the Return table:
Checks_Return_query:
Checks.ID Name Acct Amt Our_Date Their_Date Return.ID
__ ____ ____ ____ _____ ______ ________
1 Dave 1001 10.51 2/14/14 3/25/14 355
2 Joe 1002 12.14 2/28/14 4/04/14 378
3 Sam 1003 50.00 3/01/14 3/08/14 433
4 Sam 1003 50.00 4/01/14 3/08/14 433
In other words, the query repeats the Return table info for record Return.ID 433 instead of matching Return.ID 599, which is I guess what I should expect from the TOP 1 operator.
So I am trying to figure out how I can get the query to take the two concatenated fields in Checks and Return, compare them to find matching sets, then select the next unmatched record in Checks (with "next" being measured either by the ID or Our_Date) with the next unmatched record in Return (again, with "next" being measured either by the ID or Their_Date).
I spent many hours in a dark room turning the query into various joins, and back again, looking at functions like WHERE NOT IN, WHERE NOT EXISTS, FIRST() NEXT() MIN() MAX(). I am afraid I am way over my head.
I am beginning to think that I may have a structural problem, and may need to write the "matched" records in this query to another table of completed transactions, so that I can differentiate between "matched" and "unmatched" records better. But that still wouldn't help me if two of Sam's transactions are on the same weekly report I get from my client.
Are there any suggestions as to query functions I should look into for further research, or confirmation that I am barking up the wrong tree?
Thanks in advance.
I'd say that you really need another table of completed transactions, it could be temporary table.
Regarding your fears "... if two of Sam's transactions are on the same weekly report ", you can use cursor in order to write records "one-by-one" instead of set based transaction.
Example:
(using a comma, to show columns & "A_", "B_", "C_" to show tables)
I currently have...
A_John, A_Doe, B_MemStartDate, C_Date,
A_John, A_Doe, B_MemStartDate, C_Date,
A_John, A_Doe, B_MemStartDate, C_Date,
I would like Table "C" columns to be pulled by Course but shown as the "Date" column... in other words, example, where Course = 1, use Date... etc...
The table would have each member's course history.
MemberID, CourseID, Date
JohnDoe, Course1, 10-10-13
JohnDoe, Course2, 10-11-13
JohnDoe, Course3, 10-12-13
The table C is the one to many of course, with the goal to show the date that the course was taken as I would title the columns with the 3 different courses I want to show... (I want to pull only 3 different courses)
I would like to have them in a row...
A_John, A_Doe, B_Start Date, C_Course1Date, C_Course2Date, C_Course3Date
Sorry for the lack of experience in my question, but I usually get by with "copy/paste"...LOL
Keeping in mind I am using Access... can I do this?
Clarifying... sorry don't know how to do tables in basic html so have commas
Have this. Pulling from 3 different tables where the member# is unique and joined.
MEMBER, STARTDATE, STATUS, COURSE, COURSEDATE
JohnSmith, 08-01-2013, Active, Workshop1, 10-20-2013
JohnSmith, 08-01-2013, Active, Workshop2, 10-13-2013
JohnSmith, 08-01-2013, Active, Workshop3, 10-28-2013
LaraBentt, 12-01-2012, Inactive, Workshop1, 02-20-2012
LaraBentt, 12-01-2012, Inactive, Workshop2, 02-13-2012
LaraBentt, 12-01-2012, Inactive, Workshop3, 02-28-2012
Want this...
MEMBER, STARTDATE, STATUS, WORKSHOP1, WORKSHOP2, WORKSHOP3
JohnSmith, 08-01-2013, Active, 10-20-2013, 10-13-2013, 10-28-2013
LaraBentt, 12-01-2012, Inactive, 02-20-2012, 02-13-2012, 02-28-2012
Tables columns are basically like this...
Table 1 - tblMember (one - one)
MEMBER, STARTDATE
Table 2 - tblRegStatus (one - one)
MEMBER, STATUS
Table 3 - tblCourses (one to many)
MEMBER, COURSE, COURSEDATE
Hope this explains it better!
Your amendment to your question made it much more clear.
One of the problems you are going to encounter is if you have someone that has registered for the same course, twice. There has to be the understanding that you are only showing the most recent registration per person. Now that we understand that caveat - lets look at how you can build this.
The simplest way of building this is to change the query type from Select to Crosstab. This is done through the Access interface.
For each of the fields in the query, set the Total and Crosstab values to:
MemberName: Total = *Group By* Crosstab = *Row Heading*
StartDate: Total = *Group By* Crosstab = *Row Heading*
Status: Total = *Group By* Crosstab = *Row Heading*
CourseName: Total = *Group By* Crosstab = *Column Heading*
CourseDate: Total = *Max* Crosstab = *Value*
This will result in query results such as:
MemberName StartDate Status Biology English Math
---------- --------- ------ ------- ------- ----
John 10/1/2013 Active 11/14/2013 12/1/2013
Matthew 9/1/2013 Inactive 1/1/2013
Peter 8/7/2013 Active 1/1/2013 4/1/2013
Sam 1/14/2013 Inactive 11/14/2013
William 5/19/2013 Active 1/1/2013 4/1/2013
The advantage to the crosstab is that it will automatically add columns as there is data.
You have to construct your query (the joins) to give you the data that you want however. For example - do you want to show members that have no courses on the list? Do you want to show courses that have no enrollment?
Microsoft has a page on using crosstab queries:
http://office.microsoft.com/en-us/access-help/make-summary-data-easier-to-read-by-using-a-crosstab-query-HA010229577.aspx
If you want to hard code your three columns, you can also manually pivot the data. Again - keep in mind that we are using the Max aggregate, so only the latest courses will show. With a manual pivot you also need to remember that it will be limited to the data you are requesting.
Take your select query, right click and click "Totals" to show the Totals row. You can also click on the Totals icon in the ribbon under Query Tools / Design.
Set up your three columns:
MemberName: Total = *Group By*
StartDate: Total = *Group By*
Status: Total = *Group By*
Now add in a separate field for each course you wish to include. In the "Field" definition enter the following.
Science: IIf([CourseName]="Science",[CourseDate],Null)
Biology: IIf([CourseName]="Biology",[CourseDate],Null)
Math: IIf([CourseName]="Math",[CourseDate],Null)
English: IIf([CourseName]="English",[CourseDate],Null)
Set the "Total" definition value to "Max".
That should get you the results you are looking for.