Create a sequence for dates with repeats - sequence

I have a list of days (numbered 195-720) and each day has multiple observations. I would ultimately like to determine which of these days are weekdays and which are weekend days. I would be able to do this if I could just assign the digits 1-7 to each of the days. Currently, the data looks like this:
Day Household ID Hour of Day
195 1 1
195 1 2
195 1 3
195 1 4
196 1 1
196 1 2
196 1 3
197 1 1
197 1 2
It is perhaps important to note that there is not a consistent number of observations for each day (e.g. 4 observations for day 195, 3 observations for day 196, 2 observations for day 197).
I know that Day 195 is a Tuesday, which for simplicity's sake I would like to code as equal to "2" (Wednesday=3, Thursday=4, etc).
Thus, I would like to get the following output:
Day Household ID Hour of Day DAY OF WEEK
195 1 1 2
195 1 2 2
195 1 3 2
195 1 4 2
196 1 1 3
196 1 2 3
196 1 3 3
197 1 1 4
197 1 2 4
After looking through Stata documentation, I considered using DYM/DMY. However, this does not work because I do not have an original "date" variable to work from. Instead, I just have a number "195" which corresponds to Tuesday, July 12.
I wanted to use something like:
bysort day: egen Hour_of_Day = seq(2, 3, 4, 5, 6, 7, 1)
However, Stata tells me that this has a syntax error. Note: I start with "2" because the my first day (195) is a Tuesday. I also considered commands like carryforward or mod(x,y) or fill.
Does anyone know how I can set the sequence to fill the same for each day? How can I fix this code to achieve the desired output?

If you know that 195 was Tuesday then the reverse engineering is straightforward. 193 must have been Sunday and 199 Saturday.
Let's look at a sandbox with that week, 193 to 199. Our first guess at a day of week function of our own will use the mod() function (not a command). This paper is a short riff on its applications in Stata.
. clear
. set obs 7
number of observations (_N) was 0, now 7
. gen day = 192 + _n
. gen dow = mod(day, 7)
. list, sep(0)
+-----------+
| day dow |
|-----------|
1. | 193 4 |
2. | 194 5 |
3. | 195 6 |
4. | 196 0 |
5. | 197 1 |
6. | 198 2 |
7. | 199 3 |
+-----------+
Stata's convention for day of week is that 0 is Sunday and 6 is Saturday. That is just a rotation away.
. gen DOW = mod(day + 3, 7)
. list, sep(0)
+-----------------+
| day dow DOW |
|-----------------|
1. | 193 4 0 |
2. | 194 5 1 |
3. | 195 6 2 |
4. | 196 0 3 |
5. | 197 1 4 |
6. | 198 2 5 |
7. | 199 3 6 |
+-----------------+
You can check with Stata's own dow() function that another way to get DOW above is
gen StataDOW = dow(day - 2)
So an indicator for weekday is (for example)
gen weekday = !inlist(DOW, 0, 6)
or
gen weekday = inrange(DOW, 1, 5)
or
gen weekday = !inlist(dow, 4, 3)
using the first variable created.
As it happens, I originally wrote egen, seq(). Your syntax is indeed not legal, as seq() is the syntax, but nothing is ever placed within the parentheses. I wouldn't use egen here, if only because the right answers are essentially impossible with multiple occurrences (as you do have) and also unlikely if you have gaps in the data. The reasoning here is, or should be, robust to repetitions and gaps.

Related

Pandas: to get mean for each data category daily [duplicate]

I am a somewhat beginner programmer and learning python (+pandas) and hope I can explain this well enough. I have a large time series pd dataframe of over 3 million rows and initially 12 columns spanning a number of years. This covers people taking a ticket from different locations denoted by Id numbers(350 of them). Each row is one instance (one ticket taken).
I have searched many questions like counting records per hour per day and getting average per hour over several years. However, I run into the trouble of including the 'Id' variable.
I'm looking to get the mean value of people taking a ticket for each hour, for each day of the week (mon-fri) and per station.
I have the following, setting datetime to index:
Id Start_date Count Day_name_no
149 2011-12-31 21:30:00 1 5
150 2011-12-31 20:51:00 1 0
259 2011-12-31 20:48:00 1 1
3015 2011-12-31 19:38:00 1 4
28 2011-12-31 19:37:00 1 4
Using groupby and Start_date.index.hour, I cant seem to include the 'Id'.
My alternative approach is to split the hour out of the date and have the following:
Id Count Day_name_no Trip_hour
149 1 2 5
150 1 4 10
153 1 2 15
1867 1 4 11
2387 1 2 7
I then get the count first with:
Count_Item = TestFreq.groupby([TestFreq['Id'], TestFreq['Day_name_no'], TestFreq['Hour']]).count().reset_index()
Id Day_name_no Trip_hour Count
1 0 7 24
1 0 8 48
1 0 9 31
1 0 10 28
1 0 11 26
1 0 12 25
Then use groupby and mean:
Mean_Count = Count_Item.groupby(Count_Item['Id'], Count_Item['Day_name_no'], Count_Item['Hour']).mean().reset_index()
However, this does not give the desired result as the mean values are incorrect.
I hope I have explained this issue in a clear way. I looking for the mean per hour per day per Id as I plan to do clustering to separate my dataset into groups before applying a predictive model on these groups.
Any help would be grateful and if possible an explanation of what I am doing wrong either code wise or my approach.
Thanks in advance.
I have edited this to try make it a little clearer. Writing a question with a lack of sleep is probably not advisable.
A toy dataset that i start with:
Date Id Dow Hour Count
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
04/01/2015 1234 1 11 1
I now realise I would have to use the date first and get something like:
Date Id Dow Hour Count
12/12/2014 1234 0 9 5
19/12/2014 1234 0 9 3
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 4
04/01/2015 1234 1 11 1
And then calculate the mean per Id, per Dow, per hour. And want to get this:
Id Dow Hour Mean
1234 0 9 4
1234 0 10 1
1234 1 11 2.5
I hope this makes it a bit clearer. My real dataset spans 3 years with 3 million rows, contains 350 Id numbers.
Your question is not very clear, but I hope this helps:
df.reset_index(inplace=True)
# helper columns with date, hour and dow
df['date'] = df['Start_date'].dt.date
df['hour'] = df['Start_date'].dt.hour
df['dow'] = df['Start_date'].dt.dayofweek
# sum of counts for all combinations
df = df.groupby(['Id', 'date', 'dow', 'hour']).sum()
# take the mean over all dates
df = df.reset_index().groupby(['Id', 'dow', 'hour']).mean()
You can use the groupby function using the 'Id' column and then use the resample function with how='sum'.

How to write the query to make report by month in sql

I have the receiving and sending data for whole year. so i want to built the monthly report base on that data with the rule is Fisrt in first out. It means is the first receiving will be sent out first ...
DECLARE #ReceivingTbl AS TABLE(Id INT,ProId int, RecQty INT,ReceivingDate DateTime)
INSERT INTO #ReceivingTbl
VALUES (1,1001,210,'2019-03-12'),
(2,1001,315,'2019-06-15'),
(3,2001,500,'2019-04-01'),
(4,2001,10,'2019-06-15'),
(5,1001,105,'2019-07-10')
DECLARE #SendTbl AS TABLE(Id INT,ProId int, SentQty INT,SendMonth int)
INSERT INTO #SendTbl
VALUES (1,1001,50,3),
(2,1001,100,4),
(3,1001,80,5),
(4,1001,80,6),
(5,2001,200,6)
SELECT * FROM #ReceivingTbl ORDER BY ProId,ReceivingDate
SELECT * FROM #SendTbl ORDER BY ProId,SendMonth
Id ProId RecQty ReceivingDate
1 1001 210 2019-03-12
2 1001 315 2019-06-15
5 1001 105 2019-07-10
3 2001 500 2019-04-01
4 2001 10 2019-06-15
Id ProId SentQty SendMonth
1 1001 50 3
2 1001 100 4
3 1001 80 5
4 1001 80 6
5 2001 200 6
--- And the below is what i want:
Id ProId RecQty ReceivingDate ... Mar Apr May Jun
1 1001 210 2019-03-12 ... 50 100 60 0
2 1001 315 2019-06-15 ... 0 0 20 80
5 1001 105 2019-07-10 ... 0 0 0 0
3 2001 500 2019-04-01 ... 0 0 0 200
4 2001 10 2019-06-15 ... 0 0 0 0
Thanks!
Your question is not clear to me.
If you want to purely use the FIFO approach, therefore ignore any data the table contains, you necessarely need to order by ID, which in your example you are providing, and looks like it is in order of insert.
The first line inserted should be also the first line appearing in the select (FIFO), in order to do so you have to use:
ORDER BY Id ASC
Which will place the lower value of the ID first (1, 2, 3, ...)
To me though, this doesn't make much sense, so pay attention to the meaning o the data you actually have and leverage dates like ReceivingDate, and order by that, maybe even filtering by month of the date, below an example for January data:
WHERE MONTH(ReceivingDate) = 1

How to calculate tiered pricing using PostgreSQL

I'm trying to calculate tiered rates for a stay at some lodging. Lets say we have a weekly, half week, and daily rate for a property.
period_name | nights | rate
-------------------------------------
WEEK | 7 | 100
HALFWEEK | 3 | 50
DAY | 1 | 25
How would I query this with a total number of nights and get a break down of what periods qualify, going from longest to shortest? Some examples results
10 nights
We break 10 into (7 days) + (3 days). The 7 days will be at the WEEK rate (100). The 3 days will be at the HALFWEEK rate (50). Here it qualifies for (1 WEEK # 100) + (1 HALFWEEK # 50)
period_name | nights | rate | num | subtotal
----------------------------------------------
WEEK | 7 | 100 | 1 | 100
HALFWEEK | 3 | 50 | 1 | 50
4 nights
We break 4 into (3 days) + (1 day). The 3 days will be at the HALFWEEK rate (50). The 1 day will be at the DAY rate (25). Here it qualifies for (1 HALFWEEK # 50) + (1 DAY # 25)
period_name | nights | rate | num | subtotal
----------------------------------------------
HALFWEEK | 3 | 50 | 1 | 50
DAY | 1 | 25 | 1 | 25
16 nights
We break 16 into (14 days) + (2 days). The 14 days will be at the WEEK rate (multiplied by 2), (100 * 2). The 2 days will be at the DAY rate (2 x 25). Here it qualifies for (2 WEEK # 100) + (2 DAY # 25)
period_name | nights | rate | num | subtotal
----------------------------------------------
WEEK | 7 | 100 | 2 | 200
DAY | 1 | 25 | 2 | 50
I thought about using the lag window function, but now sure how I'd keep track of the days already applied by the previous period.
You can do this with a CTE RECURSIVE query.
http://sqlfiddle.com/#!17/0ac709/1
Tier table (which can be dynamically expanded):
id name days rate
-- --------- ---- ----
1 WEEK 7 100
2 DAYS 1 25
3 HALF_WEEK 3 50
4 MONTH 30 200
Days data:
id num
-- ---
1 10
2 31
3 30
4 19
5 14
6 108
7 3
8 5
9 1
10 2
11 7
Result:
num_id num days total_price
------ --- ----------------------------------------------- -----------
1 10 {"MONTH: 0","WEEK: 1","HALF_WEEK: 1","DAYS: 0"} 150
2 31 {"MONTH: 1","WEEK: 0","HALF_WEEK: 0","DAYS: 1"} 225
3 30 {"MONTH: 1","WEEK: 0","HALF_WEEK: 0","DAYS: 0"} 200
4 19 {"MONTH: 0","WEEK: 2","HALF_WEEK: 1","DAYS: 2"} 300
5 14 {"MONTH: 0","WEEK: 2","HALF_WEEK: 0","DAYS: 0"} 200
6 108 {"MONTH: 3","WEEK: 2","HALF_WEEK: 1","DAYS: 1"} 875
7 3 {"MONTH: 0","WEEK: 0","HALF_WEEK: 1","DAYS: 0"} 50
8 5 {"MONTH: 0","WEEK: 0","HALF_WEEK: 1","DAYS: 2"} 100
9 1 {"MONTH: 0","WEEK: 0","HALF_WEEK: 0","DAYS: 1"} 25
10 2 {"MONTH: 0","WEEK: 0","HALF_WEEK: 0","DAYS: 2"} 50
11 7 {"MONTH: 0","WEEK: 1","HALF_WEEK: 0","DAYS: 0"} 100
The idea:
First I took this query to calculate your result for one value (19):
SELECT
days / 7 as WEEKS,
days % 7 / 3 as HALF_WEEKS,
days % 7 % 3 / 1 as DAYS
FROM
(SELECT 19 as days) s
Here you can see the recursive structure for the module operation terminated by an integer division. Because a more generic version should be necessary I thought about a recursive version. With PostgreSQL WITH RECURSIVE clause this is possible
https://www.postgresql.org/docs/current/static/queries-with.html
So thats the final query
WITH RECURSIVE days_per_tier(row_no, name, days, rate, counts, mods, num_id, num) AS (
SELECT
row_no,
name,
days,
rate,
num.num / days,
num.num % days,
num.id,
num.num
FROM (
SELECT
*,
row_number() over (order by days DESC) as row_no -- C
FROM
testdata.tiers) tiers, -- A
(SELECT id, num FROM testdata.numbers) num -- B
WHERE row_no = 1
UNION
SELECT
days_per_tier.row_no + 1,
tiers.name,
tiers.days,
tiers.rate,
mods / tiers.days, -- D
mods % tiers.days, -- E
days_per_tier.num_id,
days_per_tier.num
FROM
days_per_tier,
(SELECT
*,
row_number() over (order by days DESC) as row_no
FROM testdata.tiers) tiers
WHERE days_per_tier.row_no + 1 = tiers.row_no
)
SELECT
num_id,
num,
array_agg(name || ': ' || counts ORDER BY days DESC) as days,
sum(total_rate_per_tier) as total_price -- G
FROM (
SELECT
*,
rate * counts as total_rate_per_tier -- F
FROM days_per_tier) s
GROUP BY num_id, num
ORDER BY num_Id
The WITH RECURSIVE contains the starting point of the recursion UNION the recursion part. The starting point simply gets the tiers (A) and numbers (B). To order the tiers due to their days I add a row count (C; only necessary if the corresponding ids are not in the right order as in my example. This could happen if you add another tier).
The recursion part takes the previous SELECT result (which is stored in days_per_tier) and calculates the next remainder and integer division (D, E). All other columns are only for holding the origin values (exception the increasing row counter which is responsible for the recursion itself).
After the recursion the counts and rates are multiplied (F) and then grouped by the origin number id which generated the total sum (G)
Edit:
Added the rate function and the sqlfiddle link.
Here what you need to do is first fire an SQL command to retrieve all condition and write down the function for your business logic.
For Example.
I will fire below query into the database.
Select * from table_name order by nights desc
In result, I will get the data sorted by night in descending order that means first will be 7 then 3 then 1.
I will write a function to write down my business logic for example.
Let's suppose I need to find for 11 days.
I will fetch the first record which will be 7 and check it will 11.
if(11 > 7){// execute this if in a loop till it's greater then 7, same for 3 & 1
days = 11-7;
price += price_from_db;
package += package_from_db;
}else{
// goto fetch next record and check the above condition with next record.
}
Note: I write down an algorithm instead of language-specific code.

Query a table so that data in one column could be shown as different fields

I have a table that stores data of customer care . The table/view has the following structure.
userid calls_received calls_answered calls_rejected call_date
-----------------------------------------------------------------------
1030 134 100 34 28-05-2018
1012 140 120 20 28-05-2018
1045 120 80 40 28-05-2018
1030 99 39 50 28-04-2018
1045 50 30 20 28-04-2018
1045 200 100 100 28-05-2017
1030 160 90 70 28-04-2017
1045 50 30 20 28-04-2017
This is the sample data. The data is stored on day basis.
I have to create a report in a report designer software that takes date as an input. When user selects a date for eg. 28/05/2018. This date is send as parameter ${call_date}. i have to query the view in such a way that result should look like as below. If user selects date 28/05/2018 then data of 28/04/2018 and 28/05/2017 should be displayed side by side as like the below column order.
userid | cl_cur | ans_cur | rej_cur |success_percentage |diff_percent|position_last_month| cl_last_mon | ans_las_mon | rej_last_mon |percentage_lm|cl_last_year | ans_last_year | rej_last_year
1030 | 134 | 100 | 34 | 74.6 % | 14% | 2 | 99 | 39 | 50 | 39.3% | 160 | 90 | 70
1045 | 120 | 80 | 40 | 66.6% | 26.7% | 1 | 50 | 30 | 20 | 60% | 50 | 30 | 20
The objective of this query is to show data of selected day, data of same day previous month and same day previous years in columns so that user can have a look and compare. Here the result is ordered by percentage(ans_cur/cl_cur) of selected day in descending order of calculated percentage and show under success_percentage.
The column position_last_month is the position of that particular employee in previous month when it is ordered in descending order of percentage. In this example userid 1030 was in 2nd position last month and userid 1045 in 1 st position last month. Similarly I have to calculate this also for year.
Also there is a field called diff_percent which calculates the difference of percentage between the person who where in same position last month.Same i have to do for last year. How i can achieve this result.Please help.
THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION.
One method is a join:
select t.user_id,
t.calls_received as cr_cur, t.calls_answered as ca_cur, t.calls_rejected as cr_cur,
tm.calls_received as cr_last_mon, tm.calls_answered as ca_last_mon, tm.calls_rejected as cr_last_mon,
ty.calls_received as cr_last_year, ty.calls_answered as ca_last_year, ty.calls_rejected as cr_last_year
from t left join
t tm
on tm.userid = t.userid and
tm.call_date = dateadd(month, -1, t.call_date) left join
t ty
on ty.userid = t.userid and
tm.call_date = dateadd(year, -1, t.call_date)
where t.call_date = ${call_date};

Complex grouping - design / performance problem

WARNING : This is one BIG Question
I have a design problem that started simple, but in one step of growth has stumped me completely.
The simple version of reality has a nice flat fact table...
All names have been changed to protect the innocent
CREATE TABLE raw_data (
tier0_id INT, tier1_id INT, tier2_id INT, tier3_id INT,
metric0 INT, metric1 INT, metric2 INT, metric3 INT
)
The tierIDs relate to entities in a fixed depth tree. Such as a business hierarchy.
The metrics are just performance figures, such as number of frogs captured, or pigeons released.
In the reporting the kindly user would make selections to mean something like the following:
tier0_id's 34 and 55 - shown separately
all of tier1_id's - grouped together
all of tier2_id's - grouped together
all of tier3_id's - shown separately
metrics 2 and 3
This gives me the following type of query:
SELECT
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END AS tier0_id,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END AS tier1_id,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END AS tier2_id,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END AS tier3_id,
SUM(metric2) AS metric2, SUM(metric3) AS metric3
FROM
raw_data
INNER JOIN tier0_values ON tier0_values.id = raw_data.tier0_id OR tier0_values.id IS NULL
INNER JOIN tier1_values ON tier1_values.id = raw_data.tier1_id OR tier1_values.id IS NULL
INNER JOIN tier2_values ON tier2_values.id = raw_data.tier2_id OR tier2_values.id IS NULL
INNER JOIN tier3_values ON tier3_values.id = raw_data.tier3_id OR tier3_values.id IS NULL
GROUP BY
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END
It's a nice hybrid of Dynamic SQL, and parametrised queries. And yes, I know, but SQL-CE makes people do strange things. Besides, that can be tidied up as and when the following change gets incorporated...
From now on, we need to be able to include NULLs in the different tiers. This will mean "applies to ALL entities in that tier".
For example, with the following very simplified data:
Activity WorkingTime ActiveTime BusyTime
1 0m 10m 0m
2 0m 15m 0m
3 0m 20m 0m
NULL 60m 0m 45m
WorkingTime never applies to an activity, so al the values go in with a NULL ID. But ActiveTime is specifically about a specific activity, so it goes in with a legitimate ID. BusyTime is also against a NULL activity because it's the cumulation of all the ActiveTime.
If one were to report on this data, the NULL values -always- get included in every row, because the NULL -means- "applies to everything". The data would look like...
Activity WorkingTime ActiveTime BusyTime (BusyOnOtherActivities)
1 60m 10m 45m (45-10 = 35m)
2 60m 15m 45m (45-15 = 30m)
3 60m 20m 45m (45-20 = 25m)
1&2 60m 25m 45m (45-25 = 20m)
1&3 60m 30m 45m (45-30 = 15m)
2&3 60m 35m 45m (45-35 = 10m)
ALL 60m 45m 45m (45-45 = 0m)
Hopefully this example makes sense, because it's actually a multi-tiered hierarchy (as per the original example), and in every tier NULLs are allowed. So I'll try an example with 3 tiers...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 0 10 0 0 0
1 4 10 | 0 15 0 0 0
1 5 10 | 0 20 0 0 0
1 NULL 10 | 60 0 45 0 0
2 3 10 | 0 5 0 0 0
2 5 10 | 0 10 0 0 0
2 6 10 | 0 15 0 0 0
2 NULL 10 | 50 0 30 0 0
1 3 11 | 0 7 0 0 0
1 4 11 | 0 8 0 0 0
1 5 11 | 0 9 0 0 0
1 NULL 11 | 30 0 24 0 0
2 3 11 | 0 8 0 0 0
2 5 11 | 0 10 0 0 0
2 6 11 | 0 12 0 0 0
2 NULL 11 | 40 0 30 0 0
NULL NULL 10 | 0 0 0 60 0
NULL NULL 11 | 0 0 0 60 0
NULL NULL NULL | 0 0 0 0 2
This would give many, many possible different output records in the reporting, but here are a few examples...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 60 10 45 60 2
1 4 10 | 60 15 45 60 2
1 5 10 | 60 20 45 60 2
2 3 10 | 50 5 30 60 2
2 5 10 | 50 10 30 60 2
2 6 10 | 50 15 30 60 2
1 ALL 10 | 60 45 45 60 2
2 ALL 10 | 50 30 30 60 2
ALL 3 10 | 110 15 75 60 2
ALL 4 10 | 60 15 45 60 2
ALL 5 10 | 110 30 75 60 2
ALL 6 10 | 50 15 30 60 2
ALL 3 ALL | 180 30 129 120 2
ALL 4 ALL | 90 23 69 120 2
ALL 5 ALL | 180 49 129 120 2
ALL 6 ALL | 90 27 60 120 2
ALL ALL 10 | 110 129 129 60 2
ALL ALL 11 | 70 129 129 60 2
ALL ALL ALL | 180 129 129 120 2
1 3&4 ALL | 90 40 69 120 2
ALL 3&4 ALL | 180 53 129 120 2
As messy as this is to explain, it makes complete and logical sense in my head. I understand what is being asked, but for the life of me I can not seem to write a query for this that doesn't take excruciating amounts of time to execute.
So, how would you write such a query, and/or refactor the schema?
I appreciate that people will ask for examples of what I've done so far, but I'm eager to hear other people's uncorrupted ideas and advice first ;)
The problem looks more like a normalization activity. I would start with normalizing the table
to something like: (You may need some more identity fields depending on your usage)
CREATE TABLE raw_data (
rawData_ID INT,
Activity_id INT,
metric0 INT)
I'd create a tiering table that looks something like: (tierplan allows for multiple groupings. If a tier_id has no parent to roll up under, then tierparent_id is NULL This alllows for recursion in the query.)
CREATE TABLE tiers (
tierplan_id INT,
tier_id INT,
tierparent_id INT)
Finally, I'd create a table that relates tiers and Activities something like:
CREATE TABLE ActivTiers (
Activplan_id INT, --id on the table
tierplan_id INT, --tells what tierplan the raw_data falls under
rawdata_id INT) --this allows the ActivityId to be payload instead of identifier.
Queries off of this ought to be "not too difficult."