Teradata DB
I am having a rough go at it. I have a dataset and I want to create a customer journey. The rules are that the first transaction is a journey. The next transaction that is at least 30 days out is a journey. The next transaction at least 30 days past that is also a journey. I do not have access to programming, only regular queries.
There are a few scenarios.
Customer has just 1 transaction in the dataset. Since it is the only one, it is flagged as a journey.
Customer has 2 transactions within 5 days. The first one is a journey and the second one is not since it is within 30 days.
Customer has 2 transactions. 1/1 and 2/5. They are > 30 days apart so each is flagged as a journey.
Customer has 3 transactions. 1/1, 1/8, 2/5. The first and third are journeys and the second one is not (since it is within the 30 day window of a previously flagged journey).
I have tried everything, but there always seems to be some scenario that doesn't work.
I have the logic that I can write down, but I can't figure out how to do it in teradata.
If trans_idx=1 then journey flag = y
If date - previous trans_idx date > 30 then journey_flag = Y
This is what I can get right. I can't get the sql right for the following logic. If date - previous trans_idx date < 30 then I need to accumulate the difference and then sum the next row. If it is still < 30 I need to accumulate and sum the next row. Once it gets past 30, I need to set that rows' journey flag to Y.
This works but it only compares the previous row. If I change it to unbounded, it will look at all the rows for the given sequence - i just need it to go back to previous 30 day end.
WHEN RUNNING_SUM_FLOAT=0 THEN 'Y'
WHEN RUNNING_SUM_FLOAT - MIN(RUNNING_SUM_FLOAT)
OVER (partition by sequence_id ORDER BY trans_idx
ROWS BETWEEN 1 PRECEDING and 1 PRECEDING) >=30
THEN 'Y'
ELSE 'N'
END as journey_flag
Related
Background
I've been working on some reporting views that get a multi-day work shift and are supposed to do some calculations based on data, but I'm a bit stuck here.
A typical shift is either 3 calendar days usually 1 half-day and two full days, or a whole week consisting of 2 half-days (end and start) and 5 full days.
Specifications
I have the following specifications for what is a full day and half-day. These rules are based on regulation and can't be changed.
2 half-days != 1 full-day, the 2 halves is more "valuable"
Given a started_at iso datetime and end_at iso datetime
I want to get two numbers, full_days, and half_days
A half day is
A day at the start of the range starting at or after 12.00
A day at the end of the range which ends before 19.00
A full day is
A day within the range (traditional 24hours)
A day at the start of the range starting before 12.00
A day at the end of the range which ends at or after 19.00
I'm thinking either a row per full-day and half-day or an aggregated row with half_days and full_days as two separate columns would be ideal in the view to connect it with my other views.
Simplified model
I simplified the data model to leave out unnecessary columns.
create table if not exists [trip]
(
trip_id integer
constraint trip_pk
primary key,
started_at text default (datetime('now')),
end_at text default (datetime('now'))
);
And I'm a bit stuck with how I should design this query. A simple time delta doesn't work.
SQLFiddle with sample data and answers: http://sqlfiddle.com/#!5/de7551/2
You can solve this with a CTE which calculates the day span (number of days the shift spans). Since half days are always 1, 2 or 0 (only occur on end and start) we don't actually need to consider each day by itself.
You can use julianday to get the day as a number, however julian days start at noon so you'll need to subtract 0.5 to get the "actual" day for your calculation. Floor the ending day to avoid a to long span if the end time is later then the start time on each respective day, and round up the result to include partial days as a spanned day.
At this point we can calculate number of half days by checking the end and start. To get the number of full days we simply subtract the half days from the result.
with trip_spans as (
select
ceil(julianday(end_at)-0.5 - floor(julianday(started_at)-0.5)) day_span
, t.*
, (
iif(time(started_at) > time('12:00'), 1, 0)
+
iif(time(end_at) <= time('19:00'), 1, 0)
) half_days
from trip t
)
select
trip_spans.*
, day_span-half_days full_days
from trip_spans
I am trying to produce a query in SQLite where I can determine the average sales made each weekday in the year.
As an example, I'd say like to say
"The average sales for Monday are $400.50 in 2017"
I have a sales table - each row represents a sale you made. You can have multiple sales for the same day. Columns that would be of interest here:
Id, SalesTotal, DayCreated, MonthCreated, YearCreated, CreationDate, PeriodOfTheDay
Day/Month/Year are integers that represent the day/month/year of the week. DateCreated is a unix timestamp that represents the date/time it was created too (and is obviously equal to day/month/year).
PeriodOfTheDay is 0, or 1 (day, or night). You can have multiple records for a given day (typically you can have at most 2 but some people like to add all of their sales in individually, so you could have 5 or more for a day).
Where I am stuck
Because you can have two records on the same day (i.e. a day sales, and a night sales, or multiple of each) I can't just group by day of the week (i.e. group all records by Saturday).
This is because the number of sales you made does not equal the number of days you worked (i.e. I could have worked 10 saturdays, but had 30 sales, so grouping by 'saturday' would produce 30 sales since 30 records exist for saturday (some just happen to share the same day)
Furthermore, if I group by daycreated,monthcreated,yearcreated it works in the sense it produces x rows (where x is the number of days you worked) however that now means I need to return this resultset to the back end and do a row count. I'd rather do this in the query so I can take the sales and divide it by the number of days you worked.
Would anyone be able to assist?
Thanks!
UPDATE
I think I got it - I would love someone to tell me if I'm right:
SELECT COUNT(DISTINCT CAST(( julianday((datetime(CreationDate / 1000, 'unixepoch', 'localtime'))) ) / 7 AS INT))
FROM Sales
WHERE strftime('%w', datetime(CreationDate / 1000, 'unixepoch'), 'localtime') = '6'
AND YearCreated = 2017
This would produce the number for saturday, and then I'd just put this in as an inner query, dividing the sale total by this number of days.
Buddy,
You can group your query by getting the day of week and week number of day created or creation date.
In MSSQL
DATEPART(WEEK,'2017-08-14') // Will give you week 33
DATEPART(WEEKDAY,'2017-08-14') // Will give you day 2
In MYSQL
WEEK('2017-08-14') // Will give you week 33
DAYOFWEEK('2017-08-14') // Will give you day 2
See this figures..
Day of Week
1-Sunday, 2- Monday, 3-Tuesday, 4-Wednesday, 5-Thursday, 6-Saturday
Week Number
1 - 53 Weeks in a year
This will be the key so that you will have a separate Saturday's in every month.
Hope this can help in building your query.
I've got a number of rows and I want to calculate the difference per date.
So say I have the following:
[Date] [Transaction Number] [Value]
1 Jan 16 1 1000
2 Jan 16 1 980
I then want a fact that for every row will compare the value with the measure from the previous date.
So If I have a measure on SUM(Value) for the current date, I basically want SUM(CurrentDate) - SUM(PreviousDate) to see the movement.
A couple of things to note:
There will actually be a couple of comparisons: previous date, previous month end, previous year end.
I want this as a calculated measure not column so that I do not need to filter on the transaction number in the previous period.
What I've tried but it just comes up empty:
Previous Value :=CALCULATE(SUM(Table[Value])) - CALCULATE(SUM(Table[Value]), FILTER(Table, Table[Date] = PreviousDay(Table[Date])))
Unfortunately I cannot tell why your measure didn't work, but following should:
Previous Value := CALCULATE(SUM(Table[Value]) - CALCULATE(SUM(Table[Value]), PREVIOUSDAY(Table[date])))
USERS
ID TIMEMODIFIED
1 1400481271
2 1400481489
3 1400486453
4 1400486525
5 1401777484
I have timemodified field, From timemodified, I need to get the rows of last 4 weeks by taking from today's date.
SELECT id FROM USERS
WHERE FROM_UNIXTIME(timemodified,'%d-%m-%Y') >= curdate()
AND FROM_UNIXTIME(timemodified,'%d-%m-%Y') < curdate()-1
Your times are already in Unix timestamp format. Bear in mind that it'll be far more efficient to compare [TIMEMODIFIED] against the current date converted to a Unix timestamp. In addition, you don't need to check any upper bound unless [TIMEMODIFIED] can be in the future.
Try:
-- 60x60x24x7x4 = 2419200 seconds in four weeks
SET #unix_four_weeks_ago = UNIX_TIMESTAMP(curdate()) - 2419200;
SELECT id FROM USERS
WHERE timemodified >= #unix_four_weeks_ago;
NB. Four weeks ago (i.e. today – 28 days) was 1,437,696,000 (24th July) at the time of this answer. The latest record in the sample you provided has a timestamp going back to the 3rd June 2014, and so none of these records will be returned by the query.
I've been working on this for a while and trying different styles... I need to develop a tracking spreadsheet (for 50 employees) that will calculate accruing vacation time and vacation time used based on the following parameters:
*Employees who have worked less than 3 years but past the 90 day anniversary with the company (at which they start accruing) (there is one more caveat on this... the accrual day is the 1st of the month of the month past the 90 days)[I figured this 90 day item out on my excel sheet] it will accrue at:
- 4 hours a month, at the end of the month, with a MAX cap of 72 hours that they can have stored (but if they use vacation time and fall below the 72 hours they can continue to accrue again up to 72 hours...)
*Employees who have worked more than 3 years but less than 6 years with the company carry over their prior vacation and begin to accrue from here onward at:
- 6.8 hours a month, at the end of the month, with a MAX cap of 122.4 hours that they can have stored (but if they use vacation time and fall below the 122.4 hours they can continue to accrue again up to 122.4 hours...)
*Employees 6 years and over with the company carry over their prior vacation and begin to accrue from here onward at:
- 10 hours a month, at the end of the month, with a MAX cap of 180 hours that they can have stored (but if they use vacation time and fall below the 180 hours they can continue to accrue again up to 180 hours....)
& yes, I need to be able to deduct vacation time used.
Does anyone have any suggestions for layout or for a formula that can do part of these functions? I appreciate any advice or suggestion on what else I can use!
I have created a test sheet, and was accruing based on these conditions for the first and started on the second set of rules (for years 3+).
However when I accrue to the max of 72 hours on the first policy, it no longer accrues correctly if they used vacation and fall under the 72 hours cap again.
I know this is a overcomplicated policy, but that is what the company wants and they will not budge.... Any help or advice is appreciated. I know my sheet isn't great.. but I'm trying options.
Below is the equation I used for the 90 day:
=IF(F2<TODAY()-90, F2+90, "90 Day Period")
Then to get the first day of the month after I used:
=IF(G2="90 Day Period","N/A",DATE(YEAR(G2),MONTH(G2)+1,1))
I tried using for the accrual of the first rule (but it has issues...):
=IF(N2="N/A","N/A",IF(N2<=36,MIN(72,((N2*4)-P2),72),(72-P2)))
For second rule:
=IF(AND(N2>36,N2<=72),MIN(122.4,((N2-36)*6.8)+Q2-S2),0)
Let
column A EmpName
column B EntryDate
Column C DaysSpent for the current month
Column D capped Accruel buffer at end of month
repeat Columns C and D month after month
example
01-Sep-2013 01-Oct-2013
EmpName EntryDate spent buffer spent buffer ... etc ...
.-------.-----------.-----------.------.-----------.------.
me 01-Jan-2010 0 12 0 18.8
you 01-Jun-2013 0 4 0 4
In order not to get spaghetti formulas I recommend to create some user defined functions in VBA, like
Function GetCap(EntryDate, ThisDate) As Single
Function GetMonthly(EntryDate, ThisDate) As Single
By experience it's easier to debug/maintain 2-3 nested If's or Select Case's in VBA than 92 character long formulas with no blanks, no comments etc. in the sheet. Should the business logic change, there's one code block to review - instead of dozens/hundreds of formula in a sheet that has grown for 3x12 months x 50 users.
The above functions may want the help of e.g.
Function EndOfMonth(MyDate) as Date
Function BeginOfNextMonth(MyDate) as Date
so that in the sheet you just
manually enter hours spent month after month
calculate new buffer as = MIN([oldbuffer] - [Spent] + GetMonthly(...), GetCap(...))
carefully use relative/absolute addressing to make the formula "copyable" across columns/rows, e.g.
row-absolute on ThisDate got from the header when copying downwards
column-absolute on EntryDate for each Emp when copying rightwards
You can of course use =GetCap(...) and =GetMonthly(...) directly in cells of your sheet to display intermediate results and for debugging purposes.
Be carefull when you compare dates
Tips:
3 years later is not always 365x3 days later
check what the VBA DateSerial() functions does for months > 12 and months < 0
the end of next month always is the first days of 2 months ahead minus 1 ... even in February of a leap year ggg
and post more questions if you get stuck on these functions.