Most recent instance between dates - sql

What I currently have...
What my goal dataset should look like...
As you can see, I need a new column that takes the rows where Type = "Repair" and places the date of the most recent date where Type = "PM". Example above shows repairs 11/19 & 10/26 so I would need the 9/29 Date since it's the most recent PM date. For repairs dated 9/8, 8/21 & 8/5 I would need the 7/26 PM date since it's the most recent PM date before those repairs. This would be the pattern for many months of data. Thanks!
After adding the recommended Windows function this is what I get

You can use a window function:
select t.*,
(case when type <> 'PM'
then max(case when type = 'PM' then date end) over (partition by id order by date)
end) as most_recent_pm_date
from t
This assumes that you want the most recent date per id.

Related

Repeat a record for every day between two dates BigQuery?

I am attempting to produce a table of historical unfulfilled units. Currently, the database captures fulfillment date and order date for a record.
CREATE TABLE `input_table`
(order_name STRING,
line_item_id STRING,
order_date DATE,
fulfillment_date DATE)
Sample Record:
order_name: ABC
line_item_id: 123456
order_date: 2017-04-19
fulfillment_date: 2017-04-25
I want to produce a table that shows the fulfillment status by day, starting with the order date and ending with the date prior to the fulfillment date of each line item, e.g. in the above sample record the output_table would be:
Ultimately, this would allow me to query the count of unfulfilled line items each day:
SELECT
date,
count(line_item_id) AS unfulfilled_line_items
FROM
`output_table`
GROUP BY 1
Indicating the fulfillment status is not strictly necessary, considering it would only include dates in which the status was unfulfilled.
While I could do something like this:
with days as (SELECT
*
FROM
UNNEST(GENERATE_DATE_ARRAY('2017-01-01', CURRENT_DATE(), INTERVAL 1 day)) AS day)
SELECT
*
FROM
`input_table`
JOIN days
ON 1=1
AND order_date <= day
AND fulfillment_date > day
..the operation is fairly expensive.
Is there a better way of going about this?
I want to produce a table that shows the fulfillment status by day, starting with the order date and ending with the date prior to the fulfillment date of each line item
Consider below
select date, order_name, line_item_id, 'unfulfilled' fulfillment_status
from `project.dataset.table`,
unnest(generate_date_array(order_date, fulfillment_date - 1)) date
if applied to sample entry in your question - output is

Pull data from table based on the date of the first record

I am querying some data from a sql table based on dates entered by the user as below :
dt = as.Date(some_date)
# Manipulate dates
end_date = as.Date(dt)
begin_date = as.character(as.Date(end_date) - 364)
What happens after this is that all records in the table where the date field falls between the begin and end date are pulled.
qry <- paste0("select * from table
where date>= '", begin_date, "' and date <= '", end_date; ")
But sometimes it might happen that I do not have 1 years of data but only 10 or 9 or 8 months.
So I want to be able to change the 364 value as per the first date in the table.
So is there any way in R by which I can pull the records starting with the begin date as end_date - 364 and if that date does not exist in the table change the begin date to the first available date and run the query again.
I understand that this will require two passes of the dates and the query but I want to be able to do it iteratively without manually checking for the dates.
Your query will give you one year of data or all the data available in table, which seems to be your requirement. However, if you need to know if there is more than one year of data before selecting the data, than you can use
SELECT MIN(date) FROM table
to get the earliest date available.

how to compare date parts in SQL Oracle

I have a bit tricky question. E.g. I have a start_date: 15/01/2015 and an end date: 17/03/2015 for given record and I would like to generalize that to know, if the record with those exact start and end date belongs to January (by my definition even if it is 31/01/2015, then it belongs to January).
I did the following:
sum(case when to_date('2015/01/01','yyyy/mm/dd') between ROUND(dtime_method_start,'MONTH') and ROUND(dtime_method_end,'MONTH') then 1 else 0 end) as flag_jan
But the problem with Round function is, that it takes everything from 16-31 as next month, which is no good for me. How can I fix it or rewrite it to make it comply with my definition?
I want to know if a record with certain dtime_method_start and dtime_method_end belongs to January. I have many records with many different start and end dates and want to know how many of them belong to January.
SELECT expected,
CASE
WHEN to_date('01/01/2015','DD/MM/YYYY') = ALL (trunc(start_date,'MONTH'), trunc(end_date,'MONTH'))
THEN 1
ELSE 0
END flag_jan
FROM
(SELECT 'notmatch 0' expected
, to_date('15/01/2015','DD/MM/YYYY') start_date
, to_date('17/03/2015','DD/MM/YYYY') end_date
FROM dual
UNION ALL
SELECT 'match 1'
, to_date('12/01/2015','DD/MM/YYYY')
, to_date('23/01/2015','DD/MM/YYYY')
FROM dual
) dates;
this query compares the truncated start_date and end_date to match the first day of the month.
To check another month_flag, juste change the date in the first case expression.
Just use trunc instead of round. Trunc with parameter 'MONTH' will truncate the date to the first day of month. If you test with between using first day of month it's ok.
I would compare the stored dates directly to a range based on the input date, rather than applying a function to every date:
count(case when dtime_method_start >= trunc(to_date('2015/01/01','yyyy/mm/dd'),'mm')
and dtime_method_start < add_months(trunc(to_date('2015/01/01','yyyy/mm/dd'),'mm'),1)
then 1
end) as flag_jan
Or you could
count(case when trunc(dtime_method_start,'mm') = trunc(to_date('2015/01/01','yyyy/mm/dd'),'mm')
then 1
end) as flag_jan

Consolidating dates

Could you please help me, I have two date columns:
Name Member_type start_date end_date
---- ----------- ---------- ----------
a 1 03-01-2007 25-12-2008
a 2 01-01-2010 07-07-2010
a 1 15-08-2010 31-12-2013
For person a, I want to return his first start date.
If he was gone for more than one year since his end date, then I want to return the start date following this end date.
If it's less than 1 year, I want to return his previous start date.
In the above example, his start date should be 01-01-2010.
He first started in 2007, but he left in 2008 and came back in 2010, which is more than a year. So here, his start date would be the date he started after the 1 year gap, which is 01-01-2010.
He again left in 07-07-2010 but came back 15-08-2010, which is less than a year. So, the start date will still be 01-01-2010.
Hope this is clear.
What you are looking for is the most recent record, where there is no start date one year before that records start date. This record contains the start you want.
Here is a way to do it by finding those records, and then getting the most recent date:
select t.name, max(t.startdate) as mostRecentRealStart
from t
where not exists (select 1
from t t2
where t2.name = t.name and t2.enddate between t.startdate - 365 and t.startdate - 1
)
group by t.name

use of week of year & subsquend in bigquery

I need to show distinct users per week. I have a date-visit column, and a user id, it is a big table with 1 billion rows.
I can change the date column from the CSVs to year,month, day columns. but how do I deduce the week from that in the query.
I can calculate the week from the CSV, but this is a big process step.
I also need to show how many distinct users visit day after day, looking for workaround as there is no date type.
any ideas?
To get the week of year number:
SELECT STRFTIME_UTC_USEC(TIMESTAMP('2015-5-19'), '%W')
20
If you have your date as a timestamp (i.e microseconds since the epoch) you can use the UTC_USEC_TO_DAY/UTC_USEC_TO_WEEK functions. Alternately, if you have an iso-formatted date string (e.g. "2012/03/13 19:00:06 -0700") you can call PARSE_UTC_USEC to turn the string into a timestamp and then use that to get the week or day.
To see an example, try:
SELECT LEFT((format_utc_usec(day)),10) as day, cnt
FROM (
SELECT day, count(*) as cnt
FROM (
SELECT UTC_USEC_TO_DAY(PARSE_UTC_USEC(created_at)) as day
FROM [publicdata:samples.github_timeline])
GROUP BY day
ORDER BY cnt DESC)
To show week, just change UTC_USEC_TO_DAY(...) to UTC_USEC_TO_WEEK(..., 0) (the 0 at the end is to indicate the week starts on Sunday). See the documentation for the above functions at https://developers.google.com/bigquery/docs/query-reference for more information.