Calculate one hour difference between two dates - apache-pig

dataset = LOAD '/user/cloudera/project/answers.txt' USING PigStorage('\t') AS ( qid:chararray , i:chararray , qs:int, qt:long, tags:chararray, qvc:chararray , qac:int , aid:chararray, j:chararray, as:int, at:long);
onedate = FOREACH dataset GENERATE ToDate(qt*1000) as qstntime , ToDate(at*1000) as anstime,tags;
difftime = FILTER onedate by GetHour(qstntime)-GetHour(anstime)==1;
dump difftime;
Output
(2009-02-18T17:37:11.000-08:00,2009-04-17T16:22:01.000-07:00,"ctags")
(2009-02-18T20:31:17.000-08:00,2009-02-19T19:29:40.000-08:00,"iphone")
(2009-02-18T22:11:11.000-08:00,2009-03-20T21:58:21.000-07:00,"php")
(2009-02-18T23:36:58.000-08:00,2009-02-19T22:18:10.000-08:00,"sqlserver")
(2009-02-19T01:05:39.000-08:00,2009-02-20T00:44:53.000-08:00,"python")
which is wrong output. it is calculating subtraction only for hours. While subtracting it has to consider month and year also.

Since you are using GetHours, you are only comparing hours and hence the incorrect result. Instead use HoursBetween which compares the entire datetime object.It returns the number of hours between two DateTime objects.
difftime = FILTER onedate by (HoursBetween(qstntime,anstime) == 1);

Related

How can I have the date just before the max/last Date and repeat the process on the whole table?

I'm trying to calculate the delta between two values in different dates : the firt is the date Max(COD)
( Sum({<COD={"$(=Max(COD))"}>*<Contrat={'PRESTA'}>} if([FMonth.autoCalendar.Num]<>[COD.autoCalendar.Num], CapaPresta)) * 130/Hours +(sum( {<COD={"$(=Max(COD))"} >} if(Contrat = 'PRESTA' , Actual)) + sum( {<COD={"$(=Max(COD))"}>} if(Contrat = 'PRESTA' , Actual_hors_site)))* 130/Hours
Now i'm trying to calculate the same value for the date before the max(COD) so I tried this: I replaced the Max(COD) by :
{<COD={"$(=monthend(addmonths(Max(COD), -1)))
But it doesn't work.

How to Query Data Associated With Minimum/Maximum in Pig

I'm looking for the coldest hour for each day. My data looks like this:
(2015/12/27,12AM,32.0)
(2015/12/27,12PM,34.0)
(2015/12/28,10AM,26.1)
(2015/12/28,10PM,28.0)
(2015/12/28,11AM,27.0)
(2015/12/28,11PM,28.9)
(2015/12/28,12AM,25.0)
(2015/12/28,12PM,26.100000000000005)
(2015/12/29,10AM,22.45)
(2015/12/29,10PM,26.1)
(2015/12/29,11AM,24.1)
(2015/12/29,11PM,25.0)
(2015/12/29,12AM,28.9)
I grouped on each day to find the Min Temp with this code:
minTemps = FOREACH gdate2 GENERATE group as day,MIN(removeDash.temp) as minTemp;
which gives this output:
(2015/12/18,17.1)
(2015/12/19,12.9)
(2015/12/20,23.0)
(2015/12/21,32.0)
(2015/12/22,30.899999999999995)
(2015/12/23,36.05)
(2015/12/24,30.45)
(2015/12/25,26.55)
(2015/12/26,28.899999999999995)
(2015/12/27,26.1)
(2015/12/28,23.55)
(2015/12/29,21.0)
My problem:I also need the hour at which the minimum temp occurred. How can I get the hour as well?
If I'm understanding your question correctly, grouping by (day, hour) won't work because this finds the coldest temperature for each hour, not the coldest hour and temperature for each day.
Instead, use a nested foreach:
B = GROUP A BY day;
C = FOREACH B {
orderd = ORDER A BY temp ASC;
limitd = LIMIT orderd 1;
GENERATE FLATTEN(limitd) AS (day, hour, temp);
};
Group by day as you did before, then order all the hours within the same day by temperature and select only the top record. Just be aware that if there is a tie between two or more hours, only one of these hours will be selected.
Yes, you are on the right track.Modify your group statement to group by day and hour.Finally use FLATTEN on your group decouple the keys.
gdate2 = GROUP removeDash by (day,hour);
minTemps = FOREACH gdate2 GENERATE FLATTEN(group) as (day,hour),MIN(removeDash.temp) as minTemp;

Teradata filter query to pull all data for a month

I have a query below which fetches me the data for last day of a month. In this query, ME_DT is defined as date time type. So when I do the max on ME_DT then it gives me the data for last day of a month. I think I need to convert the date time type to integer YYYYMM in a teradata filter condition, so that it gives me the data for the entire month not just for the last day of a month. How should I modify my existing query to get my desired result?
PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT =
(select max (PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT) from PADW.PL_CURR_DEFN_LOSS_FRCST_ME)
You should try to avoid calculations on a column in the WHERE-condition to get better estimates and possible index/partition-access:
with cte (dt) as
(
select max (PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT)
from PADW.PL_CURR_DEFN_LOSS_FRCST_ME
)
select ....
where PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT
between TRUNC(dt, 'mon')
and last_day(dt)
I have to use filer on the table because it has millions of records...
i did this ...but still verifying if I have got what i want...
(PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT =
(select max (cast(PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT as date format 'YYYYMM')) from PADW.PL_CURR_DEFN_LOSS_FRCST_ME))

Get data between record in table

I have data like this:
For example, today is on April 2012. Referring to data above, I want to get the data with M_PER = 03-2012 because this month is in the range 03-2012 TO 06-2012.
--EditedIn this case, I wanna get a rate for used currency code. Because today is still in April, and I want to know rate US Dollar (USD) to Indonesia Rupiah (IDR) I must get the data with M_PER = 03-2012 and CRR_CURRENCY_CODE = USD.
The question is what query can retrieve data like that?
Since you seem to be using quarterly values, I would use the TRUNC function with the 'Q' format model. This truncates a date to 1/1/YYYY, 1/4/YYYY, 1/7/YYYY and 1/10/YYYY, i.e. the first day of the quarter.
To fit your model which is the month at the end of the quarter, you would then have to add two months. This assumes that the MONTH_PERIOD column is a SQL date and not some other data type.
Included below is an example, using SYSDATE as the input date.
select *
from your_table
where add_months(trunc(sysdate, 'Q'),2) = month_period;
I use the rownum and order by to get the value.
SELECT * FROM tables WHERE m_per > '04-2012' AND ROWNUM = 1 ORDER BY month_period ASC

Datediff function of DQL is not returning results as expected

I am trying to query documentum server using DQL query. Using DATEDIFF function to select data that are created in the current date. Here is the query
SELECT title FROM content_table WHERE DATEDIFF(day, "r_creation_date", DATE(TODAY)) < '1' AND content_type IN ('story','news')
Problem is along with today's data its selecting yesterday's also. Why is less than 1 condition fetching yesterday's data also?
Have tried using DATEDIFF(day, "r_creation_date", DATE(TODAY)) = '0' but that does not fetch any result. I understand even the time comes into picture but as I am using 'day' as the date pattern will it not just calculate difference of the days alone?
You can try this query:
SELECT title FROM content_table WHERE r_creation_date > DATE(TODAY) AND content_type IN ('story','news')
if you need the objects created today (after 00:00, not in the last 24 h)
I should use the GETDATE() function to get the current day
Regards.