DAX First Occurance in SUMMARIZE, FIRST_VALUE equivalent - powerpivot

I'm trying to create a DAX query to combine several records withing the same table and extract some values from these combined records.
The result should display not only the min and the max of start and stop time, but also the corresponding first and last locations.
FROM
TravelID | TripID | StartTime | StopTime | StartLocation | StopLocation
1001______| 99______| 08:00_______| 08:10_______ | 50AB___________| 99DE___________
1001______| 100_____| 08:12_______| 08:20________|59DB___________| 989FE___________
TO
TravelID | StartTime | StopTime | StartLocation | StopLocation
1001______| 08:00________| 08:20_______|50AB____________|989FE_________
My efforts so fare are:
EVALUATE(
SUMMARIZE(
Source,
Source[BusinessDay]
,Source[TravelID]
,"no of trips in travels", count(Source[TripID])
,"min of starttime", min(Source[StartTime])
,"max of stoptime", max(Source[StopTime])
,"first startlocation", ???
,"last stoplocation", ???
))
I have experimented with FIRSTNONBLANK and RANKX without success.
The SLQ equivalent would be something like: FIRST_VALUE(StartLocation) OVER (PARTITION BY BusinessDay, travelId ORDER BY StartTime ASC) "SiteIn".

To create a DAX query in the pattern of your original post, use the following. Note that a query (some DAX expression that results in a table) cannot be used as a measure, and the vast majority of Power Pivot usage is in pivot tables that require scalar measures.
First some measures to make life easier:
TripCount:=
COUNT( Source[TripID] )
MinStart:=
MIN( Source[StartTime] )
MaxStop:=
MAX( Source[StopTime] )
FirstStartLocation:=
CALCULATE
VALUES( Source[StartLocation] )
,SAMPLE(
1
,Source
,Source[BusinessDay]
,ASC
)
)
LastStopLocation:=
CALCULATE
VALUES( Source[StopLocation] )
,SAMPLE(
1
,Source
,Source[BusinessDay]
,DESC
)
)
And now your query:
EVALUATE
ADDCOLUMNS(
SUMMARIZE(
Source
,Source[BusinessDay]
,Source[TravelID]
)
,"No of trips in travels", [TripCount]
,"Min of starttime", [MinStart]
,"Max of stoptime", [MaxStop]
,"First startlocation", [FirstStartLocation]
,"Last stoplocation", [LastStopLocation]
)

Related

Calculate time difference in minutes in SQL Server 2008

I have a table in SQL server 2008 with data.
Table contains data with amount of time organization has worked on request
CREATE TABLE support
( ID varchar(50),
IN_ORGANIZATION varchar(MAX),
FROM_ORGANIZATION varchar(MAX),
TIMEDIF datetime );
INSERT INTO support
(ID, IN_ORGANIZATION,FROM_ORGANIZATION,TIMEDIF )
VALUES
('22907','ORGANIZATION_NAME_1','RODLAY LLP','2017-04-15 14:58:00.000'),
('22907','MARY LOAN','ORGANIZATION_NAME_1','2017-04-15 15:00:00.000'),
('23289','VENIXTON Ltd','ORGANIZATION_NAME_1','2017-04-21 11:00:00.000'),
('23289','ORGANIZATION_NAME_1','Ocean Loan','2017-04-21 12:00:00.000'),
('23289','Ocean Loan','ORGANIZATION_NAME_1','2017-04-21 13:00:00.000')
;
I want to find time work organizations with the request: ORGANIZATION_NAME_1.
Help me write CURSOR to calculate the time.
Result:
ID, TIMEDIF(minutes)
22907, 2
23289, 120
Datediff function will do the trick
select id,datediff(minute,min(timedif),max(timedif) ) AS time from support
where in_organization = 'ORGANIZATION_NAME_1' or from_organization = 'ORGANIZATION_NAME_1'
group by id ;
My Output:
|id |time
1 |22907 |2
2 |23289 |120
Let tme know in case of any queries.
Maybe this query will help you:
select
id,
DATEDIFF(m,MIN(TIMEDIF),MAX(TIMEDIF)) as [TIMEDIF(minutes)]
from support
where IN_ORGANIZATION ='ORGANIZATION_NAME_1'
or FROM_ORGANIZATION ='ORGANIZATION_NAME_1'
group by id
If you are just arbitrarily trying to get the TimeDifferences between rows you could try something like this:
; WITH x AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY id) AS rwn
From dbo.support
)
SELECT
x.ID
, y.ID AS NextID
, x.IN_ORGANIZATION
, y.IN_ORGANIZATION NextInOrg
, x.FROM_ORGANIZATION
, y.FROM_ORGANIZATION NextFromOrg
, x.TIMEDIF
, y.TIMEDIF AS NextTimeDiff
, x.rwn
, DATEDIFF(MINUTE, x.TIMEDIF, y.TIMEDIF) AS DifferenceFromOneToTheNext
FROM x
INNER JOIN x y ON x.rwn = y.rwn - 1
If you put in an identity that self seeds you can already get a pointer for reference. This is really arbitrary though.

Merge Value Of Multiple Record based on similar criteria in Access/SQL/Excel

Currently I have a Table of Rows looked like this
I would like to merge all Rows with same FlNo to a single Row, the data of merged row follow by these criteria:
'FlNo' remain the same
'Start' would be the earliest Date
'End' would be the lastest date
'Pattern' would represent the day of week, so it would be combination of all day of weeks that appeared in every rows (ie. if Row 1 have Pattern = "12347", Row 2 = "34567", combined Pattern would = "1234567", ie2: If Row1 = "357", Row2 = "357", combined Pattern would remain the same = "357"). This part has bothered me most as I haven't found the algorithm to solve it.
'AC_Name' would be the value which appeared most time for a FlNo (in this case would be 32)
So the Final Row would be
FlNo | Start | End | Pattern | AC_Name |
660 | 26/Mar/2017 | 28/Oct/2017 | 1234567 | 32 |
As the original Data is an Excel Spreadsheet so the solution should be provided based on Excel (VBA)/Access (VBA/SQL) environment. It could process in Excel first then Import to Access or Import to Access then process in there or half/half). Personally I would prefer to process in Access and SQL as there is about 13000s Rows of Data.
Please help me to find a solution to process this data. Thank you guys a lot.
once you have properly fixed you data structure for you pattern column
you could use min(), max() and group by .. united to a selected table with max for count
select
t1.FlNo
, min(t1.Start )
, max( t1.End)
, max(D1)
, max(D2)
, max(D3)
, max(D4)
, max(D5)
, max(D6)
, max(D7)
, t2.AC_Name
from my_table t1
INNER JOIN (
select FlNo, AC_Name, max(my_count) from (
select FlNo, AC_Name , count(*) AS my_count
from my_table
group by FlNo, AC_Name ) t
GROUP BY lNo, AC_Name
having my_count = max(my_count)
) t2 on t1.FlNo = t2.FlNo
Once you have fixed the data, the query for all but Ac_Name would simply be:
select FINo, min(start), max(end),
max(IsMonday), max(IsTuesday), . . .
from t
group by FINo;
Getting Ac_Name is tricky. This should work:
select FINo, min(start), max(end),
max(IsMonday), max(IsTuesday), . . .,
(select top 1 ac_name
from t as t2
where t2.FINo = t.FINo
group by ac_name
order by count(*) desc, ac_name
) as ac_name
from t
group by FINo;

Running Total Using LAG Function

I wonder if someone could help me calculate a running total.
I am converting this from an existing excel solution so i know what i am aiming for.
I am trying to use LAG to get the values from the previous row but the calculation is not matching my target. I think i need to use the result from the previous row in the lag column but that doesn't look possible.
Any help appreciated.
use tempdb;
--Create Temp Table
IF OBJECT_ID('tempdb..#WareHouseData') IS NOT NULL DROP TABLE #WareHouseData
CREATE TABLE #WareHouseData
(
ItemId INT,
DateID INT,
OpenningWareHouseUnits INT,
FcastSales INT,
GoodsIncoming INT,
TargetRunningStock INT
);
--Fill It With example Data
--OpenningWareHouseUnits only exists in the first week
--Fcast sales can be in any week though normally all weeks
--Goods Incoming can be in any weeks
INSERT INTO #WareHouseData
([ItemId],[DateID],[OpenningWareHouseUnits],[FcastSales],[GoodsIncoming],[TargetRunningStock])
VALUES
(987654,201450,200,10,NULL,190),
(987654,201451,NULL,20,NULL,170),
(987654,201452,NULL,30,NULL,140),
(987654,201501,NULL,20,NULL,120),
(987654,201502,NULL,10,NULL,110),
(987654,201503,NULL,50,NULL,60),
(987654,201504,NULL,60,NULL,0),
(987654,201505,NULL,70,100,30),
(987654,201506,NULL,70,80,40),
(987654,201507,NULL,80,100,60),
(987654,201508,NULL,30,NULL,30),
(987654,201509,NULL,20,NULL,10),
(987654,201510,NULL,20,NULL,0),
(123456,201450,300,50,NULL,250),
(123456,201451,NULL,60,NULL,190),
(123456,201452,NULL,70,100,220),
(123456,201501,NULL,80,NULL,140),
(123456,201502,NULL,100,100,140),
(123456,201503,NULL,105,NULL,35),
(123456,201504,NULL,100,100,35),
(123456,201505,NULL,95,NULL,0),
(123456,201506,NULL,30,100,70),
(123456,201507,NULL,20,NULL,50),
(123456,201508,NULL,5,NULL,45),
(123456,201509,NULL,5,NULL,40),
(123456,201510,NULL,5,NULL,35),
(369258,201450,1000,100,NULL,900),
(369258,201451,NULL,100,NULL,800),
(369258,201452,NULL,100,NULL,700),
(369258,201501,NULL,100,NULL,600),
(369258,201502,NULL,100,NULL,500),
(369258,201503,NULL,100,NULL,400),
(369258,201504,NULL,100,NULL,300),
(369258,201505,NULL,100,NULL,200),
(369258,201506,NULL,100,NULL,100),
(369258,201507,NULL,100,500,500),
(369258,201508,NULL,100,NULL,400),
(369258,201509,NULL,100,NULL,300),
(369258,201510,NULL,100,NULL,200);
;
--Match The Target Runing Stock Total
--I need to match the TargetRunningStock Totals
--This can be recreated in excel by pasting the columns
--{ItemId DateID OpenningWareHouseUnits FcastSales GoodsIncoming}
--Into cell A1 with headers, and pasting this formula
-- =IF(C2="",IF((F1-D2+E2)<0,0,(F1-D2+E2)),(C2-D2+E2)) into cell F2
SELECT w.ItemId
, w.DateID
, w.OpenningWareHouseUnits
, w.FcastSales
, w.GoodsIncoming
, w.TargetRunningStock
, CASE WHEN w.OpenningWareHouseUnits IS NOT NULL
THEN (ISNULL(w.OpenningWareHouseUnits,0) - ISNULL(w.FcastSales,0) + ISNULL(w.GoodsIncoming,0))
ELSE CASE WHEN ((((LAG(ISNULL(w.OpenningWareHouseUnits,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID))-
(LAG(ISNULL(w.FcastSales,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)) +
(LAG(ISNULL(w.GoodsIncoming,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)))) -
ISNULL(w.FcastSales,0) + ISNULL(w.GoodsIncoming,0)) < 0
THEN 0
ELSE ((((LAG(ISNULL(w.OpenningWareHouseUnits,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID))-
(LAG(ISNULL(w.FcastSales,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)) +
(LAG(ISNULL(w.GoodsIncoming,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)))) -
ISNULL(w.FcastSales,0) + ISNULL(w.GoodsIncoming,0))
END
END CalculatedRunningStock
FROM #WareHouseData w
ORDER BY w.ItemId
, w.DateID
Ignoring most of the calculation logic for simplicity (and time), you almost certainly need to sum() over (partition by ... order by...).
select ItemId, DateId, TargetRunningStock,
sum(TargetRunningStock) over (partition by itemid order by dateid)
from WarehouseData
order by ItemId, DateId;
ItemId DateId TargetRunningStock Sum
--
123456 201450 250 250
123456 201451 190 440
123456 201452 220 660
...
987654 201507 60 920
987654 201508 30 950
987654 201509 10 960
987654 201510 0 960
Since you're trying to reproduce the results from a spreadsheet, you might need to wrap something like this around some calculated columns that use lag(). I didn't look that deeply into your spreadsheet logic.
The basic syntax for a running sum is to use an order by in the partition clause for the sum() window function:
SELECT w.ItemId, w.DateID, w.OpenningWareHouseUnits, w.FcastSales,
w.GoodsIncoming, w.TargetRunningStock,
SUM(OpenningWareHouseUnits) OVER (PARTITION BY w.ItemId, w.DateId)
FROM #WareHouseData w
ORDER BY w.ItemId, w.DateID ;
I am a little unclear how to apply this to your formula. Sample data and desired results would be a big help.

Merge continuous rows with Postgresql

I have a slots table like this :
Column | Type |
------------+-----------------------------+
id | integer |
begin_at | timestamp without time zone |
end_at | timestamp without time zone |
user_id | integer |
and I like to select merged rows for continuous time. Let's say I have (simplified) data like :
(1, 5:15, 5:30, 1)
(2, 5:15, 5:30, 2)
(3, 5:30, 5:45, 2)
(4, 5:45, 6:00, 2)
(5, 8:15, 8:30, 2)
(6, 8:30, 8:45, 2)
I would like to know if it's possible to select rows formatted like :
(5:15, 5:30, 1)
(5:15, 6:00, 2) // <======= rows id 2,3 and 4 merged
(8:15, 8:45, 2) // <======= rows id 5 and 6 merged
EDIT:
Here's the SQLfiddle
I'm using Postgresql, version 9.3!
Thank you!
Here is one method for solving this problem. Create a flag that determines if a one record does not overlap with the previous one. This is the start of a group. Then take the cumulative sum of this flag and use that for grouping:
select user_id, min(begin_at) as begin_at, max(end_at) as end_at
from (select s.*, sum(startflag) over (partition by user_id order by begin_at) as grp
from (select s.*,
(case when lag(end_at) over (partition by user_id order by begin_at) >= begin_at
then 0 else 1
end) as startflag
from slots s
) s
) s
group by user_id, grp;
Here is a SQL Fiddle.
Gordon Linoff already provided the answer (I upvoted).
I've used the same approach, but wanted to deal with tsrange type.
So I came up with this construct:
SELECT min(id) b_id, min(begin_at) b_at, max(end_at) e_at, grp, user_id
FROM (
SELECT t.*, sum(g) OVER (ORDER BY id) grp
FROM (
SELECT s.*, (NOT r -|- lag(r,1,r)
OVER (PARTITION BY user_id ORDER BY id))::int g
FROM (SELECT id,begin_at,end_at,user_id,
tsrange(begin_at,end_at,'[)') r FROM slots) s
) t
) u
GROUP BY grp, user_id
ORDER BY grp;
Unfortunately, on the top level one has to use min(begin_at) and max(end_at), as there're no aggregate functions for the range-based union operator +.
I create ranges with exclusive upper bounds, this allows me to use “is adjacent to” (-|-) operator. I compare current tsrange with the one on the previous row, defaulting to the current one in case there's no previous. Then I negate the comparison and cast to integer, which gives me 1 in cases when new group starts.

SQL query group by nearby timestamp

I have a table with a timestamp column. I would like to be able to group by an identifier column (e.g. cusip), sum over another column (e.g. quantity), but only for rows that are within 30 seconds of each other, i.e. not in fixed 30 second bucket intervals. Given the data:
cusip| quantity| timestamp
============|=========|=============
BE0000310194| 100| 16:20:49.000
BE0000314238| 50| 16:38:38.110
BE0000314238| 50| 16:46:21.323
BE0000314238| 50| 16:46:35.323
I would like to write a query that returns:
cusip| quantity
============|=========
BE0000310194| 100
BE0000314238| 50
BE0000314238| 100
Edit:
In addition, it would greatly simplify things if I could also get the MIN(timestamp) out of the query.
From Sean G solution, I have removed Group By on complete Table. In Fact re adjected few parts for Oracle SQL.
First after finding previous time, assign self parent id. If there a null in Previous Time, then we exclude giving it an ID.
Now based on take the nearest self parent id by avoiding nulls so that all nearest 30 seconds cusip fall under one Group.
As There is a CUSIP column, I assumed the dataset would be large market transactional data. Instead using group by on complete table, use partition by CUSIP and final Group Parent ID for better performance.
SELECT
id,
sub.parent_id,
sub.cusip,
timestamp,
quantity,
sum(sub.quantity) OVER(
PARTITION BY cusip, parent_id
) sum_quantity,
MIN(sub.timestamp) OVER(
PARTITION BY cusip, parent_id
) min_timestamp
FROM
(
SELECT
base_sub.*,
CASE
WHEN base_sub.self_parent_id IS NOT NULL THEN
base_sub.self_parent_id
ELSE
LAG(base_sub.self_parent_id) IGNORE NULLS OVER(
PARTITION BY cusip
ORDER BY
timestamp, id
)
END parent_id
FROM
(
SELECT
c.*,
CASE
WHEN nvl(abs(EXTRACT(SECOND FROM to_timestamp(previous_timestamp, 'yyyy/mm/dd hh24:mi:ss') - to_timestamp
(timestamp, 'yyyy/mm/dd hh24:mi:ss'))), 31) > 30 THEN
id
ELSE
NULL
END self_parent_id
FROM
(
SELECT
my_table.id,
my_table.cusip,
my_table.timestamp,
my_table.quantity,
LAG(my_table.timestamp) OVER(
PARTITION BY my_table.cusip
ORDER BY
my_table.timestamp, my_table.id
) previous_timestamp
FROM
my_table
) c
) base_sub
) sub
Below are the Table Rows
Input Data:
Below is the Output
RESULT
Following may be helpful to you.
Grouping of 30 second periods stating form a given time. Here it is '2012-01-01 00:00:00'. DATEDIFF counts the number of seconds between time stamp value and stating time. Then its is divided by 30 to get grouping column.
SELECT MIN(TimeColumn) AS TimeGroup, SUM(Quantity) AS TotalQuantity FROM YourTable
GROUP BY (DATEDIFF(ss, TimeColumn, '2012-01-01') / 30)
Here minimum time stamp of each group will output as TimeGroup. But you can use maximum or even grouping column value can be converted to time again for display.
Looking at the above comments, I'm assuming Chris's first scenario is the one you want (all 3 get grouped even though values 1 and 3 are not within 30 seconds of eachother, but are each within 30 seconds of value 2). Also going to assume that each row in your table has some unique ID called 'id'. You can do the following:
Create a new grouping, determining if the preceding row in your partition is more than 30 seconds behind the current row (e.g. determine if you need a new 30 second grouping, or to continue the previous). We'll call that parent_id.
Sum quantity over parent_id (plus any other aggregations)
The code could look like this
select
sub.parent_id,
sub.cusip,
min(sub.timestamp) min_timestamp,
sum(sub.quantity) quantity
from
(
select
base_sub.*,
case
when base_sub.self_parent_id is not null
then base_sub.self_parent_id
else lag(base_sub.self_parent_id) ignore nulls over (
partition by
my_table.cusip
order by
my_table.timestamp,
my_table.id
) parent_id
from
(
select
my_table.id,
my_table.cusip,
my_table.timestamp,
my_table.quantity,
lag(my_table.timestamp) over (
partition by
my_table.cusip
order by
my_table.timestamp,
my_table.id
) previous_timestamp,
case
when datediff(
second,
nvl(previous_timestamp, to_date('1900/01/01', 'yyyy/mm/dd')),
my_table.timestamp) > 30
then my_table.id
else null
end self_parent_id
from
my_table
) base_sub
) sub
group by
sub.time_group_parent_id,
sub.cusip