Extracting special timestamp from pandas dataframe - dataframe

Actually, I extracted some range of timestamp (event) from a big data frame based on information of another data frame.As you can see from below image, this shows the information of start and end of an event during an experiment.
the event data frame
Also, I have another data frame(main) that shows the information of participants is saved in each timestamp in unity environment.the main data frame.
Now, I want to extract the a special time range from this data frame based on the event data frame.
when I extracted this time range , I get something like this :
this is for first event and so on
as you can see , It shows the event name (event_1) and the extracted time range.
my problem is, in our experiment we have for each participant mostly 12 events. It means that for 5 participants I get the extracted event from the event_1 to event_60. but for analyzing data we should have just event_1 to event_12 not more because each participant did just 12 events.The main problem is that for some participants we have 10 events , for some other maybe 11 events and for most of them 12 events. it means we have different situation for renaming the events.
I renamed the events manually with 'replace' function in 'pandas' for renaming the events after cutting. I am looking for a better solution for doing that because later when I am working with the information of 100 participants I can not use the 'replace' function manually.
i hope this is clear
thanks in advance
I expect to have a function for different participants that extract and rename automatically based on the information of the event data frame. as we have participants ID ('uid' column), I thought that can help us because we have change in uid every time the events turns to 1 again. it means that imaging for the first participants we have 11 events then events number 12 is the event 1 for the next participants.

Related

Splunk query using time an event occurs in one index and using it as a starting point to filter events in another index

What's the most efficient way to perform the following search?
Event occurs on index A at X time
Take X time and use it as a start point in index B
Search all occurrences of a field within index B, with additional filters, 5 minutes after that initial event time that occurred from index A
Example using Windows logs: after every successful login via event ID 4624 (index="security") for a particular user on a host, search all Sysmon event ID 1 (index="sysmon") process creation events on that specific host that occurred in a 5 minute window after the login event. My vision is to examine user logins on a particular host and correlate subsequent process creation events over a short period of time.
I've been trying to play with join, stats min(_time), and eval starttimeu, but haven't had any success. Any help/pointers would be greatly appreciated!
Have you tried map? The map command runs a search for each result of another search. For example:
index=security sourcetype=wineventlog EventCode=4624
```Set the latest time for the map to event time + 5 minutes (300 seconds)```
| eval latest=_time+300
| map search="search index=sysmon host=$host$ earliest=$_time$ latest=$latest$"
Names within $ are field names from the main search.

Stata Create panel dataset with two dataframes, no common variable

I am creating a city-by-day panel from scratch, but I'm having trouble balancing and filling in the data. Every city needs to have an observation every day between 01jan2000 and 31dec2019, my variable of interest is a dummy variable recording whether or not an event took place on that day in that city.
My original dataset only recorded observations if event == 1, and I managed to fill in time gaps using tsfill, but I can't figure out how to balance the data or extend it to start on 01jan2000 and 31dec2019. I need every date and city because eventually it will be merged with data that uses that sample period.
My current approach is to create a balanced & filled in panel and then merge the event data using the date it took place. I have a stata df containing the 7,305 dates, and another containing the 273 cityid's I'm observing. Is it possible to generate a new df that combines these two so all 273 cities are observed every day? essentially there will be 273 x 7,304 observations, no variables of interest.
Any help figuring out how to solve the unbalanced issue using either of these approaches is hugely appreciated.

Qlikview line chart with multiple expressions over time period dimension

I am new to Qlikview and after several failed attempts I have to ask for some guidance regarding charts in Qlikview. I want to create Line chart which will have:
One dimension – time period of one month broke down by days in it
One expression – Number of created tasks per day
Second expression – Number of closed tasks per day
Third expression – Number of open tasks per day
This is very basic example and I couldn’t find solution for this, and to be honest I think I don’t understand how I should setup my time period dimension and expression. Each time when I try to introduce more then one expression things go south. Maybe its because I have multiple dates or my dimension is wrong.
Here is my simple data:
http://pastebin.com/Lv0CFQPm
I have been reading about helper tables like Master Callendar or “Date Island” but I couldn’t grasp it. I have tried to follow guide from here: https://community.qlik.com/docs/DOC-8642 but that only worked for one date (for me at least).
How should I setup dimension and expression on my chart, so I can count the ID field if Created Date matches one from dimension and Status is appropriate?
I have personal edition so I am unable to open qwv files from other authors.
Thank you in advance, kind regards!
My solution to this would be to change from a single line per Call with associated dates to a concatenated list of Call Events with a single date each. i.e. each Call will have a creation event and a resolution event. This is how I achieve that. (I turned your data into a spreadsheet but the concept is the same for any data source.)
Calls:
LOAD Type,
Id,
Priority,
'New' as Status,
date(floor(Created)) as [Date],
time(Created) as [Time]
FROM
[Calls.xlsx]
(ooxml, embedded labels, table is Sheet1) where Created>0;
LOAD Type,
Id,
Priority,
Status,
date(floor(Resolved)) as [Date],
time(Resolved) as [Time]
FROM
[Calls.xlsx]
(ooxml, embedded labels, table is Sheet1) where Resolved>0;
Key concepts here are allowing QlikView's auto-conatenate to do it's job by making the field-names of both load statements exactly the same, including capitalisation. The second is splitting the timestamp into a Date and a time. This allows you to have a dimension of Date only and group the events for the day. (In big data sets the resource saving is also significant.) The third is creating the dummy 'New' status for each event on the day of it's creation date.
With just this data and these expressions
Created = count(if(Status='New',Id))
Resolved = count(if(Status='Resolved',Id))
and then
Created-Resolved
all with full accumulation ticked for Open (to give you a running total rather than a daily total which might go negative and look odd) you could draw this graph.
For extra completeness you could add this to the code section to fill up your dates and create the Master Calendar you spoke of. There are many other ways of achieving this
MINMAX:
load floor(num(min([Date]))) as MINTRANS,
floor(num(max([Date]))) as MAXTRANS
Resident Calls;
let zDateMin=FieldValue('MINTRANS',1);
let zDateMax=FieldValue('MAXTRANS',1);
//complete calendar
Dates:
LOAD
Date($(zDateMin) + IterNo() - 1, '$(DateFormat)') as [Date]
AUTOGENERATE 1
WHILE $(zDateMin)+IterNo()-1<= $(zDateMax);
Then you could draw this chart. Don't forget to turn Suppress Zero Values on the Presentation tab off.
But my suggestion would be to use a combo rather than line chart so that the calls per day are shown as discrete buckets (Bars) but the running total of Open calls is a line

SQL Adding unmatched Join results to output

Maybe I am approaching the entire problem wrong - or inefficiently.
Essentially, I am trying to combine two views of data, one of them a log table, based upon 2 criteria:
RoomName field match
Timestamp matches
vw_FusionRVDB_Schedule (RoomName, StartTime, EndTime, Subject, etc)
This contains the schedule of events for all indexed rooms - times in UTC.
vw_FusionRVDB_DisplayUsageHistory (RoomName, OnTime, OffTime, etc)
This is a log of activity that has been paired down to just show when the room display has been turned on and off - times in UTC.
I am wanting to match display on/off activities with the events scheduled in the room when the logged activities occurred.
The query is really long, and includes a lot of derived fields. Hopefully just focusing on the join section will make it more clear.
SELECT <foo>
FROM dbo.vw_FusionRVDB_Schedule
INNER JOIN dbo.vw_FusionRVDB_DisplayUsageHistory
ON dbo.vw_FusionRVDB_Schedule.RoomName =
dbo.vw_FusionRVDB_DisplayUsageHistory.RoomName
AND dbo.vw_FusionRVDB_Schedule.EndTime >=
dbo.vw_FusionRVDB_DisplayUsageHistory.OnTime
AND dbo.vw_FusionRVDB_Schedule.StartTime <=
dbo.vw_FusionRVDB_DisplayUsageHistory.OffTime
This query is working great. By design, some events are listed more than once. This happens when there are multiple on/off display cycles that occur within the window of the same event. Similarly, if a room display is turned on before or during one event and stays on through a following event, data from that single log entry is used on both the first and second event record. So this query is doing exactly what is needed in this aspect.
However, I also want to add back into the output, scheduled events (from the vw_FusionRVDB_Schedule view) that have no corresponding logged activities in the vw_FusionRVDB_DisplayUsageHistory.
I have tried various forms on UNION on another query of the vw_FusionRVDB_Schedule view with null values in the and the fields otherwise taken or derived from vw_FusionRVDB_DisplayUsageHistory view. But it adds all scheduled activities back in - not just the ones with no match from the initial join.
I can provide more details if needed. Thank you in advance.
HepC answered in the comments. I was letting the results confuse me. A left join did the trick.

Dealing with gaps in timeline

I'm looking for some assistance to sort out the logic for how I am going to deal with gaps in a feed timeline, pretty much like what you would see in various Twitter clients. I am not creating a Twitter client, however, so it won't be specific to that API. I'm using our own API, so I can possibly make some changes to the API as well to accomodate for this.
I'm saving each feed item in Core Data. For persistance, I'd like to keep the feed items around. Let's say I fetch 50 feed items from my server. The next time the user launches the app, I do a request for the latest feed items and I am returned with 50 feed items and do a fetch to display the feed items in a table view.
Enough time may have passed between the two server requests that a time gap exists between the two sets of feed items.
50 new feed items (request 2)
----- gap ------
50 older feed items (request 1)
* end of items in core data - just load more *
I keep track of whether a gap exists by comparing the oldest timestamp for the feed items in request 2 with the newest timestamp in set of feed items from request 1. If the oldest timestamp from request 2 is greater then the newest timestamp from request 1 I can assume that a gap exists and I should display a cell with a button to load 50 more. If the oldest timestamp from request 2 is less than or equal to the newest timestamp from request 1 the gap has been filled and there's no need to display the loader.
My first issue is the entire logic surrounding keeping track of whether or not to display the "Load more" cell. How would I know where to display this gap? Do I store it as the same NSManagedObject entity as my feed items with an extra bool + a timestamp that lies in between the two above and then change the UI accordingly? Would there be another, better solution here?
My second issue is related to multiple gaps:
50 new feed items
----- gap ------
174 older feed items
----- gap ------
53 older feed items
* end of items in core data - just load more *
I suppose it would help in this case to go with an NSManagedObject entity so I can just do regular fetches in my Core Data and if they show up amongst the objects, then display them as loading cells and remove them accordingly (if gaps no longer exist between any sets of gaps).
I'd ultimately want to wipe the objects after a certain time has passed as the user probably wouldn't go back in time that long and if they do I can always fetch them from my server if needed.
Any experiences and advice anybody has with this subject is be greatly appreciated!