Get the records periodicaly using sql query - sql

I have below situation.I have one table named UserCheckIn.
This contain the below column
Id, UserId, CheckInTime, CheckOutTime, CheckInStatus.
I want number of user checkedin in hourly, daily, weekly. Lets consider as hourly, like i will take hours 8 am -9 am, 9 am to 10 am etc. I want one sql/linq statement give result me in array or list format like 11,23,12.
How can i achieve this?

You can use event scheduler for this.
Create events as per your requirement. Hourly/Daily/Monthly etc.
Here is the link for your perusal. This has detailed information and sample codes as well. Please note your DB user needs to have required privileges for Events creation/usage.

Related

Azure Stream Analytics : Select data with the last timestamp only

I'm working on a way to stream status of some jobs that are running on an HPC resource (sort of like trying to create a dashboard to look at real time flight status). I generate and push data every 60 seconds. Unfortunately, this way i end up with a lot of repeated data as the status of each 'job' changes unpredictably. I need a way to only keep the latest data. I'm not an SQL pro and do this work in my free time so any help will be appreciated!
Here is my query:
SELECT
Job, Ref, Location, Queue, Description, Status, ElapTime, cast (Time as datetime) as Time
INTO
output_source
FROM
input_source
Here is what my output looks like when i test the query:
Query Test Result
As you can see, in the image, there are two sets of data with two different time stamps. I would like the query to return all the columns associated with only the last timestamp. How do i do this? Any ideas? Apologies if this is a repeated question. I have not found an answer that has helped me solve this problem.
Thanks for all your help!

Doubleor triple timestamp issue

I am using SQL assistant and my data brings in snapshots from a huge database in the form of timestamps. Occasionally the snapshots bring in multiples per hour. The data is correct, multiple snapshots do happen from time to time within an hour, not always but it does happen.
I am bringing this into Spotfire and viewing by an hour and when more than one snapshot happens in the hour, the data shows as doubled.
I only want to display one per hour preferably the last(max) timestamp for the hour. Example; for the 7 am hour the data has a snapshot for 7:10 am and one for 7:55 am.
These are correct but I only want to display the last(max) timestamp, 7:55 am in this case. I can't figure the issue out in Spotfire so I am leaning towards a fix in SQL. How can I display only 1 for each hour?
You'd do this similarly to how you'd probably do it in SQL -- using a ranking/rownumber function.
The basic way Rank in Spotfire works is Rank(Order columns, order direction, partitioned columns, tie method)
You need to partition by the combination of Date and Hour, and then sort descending by your timestamp column.
So the code to identify the rows that you want to isolate should be something along the lines of:
Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first")
What you do with it from here is going to depend on how you plan to use the data - for example, you can Limit Data Using Expression and set the code above = 1 which will limit your table accordingly (helpful if you don't want your users to accidentally forget to filter), or you can create a calculated column which turns it into a flag of some form like here:
If(Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first") = 1, "Latest", "Duplicate")
Which allows your users to filter by this property. This way, they have the option to look at the extra rows.
Ultimately, though, if you want to only ever see these rows, and have no use for the earlier records, I'd probably do it in SQL, if you have that ability. This reduces the number of rows you have to load into your analytic.

Adding a new column into Athena (Presto) table calculated by taking the difference between two rows

Over the past few weeks, I've written a pipeline that picks up all the clickstream data that is being broadcasted from a website. The pipeline makes use of AWS in the following way: S3 > EC2 (for transforms) > Athena (scanning a clean, partitioned s3). New data comes into the pipeline every 24hour and this works great - my clickstream data is easily queriable. However, I now need to add some additional columns i.e. time spent on each page. This can be achieved by sorting by user ID, timestamp and then taking the difference between the timestamp column of row_n1 and row_n2. So my questions are:
1) How can I do this via an SQL query? I'm struggling to get it to work, but my thinking is that once I do I can trigger this query every 24hours to run on the new clickstream data that's coming into Athena.
2) Is this a reasonable way to add additional columns or new aggregate tables? for example, build a query that runs every 24hours on new data to append to a new table.
Ideally, I don't want to touch any of the source code that's been written to do the "core" ETL pipeline
for reference my table looks similar to the following (with the new column time spent on page) :
| userID | eventNum | Category| Time | ...... | timeSpentOnPage |
'103-1023' '3' 'View' '12-10-2019...' 3s
Thanks for any direction/advice that can be provided.
I'm not entirely sure what you are asking, and some example data and expected output would be helpful. For example, I don't quite understand what you mean by row_n and row_m.
I'm going to guess that you mean something like calculating the difference between the timestamps of consecutive rows. That can be achieved by a query like
SELECT
userID,
timestamp - LAG(timestamp, 1) OVER (PARTITION BY userID ORDER BY timestamp) AS timeSpentOnPage
FROM events
The LAG window function returns the value from a previous row (1 in this case means the previous row) in the window given by the window frame (in this case all rows with the same userID and sorted by timestamp). It's kind of like GROUP BY but for each row, if that makes sense.
It wouldn't quite give you the time spent on each page, some page views would look like they were very long when in fact there was just not any activity between them (say someone browsed some, went to lunch, and browsed some more – the last page view before lunch would look like it spanned the whole lunch).
There is no way to do the equivalent of UPDATE in Athena. The closest thing is doing a "CTAS" (Create Table AS) to create a new table (which with some automation can be turned into creating new partitions for existing tables).
If you provide some more information about your data I can revise this answer with other suggestions.

MS Access Combo Box Selections and Time Calculation

Hi currently i am making a Table with 21 Columns ("Task, Name, Time Taken") Each group, so total 7 groups.
Task combo consist of "WIP, HOLD, Quality Check"
Name combo consist of "Mark, John, Alex"
Time taken is a number field "only minutes" like 150, 200, 300 etc
At the end i have 3 columns which is for total time taken for "WIP, HOLD, Quality Check"
My Requirements:
a) When i select a Task (eg. WIP) auto the name should block with the user logged in "I have created a employee table with login form its working fine"
b) When the task selected as "WIP" and entered the "Time Taken" may be multipal times wit there groups. only the WIP total time should calculate and reflect on "TOTAL TIME TAKEN for WIP column"
Please help me ... it may be confusing but let me know if your unable to understand.
Thanks in advance.
What you're describing doesn't sound normalized at all. I'm not entirely clear on your situation but I believe you need to have two at least two tables. If you don't know what normalization is then
What is Normalisation (or Normalization)?
is a good place to start
As for the last 3 columns with the total time. You shouldn't put that in a table. Once you have a table you should always have a query total it up for you.
First of all, your table should have 5 columns. ID, Task, Name, TimeTaken, GroupName. Unless you have a need to show 7 groups simultaneously on the same form? You can filter the data at any time based on GroupName.
For part a, it sounds like your login form should be able to feed your Name combo? Just set ComboName.SelectedValue to whatever your user's name is.
For part b, I just don't understand what you're asking. Can you clarify?

Design of Databases for storing the details of the recurrent occurrence of an event

I need to implement a feature similar to the one provided by Microsoft Outlook to make your meeting appointment recurrent. I am trying to figure out the optimized Database design that I will be requiring for implementing this feature.
The requirement is something like that each run or task entered by the user will also be applicable for scheduling like a recurrent event - weekly, monthly or yearly. Could you please suggest me the Database model - table structure (with constraints) for storing these details in the DB which can be afterwards accessed by the program to do the appropriate task. Screenshots for some of the possible scheduler details can be found at the following link.
We have a mysql DB running at the backend for storing these details. As soon as the user submits a request, a request id with the details of the request is stored in the table and then a action corresponding to it is taken by the program. More clarification would be like that the users intent is to run a sql script,getting the values and then performing statistical analysis to it. But as the oracle reference DB is dynamically updated by many users, he wants to run it in a recurrent manner and get the analysis done. Note that the mysql db and the ref DB are different.
Please let me know if you require any other details.!
I would suggest storing the details of the first occurence in one table (scheduled tasks) and then the recurance (recurring tasks) details in another.
I might also then be tempted to update the scheduled task table with the next occurance as each task is completed.
As for the Table layout, a rough sketch would be as follows:
[ScehduledTasks]
TaskId (Primary Key)
Description and Details etc...
Start Datetime
End Datetime
[RecurringTasks]
TaskId (Foreign Key)
Frequency : Daily, Weekly, Monthly or Yearly.
DayNo : What Day to run on (1-7 for weekly, 1-31 for monthly, 1-365 for yearly)
Interval : Every x weeks, months etc.
WeekOfMonth : first, second, third... etc If populated then DayNo specifies the day of the week.
MonthOfYear : 1-12.
EndDatetime : The last date to perform
Occurences : The number of times to perform. If this and the previous value are null then perform for ever.
Obvious certain fields would be blank depending on how the task was set up, but I think the above covers all you would need to emulate the tasks in Outlook.