SQL difference between two rows with timestamp - sql

Im working on with SQL on PostgreSQL
I have this table that describes images of cars with columns camera,whn and reg
Camera is int,whn is timestamp string(e.g. 2007-02-25 07:51:10) and reg is string
I am doing this assignment and im stuck on:
"Print the register plate(reg) of cars which have been photographed twice by the same camera or different and the difference between the photos is a minute or less"
Does anybody know how i can express this difference between two different rows of the column whn that needs to be equal or less than 60 seconds?

Sort your table by timestamp.
Use lag() from windows function to get row below current row.
Use extract() to see different between two timestamp.

Related

Replacing incorrectly entered dates in sql server

I have ran a query is SQL server. there is a name category and every category has a date that it started... however sometimes data was incorrectly entered in the front end so when I do the data pull it returns two start dates per category when in reality just the earliest date should be present. is there any sql code I can throw into this join query that replaces all situations when a category has two dates with the earliest one?
From what I understand, you need to use the MIN() function to get only the earliest entered event when querying your table. You can achieve this my using something similar to the following:
SELECT
categoryName,
MIN(categoryDate)
FROM Category
GROUP BY categoryName
However, I am not sure this is what you need since we have no dataset to verify against. Ideally, you can explain in a more clear way, what you need to achieve, and we can help you better.

Doubleor triple timestamp issue

I am using SQL assistant and my data brings in snapshots from a huge database in the form of timestamps. Occasionally the snapshots bring in multiples per hour. The data is correct, multiple snapshots do happen from time to time within an hour, not always but it does happen.
I am bringing this into Spotfire and viewing by an hour and when more than one snapshot happens in the hour, the data shows as doubled.
I only want to display one per hour preferably the last(max) timestamp for the hour. Example; for the 7 am hour the data has a snapshot for 7:10 am and one for 7:55 am.
These are correct but I only want to display the last(max) timestamp, 7:55 am in this case. I can't figure the issue out in Spotfire so I am leaning towards a fix in SQL. How can I display only 1 for each hour?
You'd do this similarly to how you'd probably do it in SQL -- using a ranking/rownumber function.
The basic way Rank in Spotfire works is Rank(Order columns, order direction, partitioned columns, tie method)
You need to partition by the combination of Date and Hour, and then sort descending by your timestamp column.
So the code to identify the rows that you want to isolate should be something along the lines of:
Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first")
What you do with it from here is going to depend on how you plan to use the data - for example, you can Limit Data Using Expression and set the code above = 1 which will limit your table accordingly (helpful if you don't want your users to accidentally forget to filter), or you can create a calculated column which turns it into a flag of some form like here:
If(Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first") = 1, "Latest", "Duplicate")
Which allows your users to filter by this property. This way, they have the option to look at the extra rows.
Ultimately, though, if you want to only ever see these rows, and have no use for the earlier records, I'd probably do it in SQL, if you have that ability. This reduces the number of rows you have to load into your analytic.

Group timestamp column by year

I want to group the timestamp column by year. I am working with two columns. one with 3 factor levels (YES, NO AND N/A) and a timestamp column in the form of (YYYY-MM-DD H:M:S).
I want to get the frequency of the 3 factor levels per year. So I need to extract the year from the timestamp column but iI cant figure it out.
I am working with a version of Rstudio where you cant download package from online but have to work with the ones already installed. No access to SSIS package, no datepart function available. Also tried using the year function for the code below but I keep getting error.
SELECT Housing, year('datetime')
FROM tablex
GROUP BY year('datetime')
Only package I do have available are lubridate and strptime function.
PS: the timestamp is stored as a factor when try and changing to numeric or characters i get N/A everywhere. Please help!

BigQuery Google Analytics sessionsWithEvent metric

I'm having trouble creating a BigQuery query that will allow for me to fetch the Google Analytics ga:sessionsWithEvent metric.
This is what I tried:
SELECT
EXACT_COUNT_DISTINCT(concat(fullvisitorid, string(visitid))) AS distinctVisitIds
FROM
(TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2016-11-30'), TIMESTAMP('2016-12-26')))
WHERE
hits.type='EVENT'
The logic in the query above seems sound - get all the rows that have a hit.type of 'EVENT' and sum up the exact count of distinct fullVisitorId/VisitId results - aka. the number of unique sessions with an event.
But the numbers I get from here are close but higher than what I get using query explorer
Thank you.
EDIT: Addressing comment below to use wider date range with date filter
With date range +-5 days, this makes the query
SELECT
EXACT_COUNT_DISTINCT(concat(fullvisitorid, string(visitid))) AS distinctVisitIds
FROM
(TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2016-11-25'), TIMESTAMP('2016-12-31')))
WHERE
hits.type='EVENT'
AND ('20161130'<=date AND date<='20161226')
Unfortunately I still get the same number
Don't rely on the table dates, usually even on later days you can have metrics from previous days. Instead use a larger date range on from and exact date range on columns.
AFAIK also the data explorer does approximations.

DAX sum different DateTime

I have a problem here, i would like to sum the work time from my employee based on the data (time2 - time 1) daily and here is my query:
Effective Minute Work Time = 24. * 60 * (LASTNONBLANK(time2,0) -FIRSTNONBLANK(time1,0))
It works daily, but if i drill up to weekly / monthly data it show the wrong sum as it shown below :
What i want is summary of minute between daily different times (time2-time1)
Thanks for your help :)
You have several approaches you can take: the hard way or the easier way :). The harder (at least for me :)) is to use DAX to do this. You would:
1) create a date table,
2) Use the DAX calculate function to evaluate your last non-blank and first non-blank values (you might need to use calculate table, but I'm not sure; DAX experts jump in). Then subtract one vs. the other.
This will give you correct values for a given day for a given person. You can enforce the latter condition by putting a 'has one value' guard on the person name so that your measure informs the report author if they're not using it right.
Doing the same for dates is a little trickier. In the example you show you are including the date in the row grouping. But if you change your mind and want instead to have 'total hours worked by person' or 'total hours worked by everyone' you're not done with modelling yet.
Your next step is to use calculate table in combination with calculate to create a measure that returns the total. You'll use calculate table so you evaluate each date and the hours worked on that date by person. Then you'll use calculate to summarize that all down to a single number. If you're not careful with your DAX (or report authoring) you might mix which person you're summarizing for so that your first/last non blank are not at the person level. It gets intense quickly.
Your easier solution, though it might be more limited in its application - depends really on your scenario - is to use the query to transform the data into a summary by day and person using the group by command. This will give you a row per person per day with their start and end times. Then you can quickly calculate the hours worked on that day. Then you can quite easily build visuals on top of the summary data. Of course you give up some of the flexibility of the having a proper data model. However if you have a date table, a person table, and your summary table and then setup your relationships correctly you can achieve answers to the most common questions.