SQL: how to control contiguous periods of time - sql

I want to make a query in this type of table.
On the right side appears what I want. In query I want rows that contains NIFs with overlapping periods.
I want that if there are one (or more) periods that are overlapped, this NIF who are periods overlapped have to be added to the query.

You can use below query for this kind of result -
SELECT NIF -- use distinct if you want to get distinct NIF value in your result
FROM T T1 -- T is your tablename
WHERE EXISTS (SELECT 1
FROM T T2
WHERE T1.NIF = T2.NIF
AND T1."START" BETWEEN T2."START" AND T2."END"
AND T1.ROWID <> T2.ROWID);

Related

SQL: Using the result of a query in something similar to a FOR EACH loop. MS SQL SMS

I have data in several views that I would like to run a check against for computed data.
The first step involves a query that returns several rows with a VehicleID column that should be used in the "for each" aspect of the next query, this example has been simplified.
The next step gets the entries from the view [dbo].[viewDataVehicle] that match the VehicleID and returns a row with the VehicleID, Timestamp and Speed.
From here I need to calculate the average of these "Speed" values and then select all rows where Speed > AverageSpeed + SpeedVariable(that should be set in the query).
The result should output the entry rows if the condition is met with an additional OverAverage column (lets say it's a boolean TRUE or FALSE, which is this example would all be TRUE).
This is repeated for each of the other VehicleIDs and the final result is a table containing all the rows that matched the conditions.
I can group by and format later on so this aspect is not important.
How would I write a query to do this?
Generally, in SQL Select statements wheere-ever you have a From TableName you can substitute another select statement for TableName. So, start with selecting your vehicle id's:
select vehicleId
from <table>
where <whatever>
Now match to your view:
select vdv.vehicleId, Timestamp, speed
from dbo.viewDataVehicle vdv
inner join
(
select vehicleId
from <table>
where <whatever
) v on v.vehicleId = vdv.vehicleId
Now use this as input to your next step, and so on and so n.
As Lamu says in a comment, with SQL never think of individual rows: always think of sets. RAT (or Row at a Time) is not the way to go.

Add a where condition in one subquery referencing a table in another subquery (ANSI SQL)

In subquery 'two' I want to add a where condition which is based on a column in subquery 'one', that is, I want to add WHERE one."Call Type" = 'Demo' within my second subquery but that gives me an alias error. Basically, in 'two' I only want the count of those rows which have had a call type = demo as given in table 'one' (Final objective is to find the ratio of the two counts mentioned in the code). Any suggestions on how this can be achieved?
SELECT
COALESCE(one.Period, two.Period) AS Period,
one.TotalDemos,
two.TotalTrials,
Round(100.0 * one.TotalDemos / two.TotalTrials, 2) AS percentage
FROM ( SELECT
"Call Date" AS Period,
COUNT(*) AS TotalDemos
FROM "customer_calls"
WHERE "Call Type" = 'Demo'
GROUP BY Period
) AS one
FULL OUTER JOIN( SELECT
"modified" AS Period,
COUNT(*) AS TotalTrials
FROM "users"
WHERE "customertype" = 0
GROUP BY Period
) AS two ON one.Period = two.Period
Edit - My period values are as datestamps (date - time) whereas I'd want the aggregation or ratio/percentage by month of year instead of day. Not sure how that can be done using coalesce here. Any help on this would also be great.
Based on the table definitions that you've provided, you can achieve what you're after with Common Table Expressions (CTE). Example provided with the query below.
However, wee warning, trying to determine users that have made a demo call by just the Period is not recommended. If my assumption that Period is a date / time duration, then it is very likely that your query will not achieve what you want it to (i.e. you'll have records from the query that will not be a demo call). To get the query precise, you'll have to determine what the actual table joins between customer_calls and users are for your query to be effective.
Also, another wee warning, I generally don't recommend using FULL OUTER JOIN as it will return every record from both tables, meaning that you'll have a lot of records there that you don't want (i.e. lines in one that don't join to two and vice versa). INNER JOIN should suffice assuming you know the database / table structures well.
;WITH [one] AS (
SELECT
"Call Date" AS Period,
COUNT(*) AS TotalDemos
FROM "customer_calls"
WHERE "Call Type" = 'Demo'
GROUP BY Period
), [two] AS (
SELECT
"modified" AS Period,
COUNT(*) AS TotalTrials
FROM "users"
WHERE "customertype" = 0
AND Period IN (SELECT Period FROM [one])
GROUP BY Period
)
SELECT
COALESCE(one.Period, two.Period) AS Period,
one.TotalDemos,
two.TotalTrials,
Round(100.0 * one.TotalDemos / two.TotalTrials, 2) AS percentage
FROM one
FULL OUTER JOIN two
ON one.Period = two.period

SQL Server : verify that two columns are in same sort order

I have a table with an ID and a date column. It's possible (likely) that when a new record is created, it gets the next larger ID and the current datetime. So if I were to sort by date or I were to sort by ID, the resulting data set would be in the same order.
How do I write a SQL query to verify this?
It's also possible that an older record is modified and the date is updated. In that case, the records would not be in the same sort order. I don't think this happens.
I'm trying to move the data to another location, and if I know that there are no modified records, that makes it a lot simpler.
I'm pretty sure I only need to query those two columns: ID, RecordDate. Other links indicate I should be able to use LAG, but I'm getting an error that it isn't a built-in function name.
In other words, both https://dba.stackexchange.com/questions/42985/running-total-to-the-previous-row and Is there a way to access the "previous row" value in a SELECT statement? should help, but I'm still not able to make that work for what I want.
If you cannot use window functions, you can use a correlated subquery and EXISTS.
SELECT *
FROM elbat t1
WHERE EXISTS (SELECT *
FROM elbat t2
WHERE t2.id < t1.id
AND t2.recorddate > t1.recorddate);
It'll select all records where another record with a lower ID and a greater timestamp exists. If the result is empty you know that no such record exists and the data is like you want it to be.
Maybe you want to restrict it a bit more by using t2.recorddate >= t1.recorddate instead of t2.recorddate > t1.recorddate. I'm not sure how you want it.
Use this:
SELECT ID, RecordDate FROM tablename t
WHERE
(SELECT COUNT(*) FROM tablename WHERE tablename.ID < t.ID)
<>
(SELECT COUNT(*) FROM tablename WHERE tablename.RecordDate < t.RecordDate);
It counts for each row, how many rows have id less than the row's id and
how many rows have RecordDate less than the row's RecordDate.
If these counters are not equal then it outputs this row.
The result is all the rows that would not be in the same position after sorting by ID and RecordDate
One method uses window functions:
select count(*)
from (select t.*,
row_number() over (order by id) as seqnum_id,
row_number() over (order by date, id) as seqnum_date
from t
) t
where seqnum_id <> seqnum_date;
When the count is zero, then the two columns have the same ordering. Note that the second order by includes id. Two rows could have the same date. This makes the sort stable, so the comparison is valid even when date has duplicates.
the above solutions are all good but if both dates and ids are in increment then this should also work
select modifiedid=t2.id from
yourtable t1 join yourtable t2
on t1.id=t2.id+1 and t1.recordDate<t2.recordDate

Cross joining two tables with "using" instead of "on"

I found a SQL query in a book which i am not able to understand. From what i understand there are two tables - date which has a date_id and test_Date column, the 2nd table has date_id and obs_cnt.
select t1.test_date
,sum(t2.obs_cnt)
from date t1
cross join
(transactions join date using (date_id)) as t2
where t1.test_date>=t2.test_date
group by t1.test_date
order by t1.test_date
Can someone help me understand what this code does or how the output will look like.
I understand obs_cnt variable is being aggregated at a test_date level.
I understand the use of using in placed on on. But what i dont get is how the date table is being reference twice, does it mean it is being joined twice?
But what i dont get is how the date table is being reference twice, does it mean it is being joined twice?
Yes it is, although it's probably easier to think of t2 as a whole rather than as a function of the date table: t2 is the transaction table but with the actual date representation of the test_date rather than an ID.
I assume there's actually some context for all of this in the book, but it looks like this will produce:
one row of output for every row in the date table (t1), in order of test_date
for each row, total up the number of observations for all transactions that happened on or before that date, using our transactions-with-date table t2.
I understand obs_cnt variable is being aggregated at a test_date level.
It's being aggregated against t1 test_date, which is the constraint we're using to select the rows in t2 that are summed.

Exclude leading NULL values from table

To give some context, I am using time series data (one column) and I want to study gaps in the data, represented by NULL values in the data set. Although I expect some leading NULL values that I am not interested in including in my final data set. However the number of leading NULL values will vary between data sets.
I would like to exclude the top x number of rows of my data set where the value of a particular column is NULL, without excluding NULL values that appear lower in the same column.
Any help would be much appreciated.
Thanks!
EDIT: I also know that my first record in the value column is always 1, if that helps.
Unfortunately, for SQL Server 2008, I can't think of anything cleaner than:
SELECT row_number,value FROM <table> t1
WHERE value is not NULL OR
EXISTS (select * FROM <table> t2
where t2.value is not null and
t2.row_number < t1.row_number)
Just as an aside, for SQL Server 2012, you could use MAX() with an appropriate OVER() clause such that it considers all previous rows. If that MAX() returns NULL then all preceding rows are known to be NULL, and that's what I'd recommend if/when you upgrade.
You could find the first non-null item for each data set and then just query everything after that:
WITH FirstItem AS
(
SELECT
DataSetID,
MIN(row_number) row_number
FROM Data
WHERE value IS NOT NULL
GROUP BY DataSetID
)
SELECT d.* FROM Data d
INNER JOIN FirstItem fi
ON d.DataSetID = fi.DataSetid
AND d.row_number >= fi.row_number