Help in correcting the SQL - sql

I am trying to solve this query. I have the following data:
Input
Date Id Value
25-May-2011 1 10
26-May-2011 1 10
26-May-2011 2 10
27-May-2011 1 20
27-May-2011 2 20
28-May-2011 1 10
I need to query and output as:
Output
FromDate ToDate Id Value
25-May-2011 26-May-2011 1 10
26-May-2011 26-May-2011 2 10
27-May-2011 27-May-2011 1 20
28-May-2011 28-May-2011 1 10
I tried this sql but I'm not getting the correct result:
SELECT START_DATE, END_DATE, A.KEY, B.VALUE FROM
(
SELECT MIN(DATE) START_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY,VALUE
) A INNER JOIN
(
SELECT MAX(DATE) END_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY, VALUE
) B ON A.KEY = B.KEY AND A.VALUE = B.VALUE;

I think that you are trying too hard. Should be more like this:
SELECT MIN(START_DATE) AS FromDate, MAX(END_DATE) AS ToDate, KEY, VALUE
FROM KEY_VALUE
GROUP BY KEY, VALUE

This query appears to produce the correct results, though it pointed out that you missed a line in your example output '27-May-2011 ... 27-May-2011 ... 2 ... 20'.
select id, [value], date as fromdate, (
select top 1 date
from key_value kv2
where id = kv.id
and [value] = kv.[value]
and date >= kv.date
and datediff(d, kv.date, date) = (
select count(*)
from key_value
where id = kv.id
and [value] = kv.[value]
and date > kv.date
and date <= kv2.date
)
order by date desc
) as todate
from key_value kv
where not exists (
select *
from key_value
where id = kv.id
and [value] = kv.[value]
and date = dateadd(d, -1, kv.[date])
)
First it finds the min date records with the where clause, looking for records that do not have another record on the day before. Then the todate subquery gets the greatest date record by finding the number of days between it and min date then finding the number of records between the two and making sure they match. This of course assumes that the records in the table are distinct.
However if you are processing a massive table your best option may be to sort the records by key, id, date and then use a cursor to programmatically find the min and max dates as you loop over and look for values to change, then push them into a new table whether real or temp along with any other calculations you might need to do on other fields along the way.

Related

Vertica: Return zero if no records found

I am running this query to get the average of logins per user for the last 3 months. If the user has logged-in in the last 3 months, get its average, if not return 0.
I have tried a number of different ways but seems like if the user has not logged in during the last 3 months, there are no records and the count() does not return 0. It simply returns nothing.
1) select case count(*)
WHEN 0
THEN 0
ELSE count(creationTS) / 3
END as average
from table_name where creationTS >= add_months(now(), -3)
and userId = '110'
group by userId;
2) select COALESCE(count(creationTS)/3,0) as average
from table_name where creationTS >= add_months(now(), -3)
and userId = '110'
group by userId;
It gives correct result if a record is found for the condition 'creationTS >= add_months(now(), -3)' but no record exists, it returns nothing. How can I return 0 in that case.
Try it like this:
a) get all distinct userid-s from the base table in a full-select.
b) left join that full-select back with the base table, on equality of the user id and login date not earlier than 3 months ago
c) count the found user id-s in the base table, getting NULL by default if the join fails, and use NVL() to force a 0 in case of NULL, and group by user id
WITH
-- sample input data,not part of real query
indata(userid,login_dt) AS (
SELECT 'arthur', DATE '2021-09-15'
UNION ALL SELECT 'arthur', DATE '2021-08-27'
UNION ALL SELECT 'arthur', DATE '2021-08-01'
UNION ALL SELECT 'trillian', DATE '2021-09-27'
UNION ALL SELECT 'trillian', DATE '2021-08-15'
UNION ALL SELECT 'trillian', DATE '2021-06-27'
UNION ALL SELECT 'ford', DATE '2021-02-27'
UNION ALL SELECT 'ford', DATE '2021-04-27'
)
,
userids AS (
SELECT DISTINCT
userid
FROM indata
)
SELECT
userids.userid
, NVL(COUNT(indata.userid),0) AS login_count
FROM userids
LEFT JOIN indata
ON userids.userid=indata.userid
AND login_dt >= ADD_MONTHS(CURRENT_DATE,-3)
GROUP BY
userids.userid
;
userid | login_count
----------+-------------
arthur | 3
ford | 0
trillian | 2

Select most recent InstanceID base on max end date

I am trying to pull the memberinstance from a table based on the max DateEnd. If it is Null I want to pull that as it would be still ongoing. I am using sql server.
select memberinstanceid
from table
group by memberid
having MAX(ISNULL(date_end, '2099-12-31'))
This query above doesnt work for me. I have tried different ones and have gotten it to return the separate instances, but not just the one with the max date.
Below is what my table looks like.
MemberID MemberInstanceID DateStart DateEnd
2 abc12 2013-01-01 2013-12-31
4 abc21 2010-01-01 2013-12-31
2 abc10 2015-01-01 NULL
4 abc19 2014-01-01 2014-10-31
I would expect my results to look like this
MemberInstanceID
abc10
abc19
I have been trying to figure out how to do this but have not had much luck. Any help would be much appreciated. Thanks
I think you need something like the following:
select MemberID, MemberInstanceID
from table t
where (
-- DateEnd is null...
DateEnd is null
or (
-- ...or pick the latest DateEnd for this member...
DateEnd = (
select max(DateEnd)
from table
where MemberID = t.MemberID
)
-- ... and check there's not a NULL entry for DateEnd for this member
and not exists (
select 1
from table
where MemberID = t.MemberID
and DateEnd is null
)
)
)
The problem with this approach would be if there are multiple rows that match for each member, i.e. multiple NULL rows with the same MemberID, or multiple rows with the same DateEnd for the same MemberID.
SELECT TOP 1 memberinstanceid
from table
ORDER BY (CASE WHEN [DateEnd] IS NULL THEN 1 ELSE 0 END) DESC,
[DateEnd] DESC
The ORDER BY is essentially creating a "column" to sort the NULL values to the top, then doing a secondary sort on the dates that are not null.
You have a good start but you don't need to perform any explicit grouping. What you want is the row where the EndDate is null or is the largest value (latest date) of all the records with the same MemberID. You also realized that the Max couldn't return the latest non-null date because the null, if one exists, must be the latest date.
select m.*
from Members m
where m.DateEnd is null
or m.DateEnd =(
select Max( IsNull( DateEnd, '9999-12-31' ))
from Members
where MemberID = m.MemberID );

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

SQL Query to generate an extra field from data in the table

I have a table with 3 fields like this sample table Tbl1
Person Cost FromDate
1 10 2009-1-1
1 20 2010-1-1
2 10 2009-1-1
I want to query it and get back the 3 fields and a generated field called ToDate that defaults to 2099-1-1 unless there is an actual ToDate implied from another entry for the person in the table.
select Person,Cost,FromDate,ToDate From Tbl1
Person Cost FromDate ToDate
1 10 2009-1-1 2010-1-1
1 20 2010-1-1 2099-1-1
2 10 2009-1-1 2099-1-1
You can select the minimum date from all dates that are after the record's date. If there is none you get NULL. With COALESCE you change NULL into the default date:
select
Person,
Cost,
FromDate,
coalesce((select min(FromDate) from Tbl1 later where later.FromDate > Tbl1.FromDate), '2099-01-01') as ToDate
From Tbl1
order by Person, FromDate;
Although Thorsten's answer is perfectly fine, it would be more efficient to use window-functions to match the derived end-dates.
;WITH nbrdTbl
AS ( SELECT Person, Cost, FromDate, row_nr = ROW_NUMBER() OVER (PARTITION BY Person ORDER BY FromDate ASC)
FROM Tbl1)
SELECT t.Person, t.Cost, t.FromDate, derived_end_date = COALESCE(nxt.FromDate, '9991231')
FROM nbrdTbl t
LEFT OUTER JOIN nbrdTbl nxt
ON nxt.Person = t.Person
AND nxt.row_nr = t.row_nr + 1
ORDER BY t.Person, t.FromDate
Doing a test on a 2000-records table it's about 3 times as efficient according to the Execution plan (78% vs 22%).

How to subtracts values between from different dates in SQL?

Let's say that I'm using the following SQL table called TestTable:
Date Value1 Value2 Value3 ... Name
2013/01/01 1 4 7 Name1
2013/01/14 6 10 8 Name1
2013/02/23 10 32 9 Name1
And I'd like to get the increment of the values between to dates, like:
Value1Inc Value2Inc Value3Inc Name
4 22 1 Name1
between 2013/02/23 and 2013/01/14.
Please note that the values always increment. I'm trying the following approach found in StackOverflow:
select (
(select value1 from TestTable where date < '2013/01/14') -
(select value1 from TestTable where date < '2013/02/23')
) as Value1Inc,
(select value2 from TestTable where date < '2013/01/14') -
(select value2 from TestTable where date < '2013/02/23')
as Value2Inc
...
and so on, but this approach gives me a huge query.
I'd like to use MAX & MIN SQL functions in order to simplify the query, but I don't know exaclty how to do, as I'm not a SQL maste (at least yet:-).
Could you please guys give me a hand here?
Edit: Ups, I think that I have found the solution by myselft by adding a "GROUP BY Name" at the end of the query like this:
select name,max(value1) - min(value1) from TestTable where date < '2013-02-23' and date > '2013-01-01' GROUP BY Name
That was it!
You want to match the next record, using a join. Probably the easiest way is to enumerate and join:
with tt as (
select tt.*, row_number() over (partition by name order by date) as seqnum
from testtable tt
)
select tt.name, tt.date, ttnext.date as nextdate,
(ttnext.value1 - tt.value1) as Diff_Value1,
(ttnext.value2 - tt.value2) as Diff_Value2,
(ttnext.value3 - tt.value3) as Diff_Value2
from tt left outer join
tt ttnext
on tt.seqnum = ttnext.seqnum - 1;
If your database does not support row_number(), you can do something similar with correlated subqueries.