group by value but only for continue value - sql

OK, the title is far from obvious, I could not explain it better.
Let's consider the table with columns (date, xvalue, some other columns), what I need is to group them by xvalue but only when they are not interrupted considering time (column date), so for example, for:
Date |xvalue |yvalue|
1 Mar |10 |1 |
2 Mar |10 |2 |
3 Mar |20 |6 |
4 Mar |20 |1 |
5 Mar |10 |4 |
6 Mar |10 |2 |
From the above data, I would like to get three rows, for the first xvalue==10, for xvalue==20 and again for xvalue==10 and for each group aggregate of the other values, for example for sum:
1 Mar, 10, 3
3 Mar, 20, 7
5 Mar, 10, 6
It's like query:
select min(date), xvalue, sum(yvalue) from t group by xvalue
Except above will merge 1,2,5 and 6th of March and I want them separately

This is an example of a gaps-and-islands problem. But you need an ordering column. With such a column, you can use the difference of row numbers:
select min(date), xvalue, sum(yvalue)
from (select t.*,
row_number() over (partition by xvalue order by date) as seqnum_d,
row_number() over (order by date) as seqnum
from t
) t
group by xvalue, (seqnum - seqnum_d)
order by min(date)
Here is a db<>fiddle.

Datas in a database are logically stored in mathematicl sets inside which there is absolutly no order and no way to have a default ordering. they are comparable to bags in which objects can move during their use.
So there is no solution to answer your query until you add a specific column to give the requested sort order that the user need to have...

Related

Looking to find duplicates using DIFFERENCE() among 2+ columns

I'm trying to write a SQL Select query that uses the DIFFERENCE() function to find similar names in a database to identify duplicates.
The short version of the code I'm using is:
SELECT *, DIFFERENCE(FirstName, LEAD(FirstName) OVER (ORDER BY SOUNDEX(FirstName))) d
WHERE d >= 3
The problem is my database has additional columns that include middle names and nicknames. So if I have a customer who has multiple names they go by, they might be in the database multiple times, and I need to compare a variety of columns against each other.
Sample Data:
+----+--------+--------+--------+--------+
|ID |First |Middle |AKA1 |AKA2 |
+----+--------+--------+--------+--------+
|1 |Sally |Ann |NULL |NULL |
|2 |Ann |NULL |NULL |NULL |
|3 |Sue |NULL |NULL |NULL |
|4 |Suzy |NULL |NULL |NULL |
|5 |Patricia|NULL |Trish |Patty |
|6 |Patty |NULL |Patricia|Trish |
|7 |Trish |NULL |Patty |Patricia|
+----+--------+--------+--------+--------+
In the above, rows 1+2 are duplicates of each other, as are 3+4, and 5+6+7.
So I'm not sure the best way to get what I want. Here's the longer version of the code I'm actually using:
WITH A AS (SELECT *,
SOUNDEX(FirstName) AS "FirstSoundex",
SOUNDEX(LastName) AS "LastSoundex",
LAG (SOUNDEX(FirstName)) OVER (ORDER BY SOUNDEX(FirstName)) AS "PreviousFirstSoundex",
LAG (SOUNDEX(LastName)) OVER (ORDER BY SOUNDEX(LastName)) AS "PreviousLastSoundex"
FROM Clients),
B AS (
SELECT *,
ISNULL(DIFFERENCE(FirstName, LEAD(FirstName) OVER (ORDER BY FirstSoundex)),0) AS "FirstScore",
ISNULL(DIFFERENCE(LastName, LEAD(LastName) OVER (ORDER BY LastSoundex)),0) AS "LastScore"
FROM A),
C AS (
SELECT *,
ISNULL(LAG (FirstScore) OVER (ORDER BY FirstSoundex),0) AS "PreviousFirstScore",
ISNULL(LAG (LastScore) OVER (ORDER BY LastSoundex),0) AS "PreviousLastScore"
FROM B
),
D AS (
SELECT *,
(CASE WHEN (PreviousFirstScore >=3 AND PreviousLastScore >=3) THEN (PreviousFirstSoundex + PreviousLastSoundex)
WHEN (FirstScore >= 3 AND LastScore >=3) THEN (FirstSoundex + LastSoundex)
END) AS "GroupName"
FROM C
WHERE ((PreviousFirstScore >=3 AND PreviousLastScore >=3) OR (FirstScore >= 3 AND LastScore >=3))
)
SELECT *,
LAG(GroupName) OVER (ORDER BY GroupName) AS "PreviousGroup",
LEAD(GroupName) OVER (ORDER BY GroupName) AS "NextGroup"
FROM D
WHERE (D.GroupName = D.PreviousGroup OR D.GroupName = D.NextGroup)
This lets me group together bundles of potential duplicates and it works well for me. However, I now want to add in a way to check against multiple columns, and I don't know how to do that.
I was thinking about creating a union, something like:
SELECT ClientID,
LastName,
FirstName AS "TempName"
FROM Clients
UNION
SELECT ClientID,
LastName,
MiddleName AS "TempName"
FROM Clients
WHERE MiddleName IS NOT NULL
...etc
But then my LAG() and LEAD() wouldn't work because I'd have multiple rows with the same ClientID. I don't want to identify a single Client as a duplicate of itself.
Anyways, any suggestions? Thanks in advance.

How To Increment Date By One Year, Based on Last Result (DateTime Banding)

Hopefully I'll be able to explain this better than the title.
I have an activity table that looks like this:
|ID| |LicenseNumber| |DateTime|
|1 | |123 | |2017-11-17 11:19:04.420|
|2 | |123 | |2017-11-26 10:16:52.790|
|3 | |123 | |2018-02-06 11:13:21.480|
|4 | |123 | |2018-02-19 10:12:32.493|
|5 | |123 | |2018-05-16 09:33:05.440|
|6 | |123 | |2019-01-02 10:05:25.193|
What I need is a count of rows per License Number, grouped in essentially 12 month intervals. But, the year needs to start from when the previous entry ended.
For example, I need a count of all records for 12 months from 2017-11-17 11:19:04.420, and then I need a count of all records starting from (2017-11-17 11:19:04.420 + 12 months) for another 12 months, and so on.
I've considered using recursive CTEs, the LAG function etc. but can't quite figure it out. I could probably do something with a CASE statement and static values, but that would require updating the report code every year.
Any help pointing me in the right direction would be much appreciated!
I think the following code using CTE can help you but I am not totally sure what you want to achieve:
WITH CTE AS
(
SELECT TOP 1 DateTime
FROM YourTable
ORDER BY ID
UNION ALL
SELECT DATEADD(YEAR, 1, DateTime)
FROM CTE
WHERE DateTime<= DATEADD(YEAR, 1, GETDATE())
)
SELECT LicenseNumber, DateTime, Count(*) AS Rows
FROM CTE
INNER JOIN YourTable
ON YourTable.DateTime BETWEEN CTE.DateTime AND DATEADD(YEAR, 1, CTE.DateTime)
GROUP BY LicenseNumber, DateTime;
Hmmm. Do you just need the number of records in 12-month intervals after the first record?
If so:
select dateadd(year, yr - 1, min_datetime),
dateadd(year, yr, min_datetime),
count(t.id)
from (values (1), (2), (3)) v(yr) left join
(select t.*,
min(datetime) over () as min_datetime
from t
) t
on t.datetime >= dateadd(year, yr - 1, min_datetime) and
t.datetime < dateadd(year, yr, min_datetime)
group by dateadd(year, yr - 1, min_datetime),
dateadd(year, yr, min_datetime)
order by yr;
This can easily be extended to more years, if it is what you want.

How to find month gaps in Oracle table?

I have a Oracle table which has EmpName (Char),Month_from and Month_to column (Numeric). Here I need to find missing months ( Month gaps). In the below sample data I have to find missing month 6 (Jun ).
Thanks in advance.
Sample Data:
|-------|-----------|--------|
|eName |Month_From |Month_To|
|(Char) | ( Int) | ( Int) |
|-------|------------|-------|
|John |1 |2 | ( Jan to Feb)
|John |3 |5 | ( Mar to May)
|John |7 |8 | ( Jul to Aug)
|-------|------------|-------|
Need to Find (Jun to Jun).
Assuming no overlaps, you can find the missing months using lag():
select (prev_month_to + 1) as start_missing,
(month_from - 1) as end_missing
from (select t.*, lag(month_to) over (partition by name order by month_from) as prev_month_to
from t
) t
where prev_month_to <> month_from - 1;
This provides a range for each gap, because the gap could be more than one month.
Just conversion for the sample data, you may consider :
select to_char(to_date(lpad(t.month_from,2,'0'),'mm'),'Mon')||' to '||
to_char(to_date(lpad(t.month_to,2,'0'),'mm'),'Mon')
from my_table t
where upper(t.eName) = upper('&i_eName');
For the question ( Jun to Jun ):
select to_char(to_date(lpad(a1.mon,2,'0'),'mm'),'Mon')
from ( select level mon from dual connect by level <= 12 ) a1
where not exists ( select null
from my_table a2
where a1.mon between a2.month_from and a2.month_to
and upper(a2.eName) = upper('&i_eName') )
order by mon;
But, it returns also Sep, Oct, Nov, Dec, besides Jun. For this, i agree with #mathguy's comment.

Select rows that are duplicates on two columns

I have data in a table. There are 3 columns (ID, Interval, ContactInfo). This table lists all phone contacts. I'm attempting to get a count of phone numbers that called twice on the same day and have no idea how to go about this. I can get duplicate entries for the same number but it does not match on date. The code I have so far is below.
SELECT ContactInfo, COUNT(Interval) AS NumCalls
FROM AllCalls
GROUP BY ContactInfo
HAVING COUNT(AllCalls.ContactInfo) > 1
I'd like to have it return the date, the number of calls on that date if more than 1, and the phone number.
Sample data:
|ID |Interval |ContactInfo|
|--------|------------|-----------|
|1 |3/1/2017 |8009999999 |
|2 |3/1/2017 |8009999999 |
|3 |3/2/2017 |8001234567 |
|4 |3/2/2017 |8009999999 |
|5 |3/3/2017 |8007771111 |
|6 |3/3/2017 |8007771111 |
|--------|------------|-----------|
Expected result:
|Interval |ContactInfo|NumCalls|
|------------|-----------|--------|
|3/1/2017 |8009999999 |2 |
|3/3/2017 |8007771111 |2 |
|------------|-----------|--------|
Just as juergen d suggested, you should try to add Interval in your GROUP BY. Like so:
SELECT AC.ContactInfo
, AC.Interval
, COUNT(*) AS qnty
FROM AllCalls AS AC
GROUP BY AC.ContactInfo
, AC.Interval
HAVING COUNT(*) > 1
The code should like this :
select Interval , ContactInfo, count(ID) AS NumCalls from AllCalls group by Interval, ContactInfo having count(ID)>1;

Cumulative values minus overs

I have a query that produces the following table with a cumulative column (cumulate)
+--+---+--------+------+
|id|qty|cumulate|value |
+--+---+--------+------+
|1 |5 |5 |419.6 |
+--+---+--------+------+
|2 |2 |7 |167.84|
+--+---+--------+------+
|3 |1 |8 |83.92 |
+--+---+--------+------+
|4 |2 |10 |167.84|
+--+---+--------+------+
|5 |1 |11 |83.92 |
+--+---+--------+------+
|6 |5 |16 |419.6 |
+--+---+--------+------+
However I need a further attachment to the query that will only select all the rows that cumulate up to 9. In this case the first 4 rows accumulate up to 10 and the first three; 8 .
I need to extract the sum of the total value where the qty is no more and no less than 9.
The rows are in date order (date not shown) and therefor rows cannot be reordered.
How would one achieve this?
EDIT
here is my query (but the results table above is not the same output as what this query would produce):
select branch,
case
when DATEDIFF(MONTH, dateInv, getDate()) between 0 and 6 then '0-6'
when DATEDIFF(MONTH, dateInv, getDate()) between 7 and 12 then '7-12'
when DATEDIFF(MONTH, dateInv, getDate()) between 13 and 18 then '13-18'
when DATEDIFF(MONTH, dateInv, getDate()) between 19 and 24 then '19-24'
when DATEDIFF(MONTH, dateInv, getDate()) between 25 and 36 then '25-36'
when DATEDIFF(MONTH, dateInv, getDate()) > 36 then '>36'
end [period]
,sum(qty*cost) [costs]
from (
select branch,qty, dateInv, max(cost)cost, max(soh)[qoh], SUM(qty*cost)[sumqty]
, sum(qty) over (partition by product order by dateInv desc) [cumulate]
from openquery(linkedserver,
'select branch,product, soh, cost, dateInv, qty
from table
group by branch,product, soh, cost, dateInv, qty
order by dateInv DESC
')
group by branch,product,qty, dateInv
)t
where cumulate <= qoh
group by branch, dateInv
Well from the looks of it each 1 in quantity has a value of 83.92. 9 * 83.92 = 755.28
Thanks everyone for your attempts. I managed to solve this problem by using a series of nested queries with "over (partition by)", Row_number and calculations.
Lots of fun !