SQL Grouping based on validity - sql

I have the following table, let's call it tbl_costcenters, with the following dummy entries:
ID PosName CostcenterCode ValidFrom ValidUntil
1 test1 111 1.1.2019 1.6.2019
2 test1 111 1.6.2019 1.9.2019
3 test1 222 1.9.2019 1.6.2020
and i would have the following result:
PosName ValidFrom ValidUntil CostcenterCode
test1 1.1.2019 1.9.2019 111
test1 1.9.2019 1.6.2020 222
This is very simplified. The real table contains much more cols. I need to group them based on the costcentercode and get a validity that englobes the two first entries of my example, returning the validfrom from record ID 1 and the validuntil from record ID 2.
Sorry i did not really know for what to search. I think that the answer is easy for somebody that is strong in SQL.
The answer should work for both, SQL Server and for Oracle.
Thank you for your help.

This seems simple aggregation :
select PosName,
min(ValidFrom) as ValidFrom,
(case when max(ValidUntil) > min(ValidFrom) then max(ValidUntil) end) as ValidUntil,
CostcenterCode
from tbl_costcenters t
group by PosName, CostcenterCode;

I suspect that you want to group togethers records whose date overlap, while keeping those that don't overlap separated (although this is not showing in your sample data).
If so, we could use some gaps-and-island techniques here. One option uses window functions to build groups of adjacent records:
select
postName,
min(validFrom) validFrom,
max(validUntil) validUntil
costCenter
from (
select
t.*,
sum(case when validFrom <= lagValidUntil then 0 else 1 end)
over(partition by posName, costCenter order by validFrom) grp
from (
select
t.*,
lag(validUntil)
over(partition by posName, costCenter order by validFrom) lagValidUntil
from mytable t
) t
) t
group by postName, costCenter, grp
order by postName, validFrom

The definitve solution for me was:
select posname, min(validfrom),
case
when
max(case when validuntil is null then 1 ELSE 0 END) = 0
then max(validuntil)
end
from tbl_costcenters pos
group by posname, costcentercode;
Thank you all.

Related

Separate columns for product counts using CTEs

Asking a question again as my post did not follow community rules.
I first tried to write a PIVOT statement to get the desired output. However, I am now trying to approach this using CTEs.
Here's the raw data. Let's call it ProductMaster:
PRODUCT_NUM
CO_CD
PROD_CD
MASTER_ID
Date
ROW_NUM
1854
MAWC
STATIONERY
10003493039
1/1/2021
1
1567
PREF
PRINTER
10003493039
2/1/2021
2
2151
MAWC
STATIONERY
10003497290
3/2/2021
1
I require the Count of each product for every Household from this data in separate columns, Printer_CT, Stationery_Ct
Each Master_ID represents a household. And a household can have multiple products.
So each household represents one row in my final output and I need the Product Counts in separate columns. There can be multiple products in each household, 4 or even more. But I have simplified this example.
I'm writing a query with CTEs to give me the output that I want. In my output, each row is grouped by Master ID
ORGL_CO_CD
ORGL_PROD_CD
STATIONERY_CT
PRINTER_CT
MAWC
STATIONERY
1
1
MAWC
STATIONERY
1
0
Here's my query. I'm not sure where to introduce Column 'Stationery_Ct'
WITH CTE AS
(
SELECT
CO_CD, Prod_CD, MASTER_ID,
'' as S1_CT, '' as P1_CT
FROM
ProductMaster
WHERE
ROW_NUM = 1
), CTE_2 AS
(
SELECT Prod_CD, MASTER_ID
FROM ProductMaster
WHERE ROW_NUM = 2
)
SELECT
CO_CD AS ORGL_CO_CD,
c.Prod_CD AS ORGL_PROD_CD,
(CASE WHEN c2.Prod_CD = ‘PRINTER’ THEN P1_CT = 1 END) PRINTER_CT
FROM
CTE AS c
LEFT OUTER JOIN
CTE_2 AS c2 ON c.MASTER_ID = c2.MASTER_ID
Any pointers would be appreciated.
Thank you!
I guess you can solve that using just GROUP BY and SUM:
-- Test data
DECLARE #ProductMaster AS TABLE (PRODUCT_NUM INT, CO_CD VARCHAR(30), PROD_CD VARCHAR(30), MASTER_ID BIGINT)
INSERT #ProductMaster VALUES (1854, 'MAWC', 'STATIONERY', 10003493039)
INSERT #ProductMaster VALUES (1567, 'PREF', 'PRINTER', 10003493039)
INSERT #ProductMaster VALUES (2151, 'MAWC', 'STATIONERY', 10003497290)
SELECT
MASTER_ID,
SUM(CASE PROD_CD WHEN 'STATIONERY' THEN 1 ELSE 0 END) AS STATIONERY_CT,
SUM(CASE PROD_CD WHEN 'PRINTER' THEN 1 ELSE 0 END) AS PRINTER_CT
FROM #ProductMaster
GROUP BY MASTER_ID
The result is:
MASTER_ID
STATIONERY_CT
PRINTER_CT
10003493039
1
1
10003497290
1
0

SQL query to allow for latest datasets per items

I have this table in an SQL server database:
and I would like a query that gives me the values of cw1, cw2,cw3 for a restricted date condition.
I would like a query giving me the "latest" values of cw1, cw2, cw3 giving me previous values of cw1, cw2, cw3, if they are null for the last plan_date. This would be with a date condition.
So if the condition is plan_date between "02.01.2020" and "04.01.2020" then the result should be
1 04.01.2020 null, 9, 4
2 03.01.2020 30 , 15, 2
where, for example, the "30" is from the last previous date for item_nr 2.
You can get the last value using first_value(). Unfortunately, that is a window function, but select distinct solves that:
select distinct item_nr,
first_value(cw1) over (partition by item_nr
order by (case when cw1 is not null then 1 else 2 end), plan_date desc
) as imputed_cw1,
first_value(cw2) over (partition by item_nr
order by (case when cw2 is not null then 1 else 2 end), plan_date desc
) as imputed_cw2,
first_value(cw3) over (partition by item_nr
order by (case when cw3 is not null then 1 else 2 end), plan_date desc
) as imputed_cw3
from t;
You can add a where clause after the from.
The first_value() window function returns the first value from each partition. The partition is ordered to put the non-NULL values first, and then order by time descending. So, the most recent non-NULL value is first.
The only downside is that it is a window function, so the select distinct is needed to get the most recent value for each item_nr.

Two dimensional comparison in sql

DB schema
CREATE TABLE newsletter_status
(
cryptid varchar(255) NOT NULL,
status varchar(25),
regDat timestamp,
confirmDat timestamp,
updateDat timestamp,
deleteDat timestamp
);
There are rows with the same cryptid, I need to squash them to one row. So the cryptid becomes effectively unique. The complexity comes from the fact that I need to compare dates by rows as well as by columns. How to implement that?
The rule I need to use is:
status should be taken from the row with the latest timestamp (among all 4 dates)
for every date column I need to select the latest date
Example:
002bc5 | new | 2010.01.15 | 2001.01.15 | NULL | 2020.01.10
002bc5 | confirmed | NULL | 2020.01.30 | 2020.01.15 | 2020.01.15
002bc5 | deactivated | NULL | NULL | NULL | 2020.12.03
needs to be squashed into:
002bc5 | deactivated | 2010.01.15 | 2020.01.30 | 2020.01.15 | 2020.12.03
The status deactivated is taken because the timestamp 2020.12.03 is the latest
What you need to get the status is to sort rowset by dates in descending order. In Oracle there is agg_func(<arg>) keep (dense_rank first ...), in other databases it can be replaced with row_number() and filter. Because analytic functions in HANA works not so good sometimes, I suggest to use the only one aggregate function I know in HANA that supports ordering inside - STRING_AGG - with little trick. If you have not a thousands of rows with statuses (i.e. concatenated status will not be greater 4000 for varchar), it will work. This is the query:
select
cryptid,
max(regDat) as regDat,
max(confirmDat) as confirmDat,
max(updateDat) as updateDat,
max(deleteDat) as deleteDat,
substr_before(
string_agg(status, '|'
order by greatest(
ifnull(regDat, date '1000-01-01'),
ifnull(confirmDat, date '1000-01-01'),
ifnull(updateDat, date '1000-01-01'),
ifnull(deleteDat, date '1000-01-01')
) desc),
'|'
) as status
from newsletter_status
group by cryptid
You can use aggregation:
select cryptid,
coalesce(max(case when status = 'deactivated' then status end)
max(case when status = 'confirmed' then status end),
max(case when status = 'new' then status end),
) as status,
max(regDat),
max(confirmDat),
max(updateDat),
max(deleteDat)
from newsletter_status
group by cryptid;
The coalesce()s are a trick to get the statuses in priority order.
EDIT:
If you just want the row with the latest timestamp:
select cryptid,
max(case when seqnum = 1 then status end) as status_on_max_date,
max(regDat),
max(confirmDat),
max(updateDat),
max(deleteDat)
from (select ns.*,
row_number() over (partition by cryptid
order by greatest(coalesce(regDat, '2000-01-01'),
coalesce(confirmDat, '2000-01-01'),
coalesce(updateDat, '2000-01-01'),
coalesce(deleteDat, '2000-01-01')
)
) as seqnum
from newsletter_status ns
) ns
group by cryptid;
I would start by ranking the rows of each cryptid by the greatest value of the date column. Then we can use that information to identify the latest status per cryptid, and aggregate :
select cryptid,
max(case when rn = 1 then status end) as status,
max(regDate) as regDat,
max(confirmDat) as confirmDat,
max(updatedDat) as updatedDat,
max(deleteDat) as deleteDat
from (
select ns.*,
row_number() over(
partition by cryptid
order by greatest(
coalesce(regDate, '0001-01-01'),
coalesce(confirmDat, '0001-01-01'),
coalesce(updatedDat, '0001-01-01'),
coalesce(deleteDat, '0001-01-01')
)
) rn
from newsletter_status ns
) ns
group by cryptid

Find Range in Sequence

I have table #NumberRange. It has a start and end number. I have to find out ranges are in sequence
Declare #NumberRange table
(
Id int primary key,
ItemId int,
[start] int,
[end] int
)
INSERT INTO #NumberRange
VALUES
(1,1,1,10),
(2,1,11,20),
(3,1,21,30),
(4,1,40,50),
(5,1,51,60),
(6,1,61,70),
(7,1,80,90),
(8,1,100,200)
Expected Result:
Note: Result Column calculated from if any continuous numbers i.e 1 to 10 ,11-20,21-30 are continuous numbers. So result column updated as 1 and then 41-50 not continuous numbers (because previous row end with 30 next row start with 40) that is why result column will be 2 and it continuous..
In 4th end with 50 and 5 th start with 51 continuous then result would be 3 because I have differentiate with Result 1...
I have used lead functions and expected result not came,..please can someone help me get the result?
Workaround:
select
*,
[Diff] = [Lead] - [end],
[Result] = Rank() OVER (PARTITION BY ([Lead] - [end]) ORDER BY Id)
from
(select
id, [start], [end], LEAD([start]) over (order by id) as [Lead]
from
#NumberRange) Z
order by
id
Use lag() to determine where the groups start. Then a cumulative sum to enumerate them:
select nr.*,
sum(case when startr = prev_endr + 1 then 0 else 1 end) over (partition by itemid order by startr) as grp
from (select nr.*, lag(endr) over (partition by itemid order by startr) as prev_endr
from numberrange nr
) nr;
Here is a db<>fiddle.
This answer assumes that ids 4 and 5 are continuous, which makes sense based on the rest of the question.
Your expected result is not clear and the questions which are asked in the comments I have too, but I think what you want to do is something similar to
select N1.*,case when N1.[end]+1=N2.[start] then 1 else 2 end Result from #NumberRange N1 inner join #NumberRange N2 on N1.Id=N2.Id-1

Count of group for null is always 0 (zero)

In TSql what is the recommended approach for grouping data containing nulls?
Example of the type of query:
Select Group, Count([Group])
From [Data]
Group by [Group]
It appears that the count(*) and count(Group) both result in the null group displaying 0.
Example of the expected table data:
Id, Group
---------
1 , Alpha
2 , Null
3 , Beta
4 , Null
Example of the expected result:
Group, Count
---------
Alpha, 1
Beta, 1
Null, 0
This is the desired result which can be obtained by count(Id). Is this the best way to get this result and why does count(*) and count(Group) return an "incorrect" result?
Group, Count
---------
Alpha, 1
Beta, 1
Null, 2
edit: I don't remember why I thought count(*) did this, it may be the answer I'm looking for..
The best approach is to use count(*) which behaves exactly like count(1) or any other constant.
The * will ensure every row is counted.
Select Group, Count(*)
From [Data]
Group by [Group]
The reason null shows 0 instead of 2 in this case is because each cell is counted as either 1 or null and null + null = null so the total of that group would also be null. However the column type is an integer so it shows up as 0.
Just do
SELECT [group], count([group])
GROUP BY [group]
SQL Fiddle Demo
Count(id) doesn't gives the expected result as mentioned in question. Gives value of 2 for group NULL
try this..
Select Group, Count(isNull(Group,0))
From [Data]
Group by [Group]
COUNT(*) should work:
SELECT Grp,COUNT(*)
FROM tab
GROUP BY Grp
One more solution could be following:
SELECT Grp, COUNT(COALESCE(Grp, ' '))
FROM tab
GROUP BY Grp
Here is code at SQL Fiddle