Generate multiple record from existing records based on interval columns [from and to] - sql

I have 2 types of score [M,B] in column 3, if a type is M, then the score is either an S[scored] or SB[bonus scored] in column 6. Every interval [from_hrs - to_hrs] for a type B must have a corresponding SB for type M, thus, an interval for a type B cannot have a score of S for a type M. I have several records that were unfortunately captured as seen in the table below.
CREATE TABLE SCORE_TBL
(
ID int IDENTITY(1,1) PRIMARY KEY,
PERSONID_FK int NOT NULL,
S_TYPE varchar(50) NULL,
FROM_HRS int NULL,
TO_HRS int NULL,
SCORE varchar(50) NULL,
);
INSERT INTO SCORE_TBL(PERSONID_FK,S_TYPE,FROM_HRS,TO_HRS,SCORE)
VALUES
(1, 'M' , 0,20, 'S'),
(1, 'B',6, 8, 'B'),
(2, 'B',0, 2, 'B'),
(2, 'M',0,20, 'S'),
(2, 'B', 10,13, 'B'),
(2, 'B', 18,20, 'B'),
(2, 'M', 13,18, 'S');
| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
| 1 | 1 | M | 0 | 20 | S |
| 2 | 1 | B | 6 | 8 | B |
| 3 | 2 | B | 0 | 2 | B |
| 4 | 2 | M | 0 | 20 | S |
| 5 | 2 | B | 10 | 13 | B |
| 6 | 2 | B | 18 | 20 | B |
| 7 | 2 | M | 13 | 18 | S |
I want the data to look like this
| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
| 1 | 1 | M | 0 | 6 | S |
| 2 | 1 | M | 6 | 8 | SB |
| 3 | 1 | B | 6 | 8 | B |
| 4 | 1 | M | 8 | 20 | S |
| 5 | 2 | B | 0 | 2 | B |
| 6 | 2 | M | 0 | 2 | SB |
| 7 | 2 | M | 2 | 10 | S |
| 8 | 2 | B | 10 | 13 | B |
| 9 | 2 | M | 10 | 13 | SB |
| 10 | 2 | M | 13 | 18 | S |
| 11 | 2 | B | 18 | 20 | B |
| 12 | 2 | S | 18 | 20 | SB |
Any ideas on how to generate this data in SQL Server select statement? Visually, this what am trying to get.

Tricky part here is that interval might need to be split in several pieces like 0..20 for person 2.
Window functions to the rescue. This query illustrates what you need to do:
WITH
deltas AS (
SELECT personid_fk, hrs, sum(delta_s) as delta_s, sum(delta_b) as delta_b
FROM (SELECT personid_fk, from_hrs as hrs,
case when score = 'S' then 1 else 0 end as delta_s,
case when score = 'B' then 1 else 0 end as delta_b
FROM score_tbl
UNION ALL
SELECT personid_fk, to_hrs as hrs,
case when score = 'S' then -1 else 0 end as delta_s,
case when score = 'B' then -1 else 0 end as delta_b
FROM score_tbl) _
GROUP BY personid_fk, hrs
),
running AS (
SELECT personid_fk, hrs as from_hrs,
lead(hrs) over (partition by personid_fk order by hrs) as to_hrs,
sum(delta_s) over (partition by personid_fk order by hrs) running_s,
sum(delta_b) over (partition by personid_fk order by hrs) running_b
FROM deltas
)
SELECT personid_fk, 'M' as s_type, from_hrs, to_hrs,
case when running_b > 0 then 'SB' else 'S' end as score
FROM running
WHERE running_s > 0
UNION ALL
SELECT personid_fk, s_type, from_hrs, to_hrs, score
FROM score_tbl
WHERE s_type = 'B'
ORDER BY personid_fk, from_hrs;
Step by step:
deltas is union of two passes on score_tbl - one for start and one for end of score/bonus interval, creating a timeline of +1/-1 events
running calculates running total of deltas over time, yielding split intervals where score/bonus are active
final query just converts score codes and unions bonus intervals (which are passed unchanged)
SQL Fiddle here.

Related

SQL - Rows that are repetitive with a particular condition

We have a table like this:
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 3 | Steve | SomeService3 | | | | 2 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 4 | Steve | SomeService4 | | | | 12 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
Every digit in zones is a tooth (dental science) and it means "John" has got "SomeService1" twice for tooth #3.
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
Note that Steve has received services twice for tooth #2 (4th Zone) but services are not one.
I'd write some code that gives me a table with duplicate rows (Checking the only patient and received service)(using "group by" clause") but I need to check zones too.
I've tried this:
select ROW_NUMBER() over(order by vv.ID_sick) as RowNum,
bb.Radif,
bb.VCount as 'Count',
vv.ID_sick 'ID_Sick',
vv.ID_service 'ID_Service',
sick.FNamesick + ' ' + sick.LNamesick as 'Sick',
serv.NameService as 'Service',
vv.Mab_Service as 'MabService',
vv.Mab_daryafti as 'MabDaryafti',
vv.datevisit as 'DateVisit',
vv.Zone1,
vv.Zone2,
vv.Zone3,
vv.Zone4,
vv.ID_dentist as 'ID_Dentist',
dent.FNamedentist + ' ' + dent.LNamedentist as 'Dentist',
vv.id_do as 'ID_Do',
do.FNamedentist + ' ' + do.LNamedentist as 'Do'
from visiting vv inner join (
select ROW_NUMBER() OVER(ORDER BY a.ID_sick ASC) AS Radif,
count(a.ID_sick) as VCount,
a.ID_sick,
a.ID_service
from visiting a
group by a.ID_sick, a.ID_service, a.Zone1, a.Zone2, a.Zone3, a.Zone4
having count(a.ID_sick)>1)bb
on vv.ID_sick = bb.ID_sick and vv.ID_service = bb.ID_service
left join InfoSick sick on vv.ID_sick = sick.IDsick
left join infoService serv on vv.ID_service = serv.IDService
left join Infodentist dent on vv.ID_dentist = dent.IDdentist
left join infodentist do on vv.id_do = do.IDdentist
order by bb.ID_sick, bb.ID_service,vv.datevisit
But this code only returns rows with all tooths repeated. What I want is even one tooth repeats ...
How can I implement it?
I need to check characters in zones.
**Zone's datatype is varchar
This is a bad datamodel for what you are trying to do. By storing the teeth as a varchar, you have kind of decided that you are not interested in single teeth, but only in the group of teeth. Now, however, you are trying to investigate on single teeth.
You'd want a datamodel like this:
service
+------------+--------+-----------------+
| service_id | Name | RecievedService |
+------------+--------+-----------------+
| 1 | John | SomeService1 |
+------------+--------+-----------------+
| 3 | Steve | SomeService3 |
+------------+--------+-----------------+
| 4 | Steve | SomeService4 |
+------------+-------+-----------------+
service_detail
+------------+------+-------+
| service_id | zone | tooth |
+------------+------+-------+
| 1 | 1 | 1 |
| 1 | 1 | 3 |
| 1 | 3 | 4 |
+------------+------+-------+
| 1 | 1 | 3 |
| 1 | 1 | 4 |
+------------+------+-------+
| 3 | 4 | 2 |
+------------+------+-------+
| 4 | 4 | 1 |
| 4 | 4 | 2 |
+------------+------+-------+
What you can do with the given datamodel is to create such table on-the-fly using a recursive query and string manipulation:
with unpivoted(service_id, name, zone, teeth) as
(
select recievedservice, name, 1, firstzoneteeth
from mytable where len(firstzoneteeth) > 0
union all
select recievedservice, name, 2, secondzoneteeth
from mytable where len(secondzoneteeth) > 0
union all
select recievedservice, name, 3, thirdzoneteeth
from mytable where len(thirdzoneteeth) > 0
union all
select recievedservice, name, 4, fourthzoneteeth
from mytable where len(fourthzoneteeth) > 0
)
, service_details(service_id, name, zone, tooth, teeth) as
(
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from unpivoted
union all
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from service_details
where len(teeth) > 0
)
, duplicates(service_id, name) as
(
select distinct service_id, name
from service_details
group by service_id, name, zone, tooth
having count(*) > 1
)
select m.*
from mytable m
join duplicates d on d.service_id = m.recievedservice and d.name = m.name;
A lot of work and a rather slow query due to a bad datamodel, but still feasable.
Rextester demo: http://rextester.com/JVWK49901

Record batching on bases of running total values by specific number (FileSize wise batching)

We are dealing with large recordset and are currently using NTILE() to get the range of FileIDs and then using FileID column in BETWEEN clause to get specific records set. Using FileID in BETWEEN clause is a mandatory requirement from Developers. So, we cannot have random FileIDs in one batch, it has to be incremental.
As per new requirement, we have to make range based on FileSize column, e.g. 100 GB per batch.
For example:
Batch 1 : 1 has 100 size So ID: 1 record only.
Batch 2 : 2,3,4,5 = 80 but it is < 100 GB, so have to take FileId 6 if 120 GB (Total 300 GB)
Batch 3 : 7 ID has > 100 so 1 record only
And so on…
Below are my sample code, but it is not giving the expected result:
CREATE TABLE zFiles
(
FileId INT
,FileSize INT
)
INSERT INTO dbo.zFiles (
FileId
,FileSize
)
VALUES (1, 100)
,(2, 20)
,(3, 20)
,(4, 30)
,(5, 10)
,(6, 120)
,(7, 400)
,(8, 50)
,(9, 100)
,(10, 60)
,(11, 40)
,(12, 5)
,(13, 20)
,(14, 95)
,(15, 40)
DECLARE #intBatchSize FLOAT = 100;
SELECT y.FileID ,
y.FileSize ,
y.RunningTotal ,
DENSE_RANK() OVER (ORDER BY CEILING(RunningTotal / #intBatchSize)) Batch
FROM ( SELECT i.FileID ,
i.FileSize ,
RunningTotal = SUM(i.FileSize) OVER ( ORDER BY i.FileID ) -- RANGE UNBOUNDED PRECEDING)
FROM dbo.zFiles AS i WITH ( NOLOCK )
) y
ORDER BY y.FileID;
Result:
+--------+----------+--------------+-------+
| FileID | FileSize | RunningTotal | Batch |
+--------+----------+--------------+-------+
| 1 | 100 | 100 | 1 |
| 2 | 20 | 120 | 2 |
| 3 | 20 | 140 | 2 |
| 4 | 30 | 170 | 2 |
| 5 | 10 | 180 | 2 |
| 6 | 120 | 300 | 3 |
| 7 | 400 | 700 | 4 |
| 8 | 50 | 750 | 5 |
| 9 | 100 | 850 | 6 |
| 10 | 60 | 910 | 7 |
| 11 | 40 | 950 | 7 |
| 12 | 5 | 955 | 7 |
| 13 | 20 | 975 | 7 |
| 14 | 95 | 1070 | 8 |
| 15 | 40 | 1110 | 9 |
+--------+----------+--------------+-------+
Expected Result:
+--------+---------------+---------+
| FileID | FileSize (GB) | BatchNo |
+--------+---------------+---------+
| 1 | 100 | 1 |
| 2 | 20 | 2 |
| 3 | 20 | 2 |
| 4 | 30 | 2 |
| 5 | 10 | 2 |
| 6 | 120 | 2 |
| 7 | 400 | 3 |
| 8 | 50 | 4 |
| 9 | 100 | 4 |
| 10 | 60 | 5 |
| 11 | 40 | 5 |
| 12 | 5 | 6 |
| 13 | 20 | 6 |
| 14 | 95 | 6 |
| 15 | 40 | 7 |
+--------+---------------+---------+
We can achieve this if somehow we can reset the running total once it gets over 100. We can write a loop to have this result, but for that we need to go record by record, which is time consuming.
Please somebody help us on this?
You need to do this with a recursive CTE:
with cte as (
select z.fileid, z.filesize, z.filesize as batch_filesize, 1 as batchnum
from zfiles z
where z.fileid = 1
union all
select z.fileid, z.filesize,
(case when cte.batch_filesize + z.filesize > #intBatchSize
then z.filesize
else cte.batch_filesize + z.filesize
end),
(case when cte.batch_filesize + z.filesize > #intBatchSize
then cte.batchnum + 1
else cte.batchnum
end)
from cte join
zfiles z
on z.fileid = cte.fileid + 1
)
select *
from cte;
Note: I realize that fileid probably is not a sequence. You can create a sequence using row_number() in a CTE, to make this work.
There is a technical reason why running sums don't work for this. Essentially, any given fileid needs to know the breaks before it.
Small modification on above answered by Gordon Linoff and got expected result.
DECLARE #intBatchSize INT = 100
;WITH cte as (
select z.fileid, z.filesize, z.filesize as batch_filesize, 1 as batchnum
from zfiles z
where z.fileid = 1
union all
select z.fileid, z.filesize,
(case when cte.batch_filesize >= #intBatchSize
then z.filesize
else cte.batch_filesize + z.filesize
end),
(case when cte.batch_filesize >= #intBatchSize
then cte.batchnum + 1
else cte.batchnum
end)
from cte join
zfiles z
on z.fileid = cte.fileid + 1
)
select *
from cte;

Show missing rows with 0 values to maintain the order

I have a table with a Name column that its values are either 'A', 'B' or 'C'. They come in order ( A, B, C, A, B, C, ...) however, sometimes a Name might be missing (A, B,[missing C] A, B, C, ...). I want a query that gives me all of Names in order without any missing name. The Value for missing names must be 0.
PS: The table is in a Netezza database and it gets truncated and reloaded with fresh data each time by an SSIS package. What we know is that there is also an ID column with a value between 1 and 27. But the number of rows after each truncation and loading could be different. The table I want does not need the ID column, but if it had, it would be from 1 to 27, meaning that the 'table I want' must always have 27 rows.
I would recommend fixing this in the source SSIS package, but I think the following will work in Netazza (for versions that support the WITH command). Note that recursion is not used which I believe isn't support by Netazza.
If the WITH command isn't supported then some other source of a numeric seqeunce could be used (e.g. by row_number() )
setup:
CREATE TABLE TableHave
(Name varchar(1), ID int, Value decimal(5,2))
;
INSERT INTO TableHave
(Name, ID)
VALUES
('A', 1),
('A', 4),
('A', 7),
('C', 21),
('B', 23),
('A', 25)
;
update TableHave set Value = id*1.12;
Query:
;WITH
Digits AS (
SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
Tally AS (
SELECT
ones.digit
+ tens.digit * 10
+ hundreds.digit * 100
-- + thousands.digit * 1000
as num
FROM Digits ones
CROSS JOIN Digits tens
CROSS JOIN Digits hundreds
-- CROSS JOIN Digits thousands (keep adding more if needed)
)
select
d.id
, d.name
, t.value
from (
select
num + 1 as id
, case when num % 3 = 1 then 'B'
when num % 3 = 2 then 'C'
else 'A'
end Name
, coalesce(t.value,0) value
from Tally
where num <= (select ((max(id)/3)*3)+2 from TableHave)
) d
left join TableHave t on d.id = t.id
order by d.id
result:
+----+------+-------+
| id | name | value |
+----+------+-------+
| 1 | A | 1.12 |
| 2 | B | 0 |
| 3 | C | 0 |
| 4 | A | 4.48 |
| 5 | B | 0 |
| 6 | C | 0 |
| 7 | A | 7.84 |
| 8 | B | 0 |
| 9 | C | 0 |
| 10 | A | 0 |
| 11 | B | 0 |
| 12 | C | 0 |
| 13 | A | 0 |
| 14 | B | 0 |
| 15 | C | 0 |
| 16 | A | 0 |
| 17 | B | 0 |
| 18 | C | 0 |
| 19 | A | 0 |
| 20 | B | 0 |
| 21 | C | 23.52 |
| 22 | A | 0 |
| 23 | B | 25.76 |
| 24 | C | 0 |
| 25 | A | 28.00 |
| 26 | B | 0 |
| 27 | C | 0 |
+----+------+-------+
A running example (on SQL Server) is available here http://rextester.com/VXB89713

Counting on multiple columns

I have a table like this:
+------------+---------------+-------------+
|store_number|entrance_number|camera_number|
+------------+---------------+-------------+
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 4 | 1 | 2 |
| 4 | 2 | 1 |
| 4 | 3 | 1 |
+------------+---------------+-------------+
In summary the stores are numbered 1 and up, the entrances are numbered 1 and up for each store, and the cameras are numbered 1 and up for each entrance.
What I want to do is count how many how many entrances in total, and how many cameras in total for each store. Producing this result from the above table:
+------------+---------------+-------------+
|store_number|entrances |cameras |
+------------+---------------+-------------+
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 1 | 1 |
| 4 | 3 | 4 |
+------------+---------------+-------------+
How can I count on multiple columns to produce this result?
You can do this with a GROUP BY and a COUNT() of each item:
Select Store_Number,
Count(Distinct Entrance_Number) as Entrances,
Count(Camera_Number) As Cameras
From YourTable
Group By Store_Number
From what I can tell from your expected output, you're looking for the number of cameras that appear, whilst also looking for the DISTINCT number of entrances.
This will work as well,
DECLARE #store TABLE
( store_number INT,entrance_number INT,camera_number INT)
INSERT INTO #store VALUES(1,1,1),(1,1,2),(2,1,1),(2,2,1),
(2,2,2),(3,1,1),(4,1,1),(4,1,2),(4,2,1),(4,3,1)
SELECT AA.s store_number, BB.e entrances,AA.c cameras FROM (
SELECT s,COUNT(DISTINCT c) c FROM ( SELECT store_number s,
CONVERT(VARCHAR,store_number) + CONVERT(VARCHAR,entrance_number) +
CONVERT(VARCHAR,camera_number) c FROM #store ) A GROUP BY s ) AA
LEFT JOIN
( SELECT s,COUNT(DISTINCT e) e FROM ( SELECT store_number s,
CONVERT(VARCHAR,store_number) + CONVERT(VARCHAR,entrance_number) e
FROM #store ) B GROUP BY s ) BB ON AA.s = BB.s
Hope it helped. :)

select the most recent in all groups of with the same value in one column

The question isn't very clear, but I'll illustrate what I mean, suppose my table is like such:
item_name | date added | val1 | val2
------------------------------------
1 | date+1 | 10 | 20
1 | date | 12 | 21
2 | date+1 | 5 | 6
3 | date+3 | 3 | 1
3 | date+2 | 5 | 2
3 | date | 3 | 1
And I want to select row 1, 3, 4 as they are the most recent entries for each item
Try this:
select *
from tableX t1
where t1.date_added = (select max(t2.date_added)
from tableX t2
where t2.item_name = t1.item_name )