Related
What I am trying to achieve is group them by id and create a column for the date as well as data.
The background of the dataset are it is lab result taken by participant and some test are not able to be taken on same day due to fasting restrictions n etc. The database I am using is SQL Server.
Below are my DataSet as well as the desired output.
Sample dataset:
create table Sample
(
Id int,
LAB_DATE date,
A_CRE_1 varchar(100),
B_GLUH_1 varchar(100),
C_LDL_1 varchar(100),
D_TG_1 varchar(100),
E_CHOL_1 varchar(100),
F_HDL_1 varchar(100),
G_CRPH_1 varchar(100),
H_HBA1C_1 varchar(100),
I_GLU120_1 varchar(100),
J_GLUF_1 varchar(100),
K_HCR_1 varchar(100)
)
insert into Sample(Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01, '2017-11-21', '74', '6.4', '2.04', '4.17', '1.64', '6.1', '2.54')
insert into sample (Id, LAB_DATE, I_GLU120_1)
values (01, '2017-11-22','8.8')
insert into sample (Id, LAB_DATE, D_TG_1)
values (01, '2017-11-23','0.56')
insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,K_HCR_1)
values (2,'2018-10-02','57','8.91','2.43','1.28','3.99','1.25','3.19')
insert into sample (Id,LAB_DATE,H_HBA1C_1)
values (2,'2018-10-03','8.6')
insert into sample (Id,LAB_DATE,J_GLUF_1)
values (2,'2018-10-04','7.8')
insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,G_CRPH_1,H_HBA1C_1,K_HCR_1)
values (3,'2016-10-01','100','6.13','3.28','0.94','5.07','1.19','0.27','5.8','4.26')
Desired output:
ID|LAB_DATE|A_CRE_1|B_GLUH_1|C_LDL_1|Date_TG_1|D_TG_1|E_CHOL_1|F_HDL_1|G_CRPH_1|H_HBA1C_1|Date_GLU120_1|I_GLU120_1|J_GLUF_1|K_HCR_1
1|2017-11-21|74|6.4|2.04|2017-11-23|0.56|4.17|1.64|||6.1|2017-11-22|8.8|||2.54
2|02/10/2018|57|8.91|2.43||1.28|3.99|1.25||03/10/2018|8.6|||04/10/2018|7.8|3.19
3|01/10/2016|100|6.13|3.28||0.94|5.07|1.19|0.27||5.8|||||4.26
Here's a solution (that cannot cope with multiple rows of the same id/sample type - you haven't said what to do with those)
select * from
(select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1 from sample) s1
INNER JOIN
(select Id, LAB_DATE as glu120date, I_GLU120_1 from sample) s2
ON s1.id = s2.id
(select Id, LAB_DATE as dtgdate, D_TG_1 from sample) s3
ON s1.id = s3.id
Hopefully you get the idea with this pattern; if you have other sample types with their own dates, break them out of s1 and into their own subquery in a similar way (eg make an s4 for e_chol_1, s5 for k_hcr_1 etc). Note that if any sample type is missing it will cause the whole row to disappear from the results. If this is not desired and you accept NULL for missing samples, use LEFT JOIN instead of INNER
If there will be multiple samples for patient 01 and you only want the latest, the pattern becomes:
select * from
(select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1,
row_number() over(partition by id order by lab_date desc) rn
from sample) s1
INNER JOIN
(select Id, LAB_DATE as glu120date, I_GLU120_1,
row_number() over(partition by id order by lab_date desc) rn
from sample) s2
ON s1.id = s2.id and s1.rn = s2.rn
WHERE
s1.rn = 1
Note the addition of row_number() over(partition by id order by lab_date desc) rn - this establishes an incrementing counter in descending date order(latest record = 1, older = 2 ...) that restarts from 1 for every different id. We join on it too then say where rn = 1 to pick only the latest records for each sample type
As #Ben suggested, you can use group by id and take min for all column like below one.
DECLARE #Sample as table (
Id int,
LAB_DATE date,
A_CRE_1 varchar(100),
B_GLUH_1 varchar(100),
C_LDL_1 varchar(100),
D_TG_1 varchar(100),
E_CHOL_1 varchar(100),
F_HDL_1 varchar(100),
G_CRPH_1 varchar(100),
H_HBA1C_1 varchar(100),
I_GLU120_1 varchar(100),
J_GLUF_1 varchar(100),
K_HCR_1 varchar(100))
insert into #Sample(Id, LAB_DATE,A_CRE_1,
B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01,'2017-11-21','74','6.4','2.04','4.17','1.64','6.1','2.54')
insert into #Sample (Id, LAB_DATE, I_GLU120_1)
values (01, '2017-11-22','8.8')
insert into #Sample (Id, LAB_DATE, D_TG_1)
values (01, '2017-11-23','0.56')
SELECT s.Id
, MIN(s.LAB_DATE) AS LAB_DATE
, MIN(s.A_CRE_1) AS A_CRE_1
, MIN(s.B_GLUH_1) AS B_GLUH_1
, MIN(s.C_LDL_1) AS C_LDL_1
, MIN(s.D_TG_1) AS D_TG_1
, MIN(s.E_CHOL_1) AS E_CHOL_1
, MIN(s.F_HDL_1) AS F_HDL_1
, MIN(s.G_CRPH_1) AS G_CRPH_1
, MIN(s.H_HBA1C_1) AS H_HBA1C_1
, MIN(s.I_GLU120_1) AS I_GLU120_1
, MIN(s.J_GLUF_1) AS J_GLUF_1
, MIN(s.K_HCR_1) AS K_HCR_1
FROM #Sample AS s
GROUP BY s.Id
You can also check the SQL Server STUFF function. Can take help from the below link
https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-for-sql-server-data/
Following on from my comments about presenting the original data, here's what I think you should do (taking the query you commented)
SELECT
ID,
MAX(CASE WHEN TestID='1' THEN Results END) [Test_1],
MAX(CASE WHEN TestID='2' THEN Results END) [Test_2],
MAX(CASE WHEN TestID='1' THEN Result_Date_Time END) Test12Date,
MAX(CASE WHEN TestID='3' THEN Results END) [Test_3],
MAX(CASE WHEN TestID='3' THEN Result_Date_Time END) Test3Date
FROM [tbBloodSample]
GROUP BY ID
ORDER BY ID
Notes: If TestID is an int, don't use strings like '1' in your query, use ints. You don't need an ELSE NULL in a case- null is the default if the when didn't work out
Here is a query pattern. Test1 and 2 are always done on the same day, hence why I only pivot their date once. Test 3 might be done later, might be same, this means the dates in test12date and test3date might be same, might be different
Convert the strings to dates after you do the pivot, to reduce the number of conversions
I'm loading some quite nasty data through Azure data factory
This is how the data looks after being loaded, existing of 2 parts:
1. Metadata of a test
2. Actual measurements of the test -> the measurement is numeric
Image I have about 10 times such 'packages' of 1.Metadata + 2.Measurements
What I would like it to be / what I'm looking for is the following:
The number column with 1,2,.... is what I'm looking for!
Imagine my screenshot could go no further but this goes along until id=10
I guess a while loop is necessary here...
Query before:
SELECT Field1 FROM Input
Query after:
SELECT GeneratedId, Field1 FROM Input
Thanks a lot in advance!
EDIT: added a hint:
Here is a solution, this requires SQL-SERVER 2012 or later.
Start by getting an Id column on your data. If you can do this previous to the script that would be even better, but if not, try something like this...
CREATE TABLE #InputTable (
Id INT IDENTITY(1, 1),
TestData NVARCHAR(MAX) )
INSERT INTO #InputTable (TestData)
SELECT Field1 FROM Input
Now create a query to get the GeneratedId of each package as well as the Id where they start and end. You can do this by getting all the records LIKE 'title%' since that is the first record of each package, then using ROW_NUMBER, Id, and LEAD for the GeneratedId, StartId, and EndId respectively.
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%'
Lastly, join this to the input in order to get all the records, with the correct GeneratedId.
SELECT
package.GeneratedId, i.TestData
FROM (
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%' ) package
INNER JOIN #InputTable i
ON i.Id >= package.StartId
AND (package.EndId IS NULL OR i.Id < package.EndId)
Afternoon, I have the following SQL command:
SELECT INVOICE_ID,
ITEM_ID,
ORDER_NO,
CLIENT_STATE
FROM CUSTOMER_ORDER_INV_JOIN
WHERE ORDER_NO = '*1007';
This pulls out the following information:
[enter image description here][1]
There is a specific Criteria that I want to reach and that is the following:
On Order No: *1007, If client state on all lines = PaidPosted then I need another column to show 'PaidPosted' on all lines.
However On Order No: *1007, If Client State on 4 Lines = 'PaidPosted' but 1 or more lines = 'PostedAuth' then I need another column where all lines to show 'PostedAuth'. However if all of the lines are NULL I need a column where all lines show 'No Invoice'.
Hopefully this makes more sense.
I think this will get you what you need.
You can create a Temporary Table that has your sort order:
CREATE TABLE #Sort_Order
(myOrder INT,
CLIENT_STATE NVARCHAR(20)
)
INSERT INTO #Sort_Order
VALUES(1, 'Preliminary')
INSERT INTO #Sort_Order
VALUES(2, 'PostedAuth')
INSERT INTO #Sort_Order
VALUES(3, 'PaidPosted')
Then you can just join it to your table and run a RANK function on it like so:
SELECT
A.*,
DENSE_RANK() OVER (PARTITION BY A.ID
ORDER BY B.myOrder ASC) AS OrderRank
FROM #Temp A
INNER JOIN #Sort_Order B On (A.CLIENT_STATE = B.CLIENT_STATE)
WHERE A.ID = 1
This will give you results by RANK and you can use a WHERE statement to filter on only RANK = 1
If your data has multiple rows of the same Client State, you will need to do some kind of DISTINCT or GROUP BY.
Tablename: EntryTable
ID CharityName Title VoteCount
1 save the childrens save them 1
2 save the childrens saving childrens 3
3 cancer research support them 10
Tablename: ContestantTable
ID FirstName LastName EntryId
1 Neville Vyland 1
2 Abhishek Shukla 1
3 Raghu Nandan 2
Desired output
CharityName FullName
save the childrens Neville Vyland
Abhishek Shukla
cancer research Raghu Nandan
I tried
select LOWER(ET.CharityName) AS CharityName,COUNT(CT.FirstName) AS Total_No_Of_Contestant
from EntryTable ET
join ContestantTable CT
on ET.ID = CT.ID
group by LOWER(ET.CharityName)
Please advice.
Please have a look at this sqlfiddle.
Have a try with this query:
SELECT
e.CharityName,
c.FirstName,
c.LastName,
sq.my_count
FROM
EntryTable e
INNER JOIN ContestantTable c ON e.ID = c.EntryId
INNER JOIN (
SELECT EntryId, COUNT(*) AS my_count FROM ContestantTable GROUP BY EntryId
) sq ON e.ID = sq.EntryId
I assumed you actually wanted to join with ContestantTable's EntryId column. It made more sense to me. Either way (joining my way or yours) your sample data is faulty.
Apart from that, you didn't want repeating CharityNames. That's not the job of SQL. The database is just there to store and retrieve the data. Not to format it nicely. You want to work with the data on application layer anyways. Removing repeating data doesn't make this job easier, it makes it worse.
Most people do not realize that T-SQL has some cool ranking functions that can be used with grouping. Many things like reports can be done in T-SQL.
The first part of the code below creates two local temporary tables and loads them with data for testing.
The second part of the code creates the report. I use two common table expressions (CTE). I could have used two more local temporary tables or table variables. It really does not matter with this toy example.
The cte_RankData has two columns RowNum and RankNum. If RowNum = RankNum, we are on the first instance of charity. Print out the charity name and the total number of votes. Otherwise, print out blanks.
The name of the contestant and votes for that contestant are show on the detail lines. This is a typical report with sub totals show at the top.
I think this matches the report output that you wanted. I ordered the contestants by most votes descending.
Sincerely
John Miner
www.craftydba.com
--
-- Create the tables
--
-- Remove the tables
drop table #tbl_Entry;
drop table #tbl_Contestants;
-- The entries table
Create table #tbl_Entry
(
ID int,
CharityName varchar(25),
Title varchar(25),
VoteCount int
);
-- Add data
Insert Into #tbl_Entry values
(1, 'save the childrens', 'save them', 1),
(2, 'save the childrens', 'saving childrens', 3),
(3, 'cancer research', 'support them', 10)
-- The contestants table
Create table #tbl_Contestants
(
ID int,
FirstName varchar(25),
LastName varchar(25),
EntryId int
);
-- Add data
Insert Into #tbl_Contestants values
(1, 'Neville', 'Vyland', 1),
(2, 'Abhishek', 'Shukla', 1),
(3, 'Raghu', 'Nandan', 2);
--
-- Create the report
--
;
with cte_RankData
as
(
select
ROW_NUMBER() OVER (ORDER BY E.CharityName ASC, VoteCount Desc) as RowNum,
RANK() OVER (ORDER BY E.CharityName ASC) AS RankNum,
E.CharityName as CharityName,
C.FirstName + ' ' + C.LastName as FullName,
E.VoteCount
from #tbl_Entry E inner join #tbl_Contestants C on E.ID = C.ID
),
cte_SumData
as
(
select
E.CharityName,
sum(E.VoteCount) as TotalCount
from #tbl_Entry E
group by E.CharityName
)
select
case when RowNum = RankNum then
R.CharityName
else
''
end as rpt_CharityName,
case when RowNum = RankNum then
str(S.TotalCount, 5, 0)
else
''
end as rpt_TotalVotes,
FullName as rpt_ContestantName,
VoteCount as rpt_Votes4Contestant
from cte_RankData R join cte_SumData S
on R.CharityName = S.CharityName
Say I have the following table:
id|myId|Name
-------------
1 | 3 |Bob
2 | 3 |Chet
3 | 3 |Dave
4 | 4 |Jim
5 | 4 |Jose
-------------
Is it possible to use a recursive CTE to generate the following output:
3 | Bob, Chet, Date
4 | Jim, Jose
I've played around with it a bit but haven't been able to get it working. Would I do better using a different technique?
I do not recommend this, but I managed to work it out.
Table:
CREATE TABLE [dbo].[names](
[id] [int] NULL,
[myId] [int] NULL,
[name] [char](25) NULL
) ON [PRIMARY]
Data:
INSERT INTO names values (1,3,'Bob')
INSERT INTO names values 2,3,'Chet')
INSERT INTO names values 3,3,'Dave')
INSERT INTO names values 4,4,'Jim')
INSERT INTO names values 5,4,'Jose')
INSERT INTO names values 6,5,'Nick')
Query:
WITH CTE (id, myId, Name, NameCount)
AS (SELECT id,
myId,
Cast(Name AS VARCHAR(225)) Name,
1 NameCount
FROM (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e
WHERE id = 1
UNION ALL
SELECT e1.id,
e1.myId,
Cast(Rtrim(CTE.Name) + ',' + e1.Name AS VARCHAR(225)) AS Name,
CTE.NameCount + 1 NameCount
FROM CTE
INNER JOIN (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e1
ON e1.id = CTE.id + 1
AND e1.myId = CTE.myId)
SELECT myID,
Name
FROM (SELECT myID,
Name,
(Row_number() OVER (PARTITION BY myId ORDER BY namecount DESC)) AS id
FROM CTE) AS p
WHERE id = 1
As requested, here is the XML method:
SELECT myId,
STUFF((SELECT ',' + rtrim(convert(char(50),Name))
FROM namestable b
WHERE a.myId = b.myId
FOR XML PATH('')),1,1,'') Names
FROM namestable a
GROUP BY myId
A CTE is just a glorified derived table with some extra features (like recursion). The question is, can you use recursion to do this? Probably, but it's using a screwdriver to pound in a nail. The nice part about doing the XML path (seen in the first answer) is it will combine grouping the MyId column with string concatenation.
How would you concatenate a list of strings using a CTE? I don't think that's its purpose.
A CTE is just a temporarily-created relation (tables and views are both relations) which only exists for the "life" of the current query.
I've played with the CTE names and the field names. I really don't like reusing fields names like id in multiple places; I tend to think those get confusing. And since the only use for names.id is as a ORDER BY in the first ROW_NUMBER() statement, I don't reuse it going forward.
WITH namesNumbered as (
select myId, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY id
) as nameNum
FROM names
)
, namesJoined(myId, Name, nameCount) as (
SELECT myId,
Cast(Name AS VARCHAR(225)),
1
FROM namesNumbered nn1
WHERE nameNum = 1
UNION ALL
SELECT nn2.myId,
Cast(
Rtrim(nc.Name) + ',' + nn2.Name
AS VARCHAR(225)
),
nn.nameNum
FROM namesJoined nj
INNER JOIN namesNumbered nn2 ON nn2.myId = nj.myId
and nn2.nameNum = nj.nameCount + 1
)
SELECT myId, Name
FROM (
SELECT myID, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY nameCount DESC
) AS finalSort
FROM namesJoined
) AS tmp
WHERE finalSort = 1
The first CTE, namesNumbered, returns two fields we care about and a sorting value; we can't just use names.id for this because we need, for each myId value, to have values of 1, 2, .... names.id will have 1, 2 ... for myId = 1 but it will have a higher starting value for subsequent myId values.
The second CTE, namesJoined, has to have the field names specified in the CTE signature because it will be recursive. The base case (part before UNION ALL) gives us records where nameNum = 1. We have to CAST() the Name field because it will grow with subsequent passes; we need to ensure that we CAST() it large enough to handle any of the outputs; we can always TRIM() it later, if needed. We don't have to specify aliases for the fields because the CTE signature provides those. The recursive case (after the UNION ALL) joins the current CTE with the prior one, ensuring that subsequent passes use ever-higher nameNum values. We need to TRIM() the prior iterations of Name, then add the comma and the new Name. The result will be, implicitly, CAST()ed to a larger field.
The final query grabs only the fields we care about (myId, Name) and, within the subquery, pointedly re-sorts the records so that the highest namesJoined.nameCount value will get a 1 as the finalSort value. Then, we tell the WHERE clause to only give us this one record (for each myId value).
Yes, I aliased the subquery as tmp, which is about as generic as you can get. Most SQL engines require that you give a subquery an alias, even if it's the only relation visible at that point.