Inserting and transforming data from SQL table - sql

I have a question which has been bugging me for a couple of days now. I have a table with:
Date
ID
Status_ID
Start_Time
End_Time
Status_Time(seconds) (How ling they were in a certain status, in seconds)
I want to put this data in another table, that has the Status_ID grouped up as columns. This table has columns like this:
Date
ID
Lunch (in seconds)
Break(in seconds)
Vacation, (in seconds) etc.
So, Status_ID 2 and 3 might be grouped under vacation, Status_ID 1 lunch, etc.
I have thought of doing a Case nested in a while loop, to go through every row to insert into my other table. However, I cannot wrap my head around inserting this data from Status_ID in rows, to columns that they are now grouped by.

There's no need for a WHILE loop.
SELECT
date,
id,
SUM(CASE WHEN status_id = 1 THEN status_time ELSE 0 END) AS lunch,
SUM(CASE WHEN status_id = 2 THEN status_time ELSE 0 END) AS break,
SUM(CASE WHEN status_id = 3 THEN status_time ELSE 0 END) AS vacation
FROM
My_Table
GROUP BY
date,
id
Also, keeping the status_time in the table is a mistake (unless it's a non-persistent, calculated column). You are effectively storing the same data in two places in the database, which is going to end up resulting in inconsistencies. The same goes for pushing this data into another table with times broken out by status type. Don't create a new table to hold the data, use the query to get the data when you need it.

This type of query (that transpose values from rows into columns) is named pivot query (SQL Server) or crosstab (Access).
There is two types of pivot queries (generally speaking):
With a fixed number of columns.
With a dynamic number of columns.
SQL Server support both types but:
Database Engine (query language: T-SQL) support directly only pivot
queries with a fixed number of columns(1) and indirectly (2)
Analysis Services (query language: MDX) support directly both types (1 & 2).
Also, you can query(MDX) Analysis Service data sources from T-SQL using OPENQUERY/OPENROWSET functions or using a linked server with four-part names.
T-SQL (only) solutions:
For the first type (1), starting with SQL Server 2005 you can use the PIVOT operator:
SELECT pvt.*
FROM
(
SELECT Date, Id, Status_ID, Status_Time
FROM Table
) src
PIVOT ( SUM(src.Status_Time) FOR src.Status_ID IN ([1], [2], [3]) ) pvt
or
SELECT pvt.Date, pvt.Id, pvt.[1] AS Lunch, pvt.[2] AS [Break], pvt.[3] Vacation
FROM
(
SELECT Date, Id, Status_ID, Status_Time
FROM Table
) src
PIVOT ( SUM(src.Status_Time) FOR src.Status_ID IN ([1], [2], [3]) ) pvt
For a dynamic number of columns (2), T-SQL offers only an indirect solution: dynamic queries. First, you must find all distinct values from Status_ID and the next move is to build the final query:
DECLARE #SQLStatement NVARCHAR(4000)
,#PivotValues NVARCHAR(4000);
SET #PivotValues = '';
SELECT #PivotValues = #PivotValues + ',' + QUOTENAME(src.Status_ID)
FROM
(
SELECT DISTINCT Status_ID
FROM Table
) src;
SET #PivotValues = SUBSTRING(#PivotValues,2,4000);
SELECT #SQLStatement =
'SELECT pvt.*
FROM
(
SELECT Date, Id, Status_ID, Status_Time
FROM Table
) src
PIVOT ( SUM(src.Status_Time) FOR src.Status_ID IN ('+#PivotValues+') ) pvt';
EXECUTE sp_executesql #SQLStatement;

Related

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate

Finding the number of concurrent days two events happen over the course of time using a calendar table

I have a table with a structure
(rx)
clmID int
patid int
drugclass char(3)
drugName char(25)
fillDate date
scriptEndDate date
strength int
And a query
;with PatientDrugList(patid, filldate,scriptEndDate,drugClass,strength)
as
(
select rx.patid,rx.fillDate,rx.scriptEndDate,rx.drugClass,rx.strength
from rx
)
,
DrugList(drugName)
as
(
select x.drugClass
from (values('h3a'),('h6h'))
as x(drugClass)
where x.drugClass is not null
)
SELECT PD.patid, C.calendarDate AS overlap_date
FROM PatientDrugList AS PD, Calendar AS C
WHERE drugClass IN ('h3a','h6h')
AND calendardate BETWEEN filldate AND scriptenddate
GROUP BY PD.patid, C.CalendarDate
HAVING COUNT(DISTINCT drugClass) = 2
order by pd.patid,c.calendarDate
The Calendar is simple a calendar table with all possible dates throughout the length of the study with no other columns.
My query returns data that looks like
The overlap_date represents every day that a person was prescribed a drug in the two classes listed after the PatientDrugList CTE.
I would like to find the number of consecutive days that each person was prescribed both families of drugs. I can't use a simple max and min aggregate because that wouldn't tell me if someone stopped this regimen and then started again. What is an efficient way to find this out?
EDIT: The row constructor in the DrugList CTE should be a parameter for a stored procedure and was amended for the purposes of this example.
You are looking for consecutive sequences of dates. The key observation is that if you subtract a sequence from the dates, you'll get a constant date. This defines a group of dates all in sequence, which can then be grouped.
select patid
,MIN(overlap_date) as start_overlap
,MAX(overlap_date) as end_overlap
from(select cte.*,(dateadd(day,row_number() over(partition by patid order by overlap_Date),overlap_date)) as groupDate
from cte
)t
group by patid, groupDate
This code is untested, so it might have some typos.
You need to pivot on something and a max and min work that out. Can you state if someone had both drugs on a date pivot? Then you would be limiting by date if I understand your question correctly.
EG Example SQL:
declare #Temp table ( person varchar(8), dt date, drug varchar(8));
insert into #Temp values ('Brett','1-1-2013', 'h3a'),('Brett', '1-1-2013', 'h6h'),('Brett','1-2-2013', 'h3a'),('Brett', '1-2-2013', 'h6h'),('Joe', '1-1-2013', 'H3a'),('Joe', '1-2-2013', 'h6h');
with a as
(
select
person
, dt
, max(case when drug = 'h3a' then 1 else 0 end) as h3a
, max(case when drug = 'h6h' then 1 else 0 end) as h6h
from #Temp
group by person, dt
)
, b as
(
select *, case when h3a = 1 and h6h = 1 then 1 end as Logic
from a
)
select person, count(Logic) as DaysOnBothPresriptions
from b
group by person

SQL Pivot Command

I am looking for some help on designing a simple pivot so that I can link it into other parts of my queries.
My data is like this
Items Table
Below is my table if I run Select * from items
ITEM Weight
12345 10
12345 11
654321 50
654321 20
654321 100
There are hundreds of Items in this table but each item code will only ever have
maximum of 3 weight records each.
I want the desired output
ITEM Weight_1 Weight_2 Weight_3
12345 10 11 null
654321 50 20 100
Would appreciate any suggestions,
I have played around with pivots but each subsequent item puts the weights into weight 4,5,6,7,etc
instead of starting at weight1 for each item.
Thanks
Update
Below is what I have used so far,
SELECT r.*
FROM (SELECT 'weight' + CAST(Row_number() OVER (ORDER BY regtime ASC)AS
VARCHAR(10))
line,
id,
weight
FROM items it) AS o PIVOT(MIN([weight]) FOR line IN (weight1, weight2,
weight3)) AS r
You were almost there! You were only missing the PARTITION BY clause in OVER:
SELECT r.*
FROM (SELECT 'weight' + CAST(Row_number() OVER (PARTITION BY id ORDER BY
regtime ASC)
AS
VARCHAR(10)) line,
id,
weight
FROM items it) AS o PIVOT(MIN([weight]) FOR line IN (weight1, weight2,
weight3)) AS r
When you PARTITION BY by ID, the row numbers are reset for each different ID.
Update
You do not need dynamic pivot, since you will always have 3 weights. But, if you ever need dynamic number of columns, take a look at some of the examples here:
SQL Server PIVOT perhaps?
Pivot data in T-SQL
How do I build a summary by joining to a single table with SQL Server?
You will need a value to form the columns which I do with row_number. The outcome is what you want. The only negative that I have against PIVOT is that you need to know how many columns in advance. I use a similar method, but build up the select as dynamic SQL and can then insert my columns.
EDIT: updated to show columns as weight1, weight2, etc.
create table #temp (Item int, Weight int)
insert into #temp (Item, Weight)
Values (12345, 10),
(12345, 11),
(654321, 50),
(654321, 20),
(654321, 200)
SELECT *
FROM (SELECT Item,
Weight,
'weight' + cast(Row_number()
OVER (partition by Item order by item) as varchar(10)) as seq
FROM #temp) as Src
PIVOT ( MAX(Weight) FOR Seq IN ([Weight1], [Weight2], [Weight3]) ) as PVT
MySQL
Whenever you need a pivot, use group_concat it will output a CSV list of the values you need.
Once you get used to working with it, it's a great tool.
SELECT item, GROUP_CONCAT(weight) as weights FROM table1
GROUP BY item
See: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
TSQL aka SQL-server
Many many questions on this because T-SQL supports a pivot keyword.
See:
Transact SQL Query-Pivot-SQL
Pivot data in T-SQL

SQL PIVOT TABLE

I have the following data:
ID Data
1 tera
1 add
1 alkd
2 adf
2 add
3 wer
4 minus
4 add
4 ten
I am trying to use a pivot table to push the rows into 1 row with multiple columns per ID.
So as follows:
ID Custom1 Custom2 Custom3 Custom4..........
1 tera add alkd
2 adf add
3 wer
4 minus add ten
I have the following query so far:
INSERT INTO #SpeciInfo
(ID, [Custom1], [Custom2], [Custom3], [Custom4], [Custom5],[Custom6],[Custom7],[Custom8],[Custom9],[Custom10],[Custom11],[Custom12],[Custom13],[Custom14],[Custom15],[Custom16])
SELECT
ID,
[Custom1],
[Custom2],
[Custom3],
[Custom4],
[Custom5],
[Custom6],
[Custom7],
[Custom8],
[Custom9],
[Custom10],
[Custom11],
[Custom12],
[Custom13],
[Custom14],
[Custom15],
[Custom16]
FROM SpeciInfo) p
PIVOT
(
(
[Custom1],
[Custom2],
[Custom3],
[Custom4],
[Custom5],
[Custom6],
[Custom7],
[Custom8],
[Custom9],
[Custom10],
[Custom11],
[Custom12],
[Custom13],
[Custom14],
[Custom15],
[Custom16]
)
) AS pvt
ORDER BY ID;
I need the 16 fields, but I am not exactly sure what I do in the From clause or if I'm even doing that correctly?
Thanks
If what you seek is to dynamically build the columns, that is often called a dynamic crosstab and cannot be done in T-SQL without resorting to dynamic SQL (building the string of the query) which is not recommended. Instead, you should build that query in your middle tier or reporting application.
If you simply want a static solution, an alternative to using PIVOT of what you seek might look something like so in SQL Server 2005 or later:
With NumberedItems As
(
Select Id, Data
, Row_Number() Over( Partition By Id Order By Data ) As ColNum
From SpeciInfo
)
Select Id
, Min( Case When Num = 1 Then Data End ) As Custom1
, Min( Case When Num = 2 Then Data End ) As Custom2
, Min( Case When Num = 3 Then Data End ) As Custom3
, Min( Case When Num = 4 Then Data End ) As Custom4
...
From NumberedItems
Group By Id
One serious problem in your original data is that there is no indicator of sequence and thus there is no means for the system to know which item for a given ID should appear in the Custom1 column as opposed to the Custom2 column. In my query above, I arbitrarily ordered by name.

Weighted average in T-SQL (like Excel's SUMPRODUCT)

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):
(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)
The first part reflects the same functionality as Excel's SUMPRODUCT() function.
My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.
EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:
select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Weighty'
) baseSeries inner join (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Tons Milled'
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate
Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...
SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'
The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string
Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)
EDIT:
In your lastest example you highlight three fields:
RecordDate
KPI
Actual
When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.
Some questions I have are:
Are there any other fields?
Is there only ever ONE actual per date per KPI?
The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)
Regardless, a slight simplification of your query is certainly possible...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'
The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.
If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
This assumes the AVG of the weight is valid if there are multiple weights for the same day.
EDIT : Someone just voted for this so I thought I'd improve the final answer :)
SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate
This avoids the JOIN and also only scans the table once.
It relies on the fact that NULL values are ignored when calculating the AVG().
SELECT SUM(A * B) / SUM(A)
FROM mytable
If I have understand the problem then try this
SET DATEFORMAT dmy
declare #tbl table(A int, B int,recorddate datetime,KPI varchar(50))
insert into #tbl
select 1,10 ,'21/01/2009', 'Weighty'union all
select 2,20,'10/01/2009', 'Tons Milled' union all
select 3,30 ,'03/02/2009', 'xyz'union all
select 4,40 ,'10/01/2009', 'Weighty'union all
select 5,50 ,'05/01/2009', 'Tons Milled'union all
select 6,60,'04/01/2009', 'abc' union all
select 7,70 ,'05/01/2009', 'Weighty'union all
select 8,80,'09/01/2009', 'xyz' union all
select 9,90 ,'05/01/2009', 'kws' union all
select 10,100,'05/01/2009', 'Tons Milled'
select SUM(t1.A*t2.A)/SUM(t2.A)Result from
(select RecordDate,A,B,KPI from #tbl)t1
inner join(select RecordDate,A,B,KPI from #tbl t)t2
on t1.RecordDate = t2.RecordDate
and t1.KPI = t2.KPI