SQL PIVOT TABLE - sql

I have the following data:
ID Data
1 tera
1 add
1 alkd
2 adf
2 add
3 wer
4 minus
4 add
4 ten
I am trying to use a pivot table to push the rows into 1 row with multiple columns per ID.
So as follows:
ID Custom1 Custom2 Custom3 Custom4..........
1 tera add alkd
2 adf add
3 wer
4 minus add ten
I have the following query so far:
INSERT INTO #SpeciInfo
(ID, [Custom1], [Custom2], [Custom3], [Custom4], [Custom5],[Custom6],[Custom7],[Custom8],[Custom9],[Custom10],[Custom11],[Custom12],[Custom13],[Custom14],[Custom15],[Custom16])
SELECT
ID,
[Custom1],
[Custom2],
[Custom3],
[Custom4],
[Custom5],
[Custom6],
[Custom7],
[Custom8],
[Custom9],
[Custom10],
[Custom11],
[Custom12],
[Custom13],
[Custom14],
[Custom15],
[Custom16]
FROM SpeciInfo) p
PIVOT
(
(
[Custom1],
[Custom2],
[Custom3],
[Custom4],
[Custom5],
[Custom6],
[Custom7],
[Custom8],
[Custom9],
[Custom10],
[Custom11],
[Custom12],
[Custom13],
[Custom14],
[Custom15],
[Custom16]
)
) AS pvt
ORDER BY ID;
I need the 16 fields, but I am not exactly sure what I do in the From clause or if I'm even doing that correctly?
Thanks

If what you seek is to dynamically build the columns, that is often called a dynamic crosstab and cannot be done in T-SQL without resorting to dynamic SQL (building the string of the query) which is not recommended. Instead, you should build that query in your middle tier or reporting application.
If you simply want a static solution, an alternative to using PIVOT of what you seek might look something like so in SQL Server 2005 or later:
With NumberedItems As
(
Select Id, Data
, Row_Number() Over( Partition By Id Order By Data ) As ColNum
From SpeciInfo
)
Select Id
, Min( Case When Num = 1 Then Data End ) As Custom1
, Min( Case When Num = 2 Then Data End ) As Custom2
, Min( Case When Num = 3 Then Data End ) As Custom3
, Min( Case When Num = 4 Then Data End ) As Custom4
...
From NumberedItems
Group By Id
One serious problem in your original data is that there is no indicator of sequence and thus there is no means for the system to know which item for a given ID should appear in the Custom1 column as opposed to the Custom2 column. In my query above, I arbitrarily ordered by name.

Related

How to select the best item in each group?

I have table reports:
id
file_name
1
jan.xml
2
jan.csv
3
feb.csv
In human language: there are reports for each month. Each report could be in XML or CSV format. There could be 1-2 reports for each month in unique format.
I want to select the reports for all months, picking only 1 file for each month. The XML format is more preferable.
So, expected output is:
id
file_name
1
jan.xml
3
feb.csv
Explanation: the file jan.csv was excluded since there is more preferable report for that month: jan.xml.
As mentioned in the comments your data structure has a number of challenges. It really needs a column for ReportDate or something along those lines that is a date/datetime so you know which month the report belongs to. That would also give you something to sort by when you get your data back. Aside from those much needed improvements you can get the desired results from your sample data with something like this.
create table SomeFileTable
(
id int
, file_name varchar(10)
)
insert SomeFileTable
select 1, 'jan.xml' union all
select 2, 'jan.csv' union all
select 3, 'feb.csv'
select s.id
, s.file_name
from
(
select *
, FileName = parsename(file_name, 2)
, FileExtension = parsename(file_name, 1)
, RowNum = ROW_NUMBER() over(partition by parsename(file_name, 2) order by case parsename(file_name, 1) when 'xml' then 1 else 2 end)
from SomeFileTable
) s
where s.RowNum = 1
--ideally you would want to order the results but you don't have much of anything to work with in your data as a reliable sorting order since the dates are implied by the file name
You may want to use a window function that ranks your rows by partitioning on the month and ordering by the format name, by working on the file_name field.
WITH ranked_reports AS (
SELECT
id,
file_name,
ROW_NUMBER() OVER(
PARTITION BY LEFT(file_name, 3)
ORDER BY RIGHT(file_name, 3) DESC
) AS rank
FROM
reports
)
SELECT
id,
file_name
FROM
ranked_reports
WHERE
rank = 1

SQL Server iterating through time series data

I am using SQL Server and wondering if it is possible to iterate through time series data until specific condition is met and based on that label my data in other table?
For example, let's say I have a table like this:
Id Date Some_kind_of_event
+--+----------+------------------
1 |2018-01-01|dsdf...
1 |2018-01-06|sdfs...
1 |2018-01-29|fsdfs...
2 |2018-05-10|sdfs...
2 |2018-05-11|fgdf...
2 |2018-05-12|asda...
3 |2018-02-15|sgsd...
3 |2018-02-16|rgw...
3 |2018-02-17|sgs...
3 |2018-02-28|sgs...
What I want to get, is to calculate for each key the difference between two adjacent events and find out if there exists difference > 10 days between these two adjacent events. In case yes, I want to stop iterating for that specific key and put label 'inactive', otherwise 'active' in my other table. After we finish with one key, we start with another.
So for example id = 1 would get label 'inactive' because there exists two dates which have difference bigger that 10 days. The final result would be like that:
Id Label
+--+----------+
1 |inactive
2 |active
3 |inactive
Any ideas how to do that? Is it possible to do it with SQL?
When working with a DBMS you need to get away from the idea of thinking iteratively. Instead you need to try and think in sets. "Instead of thinking about what you want to do to a row, think about what you want to do to a column."
If I understand correctly, is this what you're after?
CREATE TABLE SomeEvent (ID int, EventDate date, EventName varchar(10));
INSERT INTO SomeEvent
VALUES (1,'20180101','dsdf...'),
(1,'20180106','sdfs...'),
(1,'20180129','fsdfs..'),
(2,'20180510','sdfs...'),
(2,'20180511','fgdf...'),
(2,'20180512','asda...'),
(3,'20180215','sgsd...'),
(3,'20180216','rgw....'),
(3,'20180217','sgs....'),
(3,'20180228','sgs....');
GO
WITH Gaps AS(
SELECT *,
DATEDIFF(DAY,LAG(EventDate) OVER (PARTITION BY ID ORDER BY EventDate),EventDate) AS EventGap
FROM SomeEvent)
SELECT ID,
CASE WHEN MAX(EventGap) > 10 THEN 'inactive' ELSE 'active' END AS Label
FROM Gaps
GROUP BY ID
ORDER BY ID;
GO
DROP TABLE SomeEvent;
GO
This assumes you are using SQL Server 2012+, as it uses the LAG function, and SQL Server 2008 has less than 12 months of any kind of support.
Try this. Note, replace #MyTable with your actual table.
WITH Diffs AS (
SELECT
Id
,DATEDIFF(DAY,[Date],LEAD([Date],1,0) OVER (ORDER BY [Id], [Date])) Diff
FROM #MyTable)
SELECT
Id
,CASE WHEN MAX(Diff) > 10 THEN 'Inactive' ELSE 'Active' END
FROM Diffs
GROUP BY Id
Just to share another approach (without a CTE).
SELECT
ID
, CASE WHEN SUM(TotalDays) = (MAX(CNT) - 1) THEN 'Active' ELSE 'Inactive' END Label
FROM (
SELECT
ID
, EventDate
, CASE WHEN DATEDIFF(DAY, EventDate, LEAD(EventDate) OVER(PARTITION BY ID ORDER BY EventDate)) < 10 THEN 1 ELSE 0 END TotalDays
, COUNT(ID) OVER(PARTITION BY ID) CNT
FROM EventsTable
) D
GROUP BY ID
The method is counting how many records each ID has, and getting the TotalDays by date differences (in days) between the current the next date, if the difference is less than 10 days, then give me 1, else give me 0.
Then compare, if the total days equal the number of records that each ID has (minus one) would print Active, else Inactive.
This is just another approach that doesn't use CTE.

Merge Value Of Multiple Record based on similar criteria in Access/SQL/Excel

Currently I have a Table of Rows looked like this
I would like to merge all Rows with same FlNo to a single Row, the data of merged row follow by these criteria:
'FlNo' remain the same
'Start' would be the earliest Date
'End' would be the lastest date
'Pattern' would represent the day of week, so it would be combination of all day of weeks that appeared in every rows (ie. if Row 1 have Pattern = "12347", Row 2 = "34567", combined Pattern would = "1234567", ie2: If Row1 = "357", Row2 = "357", combined Pattern would remain the same = "357"). This part has bothered me most as I haven't found the algorithm to solve it.
'AC_Name' would be the value which appeared most time for a FlNo (in this case would be 32)
So the Final Row would be
FlNo | Start | End | Pattern | AC_Name |
660 | 26/Mar/2017 | 28/Oct/2017 | 1234567 | 32 |
As the original Data is an Excel Spreadsheet so the solution should be provided based on Excel (VBA)/Access (VBA/SQL) environment. It could process in Excel first then Import to Access or Import to Access then process in there or half/half). Personally I would prefer to process in Access and SQL as there is about 13000s Rows of Data.
Please help me to find a solution to process this data. Thank you guys a lot.
once you have properly fixed you data structure for you pattern column
you could use min(), max() and group by .. united to a selected table with max for count
select
t1.FlNo
, min(t1.Start )
, max( t1.End)
, max(D1)
, max(D2)
, max(D3)
, max(D4)
, max(D5)
, max(D6)
, max(D7)
, t2.AC_Name
from my_table t1
INNER JOIN (
select FlNo, AC_Name, max(my_count) from (
select FlNo, AC_Name , count(*) AS my_count
from my_table
group by FlNo, AC_Name ) t
GROUP BY lNo, AC_Name
having my_count = max(my_count)
) t2 on t1.FlNo = t2.FlNo
Once you have fixed the data, the query for all but Ac_Name would simply be:
select FINo, min(start), max(end),
max(IsMonday), max(IsTuesday), . . .
from t
group by FINo;
Getting Ac_Name is tricky. This should work:
select FINo, min(start), max(end),
max(IsMonday), max(IsTuesday), . . .,
(select top 1 ac_name
from t as t2
where t2.FINo = t.FINo
group by ac_name
order by count(*) desc, ac_name
) as ac_name
from t
group by FINo;

Index number for records within a pipe-delimited field inside a csv

I have a csv that I'm bringing into a SQL table. The csv has a field within it for CrimeType. That field is pipe delimited. So, I'm using cross apply to break up the pipe, like this:
SELECT CrimeRecords.CaseNum, CrimeRecords.Offense, PrimaryCrime.PrimaryCrime
FROM (SELECT CaseNum ,x.i.value('.','varchar(20)') AS Offense
FROM (SELECT CaseNum, CONVERT(XML,'<i>'+REPLACE(CrimeType, '|', '</i><i>') + '</i>') AS d
FROM CrimeView.dbo.tblCrimeData)x1 CROSS APPLY d.nodes('i') AS x(i)) AS CrimeRecords
Can someone help me add a step to create a field for a sequence number? Basically I just want to return the order of the items in the pipe.
For rows like:
1, Burglary|Assault
2, Burglary
3, Assault|Assault-Weapon|Theft
My result table would look like this:
CaseNum CrimeType SeqNum
1 Burglary 1
1 Assault 2
2 Burglary 1
3 Assault 1
3 Assault-Weapon 2
3 Theft 3
Edit to show that the Sequence Number resets for each CaseNum.
Edit tags to clarify that this is Microsoft SQL, not MySQL.
Try including the ROW_NUMBER() function in your SELECT statement (http://technet.microsoft.com/en-us/library/ms186734.aspx).
i.e.
SELECT ROW_NUMBER() OVER (PARTITION BY CrimeRecords.CaseNum ORDER BY CrimeRecords.CaseNum) As Idx, CrimeRecords.CaseNum, CrimeRecords.Offense, PrimaryCrime.PrimaryCrime
FROM (SELECT CaseNum ,x.i.value('.','varchar(20)') AS Offense
FROM (SELECT CaseNum, CONVERT(XML,'<i>'+REPLACE(CrimeType, '|', '</i><i>') + '</i>') AS d
FROM CrimeView.dbo.tblCrimeData)x1 CROSS APPLY d.nodes('i') AS x(i)) AS CrimeRecords
Edit: Included Partition By to reset the sequence for each case.
if you have a simple table CrimeRecords like CaseNum | CrimeType
you have to do something like this
SELECT CaseNum,CrimeType, #row:=#row+1 SeqNum
FROM CrimeRecords a JOIN (SELECT #row := 0) b;
ok.. I cant see crearly in your query and i cant try it in a db, buy try to use the query i shared.
It is just an example to show how you can add numbers in order 1,2,3...x from some elements in the rows.. so try to mix it code in your query and reestart the #row each time the group change..
so you ll get it

Inserting and transforming data from SQL table

I have a question which has been bugging me for a couple of days now. I have a table with:
Date
ID
Status_ID
Start_Time
End_Time
Status_Time(seconds) (How ling they were in a certain status, in seconds)
I want to put this data in another table, that has the Status_ID grouped up as columns. This table has columns like this:
Date
ID
Lunch (in seconds)
Break(in seconds)
Vacation, (in seconds) etc.
So, Status_ID 2 and 3 might be grouped under vacation, Status_ID 1 lunch, etc.
I have thought of doing a Case nested in a while loop, to go through every row to insert into my other table. However, I cannot wrap my head around inserting this data from Status_ID in rows, to columns that they are now grouped by.
There's no need for a WHILE loop.
SELECT
date,
id,
SUM(CASE WHEN status_id = 1 THEN status_time ELSE 0 END) AS lunch,
SUM(CASE WHEN status_id = 2 THEN status_time ELSE 0 END) AS break,
SUM(CASE WHEN status_id = 3 THEN status_time ELSE 0 END) AS vacation
FROM
My_Table
GROUP BY
date,
id
Also, keeping the status_time in the table is a mistake (unless it's a non-persistent, calculated column). You are effectively storing the same data in two places in the database, which is going to end up resulting in inconsistencies. The same goes for pushing this data into another table with times broken out by status type. Don't create a new table to hold the data, use the query to get the data when you need it.
This type of query (that transpose values from rows into columns) is named pivot query (SQL Server) or crosstab (Access).
There is two types of pivot queries (generally speaking):
With a fixed number of columns.
With a dynamic number of columns.
SQL Server support both types but:
Database Engine (query language: T-SQL) support directly only pivot
queries with a fixed number of columns(1) and indirectly (2)
Analysis Services (query language: MDX) support directly both types (1 & 2).
Also, you can query(MDX) Analysis Service data sources from T-SQL using OPENQUERY/OPENROWSET functions or using a linked server with four-part names.
T-SQL (only) solutions:
For the first type (1), starting with SQL Server 2005 you can use the PIVOT operator:
SELECT pvt.*
FROM
(
SELECT Date, Id, Status_ID, Status_Time
FROM Table
) src
PIVOT ( SUM(src.Status_Time) FOR src.Status_ID IN ([1], [2], [3]) ) pvt
or
SELECT pvt.Date, pvt.Id, pvt.[1] AS Lunch, pvt.[2] AS [Break], pvt.[3] Vacation
FROM
(
SELECT Date, Id, Status_ID, Status_Time
FROM Table
) src
PIVOT ( SUM(src.Status_Time) FOR src.Status_ID IN ([1], [2], [3]) ) pvt
For a dynamic number of columns (2), T-SQL offers only an indirect solution: dynamic queries. First, you must find all distinct values from Status_ID and the next move is to build the final query:
DECLARE #SQLStatement NVARCHAR(4000)
,#PivotValues NVARCHAR(4000);
SET #PivotValues = '';
SELECT #PivotValues = #PivotValues + ',' + QUOTENAME(src.Status_ID)
FROM
(
SELECT DISTINCT Status_ID
FROM Table
) src;
SET #PivotValues = SUBSTRING(#PivotValues,2,4000);
SELECT #SQLStatement =
'SELECT pvt.*
FROM
(
SELECT Date, Id, Status_ID, Status_Time
FROM Table
) src
PIVOT ( SUM(src.Status_Time) FOR src.Status_ID IN ('+#PivotValues+') ) pvt';
EXECUTE sp_executesql #SQLStatement;