Converting data file to flatfile?

Converting data file to flatfile? - sql

I'm looking to convert a data file to a flat file format, with multiple hierarchical dimensions. I have included an example, but ideally, I will have an unknown number of columns that I wish to transform, while the hierarchical dimensions will be fixed.

If you have unknown or variable columns, you can dynamically UNPIVOT your data without using Dynamic SQL. Note that we only need to exclude the two key columns ... Where [key] not in ('Country','City')
Example
Select Country
,City
,Metric = B.[key]
,Value = B.Value
From YourTable A
Cross Apply ( Select *
From OpenJson( (Select A.* For JSON Path,Without_Array_Wrapper ) )
Where [key] not in ('Country','City')
) B
Returns
Country City Metric Value
US NY Snowfall 13
US NY Temp 94
US NY Snowfall 5
US NY Temp 84
UK London Snowfall 6
UK London Temp 85

you need to unpivot your data:
SELECT unpvt.country
, unpvt.city
, unpvt.metrics
, unpvt.Valuess
FROM
( SELECT * FROM tablename ) p
UNPIVOT ( Valuess FOR metrics IN ( snowfall , temp) ) unpvt

Related

SQL - Separating a merge field into separate fields based on delimiters

I have a table with an nvarchar(max) column including a merged text like below:
ID MyString
61 Team:Finance,Accounting,HR,Country:Global,
62 Country:Germany,
63 Team:Legal,
64 Team:Finance,Accounting,Country:Global,External:Tenants,Partners,
65 External:Vendors,
What I need is to create another table for each item having the Team, Country and External values separated into 3 different columns.
Id Team Country External
61 Finance,Accounting,HR Global NULL
62 NULL Germany NULL
63 Legal NULL NULL
64 Finance,Accounting Global Tenants,Partners
65 NULL NULL Vendors
What is the most efficient way to do it? I'm trying to use STRING_SPLIT but couldn't manage it.
Any help would be appreciated.

Please try the following solution.
Data resembles JSON, so we'll compose a proper JSON via few REPLACE() function calls.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO #tbl (ID, tokens) VALUES
(61, 'Team:Finance,Accounting,HR,Country:Global,'),
(62, 'Country:Germany,'),
(63, 'Team:Legal,'),
(64, 'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
(65, 'External:Vendors,');
-- DDL and sample data population, end
SELECT *
FROM #tbl
CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(REPLACE(TRIM(',' FROM tokens), ':', '": "')
,',Country', '", "Country')
,',External', '", "External') + '"}')
WITH
(
Team VARCHAR(100) '$.Team',
Country VARCHAR(100) '$.Country',
[External] VARCHAR(100) '$.External'
) AS u;
Output
+----+-------------------------------------------------------------------+-----------------------+---------+------------------+
| ID | tokens | Team | Country | External |
+----+-------------------------------------------------------------------+-----------------------+---------+------------------+
| 61 | Team:Finance,Accounting,HR,Country:Global, | Finance,Accounting,HR | Global | NULL |
| 62 | Country:Germany, | NULL | Germany | NULL |
| 63 | Team:Legal, | Legal | NULL | NULL |
| 64 | Team:Finance,Accounting,Country:Global,External:Tenants,Partners, | Finance,Accounting | Global | Tenants,Partners |
| 65 | External:Vendors, | NULL | NULL | Vendors |
+----+-------------------------------------------------------------------+-----------------------+---------+------------------+

Firstly, let me repeat my comments here. SQL Server is the last place you should be doing this; it's string manipulation is poor and you have a severely denormalised design, with denormalised data containing denormalised data. Fixing your design to a normalised approach must be a priority, as leaving your data in this state is only going to make things harder the further you go down this rabbit hole.
One method you could use to achieve this, however, would be with a JSON splitter and some restring aggregation, but this is real ugly. The choice of having the "column" and "row" delimiter to both be a comma (,) makes this a complete mess, and I am not going to explain what it's doing because you just should not be doing this.
WITH YourTable AS(
SELECT *
FROM (VALUES(61,'Team:Finance,Accounting,HR,Country:Global,'),
(62,'Country:Germany,'),
(63,'Team:Legal,'),
(64,'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
(65,'External:Vendors,'))V(ID,MyString)),
PartiallyNormal AS(
SELECT YT.ID,
CONVERT(int,LEAD(OJC.[Key],1,OJC.[Key]) OVER (PARTITION BY ID ORDER BY OJC.[Key], OJV.[Key])) AS ColumnNo,
OJV.[value],
CONVERT(int,OJC.[key]) AS [key]
FROM YourTable YT
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(YT.MyString,':','","'),'"]')) OJC
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(OJC.[value],',','","'),'"]')) OJV),
WithNames AS(
SELECT ID,
ColumnNo,
[value],
[key],
FIRST_VALUE(PN.[Value]) OVER (PARTITION BY ID, ColumnNo ORDER BY [Key]) AS ColumnName
FROM PartiallyNormal PN)
SELECT ID,
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Team' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Team, --TRIM because I've not taken the time to work out why there are sometimes a trailing comma
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Country' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Country,
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'External' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS [External]
FROM WithNames WN
WHERE [value] <> [ColumnName]
GROUP BY ID
ORDER BY ID;
db<>fiddle

STRING_SPLIT in SQL Server 2017 doesn't tell us the order of the items in the list, so it can't be used here.
Only SQL Server 2022 would add a parameter to STRING_SPLIT that would tell the order of the items.
Until that version of SQL Server the most efficient method would likely be the CLR. Write your parser in C# and call your function using CLR.

Another option is:
splitting the string using the STRING_SPLIT function on the colon
extracting consecutive strings using the LAG function
removing the string identifiers (Team, Country and External)
aggregating on the ID to remove NULL values
Here's the query:
WITH cte AS (
SELECT ID,
LAG(value) OVER(PARTITION BY ID ORDER BY (SELECT 1)) AS prev_value,
value
FROM tab
CROSS APPLY STRING_SPLIT(MyString, ':')
)
SELECT ID,
MAX(CASE WHEN prev_value LIKE 'Team'
THEN REPLACE(value, ',Country', '') END) AS [Team],
MAX(CASE WHEN prev_value LIKE '%Country'
THEN LEFT(value, LEN(value)-1) END) AS [Country],
MAX(CASE WHEN prev_value LIKE '%External'
THEN LEFT(value, LEN(value)-1) END) AS [External]
FROM cte
GROUP BY ID
Check the demo here.

Is there a method to simply transpose a table in SQL. This table contains Numeric and Varchar values

I would like to know how to transpose very simply a table in SQL. There is no sum or calculations to do.
This table contains Numeric and Varchar values.
Meaning, I have a table of 2 rows x 195 columns. I would like to have the same table with 195 rows x 2 columns (maybe 3 columns)
time_index
legal_entity_code
cohort
...
...
0
AAA
50
...
...
1
BBB
55
...
...
TO
Element
time_index_0
time_index_1
legal_entity_code
AAA
BBB
cohort
50
55
...
...
...
...
...
...
I have created this piece of code for testing
SELECT time_index, ValueT, FieldName
FROM (select legal_entity_code, cohort, time_index from ifrs17.output_bba where id in (1349392,1349034)) as T
UNPIVOT
(
ValueT
FOR FieldName in ([legal_entity_code],[cohort])
) as P
but I receive this error message :
The type of column "cohort" conflicts with the type of other columns specified in the UNPIVOT list.

I would recommend using apply for this. I don't fully follow the specified results because the query and the sample data are inconsistent in their naming.
I'm pretty sure you want:
select o.time_index, v.*
from ifrs17.output_bba o cross apply
(values ('Name1', o.name1),
('Value1', convert(varchar(max), o.value1)),
('Name2', o.name2)
) v(name, value)
where o.id in (1349392,1349034);

Gordon's approach is correct and certainly more performant. +1
However, if you want to dynamically unpivot 195 columns without having to list them all, consider the following:
Note: if not 2016+ ... there is a similar XML approach.
Example or dbFiddle
Select Element = [Key]
,Time_Index_0 = max(case when time_index=0 then value end)
,Time_Index_1 = max(case when time_index=1 then value end)
From (
Select [time_index]
,B.*
From YourTable A
Cross Apply (
Select [Key]
,Value
From OpenJson( (Select A.* For JSON Path,Without_Array_Wrapper ) )
Where [Key] not in ('time_index')
) B
) A
Group By [Key]
Returns
Element Time_Index_0 Time_Index_1
cohort 50 55
legal_entity_code AAA BBB

Each Column in Separate Row

How can I display each column in separate row and at the end add additional field.
For example I have this result:
ID ArticleName Brend1 Brend2 Brend3
== =========== ======== ======== ========
1 TestArticle 10001 20002 30003
I want to achieve this:
ID ArticleName BrandNo BrandName
== =========== ======= =========
1 TestArticle 10001 if column name = Brand1 Then Nike
1 TestArticle 20002 if column name = Brand2 Then Adidas
1 TestArticle 30003 if column name = Brand3 Then Mercedes
I can show each column in separate row, but how can I add additional column to the end of the result BrandName
Here is what I've done:
DECLARE #temTable TABLE
(
Id INT,
ArticleName VARCHAR(20),
Brand1 VARCHAR(20),
Brand2 VARCHAR(20),
Brand3 VARCHAR(20)
);
INSERT INTO #temTable
(
Id,
ArticleName,
Brand1,
Brand2,
Brand3
)
VALUES
(1, 'TestArticle', '10001', '20002', '30003');
SELECT Id,
ArticleName,
b.*
FROM #temTable a
CROSS APPLY
(
VALUES
(Brand1),
(Brand2),
(Brand3)
) b (Brand)
WHERE b.Brand IS NOT NULL;

You could use CROSS APPLY as
SELECT Id, ArticleName, Br BrandNo, Val BrandName
FROM #TemTable TT
CROSS APPLY(
VALUES
(Brand1, 'Nike'),
(Brand2, 'Adidas'),
(Brand3, 'Mercedes')
) T(Br, Val)
db-fiddle

I assume the brand is stored in another table, so you just need to add another column in your VALUES operator, and then join to the Brand Table:
SELECT Id,
ArticleName,
V.Brand
FROM #temTable a
CROSS APPLY (VALUES (1,Brand1),
(2,Brand2),
(3,Brand3)) V (BrandID,Brand)
JOIN dbo.Brand B ON V.BrandID = B.BrandID
WHERE V.Brand IS NOT NULL;

You can use UNPIVOT to achieve this. You can use either a case statement or another table variable to switch column names with brand names, I would prefer a table variable with a join it would make adding new column a bit easier.
DECLARE #d TABLE (ColNames VARCHAR(128) , BrandName VARCHAR(100))
INSERT INTO #d VALUES ('Brand1', 'Nike'),('Brand2', 'Adidas'),('Brand3', 'Mercedes')
SELECT up.Id
, up.ArticleName
, up.BrandNo
, d.BrandName
FROM #temTable
UNPIVOT (BrandNo FOR ColNames IN (Brand1,Brand2,Brand3)) up
INNER JOIN #d d ON d.ColNames = up.ColNames

Get column names and data to rows in SQL

I have a table with basic employee details as below:
Table: tblEmployees
EmpID Name Contact Sex
100 John 55555 M
200 Kate 44444 F
300 Sam 88888 M
I would like to get my query result as follows of a particular employee where EmpID = 200
Col1 Col2
EmpID 200
Name Kate
Sex F

You can use cross apply:
select t.*
from employees e
cross apply (values
('empid', cast(empid as varchar(100))),
('name', name),
('sex', sex)
) t(attr, value)
where e.empid = 200
Presumably, empid is a number, so explicit casting is needed (otherwise sql server will try to cast the name and sex to numbers, which will fail).
Demo on DB Fiddle:
attr | value
:---- | :----
empid | 200
name | Kate
sex | F

Or a less sophisticated solution involving 3 UNIONs, assuming the field names are predetermined in advance. This might perform better on large tables.
If you have performance issues, analyze the execution plan and make sure indexes are utilized optimally.
Since you are only looking for one particular employee at a time:
SELECT 'empid', convert(varchar(12), EmpID)
FROM tblEmployees
WHERE EmpID = 200
UNION ALL
SELECT 'name', name
FROM tblEmployees
WHERE EmpID = 200
UNION ALL
SELECT 'sex', sex
FROM tblEmployees
WHERE EmpID = 200
The first line does convert(varchar(12) under the assumption that EmpID is an int field.

Another option is with a little XML
Full Disclosure: Not as performant as GMB's CROSS APPLY (+1) or UNPIVOT. BUT it will dynamically unpivot virtually any row, table, view or ad-hoc query without actually using dynamic SQL.
Example
Declare #YourTable Table ([EmpID] varchar(50),[Name] varchar(50),[Contact] varchar(50),[Sex] varchar(50)) Insert Into #YourTable Values
(100,'John',55555,'M')
,(200,'Kate',44444,'F')
,(300,'Sam',88888,'M')
Select A.EmpID
,C.*
From #YourTable A
Cross Apply ( values (convert(xml,(select a.* for XML Raw ))) ) B(XMLData)
Cross Apply (
Select Item = xAttr.value('local-name(.)', 'varchar(100)')
,Value = xAttr.value('.','varchar(max)')
From XMLData.nodes('//#*') xNode(xAttr)
Where xAttr.value('local-name(.)', 'varchar(100)') not in ('EmpID','Other','Columns2Exclude')
) C
Returns
EmpID Item Value
100 Name John
100 Contact 55555
100 Sex M
200 Name Kate
200 Contact 44444
200 Sex F
300 Name Sam
300 Contact 88888
300 Sex M
EDIT - If Interested Here a TVF approach
Select A.EmpID
,B.*
From #YourTable A
Cross Apply [dbo].[tvf-XML-UnPivot-Row]((Select A.* for XML RAW)) B
The TVF
CREATE FUNCTION [dbo].[tvf-XML-UnPivot-Row](#XML xml)
Returns Table
As
Return (
Select Item = xAttr.value('local-name(.)', 'varchar(100)')
,Value = xAttr.value('.','varchar(max)')
From #XML.nodes('//#*') xNode(xAttr)
)

SQL Pivot Column that has multiple values for same column

Trying to pivot table results that may have multiple rows with the same value
I have data that looks like this so far.
Nbr Person Test
33 Barry. Prim
33 Brian Sup
33 Burke RT 1st
33 Ray Add
33 Jake Add
33 Smith Add
I'm trying to pivot it so that it looks like this:
Nbr Prim Sup 1st Add Add2 Add3
33 Barry Brian Burke Ray Jake Smith
This is what I have so far with a normal pivot but it doesn't work to grab all the ones with the same value in the Test Column
CREATE TABLE #testTbl(nbr int,name varchar(20),test VARCHAR(10))
INSERT INTO #testTbl
SELECT '33','Barry','Prim'
UNION
SELECT '33','Brian','Sup'
UNION
SELECT '33','Burke','1st'
UNION
SELECT '33','Ray','Add'
UNION
SELECT '33','jake','Add'
UNION
SELECT '33','Smith','Add'
select * from (
Select *
from #testTbl
) as x
pivot(
max(name) for test in ([prim],[sup],[1st],[add])
)
as pivot1
Any help is greatly appreciated. If its not possible to have the columns output as Add Add2 and Add3 thats fine. Whatever works.

You can do so by modifying the test value using window functions:
select *
from (Select tt.name,
(test + (case when count(*) over (partition by test) = 1
then ''
else cast(row_number() over (partition by test order by (select null)) as varchar(255))
end)) as test
from testTbl tt
) as x
pivot(
max(name) for test in ([prim], [sup], [1st], [Add1], [Add2], [Add3])
) as pivot1
A SQL Fiddle is here.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Converting data file to flatfile? - sql

I'm looking to convert a data file to a flat file format, with multiple hierarchical dimensions. I have included an example, but ideally, I will have an unknown number of columns that I wish to transform, while the hierarchical dimensions will be fixed.

you need to unpivot your data: SELECT unpvt.country , unpvt.city , unpvt.metrics , unpvt.Valuess FROM ( SELECT * FROM tablename ) p UNPIVOT ( Valuess FOR metrics IN ( snowfall , temp) ) unpvt

Related

SQL - Separating a merge field into separate fields based on delimiters

Is there a method to simply transpose a table in SQL. This table contains Numeric and Varchar values

Each Column in Separate Row

Get column names and data to rows in SQL

SQL Pivot Column that has multiple values for same column

Categories

Resources