Convert CSV stored in a string variable to table - sql

I've got CSV data stored in a string variable in SQL:
#csvContent =
'date;id;name;position;street;city
19.03.2019 10:06:00;1;Max;President;Langestr. 35;Berlin
19.04.2019 12:36:00;2;Bernd;Vice President;Haupstr. 40;Münster
21.06.2019 14:30:00;3;Franziska;financial;Hofstr. 19;Frankfurt'
What I want to do is to convert it to a #table, so it would look like
SELECT * FROM #table
date id name position street city
---------------------------------------------------------------------
19.03.2019 10:06:00 1 Max President Langestr. 35 Berlin
19.04.2019 12:36:00 2 Bernd Vice President Haupstr. 40 Münster
21.06.2019 14:30:00 3 Franzi financial Hofstr. 19 Frankfurt
The headers aren't fixed so the CSV could have more or less columns with differnt Header names.
I've tried it with split_string() and pivot but didn't find a solution for this.

If you are using SQL server, this might be a solution for your request:
How to split a comma-separated value to columns

Hope it will help you
CREATE TABLE #temp(
date date,
id int ,
name varchar(100),
. ....... //create column that you needed
)
DECLARE #sql NVARCHAR(4000) = 'BULK INSERT #temp
FROM ''' + #CSVFILE+ ''' WITH
(
FIELDTERMINATOR ='';'',
ROWTERMINATOR =''\n'',
FIRSTROW = 2
)';
EXEC(#sql);
SELECT *FROM #temp
DROP TABLE #temp

Related

Use table of column metadata as column header and type

Tables
dbo.Metadata:
Column
Type
ID
int
Name
varchar(50)
Location
varchar(50)
dbo.Data:
Col1
Col2
Col3
1
Awesomenauts Inc.
Germany
2
DataMunchers
France
3
WeBuyStuff
France
Wanted Output:
ID
Name
Location
1
Awesomenauts Inc.
Germany
2
DataMunchers
France
3
WeBuyStuff
France
Is there any simple way to achieve this?
Perhaps with Dynamic SQL?
Oh, and the schema may wary from day to day, everything will be batch reloaded into the DWH daily.
You will need to have some sort of order defined in your metadata for this to work. For my script, I added ColumnOrder for reference
/*Setup Metadata table*/
DROP TABLE IF EXISTS #Metadata
CREATE TABLE #Metadata (
ColumnOrder INT IDENTITY(1,1) PRIMARY KEY /*Need to have some sort of defined column order, I created one for illustration purposes*/
,[Column] SYSNAME
,[Type] VARCHAR(255)
)
/*Load data*/
INSERT INTO #Metadata
VALUES
('ID','int')
,('Name','varchar(50)')
,('Location','varchar(50)')
/*Create dynamic SQL*/
DECLARE #DynamicSQL NVARCHAR(MAX);
/*Create column list*/
;WITH cte_Column AS (
SELECT ColumnOrder,
[Column]
,[Type]
,DataColName = CONCAT('Col',Row_Number () OVER (ORDER BY A.ColumnOrder))
FROM #Metadata AS A
)
SELECT #DynamicSQL
= STRING_AGG(
Concat(QUOTENAME([Column])
,' = CAST ('
,DataColName
,' AS '
,A.[Type]
,')')
,CONCAT(CHAR(13),CHAR(10),',') /*Line break + comma separators*/
)
WITHIN GROUP (ORDER BY A.ColumnOrder) /*Ensures columns concatenated in order*/
FROM cte_Column AS A
Set #DynamicSQL = CONCAT('SELECT ',#DynamicSQL,CHAR(13),CHAR(10),' FROM dbo.Data')
PRINT #DynamicSQL
/*Uncomment to execute*/
--EXEC (#DynamicSQL)

SQL: Deleting Identical Columns With Different Names

My original table ("original_table") looks like this (contains both numeric and character variables):
age height height2 gender gender2
1 18 76.1 76.1 M M
2 19 77.0 77.0 F F
3 20 78.1 78.1 M M
4 21 78.2 78.2 M M
5 22 78.8 78.8 F F
6 23 79.7 79.7 F F
I would like to remove columns from this table that have identical entries, but are named differently. In the end, this should look like this ("new_table"):
age height gender
1 18 76.1 M
2 19 77.0 F
3 20 78.1 M
4 21 78.2 M
5 22 78.8 F
6 23 79.7 F
My Question: Is there a standard way to do this in SQL? I tried to do some research and came across the following link : How do I compare two columns for equality in SQL Server?
What I Tried So Far: It seems that something like this might work:
CREATE TABLE new_table AS SELECT * FROM original_table;
ALTER TABLE new_table
ADD does_age_equal_height varchar(255);
UPDATE new_table
SET does_age_equal_height = CASE
WHEN age = height THEN '1' ELSE '0' END AS does_age_equal_height;
From here, if the "sum" of all values in the "does_age_equal_height" column equals to the number of rows from "new_table" (i.e. select count(rownum) from new_table) - this must mean that both columns are equal, and that one of the columns can be dropped.
However, this is a very inefficient method, even for tables having a small number of columns. In my example, I have 5 columns - this means that I would have to repeat the above process " 5C2" times, i.e. 5! / (2!*3!) = 10 times. For example:
ALTER TABLE employees
ADD does_age_equal_height varchar(255),
does_age_equal_height2 varchar(255)
does_age_equal_gender varchar(255)
does_age_equal_gender2 varchar(255)
does_height_equal_height2 varchar(255)
does_height_equal_gender varchar(255)
does_height_equal_gender2 varchar(255)
does_height2_equal_gender varchar(255)
does_height2_equal_gender2 varchar(255)
does_gender_equal_gender2 varchar(255);
This would then be followed by multiple CASE statements - further complicating the process.
Can someone please show me a more efficient way of doing this?
Thanks!
I hope to get your problem in the right way. This is my code in SqlServer to handle it, you should customize it based on Netezza SQL.
My idea is:
Calculate MD5 for each column and then compare these columns together, if there is the same hash, one of the columns will be chosen.
I going to create the below table for this problem:
CREATE TABLE Students
(
Id INT PRIMARY KEY IDENTITY,
StudentName VARCHAR (50),
Course VARCHAR (50),
Score INT,
lastName VARCHAR (50) -- another alias for StudentName ,
metric INT, -- another alias for score
className VARCHAR(50) -- another alias for Course
)
GO
INSERT INTO Students VALUES ('Sally', 'English', 95, 'Sally', 95, 'English');
INSERT INTO Students VALUES ('Sally', 'History', 82, 'Sally', 82, 'History');
INSERT INTO Students VALUES ('Edward', 'English', 45, 'Edward', 45, 'English');
INSERT INTO Students VALUES ('Edward', 'History', 78, 'Edward', 78, 'History');
after creating the table and inserting sample records, it turns to find similar columns.
step 1. Declare variables.
DECLARE #cols_q VARCHAR(max),
#cols VARCHAR(max),
#table_name VARCHAR(max)= N'Students',
#res NVARCHAR(max),
#newCols VARCHAR(max),
#finalResQuery VARCHAR(max);
step 2. Generate dynamics query for calculating a hash for every column.
SELECT #cols_q = COALESCE(#cols_q+ ', ','')+'HASHBYTES(''MD5'', CONVERT(varbinary(max), (select '+ COLumn_NAME +' as t from Students FOR XML AUTO))) as '+ COLumn_NAME,
#cols = coalesce(#cols + ',','')+COLumn_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table_name;
set #cols_q = 'select '+ #cols_q +' into ##tmp_'+ #table_name+' from '+ #table_name;
step 3. Run generated query.
exec(#cols_q)
step 4. Get columns that duplicated columns removed.
set #res = N'select uniq_colname into ##temp_colnames
from(
select max(colname) as uniq_colname from (
select * from ##tmp_Students
)tt
unpivot (
md5_hash for colname in ( '+ #cols +')
) as tbl
group by md5_hash
)tr';
exec ( #res);
step 5. Get final results
select #newCols = COALESCE(#newCols+ ', ','')+ uniq_colname from ##temp_colnames
set #finalResQuery = 'select '+ #newCols +' from '+ #table_name;
exec (#finalResQuery)

SQL CSV as Query Results Column

I have the following SQL which queries a single table, single row, and returns the results as a comma separate string e.g.
Forms
1, 10, 4
SQL :
DECLARE #tmp varchar(250)
SET #tmp = ''
SELECT #tmp = #tmp + Form_Number + ', '
FROM Facility_EI_Forms_Required
WHERE Facility_ID = 11 AND EI_Year=2012 -- single Facility, single year
SELECT SUBSTRING(#tmp, 1, LEN(#tmp) - 1) AS Forms
The Facility_EI_Forms_Required table has three records for Facility_ID = 11
Facility_ID EI_Year Form_Number
11 2012 1
11 2012 10
11 2012 4
Form_number is a varchar field.
And I have a Facility table with Facility_ID and Facility_Name++.
How do I create a query to query all Facilites for a given year and produce the CSV output field?
I have this so far:
DECLARE #tmp varchar(250)
SET #tmp = ''
SELECT TOP 100 A.Facility_ID, A.Facility_Name,
(
SELECT #tmp = #tmp + B.Form_Number + ', '
FROM B
WHERE B.Facility_ID = A.Facility_ID
AND B.EI_Year=2012
)
FROM Facility A, Facility_EI_Forms_Required B
But it gets syntax errors on using #tmp
My guess is this is too complex a task for a query and a stored procedure may be need, but I have little knowledge of SPs. Can this be done with a nested query?
I tried a Scalar Value Function
ALTER FUNCTION [dbo].[sp_func_EI_Form_List]
(
-- Add the parameters for the function here
#p1 int,
#pYr int
)
RETURNS varchar
AS
BEGIN
-- Declare the return variable here
DECLARE #Result varchar
-- Add the T-SQL statements to compute the return value here
DECLARE #tmp varchar(250)
SET #tmp = ''
SELECT #tmp = #tmp + Form_Number + ', '
FROM OIS..Facility_EI_Forms_Required
WHERE Facility_ID = #p1 AND EI_Year = #pYr -- single Facility, single year
SELECT #Result = #tmp -- SUBSTRING(#tmp, 1, LEN(#tmp) - 1)-- #p1
-- Return the result of the function
RETURN #Result
END
The call
select Facility_ID, Facility.Facility_Name,
dbo.sp_func_EI_Form_List(Facility_ID,2012)
from facility where Facility_ID=11
returns
Facility_ID Facility_Name Form_List
11 Hanson Aggregates 1
so it is only returning the first record instead of all three. What am I doing wrong?
Try the following approach, which is an analogy to SO answer Concatenate many rows into a single text string. I hope it is correct, as I cannot try it out without having the schema and some demo data (maybe you can add schema and data to your question):
Select distinct A.Facility_ID, A.Facility_Name,
substring(
(
Select ',' + B.Form_Number AS [text()]
From Facility_EI_Forms_Required B
Where B.Facility_ID = A.Facility_ID
AND B.EI_Year=2012
ORDER BY B.Facility_ID
For XML PATH ('')
), 2, 1000) [Form_List]
From Facility A

update comma separated values in a column

Before we begin, this is a bad situation / design, but I am not the able to fix the design as the column is used by an application and database that I ( my company ) don't own. I do not control the new or old values either.
I have a table similar to this:
Create Table #tblCRigrationTest
(
ReasonString VarChar(1000),
ReasonStringNew VarChar(1000)
)
With data like this:
Insert Into #tblCRigrationTest ( ReasonString, ReasonStringNew )
Values ('5016|5005|5006|5032|5020|5010|5007|5011|5012|5028|5024|5008|5029', '')
What I need to do is "loop" through each ID and based on its value, update it, concatenate it into a new string, and then store it in the ReasonStringNew column. The new ID's appear in the second column below:
Old New
--------------
5005 1
5006 2
5020 3
5032 4
5010 5
5007 6
5011 7
5012 8
5028 9
5024 10
5008 11
5016 12
5009 13
5029 14
Any suggestions on how to do this?
just take your column values in a temp table then try to update
SET #STRSQL = 'SELECT ''' + REPLACE('Yourcolumn, ',',
''' ,(your saparator)''') + ''''
DECLARE #tbl TABLE
(
col1 VARCHAR(100) ,
)
INSERT INTO #tbl
EXECUTE ( #STRSQL
)

SQL Splitting data in a column into a new table

I am being given data constantly in the following format:
Items Instrument
------- ---------
1|2|3 201400001
2|3 201400002
3 201400003
1|4 201400004
and need to output:
Item Instrument
------- ---------
1 201400001
2 201400001
3 201400001
2 201400002
3 201400002
3 201400003
1 201400004
4 201400004
My answer is a stored function or procedure I know, but which? further, do i write the function or procedure to accept columns from any database i have or will i need to write this on a case by case basis? essentially these are reports being turned in often and i am asked to get them into sql so the data can be analyzed further.
i hope this makes sense. thank you in advance for your time.
If you have a list of all possible items, then you can do:
select i.itemId, t.instrument
from table t join
items i
on '|' + t.items + '|' like '%|'+cast(i.itemId as varchar(255))+'|%';
Original Code: http://dotnetbites.com/split-strings-into-table
Modified for your case:
CREATE FUNCTION [dbo].[SplitStringToTable]
(
#InputString VARCHAR(MAX) = ''
, #Delimiter CHAR(1) = ','
, #Instrument char(8)
)
RETURNS #RESULT TABLE(ID INT IDENTITY, Items VARCHAR(1000), Instrument char(8))
AS
BEGIN
DECLARE #XML XML
SELECT #XML = CONVERT(XML, SQL_TEXT)
FROM (
SELECT '<root><item>'
+ REPLACE(#InputString, #Delimiter, '</item><item>')
+ '</item></root>' AS SQL_TEXT
) dt
INSERT INTO #RESULT(Items, Instrument)
SELECT t.col.query('.').value('.', 'VARCHAR(1000)') AS Items , #Instrument As Instrument
FROM #XML.nodes('root/item') t(col)
RETURN
END