Export data from a non-normalized database

Export data from a non-normalized database - sql

I need to export data from a non-normalized database where there are multiple columns to a new normalized database.
One example is the Products table, which has 30 boolean columns (ValidSize1, ValidSize2 ecc...) and every record has a foreign key which points to a Sizes table where there are 30 columns with the size codes (XS, S, M etc...). In order to take the valid sizes for a product I have to scan both tables and take the value SizeCodeX from the Sizes table only if ValidSizeX on the product is true. Something like this:
Products Table
--------------
ProductCode <PK>
Description
SizesTableCode <FK>
ValidSize1
ValidSize2
[...]
ValidSize30
Sizes Table
-----------
SizesTableCode <PK>
SizeCode1
SizeCode2
[...]
SizeCode30
For now I am using a "template" query which I repeat for 30 times:
SELECT
Products.Code,
Sizes.SizesTableCode, -- I need this code because different codes can have same size codes
Sizes.Size_1
FROM Products
INNER JOIN Sizes
ON Sizes.SizesTableCode = Products.SizesTableCode
WHERE Sizes.Size_1 IS NOT NULL
AND Products.ValidSize_1 = 1
I am just putting this query inside a loop and I replace the "_1" with the loop index:
SET #counter = 1;
SET #max = 30;
SET #sql = '';
WHILE (#counter <= #max)
BEGIN
SET #sql = #sql + ('[...]'); -- Here goes my query with dynamic indexes
IF #counter < #max
SET #sql = #sql + ' UNION ';
SET #counter = #counter + 1;
END
INSERT INTO DestDb.ProductsSizes EXEC(#sql); -- Insert statement
GO
Is there a better, cleaner or faster method to do this? I am using SQL Server and I can only use SQL/TSQL.

You can prepare a dynamic query using the SYS.Syscolumns table to get all value in row
DECLARE #SqlStmt Varchar(MAX)
SET #SqlStmt=''
SELECT #SqlStmt = #SqlStmt + 'SELECT '''+ name +''' column , UNION ALL '
FROM SYS.Syscolumns WITH (READUNCOMMITTED)
WHERE Object_Id('dbo.Products')=Id AND ([Name] like 'SizeCode%' OR [Name] like 'ProductCode%')
IF REVERSE(#SqlStmt) LIKE REVERSE('UNION ALL ') + '%'
SET #SqlStmt = LEFT(#SqlStmt, LEN(#SqlStmt) - LEN('UNION ALL '))
print ( #SqlStmt )

Well, it seems that a "clean" (and much faster!) solution is the UNPIVOT function.
I found a very good example here:
http://pratchev.blogspot.it/2009/02/unpivoting-multiple-columns.html

Related

SQL return values if row count > X

DECLARE #sql_string varchar(7000)
set #sql_string = (select top 1 statement from queries where name = 'report name')
EXECUTE (#sql_string)
#sql_string is holding another SQL statement. This query works for me. It returns all the values from the query from the statement on the queries table. From this, I need to figure out how to only return the results IF the number of rows returned exceeds a threshold (for my particular case, 25). Else return nothing. I can't quite figure out how to get this conditional statement to work.
Much appreciated for any direction on this.

If all the queries return the same columns, you could simply store the data in a temporary table or table variable and then use logic such as:
select t.*
from #t t
where (select count(*) from #t) > 25;
An alternative is to try constructing a new query from the existing query. I don't recommend trying to parse the existing string, if you can avoid that. Assuming that the query does not use CTEs or have an ORDER BY clause, for instance, something like this should work:
set #sql = '
with q as (
' + #sql + '
)
select q.*
from q
where (select count(*) from q) > 25
';

That did the trick #Gordon. Here was my final:
DECLARE #report_name varchar(100)
DECLARE #sql_string varchar(7000)
DECLARE #sql varchar(7000)
DECLARE #days int
set #report_name = 'Complex Pass Failed within 1 day'
set #days = 5
set #sql_string = (select top 1 statement from queries where name = #report_name )
set #sql = 'with q as (' + #sql_string + ') select q.* from q where (select count(*) from q) > ' + convert(varchar(100), #days)
EXECUTE (#sql)
Worked with 2 nuances.
The SQL returned could not include an end ";" charicter
The statement cannot include an "order by" statement

simplify if statement with stored procedure

i have a following stored procedure where i am repeating similar code. all i am doing is checking the condition based on Sample id1, sampleid2, and sample id3 to follow in similar fashion. The value of 'y' goes on till about it reaches 10, so it's going to be a big 'if' condition based statements. i was trying to see if a better solution could be put in place. thanks.
#select = 'select * from tbl Sample......'
if(x = 1 and y=1)
set #where = 'where Sample.id1 >=1 and <=10'
if(x = 1 and y=2)
set #where = 'where Sample.id1 >=11 and <=20'
if(x=2 and y=1)
set #where = 'where Sample.id2 >=1 and <= 10'
if(x=2 and y=2)
set #where = 'where Sample.id2 >=11 and <=20'
if(x=3 and y=1)
set #where = 'where Sample.id3 >=1 and <=10'
if(x=3 and y=2)
set #where = 'where Sample.id3 >=11 and <=20' //increment goes on
exec(#select+#where)

In general, if there is no easy correlation between the values of x, y and the filtered columns id1, id2 etc, then you could move the where predicates into a table keyed by values of x and y, and then use this as a lookup to apply to your PROC. Assuming the SPROC is used heavily, the lookup table can be made permanent and indexed on your x,y input mapping columns.
CREATE TABLE dbo.WhereMappings
(
x INT,
y INT,
Predicate NVARCHAR(MAX),
CONSTRAINT PK_MyWhereMappings PRIMARY KEY(x, y)
)
INSERT INTO dbo.WhereMappings(x, y, Predicate) VALUES
(1, 1, 'Sample.id1 > 5 and Sample.id2 <= 10'),
(1, 2, 'Sample.id1 > 7 and Sample.id2 <= 15'),
(2, 1, 'Sample.id2 > 2 and Sample.id3 <= 18');
Your proc then simplifies to:
CREATE PROC MyProc(#x INT, #y INT) AS
BEGIN
DECLARE #sql NVARCHAR(MAX);
DECLARE #predicate NVARCHAR(MAX);
SELECT TOP 1 #predicate = Predicate
FROM dbo.WhereMappings WHERE x = #x AND y = #y;
-- TODO THROW if predicate not mapped
SET #sql = CONCAT('SELECT * FROM Sample WHERE ', #predicate);
EXECUTE(#sql);
END;
Re : What does this solve
Although this hasn't necessarily reduced the complexity of the original queries, it does however allow for a data-only maintenance approach to the mappings, e.g. Admin UI screens could be written to maintain (and validate! think Sql Injection) the predicate mappings, without the need for direct modification to the SPROC.
Edit
After your edit, it does appear that there is a correlation between x, y and the filtered column and range used in the idx predicates, viz x sets the column, and y sets the range between.
In that case, simply append the value of x to an id column name stub, and multiply out the value of the BETWEEN clause to y*10 - 9 to y * 10;

You may do something like this:
select
*
from
tbl Sample
where
(#x=1 and #y=1 and Sample.id1>=..and Sample.id1<=..) --(or you could use between)
OR (#x=1 and #y=2 and Sample.id1>=..and Sample.id1<=..)
..

set #select = 'select * from tbl Sample......'
set #where = 'where Sample.id'+convert(nvarchar(10),#x)+' >=....and <=...'
exec(#select+#where)

I would suggest to use another sql table which will have information of all these condition like shown in below screenshot.
Then use join in your sql query like.(Assume above table has name Limit
select * from tbl Sample smpl
inner join Limit lmt
on #x=lmt.x and #y=lmt.y and
(
(#x=1 and smpl.id1 >= lmt.Min_limit and smpl.id1 <=lmt.Max_limit) or
(#x=2 and smpl.id2 >= lmt.Min_limit and smpl.id2 <=lmt.Max_limit) or
(#x=3 and smpl.id3 >= lmt.Min_limit and smpl.id3 <=lmt.Max_limit)
)
In this I have tried to avoid dynamic query.

I usually try to find a relation between inputs and outputs and in this case I found this way:
SET #where = 'WHERE Sample.id{0} >= {1} + 1 and <= {1} + 10'
SET #where = REPLACE(#where, '{0}', CAST(x AS varchar(5)))
SET #where = REPLACE(#where, '{1}', CAST((y - 1) AS varchar(5)))

I think you want something like:
SET #where = 'where Sample.id' + CAST(#x AS VARCHAR(10)) + ' between ' +
CAST((#y - 1) * 10 + 1 AS VARCHAR(10)) + ' and ' +
CAST(#y * 10 AS VARCHAR(10))

How to UPDATE all columns of a record without having to list every column

I'm trying to figure out a way to update a record without having to list every column name that needs to be updated.
For instance, it would be nice if I could use something similar to the following:
// the parts inside braces are what I am trying to figure out
UPDATE Employee
SET {all columns, without listing each of them}
WITH {this record with id of '111' from other table}
WHERE employee_id = '100'
If this can be done, what would be the most straightforward/efficient way of writing such a query?

It's not possible.
What you're trying to do is not part of SQL specification and is not supported by any database vendor. See the specifications of SQL UPDATE statements for MySQL, Postgresql, MSSQL, Oracle, Firebird, Teradata. Every one of those supports only below syntax:
UPDATE table_reference
SET column1 = {expression} [, column2 = {expression}] ...
[WHERE ...]

This is not posible, but..
you can doit:
begin tran
delete from table where CONDITION
insert into table select * from EqualDesingTabletoTable where CONDITION
commit tran
be carefoul with identity fields.

Here's a hardcore way to do it with SQL SERVER. Carefully consider security and integrity before you try it, though.
This uses schema to get the names of all the columns and then puts together a big update statement to update all columns except ID column, which it uses to join the tables.
This only works for a single column key, not composites.
usage: EXEC UPDATE_ALL 'source_table','destination_table','id_column'
CREATE PROCEDURE UPDATE_ALL
#SOURCE VARCHAR(100),
#DEST VARCHAR(100),
#ID VARCHAR(100)
AS
DECLARE #SQL VARCHAR(MAX) =
'UPDATE D SET ' +
-- Google 'for xml path stuff' This gets the rows from query results and
-- turns into comma separated list.
STUFF((SELECT ', D.'+ COLUMN_NAME + ' = S.' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #DEST
AND COLUMN_NAME <> #ID
FOR XML PATH('')),1,1,'')
+ ' FROM ' + #SOURCE + ' S JOIN ' + #DEST + ' D ON S.' + #ID + ' = D.' + #ID
--SELECT #SQL
EXEC (#SQL)

In Oracle PL/SQL, you can use the following syntax:
DECLARE
r my_table%ROWTYPE;
BEGIN
r.a := 1;
r.b := 2;
...
UPDATE my_table
SET ROW = r
WHERE id = r.id;
END;
Of course that just moves the burden from the UPDATE statement to the record construction, but you might already have fetched the record from somewhere.

How about using Merge?
https://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx
It gives you the ability to run Insert, Update, and Delete. One other piece of advice is if you're going to be updating a large data set with indexes, and the source subset is smaller than your target but both tables are very large, move the changes to a temporary table first. I tried to merge two tables that were nearly two million rows each and 20 records took 22 minutes. Once I moved the deltas over to a temp table, it took seconds.

If you are using Oracle, you can use rowtype
declare
var_x TABLE_A%ROWTYPE;
Begin
select * into var_x
from TABLE_B where rownum = 1;
update TABLE_A set row = var_x
where ID = var_x.ID;
end;
/
given that TABLE_A and TABLE_B are of same schema

It is possible. Like npe said it's not a standard practice. But if you really have to:
1. First a scalar function
CREATE FUNCTION [dte].[getCleanUpdateQuery] (#pTableName varchar(40), #pQueryFirstPart VARCHAR(200) = '', #pQueryLastPart VARCHAR(200) = '', #pIncludeCurVal BIT = 1)
RETURNS VARCHAR(8000) AS
BEGIN
DECLARE #pQuery VARCHAR(8000);
WITH cte_Temp
AS
(
SELECT
C.name
FROM SYS.COLUMNS AS C
INNER JOIN SYS.TABLES AS T ON T.object_id = C.object_id
WHERE T.name = #pTableName
)
SELECT #pQuery = (
CASE #pIncludeCurVal
WHEN 0 THEN
(
STUFF(
(SELECT ', ' + name + ' = ' + #pQueryFirstPart + #pQueryLastPart FROM cte_Temp FOR XML PATH('')), 1, 2, ''
)
)
ELSE
(
STUFF(
(SELECT ', ' + name + ' = ' + #pQueryFirstPart + name + #pQueryLastPart FROM cte_Temp FOR XML PATH('')), 1, 2, ''
)
) END)
RETURN 'UPDATE ' + #pTableName + ' SET ' + #pQuery
END
2. Use it like this
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery(<your table name>, <query part before current value>, <query part after current value>, <1 if current value is used. 0 if updating everything to a static value>);
EXEC (#pQuery)
Example 1: make all employees columns 'Unknown' (you need to make sure column type matches the intended value:
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery('employee', '', 'Unknown', 0);
EXEC (#pQuery)
Example 2: Remove an undesired text qualifier (e.g. #)
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery('employee', 'REPLACE(', ', ''#'', '''')', 1);
EXEC (#pQuery)
This query can be improved. This is just the one I saved and sometime I use. You get the idea.

Similar to an upsert, you could check if the item exists on the table, if so, delete it and insert it with the new values (technically updating it) but you would lose your rowid if that's something sensitive to keep in your case.
Behold, the updelsert
IF NOT EXISTS (SELECT * FROM Employee WHERE ID = #SomeID)
INSERT INTO Employee VALUES(#SomeID, #Your, #Vals, #Here)
ELSE
DELETE FROM Employee WHERE ID = #SomeID
INSERT INTO Employee VALUES(#SomeID, #Your, #Vals, #Here)

you could do it by deleting the column in the table and adding the column back in and adding a default value of whatever you needed it to be. then saving this will require to rebuild the table

Update multiple columns by loop?

I have a select statement which I want to convert into an update statement for all the columns in the table which have the name Variable[N].
For example, I want to do these things:
I want to be able to convert the SQL below into an update statement.
I have n columns with the name variable[N]. The example below only updates column variable63, but I want to dynamically run the update on all columns with names variable1 through variableN without knowing how many variable[N] columns I have in advance. Also, in the example below I get the updated result into NewCol. I actually want to update the respective variable column with the results if possible, variable63 in my example below.
I want to have a wrapper that loops over column variable1 through variableN and perform the same respective update operation on all those columns:
SELECT
projectid
,documentid
,revisionno
,configurationid
,variable63
,ISNULL(Variable63,
(SELECT TOP 1
variable63
FROM table1
WHERE
documentid = t.documentid
and projectid=t.projectid
and configurationid=t.configurationid
and cast(revisionno as int) < cast(t.revisionno as int)
AND Variable63 is NOT NULL
ORDER BY
projectid desc
,documentid desc
,revisionno desc
,configurationid desc
)) as NewCol
FROM table1 t;

There's no general way to loop through variables in SQL, you're supposed to know exactly what you want to modify. In some databases, it will be possible to query system tables to dynamically build an update statement (I know how to do that in InterBase and it's decessor Firebird), but you haven't told us anything which database engine you're using.
Below is a way you could update several fields that are null, COALESCE and CASE are two way of doing the same thing, as is using LEFT JOIN or NOT EXISTS. Use the ones you and your database engine is most comfortable with. Beware that all records will be updated, so this is not a good solution if your database contains millions of records, each record is large and you want this query to be executed lots of times.
UPDATE table1 t
SET t.VARIABLE63 =
COALESCE(t.VARIABLE63,
(SELECT VARIABLE63
FROM table1 t0
LEFT JOIN table1 tNot
ON tNot.documentid = t.documentid
AND tNot.projectid=t.projectid
AND tNot.configurationid=t.configurationid
AND cast(tNot.revisionno as int) > cast(t0.revisionno as int)
AND cast(tNot.revisionno as int) < cast(t.revisionno as int)
AND tNot.Variable63 is NOT NULL
WHERE t0.documentid = t.documentid
AND t0.projectid=t.projectid
AND t0.configurationid=t.configurationid
AND cast(t0.revisionno as int) < cast(t.revisionno as int)
AND t0.Variable63 is NOT NULL
AND tNot.Variable63 is NULL)),
t.VARIABLE64 = CASE WHEN t.VARIABLE64 IS NOT NULL then t.VARIABLE64
ELSE (SELECT VARIABLE64
FROM table1 t0
WHERE t0.documentid = t.documentid
AND t0.projectid=t.projectid
AND t0.configurationid=t.configurationid
AND cast(t0.revisionno as int) < cast(t.revisionno as int)
AND t0.Variable64 is NOT NULL
AND NOT EXISTS(SELECT 1
FROM table1 tNot
WHERE tNot.documentid = t.documentid
AND tNot.projectid=t.projectid
AND tNot.configurationid=t.configurationid
AND cast(tNot.revisionno as int) > cast(t0.revisionno as int)
AND cast(tNot.revisionno as int) < cast(t.revisionno as int)
AND tNot.Variable64 is NOT NULL)) END

OK I think I got it. Function that loops through columns and runs an update command per column.
DECLARE #sql NVARCHAR(1000),
#cn NVARCHAR(1000)--,
--#r NVARCHAR(1000),
--#start INT
DECLARE col_names CURSOR FOR
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'PIVOT_TABLE'
ORDER BY ordinal_position
--SET #start = 0
DECLARE #op VARCHAR(max)
SET #op=''
OPEN col_names FETCH next FROM col_names INTO #cn
WHILE ##FETCH_STATUS = 0
BEGIN
--print #cn
IF UPPER(#cn)<> 'DOCUMENTID' and UPPER(#cn)<> 'CONFIGURATIONID' and UPPER(#cn)<> 'PROJECTID' and UPPER(#cn)<> 'REVISIONNO'
BEGIN
SET #sql = 'UPdate pt
set pt.' + #cn + ' = ((SELECT TOP 1 t.' + #cn + ' FROM pivot_table t WHERE t.documentid = pt.documentid and t.projectid=pt.projectid
and t.configurationid=pt.configurationid and cast(t.revisionno as int) < cast(pt.revisionno as int) AND t.' + #cn + ' is NOT NULL
ORDER BY revisionno desc)) from PIVOT_TABLE pt where pt.' + #cn + ' is NULL;'
EXEC Sp_executesql
#sql
--print #cn
END
FETCH next FROM col_names INTO #cn
END
CLOSE col_names
DEALLOCATE col_names;

Pass EXEC statement to APPLY as a parameter

I have a need to grab data from multiple databases which has tables with the same schema. For this I created synonyms for this tables in the one of the databases. The number of databases will grow with time. So, the procedure, which will grab the data should be flexible. I wrote the following code snippet to resolve the problem:
WHILE #i < #count
BEGIN
SELECT #synonymName = [Name]
FROM Synonyms
WHERE [ID] = #i
SELECT #sql = 'SELECT TOP (1) *
FROM [dbo].[synonym' + #synonymName + '] as syn
WHERE [syn].[Id] = tr.[Id]
ORDER BY [syn].[System.ChangedDate] DESC'
INSERT INTO #tmp
SELECT col1, col2
FROM
(
SELECT * FROM TableThatHasRelatedDataFromAllTheSynonyms
WHERE [Date] > #dateFrom
) AS tr
OUTER APPLY (EXEC(#sql)) result
SET #i = #i + 1
END
I also appreciate for any ideas on how to simplify the solution.

Actually, it's better to import data from all tables into one table (maybe with additional column for source table name) and use it. Importing can be performed through SP or SSIS package.
Regarding initial question - you can achieve it through TVF wrapper for exec statement (with exec .. into inside it).
UPD: As noticed in the comments exec doesn't work inside TVF. So, if you really don't want to change DB structure and you need to use a lot of tables I suggest to:
OR select all data from synonym*** table into variables (as I see you select only one row) and use them
OR prepare dynamic SQL for complete statement (with insert, etc.) and use temporary table instead of table variable here.

My solution is quite simple. Just to put all the query to the string and exec it. Unfortunately it works 3 times slower than just copy/past the code for all the synonyms.
WHILE #i < #count
BEGIN
SELECT #synonymName = [Name]
FROM Synonyms
WHERE [ID] = #i
SELECT #sql = 'SELECT col1, col2
FROM
(
SELECT * FROM TableThatHasRelatedDataFromAllTheSynonyms
WHERE [Date] > ''' + #dateFrom + '''
) AS tr
OUTER APPLY (SELECT TOP (1) *
FROM [dbo].[synonym' + #synonymName + '] as syn
WHERE [syn].[Id] = tr.[Id]
ORDER BY [syn].[System.ChangedDate] DESC) result'
INSERT INTO #tmp
EXEC(#sql)
SET #i = #i + 1
END

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Export data from a non-normalized database - sql

Well, it seems that a "clean" (and much faster!) solution is the UNPIVOT function. I found a very good example here: http://pratchev.blogspot.it/2009/02/unpivoting-multiple-columns.html

Related

SQL return values if row count > X

simplify if statement with stored procedure

How to UPDATE all columns of a record without having to list every column

Update multiple columns by loop?

Pass EXEC statement to APPLY as a parameter

Categories

Resources