How to migrate datas using .sql script? - sql

Im struggling how to migrate datas using .sql script I'm quite new to SQL and trying to figure out how to migrate data's purely on .SQL. I want to add my old data to the new table as a new record with a different structure
Here's my case: I have old two tbl and i want to merge it to my new structured tbl with an additional columns. I'm kinda stuck here since I'm not used in using conditional on .SQL
Prefixes of the tables are schemas
Old table
old.groups
id
group_name
10
Apex
11
Pred
12
Tor
old.sub_groups
parent_id
sub_group
10
sub-apex
11
sub-pred
11
sub-sub-pred
New Table:
Expected Migrated Data
public.new_groups *id is auto incremented
Fresh New populated table
id
group_name
level
parent_id
0
Apex
1
10
1
Pred
1
11
2
Tor
null
null
3
sub-apex
2
10
4
sub-pred
2
11
5
sub-sub-pred
2
11
I want to merge it with conditions. but i can't keep up with SQL queries
Condition 1: If old.groups.id doesn't detect any match on old.sub_groups.parent_id it will be inserted to public.new_groups but the public.new_groups.level and public.new_groups.parent_id will be default to null.
Condition 2: If old.groups.id detects a match on old.sub_groups.parent_id it will be also inserted to public.new_groups then tag the level as 1 (1 means parent group in my structure) but with another new three inserted records which is the sub_groups it detected refer to tbl.new_groups id [3, 4, and 5] and tag the level as 2. and the parent_id will be the parent_id of the old.sub_groups or the id of the parent in old.groups
This is my unfinished Query im only able to call the data its missing out the conditional and the update but i think this is also wrong:
INSERT INTO public.new_groups(
SELECT *, b.sub_group as group_name, b.parent_id FROM old.groups as a
LEFT JOIN old.sub_groups as b ON a.id = b.parent_id....
)

When you created your table like this:
CREATE TABLE new (
id SERIAL PRIMARY KEY ,
group_name VARCHAR(20),
level INTEGER,
parent_id INTEGER
);
You can copy the tables with this statement:
INSERT INTO new(group_name, level, parent_id)
SELECT DISTINCT
group_name,
CASE WHEN subgroups.parent_id IS NULL THEN NULL ELSE 1 END as level,
subgroups.parent_id
FROM old
LEFT JOIN subgroups ON old.id = subgroups.parent_id
UNION ALL
SELECT
sub_group,
2,
parent_id
FROM subgroups;
see: DBFIDDLE
just my id starts with 1, and not with 0.

Related

union table, change serial primary key, postgresql

Postgresql:
I have two tables 'abc' and 'xyz' in postgresql. Both tables have same 'id' columns which type is 'serial primary key;'.
abc table id column values are 1,2,3 and also xyz id column containing same values 1,2,3,4
I want to union both tables with 'union all' constraint. But I want to change 'xyz' id column values to next value of 'abc' id column last value as 1,2,3,4,5,6,7
select id from abc
union all
select id from xyz
|id|
1
2
3
1
2
3
4
my wanted resuls as
|id|
1
2
3
4
5
6
7
BETTER - Thanks to #CaiusJard
This should do it for you
select id FROM abc
UNION ALL select x.id + a.maxid FROM xyz x,
(SELECT MAX(id) as maxid from abc) a
ORDER BY id
For anyone who's doing something like this:
I had a similar problem to this, I had table A and table B which had two different serials. My solution was to create a new table C which was identical to table B except it had an "oldid" column, and the id column was set to use the same sequence as table A. I then inserted all the data from table B into table C (putting the id in the oldid field). Once I fixed the refernces to point to from the oldid to the (new)id I was able to drop the oldid column.
In my case I needed to fix the old relations, and needed it to remain unique in the future (but I don't care that the ids from table A HAVE to all be before those from table C). Depending on what your trying to accomplish, this approach may be useful.
If anyone is going to use this approach, strictly speaking, there should be a trigger to prevent someone from manually setting an id in one table to match another. You should also alter the sequence to be owned by NONE so it's not dropped with table A, if table A is ever dropped.

Recursive Delete SQL Oracle

I'm searching a way to do a recursive delete on a table.
The situation is that table have 3 foreign key 1 on itself and 2 others, I want to delete depending on the date of the occurrence.
Table1 --> Id1, dateOCC, ParentID
1, 13-12-26, null
2, 13-07-18, null
3, 14-12-31, 1
4, 13-06-26, 1
5, 14-07-23, null
6, 13-07-22, 2
Table2--> ID, stuff
Table3 --> ID, stuff
The ID of Table 2 and Table 3 are linked directly on ID of Table1.
The amount of data inside table 1 is approximately 20 000 000 row and the others table is approximately the same amount.
Here is on of the request I tried(its inside of a cursor who delete the data returned.
SELECT EO.ID,
EO.DATEOCC,
EO.PARENTID
FROM TABLE1 EO
WHERE EO.DATEOCC <= TO_DATE ('2013-12-31','YYYY-MM-DD')
AND NOT EXISTS(SELECT 1 FROM TABLE2 WHERE ID = EO.ID)
AND NOT EXISTS( SELECT 1 FROM TABLE3 WHERE ID = EO.ID)
START WITH EO.PARENTID IS NULL
CONNECT BY PRIOR EO.ID = EO.PARENTID;
This request is really really slow to output the data that I want.
And it seems that is not return the data that I need to delete.
Edit #1
Ok so heres an example of what I need to do(In this example I suppose that the table 2 and table 3 have no matching ID on Table 1)
Table1 --> Id1, dateOCC, ParentID
1, 13-12-26, null
2, 13-07-18, null
3, 14-12-31, 1
4, 13-06-26, 1
5, 14-07-23, null
6, 13-07-22, 2
After the delete sequence the table have to be like that if the >= date is 13-12-31
Table1 --> Id1, dateOCC, ParentID
1, 13-12-26, null
3, 14-12-31, 1
5, 14-07-23, null
So as you can see I delte the child that I can delete with his parent if possible. If I cant delete his parent because another child exist and I cant delete it I dont delete de parent(delete only the child that I can).
In a hierarchical query, the WHERE clause is applied after the START WITH and CONNECT BY are used to build the hierarchy. But syntactically it comes first, which makes it intuitively seem that it will be applied first.
If what you really want is to apply the WHERE clause first, then build the hierarchy, you can use a subquery like this:
SELECT EO.ID,
EO.DATEOCC,
EO.PARENTID
FROM (
SELECT * FROM TABLE1 EO
WHERE EO.DATEOCC <= TO_DATE ('2013-12-31','YYYY-MM-DD')
AND NOT EXISTS(SELECT 1 FROM TABLE2 WHERE ID = EO.ID)
AND NOT EXISTS( SELECT 1 FROM TABLE3 WHERE ID = EO.ID)
) EO
START WITH EO.PARENTID IS NULL
CONNECT BY PRIOR EO.ID = EO.PARENTID;
But it is not clear whether that is what you want. This would give you the top-level parents within the desired date range, and without children in the other tables, then build the entire hierarchy for those parents. It's possible that lower nodes in the hierarchy would have children in the other tables, which would cause the delete to fail.
If that's not what you want, I think you need to describe your requirements more clearly.

Insert data from one table to other using select statement and avoid duplicate data

Database: Oracle
I want to insert data from table 1 to table 2 but the catch is, primary key of table 2 is the combination of first 4 letters and last 4 numbers of the primary key of table 1.
For example:
Table 1 - primary key : abcd12349887/abcd22339887/abcder019987
In this case even if the primary key of table 1 is different, but when I extract the 1st 4 and last 4 chars, the output will be same abcd9887
So, when I use select to insert data, I get error of duplicate PK in table 2.
What I want is if the data of the PK is already present then don't add that record.
Here's my complete stored procedure:
INSERT INTO CPIPRODUCTFAMILIE
(productfamilieid, rapport, mesh, mesh_uitbreiding, productlabelid)
(SELECT DISTINCT (CONCAT(SUBSTR(p.productnummer,1,4),SUBSTR(p.productnummer,8,4)))
productnummer,
ps.rapport, ps.mesh, ps.mesh_uitbreiding, ps.productlabelid
FROM productspecificatie ps, productgroep pg,
product p left join cpiproductfamilie cpf
on (CONCAT(SUBSTR(p.productnummer,1,4),SUBSTR(p.productnummer,8,4))) = cpf.productfamilieid
WHERE p.productnummer = ps.productnummer
AND p.productgroepid = pg.productgroepid
AND cpf.productfamilieid IS NULL
AND pg.productietype = 'P'
**AND p.ROWID IN (SELECT MAX(ROWID) FROM product
GROUP BY (CONCAT(SUBSTR(productnummer,1,4),SUBSTR(productnummer,8,4))))**
AND (CONCAT(SUBSTR(p.productnummer,1,2),SUBSTR(p.productnummer,8,4))) not in
(select productfamilieid from cpiproductfamilie));
The highlighted section seems to be wrong, and because of this the data is not picking up.
Please help
Try using this.
p.productnummer IN (SELECT MAX(productnummer) FROM product
GROUP BY (CONCAT(SUBSTR(productnummer,1,4),SUBSTR(productnummer,8,4))))

TSQL Inserting records and track ID

I would like to insert records in a table below (structure of table with example data). I have to use TSQL to achieve this:
MasterCategoryID MasterCategoryDesc SubCategoryDesc SubCategoryID
1 Housing Elderly 4
1 Housing Adult 5
1 Housing Child 6
2 Car Engine 7
2 Car Engine 7
2 Car Window 8
3 Shop owner 9
So for example if I enter in a new record with MasterCategoryDesc = 'Town' it will insert '4' in MasterCategoryID with the respective SubCategoryDesc + ID.
CAN I SIMPLIFY THIS QUESTION BY REMOVING THE SubCategoryDesc and SubCategoryID columns. How can I achieve this now just with the 2 columns MasterCategoryID and MasterCategoryDesc
INSERT into Table1
([MasterCategoryID], [MasterCategoryDesc], [SubCategoryDesc], [SubCategoryID])
select TOP 1
case when 'Town' not in (select [MasterCategoryDesc] from Table1)
then (select max([MasterCategoryID])+1 from Table1)
else (select [MasterCategoryID] from Table1 where [MasterCategoryDesc]='Town')
end as [MasterCategoryID]
,'Town' as [MasterCategoryDesc]
,'owner' as [SubCategoryDesc]
,case when 'owner' not in (select [SubCategoryDesc] from Table1)
then (select max([SubCategoryID])+1 from Table1)
else (select [SubCategoryID] from Table1 where [SubCategoryDesc]='owner')
end as [SubCategoryID]
from Table1
SQL FIDDLE
If you want i can create a SP too. But you said you want an T-SQL
This will take three steps, preferably in a single Stored Procedure. Make sure it's within a transaction.
a) Check if the MasterCategoryDesc you are trying to insert already exists. If so, take its ID. If not, find the highest MasterCategoryID, increase by one, and save it to a variable.
b) The same with SubCategoryDesc and SubCategoryID.
c) Insert the new record with the two variables you created in steps a and b.
Create a table for the MasterCategory and a table for the SubCategory. Make an ___ID column for each one that is identity (1,1). When loading, insert new rows for nonexistent values and then look up existing values for the INSERT.
Messing around with finding the Max and looking up data in the existing table is, in my opinion, a recipe for failure.

Selecting most recent and specific version in each group of records, for multiple groups

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.
To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.
This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id