Data extract and import from CSV with foreign keys - Postgresql - sql

I have a multi tenant database. My requirement is to extract a single tenant's data from a database and insert in to other database.
So I have 2 tables: users and identities.
users table has foreign key identity_id connected with identities table
There can be many identities and users under a customer.
I am extracting the data to a csv file and inserting into new database from the csv file.
primary key is set to auto increment, so users and identities table generate id while inserting data from csv.
Table data from existing database
Users table
| id | identity_id |
| --- | ------------|
| 86 | 70 |
| 193 | 127 |
| 223 | 131 |
Identities table
|id |name |email |
|---|------------|-----------------|
|70 |Alon muscle |muscle#test.com |
|131|james |james#james.com |
|127|watson |watson#watson.com|
Now identity_id is the foreign key in users table mapping to identities table.
I am trying to insert users and identities data to new database
So primary key will be auto incremented for users and identities.
The problem comes here with foreign key.
How can I maintain foreign_key relationship as I have multiple users and identities records?

Well you did not actually provide details on your tables, that would be the actual definitions (ddl). Nor provide the CVS contents, which I assume your stage table contains same. However with the test data provided and a couple assumptions the following demonstrates a method to load your data. The method is to build a procedure which uses the stage table to load identities table then selects the generates id from the email provided to populate users table. Assumptions:
email must be unique in identities (at least in lower case).
stage table reflects name and email for identities.
Procedure to load identities and users.
create or replace procedure generate_user_idents()
language sql $
insert into identities(name, email)
select name, email
from stage
on conflict (low_email)
do nothing ;
as $$
insert into users(ident_id)
select ident.ident_id
from identities ident
where ident.low_email in
( select lower(email)
from stage
) ;
$$;
Script to clear and repopulate stage data then load stage to identities and users.
do $$
begin
execute 'truncate table stage';
-- replace the following with your \copy to load stage
insert into stage(name, email)
values ( 'Alon muscle', 'muscle#test.com' )
, ( 'watson', 'watson#watson.com')
, ( 'james', 'james#james.com' );
call generate_user_idents();
end ;
$$;
See demo here. Since the demo generates the ids, it does not exactly match your provided values, but close. As it stands the procedure would be happy generating duplicates should you fail to clear the stage table or reenter the same values into it. You have to decide how to handle that.

Related

Get back the id of each insert in SQL Server

Let's say we want to insert two users and I want to know the userId of each record I inserted.
Example:
Db:
User.lookup database with these columns:
UserId(PK, identity) | Username
Setup, insert two users:
declare #users table (uniqueId INT, name nvarchar(100));
insert into #users (0, 'TestUser')--Two users with the same name, they'll get a different userid in the db
insert into #users (1, 'TestUser')--Uniqueid is just an autonumber I use to tell the difference between them.
Insert statement:
insert into user.lookup (userName)
output inserted.userid
select name from #users;
This will return two usersIds, example 1 & 2. But how do I know which of the two users got which userId?
I can differentiate them in code with their 'uniqueid' I pass but I don't know how to return it.
Don't just output the id. You can include other columns:
insert into user.lookup (userName)
output inserted.*
select name from #users;
Here is a db<>fiddle.
You can't correlate the inserted rows with the database-assigned IDs, at least not without inserting an alternate key as well. INSERT ... OUTPUT will not let you output a row that wasn't actually inserted, so the column that correlates the un-keyed rows with the new key values has to be actually inserted.
So the options are:
To use a SEQUENCE instead of IDENTITY and and either assign IDs to the table variable before insert, or assign IDs to the entities on the client, eg by calling sp_sequence_get_range.
Use MERGE instead of INSERT. This is what Entity Framework Core does. See eg The Case of Entity Framework Core’s Odd SQL
As Gordon explained, one can output more than 1 column.
But just to put my 2 cents in, such insert doesn't really need an intermediate table variable.
create table lookup (
lookupId int identity primary key,
userName nvarchar(100),
createdOn datetime2 not null
default sysdatetime()
)
GO
✓
insert into lookup (userName) values
('TestUser1')
,('TestUser2')
;
GO
2 rows affected
insert into lookup (userName)
output inserted.lookupId, inserted.userName
values
('Testuser3'),
('Testuser3')
GO
lookupId | userName
-------: | :--------
3 | Testuser3
4 | Testuser3
select lookupId, userName
--, convert(varchar,createdOn) as createdOn
from lookup
order by lookupId
GO
lookupId | userName
-------: | :--------
1 | TestUser1
2 | TestUser2
3 | Testuser3
4 | Testuser3
db<>fiddle here

how to insert a data into many tables

For example, I have a 3 table
Table name: Role
table Attributes: roleid (pk), rolename
role to user one to many
Table name: user
table Attributes: roleid (fk), userid(pk), trackingid(fk) username, password, email
user to tacking one to one
Table name: tracking
table Attributes: trackingid(pk) approvalstatus*, status, createdby, createdDate(yyyy-mm-dd).
*meaning of attributes
approval status - admin will approve any changes so it can be pending, approved or rejected
status is to indicated whether the change request is new user/ edit or delete user.
How do I do a insert into statement to insert a new user for approval. As, when you insert the data in the database should look like this
+----------+----------+--------------------+----------+--------+----------------+-----------+-------------+
| username | password | email | rolename | status | approvalstatus | createdby | createdDate |
+----------+----------+--------------------+----------+--------+----------------+-----------+-------------+
| harry | password | harry#yahoo.com.sg | Admin | New | Pending | Barry | 2016-09-20 |
+----------+----------+--------------------+----------+--------+----------------+-----------+-------------+
This really depends on how you're interacting with the Database.
If you're using an ORM like Entity Framework or NHibernate, this comes out of the box depending on how you map your tables.
For information on this please view:
Entity Framework
NHibernate
If you're doing this direct to SQL Server you can use the following:
Stored Procedure - In this case you'll call a single stored procedure on your DB Server, which will do three inserts for you based on your inputs.
You can perform 3 operations to the DB using a single connection.
Regardless of whether you pick a Stored Proc. or manual statement your inserts will look like:
INSERT INTO ROLE (RoleID, RoleName) VALUES (newID(), 'Your Role Name'); --If you have auto increment on your PK you can ignore inserting into RoleID. Most Systems I work with now use GUID's for ID's so this is just an example.
INSERT INTO User (RoleID, UserID, TrackingID, UserName, Password, Email) Values (....)
INSERT INTO Tracking (TrackingID, ApprovalStatus, Status, CreatedBy, CreatedDate) values (....)
Once you have an entry in your DB you can update that using:
UPDATE Tracking SET ApprovalStatus = 'whatever you want here' where id = X
IF you need to maintain history of tracking, rather than update Tracking, you need to insert a new row and always make sure when you're SELECT'ing your data, you get the latest one based on the DateTime stamp.
Your table in your question is misleading. You're joining three tables to get those results, which is maybe what you want on our output.

error while creating a sqlite trigger

I have these tables:
+-------------------------+
| Movies |
+-----+--------+----------+
| _id | title | language |
+--+--+--------+----------+
|
+-------------------------+
|
+----------------------------------+
| Movie_Cast |
+------------+----------+----------+
| id | actor_id | movie_id |
+------------+-----+----+----------+
|
+-------------+
|
+-------------------+
| Actors |
+----------+--------+
| actor_id | name |
+----------+--------+
What i'm trying to do is to delete a movies row, delete also the related rows from the junction table (movie_cast). And finally delete, from actors table, all the rows that are not referenced in the movie_cast table.
this is the tables schema:
create table movies (_id INTEGER PRIMARY KEY, title TEXT, language TEXT);
create table movie_cast (id INTEGER PRIMARY KEY,
actor_id INTEGER REFERENCES actors(actor_id) ON DELETE RESTRICT,
movie_id INTEGER REFERENCES movies(_id) ON DELETE CASCADE);
create table actors (actor_id INTEGER PRIMARY KEY, actor TEXT UNIQUE);
right now, when the user deletes a movies entry, movie_cast rows referencing the movies._id are also deleted. (i had some troubles with that, but then i used "PRAGMA foreign_keys = ON;" ) so far so good! To delete the actors rows, i thought i could create a trigger that tries to delete actors entries based on the movie_cast.actor_id we just deleted, but since i'm using "ON DELETE RESTRIC", if the actor has still a reference it would abort the delete.
But i'm not even being able to prove it, because i'm getting an error when creating the trigger:
CREATE TRIGGER Delete_Actors AFTER DELETE ON movie_cast
FOR EACH ROW
BEGIN
DELETE FROM actors WHERE actors.actor_id = OLD.actor_id;
END;
SQL Error [1]: [SQLITE_ERROR] SQL error or missing database (near "actor_id": syntax error)
[SQLITE_ERROR] SQL error or missing database (near "actor_id": syntax error)
It seems, it doesn't know what OLD is. What am i doing wrong here?
UPDATE:
It looks like a sqlite configuration problem. I'm using DBeaver SQLite 3.8.2, and it seems to be a problem with the temporary file, but to be honest i don't know how to fix it even reading the possible solution:
It's failing to create the temporary file required for a statement journal.
It's likely any statement that uses a temporary file will fail.
http://www.sqlite.org/tempfiles.html
One way around the problem would be to configure SQLite not to use
temp files using "PRAGMA temp_store = memory;".
Or ensure that the environment variable SQLITE_TMPDIR is set to
the path of a writable directory. See also:
http://www.sqlite.org/c3ref/temp_directory.html
So, I am going to assume it works and try it directly on my android app.
It was something really s****p. For DBeaver the trigger creation is a complex-statement, and delimiters were not working either, so it was needed to select the whole statement then press ctrl+enter.
Anyway, the statement works. But for a better results I got rid of "ON DELETE RESTRICT" from movie_cast.actor_id. and created a conditional trigger, that executes the delete from actors table, only when there are no more actor_ids equal to the one just deleted(OLD):
CREATE TRIGGER Delete_Actors
AFTER DELETE ON movie_cast
FOR EACH ROW
WHEN (SELECT count(id) FROM movie_cast WHERE actor_id = OLD.actor_id) = 0
BEGIN
DELETE FROM actors WHERE actors.actor_id = OLD.actor_id;
END;

SQL - keep values with UPDATE statement

I have a table "news" with 10 rows and cols (uid, id, registered_users, ....) Now i have users that can log in to my website (every registered user has a user id). The user can subscribe to a news on my website.
In SQL that means: I need to select the table "news" and the row with the uid (from the news) and insert the user id (from the current user) to the column "registered_users".
INSERT INTO news (registered_users)
VALUES (user_id)
The INSERT statement has NO WHERE clause so i need the UPDATE clause.
UPDATE news
SET registered_users=user_id
WHERE uid=post_news_uid
But if more than one users subscribe to the same news the old user id in "registered_users" is lost....
Is there a way to keep the current values after an sql UPDATE statement?
I use PHP (mysql). The goal is this:
table "news" row 5 (uid) column "registered_users" (22,33,45)
--- 3 users have subscribed to the news with the uid 5
table "news" row 7 (uid) column "registered_users" (21,39)
--- 2 users have subscribed to the news with the uid 7
It sounds like you are asking to insert a new user, to change a row in news from:
5 22,33
and then user 45 signs up, and you get:
5 22,33,45
If I don't understand, let me know. The rest of this solution is an excoriation of this approach.
This is a bad, bad, bad way to store data. Relational databases are designed around tables that have rows and columns. Lists should be represented as multiple rows in a table, and not as string concatenated values. This is all the worse, when you have an integer id and the data structure has to convert the integer to a string.
The right way is to introduce a table, say NewsUsers, such as:
create table NewsUsers (
NewsUserId int identity(1, 1) primary key,
NewsId int not null,
UserId int not null,
CreatedAt datetime default getdaete(),
CreatedBy varchar(255) default sysname
);
I showed this syntax using SQL Server. The column NewsUserId is an auto-incrementing primary key for this table. The columns NewsId is the news item (5 in your first example). The column UserId is the user id that signed up. The columns CreatedAt and CreatedBy are handy columns that I put in almost all my tables.
With this structure, you would handle your problem by doing:
insert into NewsUsers
select 5, <userid>;
You should create an additional table to map users to news they have registeres on
like:
create table user_news (user_id int, news_id int);
that looks like
----------------
| News | Users|
----------------
| 5 | 22 |
| 5 | 33 |
| 5 | 45 |
| 7 | 21 |
| ... | ... |
----------------
Then you can use multiple queries to first retrieve the news_id and the user_id and store them inside variables depending on what language you use and then insert them into the user_news.
The advantage is, that finding all users of a news is much faster, because you don't have to parse every single idstring "(22, 33, 45)"
It sounds like you want to INSERT with a SELECT statement - INSERT with SELECT
Example:
INSERT INTO tbl_temp2 (fld_id)
SELECT tbl_temp1.fld_order_id
FROM tbl_temp1
WHERE tbl_temp1.fld_order_id > 100;

Postgres: Is there a way to tie a User to a Schema?

In our database we have users: A, B, C.
Each user has its own corresponding schema: A, B, C.
Normally if I wanted to select from a table in one of the schemas I would have to do:
select * from A.table;
My question is:
Is there a way to make:
select * from table
go to the correct schema based on the user that is logged in?
This is the default behavior for PostgreSQL. Make sure your search_path is set correctly.
SHOW search_path;
By default it should be:
search_path
--------------
"$user",public
See PostgreSQL's documentation on schemas for more information. Specifically this part:
You can create a schema for each user with the same name as that user. Recall that the default search path starts with $user, which resolves to the user name. Therefore, if each user has a separate schema, they access their own schemas by default.
If you use this setup then you might also want to revoke access to the public schema (or drop it altogether), so users are truly constrained to their own schemas.
Update RE you comment:
Here is what happens on my machine. Which is what I believe you are wanting.
skrall=# \d
No relations found.
skrall=# show search_path;
search_path
----------------
"$user",public
(1 row)
skrall=# create schema skrall;
CREATE SCHEMA
skrall=# create table test(id serial);
NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id"
CREATE TABLE
skrall=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------+----------+--------
skrall | test | table | skrall
skrall | test_id_seq | sequence | skrall
(2 rows)
skrall=# select * from test;
id
----
(0 rows)
skrall=#