How to verify table update and migrate data from another table - postgresql - sql

I have following two tables in my potgres database with each type.
user
userid | bigint (PK) NOT NULL
username | character varying(255)
businessname | character varying(255)
inbox
messageid | bigint (PK) NOT NULL
username | character varying(255)
businessname | character varying(255)
What i wanna achieve here is i want to add a new field called userRefId to inbox table and migrate data on user table's userid data into that where each username and businessname match in both tables.
These are the queries i use to do that.
ALTER TABLE inbox ADD userRefId bigint;
UPDATE inbox
SET userRefId = u.userid
from "user" u
WHERE u.username = inbox.username
AND u.businessname = inbox.businessname;
Now i want to verify the data has been migrated correctly. what are the approaches i can take to achieve this? (Note : the username on inbox can be null)
Would this be good enough to verification?
Result of select count(*) from inbox where username is not null; being equal to
select count(userRefId) from inbox;

Is the data transferred correctly? First, the update looks correct, so you don't really need to worry.
You can get all rows in consumer_inbox where the user names don't match
select ci.*. -- or count(*)
from consumer_inbox ci
where not exists (select 1
from user u
where ci.userRefId = u.userId
);
This doesn't mean that the update didn't work. Just that the values in consumer_inbox have no matches.
Under the circumstances of your code, this is equivalent to:
select ci.*
from consumer_inbox ci
where userId is null;
Although this would not pick up a userId set to a non-matching record (cosmic rays, anyone?).
You can also validate the additional fields used for matching:
select ci.*. -- or count(*)
from consumer_inbox ci
where not exists (select 1
from user u
where ci.userRefId = u.userId and
ci.username = u.username and
ci.businessname = u.businessname
);
However, all this checking seems unnecessary, unless you have trigger on the tables or known non-matched records.

Related

How to insert rows in a table depending on a different table?

I have two tables.
One user table containing the user_id, user_email, user_name
and other user_status table containing user_email, status.
The issue I am facing is the user_status table is newly added and it is empty. The user table is already in the production. I want to achieve a scenario where I can add the rows in the status table without cleaning the db.
If the user_name is empty, then the status in the user_status table would be offline otherwise online.
user_id user_email user_name
1 xyz#gmail.com xyz
2 abc#gmail.com
If this is my user table and my user_status table is empty, then I want to update the user_status table as:
user_email status
xyz#gmail.com active
abc#gmail.com inactive
Use insert ...select and a conditional expression:
insert into user_status(user_email, status)
select user_email, case when user_name is null then 'offline' else 'online' end
from users
This assumes that by "empty" you mean null. If you really mean empty string, then the condition in the case should be where user_name = '' instead.
Note that user is a language keyword in almost all databases, hence not a good choice for a column name. I renamed it to users in the query.

Can a SQL database support "insert only" mode?

In the latest years the "Insert Only" methodology came more and more popular.
For those who use SQL DB you probably know that in high volume with a lot of update queries the DB is locking the rows and you starting to get a "bottleneck". the Insert Only mode is to use only insert (without updates) and always retrieve the latest item in the DB.
The issue I'm facing is with the SELECT queries since there is a field that can be common for multiple records in the DB and if I will want to query by it I will never know when I got all of the latest records for the field above (unless I'm using GROUP and this will not be efficient)
Scheme Example:
let say I have the following scheme:
CREATE TABLE users
(
id SERIAL NOT NULL
CONSTRAINT users_pkey
PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255),
username VARCHAR(255),
email VARCHAR(255),
password VARCHAR(255),
account_id INTEGER,
created_at TIMESTAMP NOT NULL
);
Now let say I have the following users that's related to account number 1 (using account_id):
1. John Doe
2. Jain Doe
If I will want to edit John Doe last name in the Insert Only mode I will insert a new record and when I will want to retrieve it I will run the following query:
SELECT * from users WHERE email='jhon.doe#test.com' ORDER BY created_at Desc limit 1;
The issue is what I need to to if I want to retrieve all account 1 users ? how can I prevent from executing poor query with group by
The following query will return 3 records although I have only 2 users
SELECT * from users WHERE account_id=1;
The answer to your question is distinct on (in Postgres). However, it is unclear how you define a user. I would expect a user_id, but perhaps email is supposed to serve this purpose.
The query looks like:
select distinct on (email) u.*
from users u
where account_id = 1
order by email, created_at desc;
For performance, you want an index on users(account_id, email, created_at desc).

Database Schema for Claims Authentication

On a database I have the tables: USERS, USERS_PROFILES, USERS_CLAIMS.
create table dbo.USERS
(
Id int identity not null,
Username nvarchar (120) not null,
Email nvarchar (120) not null
);
create table dbo.USERS_PROFILES
(
Id int not null,
[Name] nvarchar (80) not null
);
create table dbo.USERS_CLAIMS
(
Id int not null,
[Type] nvarchar (200) not null,
Value nvarchar (200) not null,
);
I am using Claims authorization. When a user signs up and Identity is created.
The identity contains claims and each claim has a type and a value:
UsernameType > Username from USERS
EmailType > Email from USERS
NameType > Name from USERS_PROFILES
RoleType > Directly from USERS_CLAIMS
So I am creating the Identity from many columns in 3 tables.
I ended up with this because I migrated to Claims Authentication.
QUESTION
Should I move the Username, Email and Name to USERS_CLAIMS?
The USERS_PROFILES table would disappear ...
And USERS table would contain only info like "UserId, LastLoginDate, CreatedDate, ..."
If I want get a user by username I would just get the Claim of type username ...
If I want to sign in the user I just get all claims and create the identity.
So the Identity Model is much similar to the SQL tables.
Does this make sense? How would you design the tables?
Thank You,
Miguel
You are creating a key value store. They are a nightmare to query in SQL. Consider the difficulty of querying user attributes by a value on the USER_CLAIMS table. Example:
-- Users with name and email by username
SELECT p.ID, p.Username, p.Name, p.Email, u.LastLoggedIN
FROM USER_PROFILES p
INNER JOIN Users u on p.ID = u.ID
WHERE p.ID = #UserID
-- Users with name and email by username with a claims table
-- Does not specify whether there is only one email, so this could return multiple
-- rows for a single user.
SELECT p.ID, cUName.Value as Username, cName.Value as Name, cEMail.Value as Email, u.LastLoggedIN
FROM Users u
LEFT OUTER JOIN USER_CLAIMS cName ON u.ID = cName.ID and cName.[Type] = 'http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name'
LEFT OUTER JOIN USER_CLAIMS cUName ON u.ID = cUName.ID and cUName.[Type] = 'http://schemas.xmlsoap.org/ws/2005/05/identity/claims/privatepersonalidentifier'
LEFT OUTER JOIN USER_CLAIMS cEmail ON u.ID = cEmail.ID and cEmail.[Type] = 'http://schemas.xmlsoap.org/ws/2005/05/identity/claims/email'
WHERE p.ID = #UserID
Can a user have multiple profiles? If not, there is no need for the "USERS_PROFILES" table. Keep the "Username" and "Email" columns on the "USERS" table. If you put them on the "USERS_CLAIMS" table, you would be storing redundant information anytime a user files a claim.
I am not sure what kind of tracking you'd like to have for your users, but I would recommend having a separate table that tracks when a user signs in. Something like this:
CREATE TABLE USERS_LOG (user_id INT, log_in DATETIME);
You can then get rid of the "LastLoginDate" on your "USERS" table and do a join to get the last time the user signed in. It'll give you more ways to track your users and you won't be creating blocks on your "USERS" table by updating it constantly.

PostgreSQL - using to_tsvector with SELECT

I have been reading the PSQL Documentation and also bothering Google - although I am not sure what to look for - but it doesn't look it is possible to create a tsvector out of a select.
Let me explain a bit. I have table users and I added a tsvector column to it called tsv.
Column | Type | Modifiers
-------------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('jobs_id_seq'::regclass)
username | character varying(255) | not null
tsv | tsvector |
Now every user has many articles.
What I want now is to store the articles title as tsvector in the tsv column. Something like this:
UPDATE users SET tsv = to_tsvector(
SELECT string_agg(title)
FROM users INNER JOIN books
ON user.id = articles.user_id
GROUP BY user.id
)
So obviously the query would not work even without trying to make a tsvector out of a SELECT. It has basically 2 "problems"
Thank you very much in advance.
UPDATE users
SET tsv = to_tsvector(s.tsv)
from
(
SELECT id, string_agg(title) tsv
FROM
users
INNER JOIN
articles ON user.id = articles.user_id
GROUP BY user.id
) s
where users.id = s.id

Create view from table with multiple primary key

I've a table like this one:
Column | Type | Modifiers
username | character varying(12) | not null
electioncode | integer | not null
votes | integer | default 0
PRIMARY KEY (username, electioncode)
i need to create a view with username, electioncode, max(votes)
if i use this query it works fine but without username:
SELECT electioncode, max(votes) from table group by electioncode;
if i add username it asks me to add it into the group by but if i do that it gives me the entire table instead of just the username-electioncode-maxvotes
Do you want to get username associated with this number of votes? Or any username in given election code?
If the first:
SELECT
DISTINCT ON ( electioncode )
*
FROM table
ORDER BY electioncode, votes desc;
if the other:
SELECT
electioncode,
min(username),
max(votes)
FROM
table
GROUP BY electioncode;
Your username field seems to be unique. Every record has different username (I am assuming) thus when you group by username it will give you all the records. What you are trying to do has a logic issue not syntax issue.
A suggestion: You want to write on a piece of paper the output you would like to see and then construct the query... If you want Username, Electioncode and max (votes) then imagine how you would display the data where two usernames - user1 and user 2 who have electioncode 001 and voted 1 each? How would you display this?