Can a SQL database support "insert only" mode? - sql

In the latest years the "Insert Only" methodology came more and more popular.
For those who use SQL DB you probably know that in high volume with a lot of update queries the DB is locking the rows and you starting to get a "bottleneck". the Insert Only mode is to use only insert (without updates) and always retrieve the latest item in the DB.
The issue I'm facing is with the SELECT queries since there is a field that can be common for multiple records in the DB and if I will want to query by it I will never know when I got all of the latest records for the field above (unless I'm using GROUP and this will not be efficient)
Scheme Example:
let say I have the following scheme:
CREATE TABLE users
(
id SERIAL NOT NULL
CONSTRAINT users_pkey
PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255),
username VARCHAR(255),
email VARCHAR(255),
password VARCHAR(255),
account_id INTEGER,
created_at TIMESTAMP NOT NULL
);
Now let say I have the following users that's related to account number 1 (using account_id):
1. John Doe
2. Jain Doe
If I will want to edit John Doe last name in the Insert Only mode I will insert a new record and when I will want to retrieve it I will run the following query:
SELECT * from users WHERE email='jhon.doe#test.com' ORDER BY created_at Desc limit 1;
The issue is what I need to to if I want to retrieve all account 1 users ? how can I prevent from executing poor query with group by
The following query will return 3 records although I have only 2 users
SELECT * from users WHERE account_id=1;

The answer to your question is distinct on (in Postgres). However, it is unclear how you define a user. I would expect a user_id, but perhaps email is supposed to serve this purpose.
The query looks like:
select distinct on (email) u.*
from users u
where account_id = 1
order by email, created_at desc;
For performance, you want an index on users(account_id, email, created_at desc).

Related

SQL - Query that returns the Username along with their total count of records

I'm new to the relational database stuff and Im having a hard time understanding how to write a query to do what I want. I have two tables that have a relationship.
CREATE TABLE DocumentGroups (
id INTEGER PRIMARY KEY AUTOINCREMENT,
comments TEXT,
Username TEXT NOT NULL,
)
CREATE TABLE Documents (
id INTEGER PRIMARY KEY,
documentGroupId INT NOT NULL,
documentTypeId INT NOT NULL,
documentTypeName TEXT NOT NULL,
succesfullyUploaded BIT
)
I would like to query the Documents table and get the record count for each username. Here is the query that I came up with:
SELECT Count(*)
FROM DOCUMENTS
JOIN DocumentGroups ON Documents.documentGroupId=DocumentGroups.id
GROUP BY Username
I currently have 2 entries in the Documents table, 1 from each user. This query prints out:
[{Count(*): 1}, {Count(*): 1}]
This looks correct, but is there anyway for me to get he username associated with each count. Right now there is no way of me knowing which count belongs to each user.
You are almost there. Your query already produces one row per user name (that's your group by clause). All that is left to do is to put that column in the select clause as well:
select dg.username, count(*) cnt
from documents d
join documentgroups dg on d.documentgroupid = dg.id
group by dg.username
Side notes:
table aliases make the queries easier to read and write
in a multi-table query, always qualify all columns with the (alias of) table they belong to
you probably want to alias the result of count(*), so it is easier to consume it from your application

How to insert rows in a table depending on a different table?

I have two tables.
One user table containing the user_id, user_email, user_name
and other user_status table containing user_email, status.
The issue I am facing is the user_status table is newly added and it is empty. The user table is already in the production. I want to achieve a scenario where I can add the rows in the status table without cleaning the db.
If the user_name is empty, then the status in the user_status table would be offline otherwise online.
user_id user_email user_name
1 xyz#gmail.com xyz
2 abc#gmail.com
If this is my user table and my user_status table is empty, then I want to update the user_status table as:
user_email status
xyz#gmail.com active
abc#gmail.com inactive
Use insert ...select and a conditional expression:
insert into user_status(user_email, status)
select user_email, case when user_name is null then 'offline' else 'online' end
from users
This assumes that by "empty" you mean null. If you really mean empty string, then the condition in the case should be where user_name = '' instead.
Note that user is a language keyword in almost all databases, hence not a good choice for a column name. I renamed it to users in the query.

Create view from table with multiple primary key

I've a table like this one:
Column | Type | Modifiers
username | character varying(12) | not null
electioncode | integer | not null
votes | integer | default 0
PRIMARY KEY (username, electioncode)
i need to create a view with username, electioncode, max(votes)
if i use this query it works fine but without username:
SELECT electioncode, max(votes) from table group by electioncode;
if i add username it asks me to add it into the group by but if i do that it gives me the entire table instead of just the username-electioncode-maxvotes
Do you want to get username associated with this number of votes? Or any username in given election code?
If the first:
SELECT
DISTINCT ON ( electioncode )
*
FROM table
ORDER BY electioncode, votes desc;
if the other:
SELECT
electioncode,
min(username),
max(votes)
FROM
table
GROUP BY electioncode;
Your username field seems to be unique. Every record has different username (I am assuming) thus when you group by username it will give you all the records. What you are trying to do has a logic issue not syntax issue.
A suggestion: You want to write on a piece of paper the output you would like to see and then construct the query... If you want Username, Electioncode and max (votes) then imagine how you would display the data where two usernames - user1 and user 2 who have electioncode 001 and voted 1 each? How would you display this?

Higher Query result with the DISTINCT Keyword?

Say I have a table with 100,000 User IDs (UserID is an int).
When I run a query like
SELECT COUNT(Distinct User ID) from tableUserID
the result I get is HIGHER than the result from the following statement:
SELECT COUNT(User ID) from tableUserID
I thought Distinct implied unique, which would mean a lower result. What would cause this discrepancy and how would I identify those user IDs that don't show up in the 2nd query?
Thanks
**
UPDATE - 11:14 am est
**
Hi All
I sincerely apologize as I should've taken the trouble to reproduce this in my local environment. But I just wanted to see if there was a general consensus about this. Here are the full details:
The query is a result of an inner join between 2 tables.
One has this information:
TABLE ACTIVITY (NO PRIMARY KEY)
UserID int (not Nullable)
JoinDate datetime
Status tinyint
LeaveDate datetime
SentAutoMessage tinyint
SectionDetails varchar
And here is the second table:
TABLE USER_INFO (CLUSTERED PRIMARY KEY)
UserID int (not Nullable)
UserName varchar
UserActive int
CreatedOn datetime
DisabledOn datetime
The tables are joined on UserID and the UserID being selected in the original 2 queries is the one from the TABLE ACTIVITY.
Hope this clarifies the question.
This is not technically an answer, but since I took time to analyze this, I might as well post it (although I have the risk of being down voted).
There was no way I could reproduce the described behavior.
This is the scenario:
declare #table table ([user id] int)
insert into #table values
(1),(1),(1),(1),(1),(1),(1),(2),(2),(2),(2),(2),(2),(null),(null)
And here are some queries and their results:
SELECT COUNT(User ID) FROM #table --error: this does not run
SELECT COUNT(dsitinct User ID) FROM #table --error: this does not run
SELECT COUNT([User ID]) FROM #table --result: 13 (nulls not counted)
SELECT COUNT(distinct [User ID]) FROM #table --result: 2 (nulls not counted)
And something interesting:
SELECT user --result: 'dbo' in my sandbox DB
SELECT count(user) from #table --result: 15 (nulls are counted because user value
is not null)
SELECT count(distinct user) from #table --result: 1 (user is the same
value always)
I find it very odd that you are able to run the queries exactly how you described. You'd have to let us know the table structure and the data to get further help.
how would I identify those user IDs that don't show up in the 2nd query
Try this query
SELECT UserID from tableUserID Where UserID not in (SELECT Distinct User ID from tableUserID)
I think there will be no row.
Edit:
User is a reserved keyword. Do you mean UserID in your requests ?
Ray : Yes
I tried to reproduce the problem in my environment and my conclusion is that given the conditions you described, the result from the first query can not be higher than the second one. Even if there would be NULL's, that just won't happen.
Did you run the query #Jean-Charles sugested?
I'm very intrigued with this, please let us know what turns out to be the problem.

Insert data and set foreign keys with Postgres

I have to migrate a large amount of existing data in a Postgres DB after a schema change.
In the old schema a country attribute would be stored in the users table. Now the country attribute has been moved into a separate address table:
users:
country # OLD
address_id # NEW [1:1 relation]
addresses:
id
country
The schema is actually more complex and the address contains more than just the country. Thus, every user needs to have his own address (1:1 relation).
When migrating the data, I'm having problems setting the foreign keys in the users table after inserting the addresses:
INSERT INTO addresses (country)
SELECT country FROM users WHERE address_id IS NULL
RETURNING id;
How do I propagate the IDs of the inserted rows and set the foreign key references in the users table?
The only solution I could come up with so far is creating a temporary user_id column in the addresses table and then updating the the address_id:
UPDATE users SET address_id = a.id FROM addresses AS a
WHERE users.id = a.user_id;
However, this turned out to be extremely slow (despite using indices on both users.id and addresses.user_id).
The users table contains about 3 million rows with 300k missing an associated address.
Is there any other way to insert derived data into one table and setting the foreign key reference to the inserted data in the other (without changing the schema itself)?
I'm using Postgres 8.3.14.
Thanks
I have now solved the problem by migrating the data with a Python/sqlalchemy script. It turned out to be much easier (for me) than trying the same with SQL. Still, I'd be interested if anybody knows a way to process the RETURNING result of an INSERT statement in Postgres SQL.
The table users must have some primary key that you did not disclose. For the purpose of this answer I will name it users_id.
You can solve this rather elegantly with data-modifying CTEs introduced with PostgreSQL 9.1:
country is unique
The whole operation is rather trivial in this case:
WITH i AS (
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL
RETURNING id, country
)
UPDATE users u
SET address_id = i.id
FROM i
WHERE i.country = u.country;
You mention version 8.3 in your question. Upgrade! Postgres 8.3 has reached end of life.
Be that as it may, this is simple enough with version 8.3. You just need two statements:
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE address_id IS NULL
AND a.country = u.country;
country is not unique
That's more challenging. You could just create one address and link to it multiple times. But you did mention a 1:1 relationship that rules out such a convenient solution.
WITH s AS (
SELECT users_id, country
, row_number() OVER (PARTITION BY country) AS rn
FROM users
WHERE address_id IS NULL
)
, i AS (
INSERT INTO addresses (country)
SELECT country
FROM s
RETURNING id, country
)
, r AS (
SELECT *
, row_number() OVER (PARTITION BY country) AS rn
FROM i
)
UPDATE users u
SET address_id = r.id
FROM r
JOIN s USING (country, rn) -- select exactly one id for every user
WHERE u.users_id = s.users_id
AND u.address_id IS NULL;
As there is no way to unambiguously assign exactly one id returned from the INSERT to every user in a set with identical country, I use the window function row_number() to make them unique.
Not as straight forward with Postgres 8.3. One possible way:
INSERT INTO addresses (country)
SELECT DISTINCT country -- pick just one per set of dupes
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE a.country = u.country
AND u.address_id IS NULL
AND NOT EXISTS (
SELECT * FROM addresses b
WHERE b.country = a.country
AND b.users_id < a.users_id
); -- effectively picking the smallest users_id per set of dupes
Repeat this until the last NULL value is gone from users.address_id.