How to store array of tables in Schema - sql

So I am a student currently learning about PostgreSQL. I am trying to figure out the way, how to randomly seed data. I have 10M users and 100 stocks.
Currently my tables will look like:
CREATE TABLE user (
user_id INTEGER NOT NULL,
amount_of_stocks [][] array, -- this is just assumption
PRIMARY KEY (user_id)
);
CREATE TABLE stock (
stock_id INTEGER NOT NULL,
amount_per_stock INT,
quantity INT
PRIMARY KEY (stock_id)
);
How would I store 100 different stocks for each user?

Sounds like a classical many-to-many relationship. Should not involve arrays at all. Assuming Postgres 10 or later, use something along these lines:
CREATE TABLE users ( -- "user" is a reserved word!
user_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, username text UNIQUE NOT NULL -- or similar
);
CREATE TABLE stock (
stock_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, stock text UNIQUE NOT NULL -- or similar
);
CREATE TABLE user_stock (
user_id int REFERENCES users
, stock_id int REFERENCES stock
, amount int NOT NULL
, PRIMARY KEY (user_id, stock_id)
);
Detailed explanation:
How to implement a many-to-many relationship in PostgreSQL?
Auto increment table column
Seed
Postgres provides generate_series() to conveniently generate rows. random() is instrumental to generate random numbers:
INSERT INTO users(username)
SELECT 'user_' || g
FROM generate_series(1, 10000000) g; -- 10M (!) - try with just 10 first
INSERT INTO stock(stock)
SELECT 'stock_' || g
FROM generate_series(1, 100) g;
Experiment with a small number of users first. 10M users * 100 stocks generates a billion rows. Takes some time and occupies some space.
How would I store 100 different stocks for each user?
INSERT INTO user_stock
(user_id, stock_id, amount)
SELECT u.user_id, s.stock_id, ceil(random() * 1000)::int
FROM users u, stock s; -- cross join
Every user gets 100 different stocks - though everyone gets the same set in this basic example, you did not define more closely. I added a random amount per stock between 1 and 1000.
About the cross join to produce the Cartesian product:
What does [FROM x, y] mean in Postgres?

CREATE TABLE user (
user_id INTEGER NOT NULL,
stocks text[],
PRIMARY KEY (user_id)
);
Store a list of the primary keys in your Stock table, so you can easily look up their values with a select statement.
If you wanted to, you could make the array two-dimensional, and store the value.. but that violates some principle, I'm sure, as you already have a table for that purpose.

Related

How to create and normalise this SQL relational database?

I am trying to design an SQL lite database. I have made a quick design on paper of what I want.
So far, I have one table with the following columns:
user_id (which is unique and the primary key of the table)
number_of_items_allowed (can be any number and does not have to be
unique)
list_of_items (a list of any size as long as it is less than or equal
to number_of_items_allowed and this list stores item IDs)
The column I am struggling the most with is list_of_items. I know that a relational database does not have a column which allows lists and you must create a second table with that information to normalise the database. I have looked at a few stack overflow answers including this one, which says that you can't have lists stored in a column, but I was not able to apply the accepted answer to my case.
I have thought about having a secondary table which would have a row for each item ID belonging to a user_id and the primary key in that case would have been the combination of the item ID and the user_id, however, I was not sure if that would be the ideal way of going about it.
Consider the following schema with 3 tables:
CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
user TEXT NOT NULL,
number_of_items_allowed INTEGER NOT NULL CHECK(number_of_items_allowed >= 0)
);
CREATE TABLE items (
item_id INTEGER PRIMARY KEY,
item TEXT NOT NULL
);
CREATE TABLE users_items (
user_id INTEGER NOT NULL REFERENCES users(user_id) ON UPDATE CASCADE ON DELETE CASCADE,
item_id INTEGER NOT NULL REFERENCES items (item_id) ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY(user_id, item_id)
);
For this schema, you need a BEFORE INSERT trigger on users_items which checks if a new item can be inserted for a user by comparing the user's number_of_items_allowed to the current number of items that the user has:
CREATE TRIGGER check_number_before_insert_users_items
BEFORE INSERT ON users_items
BEGIN
SELECT
CASE
WHEN (SELECT COUNT(*) FROM users_items WHERE user_id = NEW.user_id) >=
(SELECT number_of_items_allowed FROM users WHERE user_id = NEW.user_id)
THEN RAISE (ABORT, 'No more items allowed')
END;
END;
You will need another trigger that will check when number_of_items_allowed is updated if the new value is less than the current number of the items of this user:
CREATE TRIGGER check_number_before_update_users
BEFORE UPDATE ON users
BEGIN
SELECT
CASE
WHEN (SELECT COUNT(*) FROM users_items WHERE user_id = NEW.user_id) > NEW.number_of_items_allowed
THEN RAISE (ABORT, 'There are already more items for this user than the value inserted')
END;
END;
See the demo.

PostgreSQL: Effective way of grouping data by the number of records

Example
CREATE TABLE transactions (
id SERIAL PRIMARY KEY,
value NUMERIC NOT NULL
);
CREATE TABLE batches (
id SERIAL PRIMARY KEY,
total_value NUMERIC NOT NULL
);
CREATE TABLE transaction_batches (
id SERIAL PRIMARY KEY,
batch_id INT NOT NULL REFERENCES batches (id) ON DELETE CASCADE ON UPDATE CASCADE,
transaction_id INT NOT NULL REFERENCES transactions (id) ON DELETE CASCADE ON UPDATE CASCADE
);
In the transaction_batches table transactions should be grouped in batches of N transactions
User can delete or create transactions at any time
If user has changed the transactions then the transaction batches must be rearranged (in order to ensure the batches of 20 transactions)
The goal
Effectively group the transactions in batches of N transactions with each transactions change by the user
Question
Could you please suggest a solution to achieve the goal?
P.S. You can suggest another tables structure
I would simply do:
CREATE TABLE transactions (
id SERIAL PRIMARY KEY,
value NUMERIC NOT NULL,
batch_num INT NOT NULL
);
Set batch_num equal to:
SELECT floor((row_number() over (order by id) - 1) / N)
You can do this in a trigger. Or you can simply use a view to calculate this when the table is queried.
CREATE TABLE transactions (
id SERIAL PRIMARY KEY,
value NUMERIC NOT NULL,
batch_id INT NOT NULL REFERENCES batches (id)
);
CREATE TABLE batches (
id SERIAL PRIMARY KEY,
num_of_transactions INT,
total_value NUMERIC NOT NULL
);
When user changes a transaction, we remove it from the batch as follows:
Set its batch_id to NULL
Decrease the num_of_transactions in the batch
Whenever we want to rearrange batches:
Find all batches for which num_of_transactions < N.
For each transaction that has batch_id=NULL, add it to the next batch that has a vacancy, updating the num_of_transactions of the batch, as well as the batch_id in the transaction.

Google 'Big Table' like data in SQL? How to design DB?

I need to create a database that, among other things, lets people choose 1 - N zip codes in the US.
Intuitively it seems best to make users a row and zip codes columns.
The problem I am having is that is like 42k columns. Which I am confident is outside most SQL DBs upper bound on columns.
I could have separate tables for each state. And then would have like 500-5K columns / table?
I mean that is doable, but the whole thing just seems a little ridiculous.
All thoughts, critiques, etc. are appreciated.
Also, any know the best place to get a list of zip codes (maybe broken down by state?)? Googling yielded some dated stuff. And so far I have USPS APIs for live verification. But I just need a static list.
Thanks everyone.
In any database -- including BigQuery -- your description suggests a table UserZips with one row per UserId and ZipCode.
Bigquery does not require such a structure. It supports arrays within a row, so you can have an array of the zip codes that a user chooses.
It also supports records within a row, so you can have an array of records. Each record could have a zip code and other information.
In many databases, including BigQuery, you might find a JSON object to be the appropriate representation.
Nonetheless, what first comes to mind is a table with a column for the user and zip code.
A structure like this would allow you to add as many zip codes as needed and link as many zips as you needed to as many users as you have.
http://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=6d1c3a41cb439d17f4def51a672766e1
CREATE TABLE zipCodes (
zipID int identity
, zipcode varchar(5) NOT NULL
, zipPlusFour varchar(4) DEFAULT '0000'
, CONSTRAINT PK_zipID PRIMARY KEY (zipID)
) ;
CREATE TABLE users (
userID int identity
, username nvarchar(20) NOT NULL
, CONSTRAINT PK_userID PRIMARY KEY (userID)
) ;
CREATE TABLE xref_users_zips (
userID int NOT NULL
, zipID int NOT NULL
, CONSTRAINT FK_userID FOREIGN KEY (userID) REFERENCES users(userID)
, CONSTRAINT FK_zipID FOREIGN KEY (zipID) REFERENCES zipCodes(zipID)
) ;
INSERT INTO zipCodes (zipcode)
VALUES ('00501'), ('00544'), ('00601')
;
INSERT INTO users (username)
VALUES ('johndoe'),('robertbuilder'),('zaphodbeeblebrox')
;
INSERT INTO xref_users_zips (userID, zipID)
VALUES (1,1), (2,2), (3,3)
;
SELECT *
FROM users u
INNER JOIN xref_users_zips xuz ON u.userID = xuz.userID
INNER JOIN zipcodes z ON xuz.zipID = z.zipID

Serial numbers per group of rows for compound key

I am trying to maintain an address history table:
CREATE TABLE address_history (
person_id int,
sequence int,
timestamp datetime default current_timestamp,
address text,
original_address text,
previous_address text,
PRIMARY KEY(person_id, sequence),
FOREIGN KEY(person_id) REFERENCES people.id
);
I'm wondering if there's an easy way to autonumber/constrain sequence in address_history to automatically count up from 1 for each person_id.
In other words, the first row with person_id = 1 would get sequence = 1; the second row with person_id = 1 would get sequence = 2. The first row with person_id = 2, would get sequence = 1 again. Etc.
Also, is there a better / built-in way to maintain a history like this?
Don't. It has been tried many times and it's a pain.
Use a plain serial or IDENTITY column:
Auto increment table column
CREATE TABLE address_history (
address_history_id serial PRIMARY KEY
, person_id int NOT NULL REFERENCES people(id)
, created_at timestamp NOT NULL DEFAULT current_timestamp
, previous_address text
);
Use the window function row_number() to get serial numbers without gaps per person_id. You could persist a VIEW that you can use as drop-in replacement for your table in queries to have those numbers ready:
CREATE VIEW address_history_nr AS
SELECT *, row_number() OVER (PARTITION BY person_id
ORDER BY address_history_id) AS adr_nr
FROM address_history;
See:
Gap-less sequence where multiple transactions with multiple tables are involved
Or you might want to ORDER BY something else. Maybe created_at? Better created_at, address_history_id to break possible ties. Related answer:
Column with alternate serials
Also, the data type you are looking for is timestamp or timestamptz, not datetime in Postgres:
Ignoring time zones altogether in Rails and PostgreSQL
And you only need to store previous_address (or more details), not address, nor original_address. Both would be redundant in a sane data model.

Can I use SQL to model my data?

I am trying to develop a bidding system, where an item is listed, and bidders can place a bid, which includes a bid amount and a message. An item may have an arbitrary number of bids on it. Bidders should also be able to see all the bids they have made across different items.
I am unfamiliar with SQL, so am a little unsure how to model this scenario. I was thinking the following:
A User table, which stores information about bidders, such as name, ID number, etc.
A Bid table, which contains all the bids in the system, which stores the bidder's user ID, the bid amount, the bid description.
A Job table, which contains the User ID of the poster, an item description, and then references to the various bids.
The problem I am seeing is how can I store these references to the Bid table entries in the Job table entries?
Is this the right way to go about approaching this problem? Should I be considering a document-oriented database, such as Mongo, instead?
You're describing a many-to-many relationship. In very simplified form, your tables would look something like this:
user:
id int primary key
job:
id int primary key
bids:
user_id int
job_id int
primary key(userid, job_id)
foreign key (user_id) references user (id)
foreign key (job_id) references job (id)
basically, the bids table would contain fields to represent both the user and the job, along with whatever other fields you'd need, such as bid amount, date/time stamp, etc...
Now, I've made the user_id/job_id fields a primary key in the bids table, which would limit each user to 1 bid per job. Simply remove the primary key and put in two regular indexes on each field to remove the limit.
SQL will work fine like you have it set up... I would do:
create table usertable (
userID integer unsigned not null auto_increment primary key,
userName varchar(64) );
create table jobtable (
jobID integer unsigned not null auto_increment primary key,
jobDesc text,
posterUserRef integer not null );
create table bidtable (
bidID integer unsigned not null auto_increment primary key,
bidAmount integer,
bidDesc text,
bidTime datetime,
bidderUserRef integer not null references usertable(userID),
biddingOnJobRef integer not null reference jobtable(jobID) );
Now you can figure out whatever you want with various joins (maximum bid per user, all bids for job, all bids by user, highest bidder for job, etc).