Creating database tables with "either or" type fields - sql

I have a database that tracks players through their attempts at a game. To accomplish this, I keep a table of users and store the attempts in a separate table. The schema for these tables is:
CREATE TABLE users (
id BIGINT PRIMARY KEY, -- the local unique ID for this user
name TEXT UNIQUE, -- a self-chosen username for the user
first_name TEXT, -- the user's first name
last_name TEXT, -- the user's last name
email TEXT, -- the user's email address
phone TEXT -- the user's phone number
);
CREATE TABLE trials (
timestamp TIMESTAMP PRIMARY KEY, -- the time the trial took place
userid BIGINT, -- the ID of the user who completed the trial
score NUMERIC, -- the score for the trial
level NUMERIC, -- the difficulty level the trial ran at
penalties NUMERIC -- the number of penalties accrued in the trial
);
Now I need to be able to store attempts that come from "transient" users. These attempts should not be linked back to an existing user. However, these transient users will still be able to enter a name that displays in the results. This name is not required to be unique in the table, since it does not represent a "real" user.
My first thought was to create a new field in the trials table called name. If userid is null, I would know it is a transient user, but I would still be able to show the name fields in the results. This approach doesn't quite smell right, and it seems like it will make my queries a bit more complicated. Additionally, it seems like I'm duplicating data in a sense.
Another thought was to replace userid with a useref text field that would be some kind of formatted string representing the user. For example, if the value were enclosed in curly braces, I would know it's an ID, i.e. {58199204}. If the value were not enclosed, I would treat it as a transient user. This more accurately represents what I'm trying to do conceptually (i.e. it's either an ID or a transient user string), but it would really complicate my queries.
I'm using SQLite for the backend... It lacks some of the extended features of SQL Server or MySQL.
Any thoughts on these or another approach to the problem?

Without more information about why a transient user can use but not exist in the system, I concur with your idea to:
Add a NAME column to the TRIALS table
Make the USER_ID column in the TRIALS table nullable/optional in order to indicate transient user status
If you could allow a transient user to exist in the system, I would recommend:
Creating a USER_TYPE_CODE table
Update the USERS table to include the USER_TYPE_CODE column (w/ foreign key reference to the USER_TYPE_CODE table)

You can either create a UserType field in the users table, and add "transient" users to the Users table, but this might increase the size of the Users table, or create a UserType field on the Trials table and create an additional TransientUsers table.
This will allow you to distinguish the difference of userid with the UserType field.

I'd like to point out that you really shouldn't use the formatted string approach. What happens if a user finds a bugged input port into your database and inputs "{8437101}" (or whatever user ID they want)?

SQLite lets you mix types in a field. I'd suggest you do as you were thinking, but without the braces. Disallow numeric names. If the userid is a number, which is exactly when it matches an id in the users table, it is a user id. If not it's the name of a transient user.

Related

Best Practices when Choosing SQL Keys Types

I am very new to the SQL database. I need to create a database for my internship as our DA resigned suddenly.
The data is available but it is not inputted into a database yet. I am trying to follow the tutorials online but got stuck on what to choose for the different key types.
I hope to get the feedback of more experienced folks to get your guidance.
Table columns:
entry id (unique)
entry timestamp
username (unique but can appear more than once if the same user input a new meal record)
email address
user first and last name
meals taken date
meal type
meal calories
meal duration
meal cost
meal location
user notes
For primary key = entry id
For candidate key, I will pick username & and entry ID. Are there other columns that I should select as candidate keys? Would email make more sense? But a username can be repeated if they input another meal record. Does that matter?
For compound Key =
email address + user first and last name?
record date + user name?
Are there other keys I need to classify?
Online tutorials these are the most basic keys I need to identify. But I am not sure if I am making the right choice. I appreciate any feedback.
Your data seems to contain multiple entities. Based on your simple description, I can identify:
users
meals
locations
Then there seems to be this thing called a entries which is a user, eating (buying?) a meal at a location. This is a 3-way junction table among the entities.
This is a guess on what you are trying to represent. But it sounds like multiple tables.
I don't know what database you're using, I'll use Postgres because it's free, follows the SQL standard, has good documentation, and is very powerful.
As Gordon said, you seem to have three things: users, meals, and locations. Three things means one table for each. This avoids storing redundant data. The whole topic is database normalization.
create table users (
id bigserial primary key,
username text not null unique,
email text not null unique,
first_name text not null,
last_name text not null
);
create table meals (
id bigserial primary key,
type text not null unique,
-- If calories vary by user, move this into user_meals.
calories integer not null
);
create table locations (
id bigserial primary key,
-- The specific information you have about a location will vary,
-- but this is a good start. I've allowed nulls because often
-- people don't have full information about their location.
name text,
address text,
city text,
province text,
country text,
postal_code text
);
You asked about compound keys. Don't bother. There's too many potential problems. Use a simple, unique, auto-incrementing big integer on every table.
Primary keys must be unique and unchanging. Usernames, names, email addresses... these can all change. Even if you think they won't, why bake the risk into your schema?
Foreign keys will be repeated all over the database and indexes many times. You want them to be small and simple and fixed size to not use up any more storage than necessary. Integers are small and simple and fixed size. Text is not.
Compound primary keys potentially leak information. A primary keys refer to a row, and those often show up in URLs and the like. If we were to use the user's email address or name that risks leaking personally identifiable information.
I've chosen bigserial for the primary key. serial types are auto-incrementing so the database will take care of assigning each new row a primary key. bigserial uses a 64 bit integer. A regular integer can only hold about 2 to 4 billion entries. That sounds like a lot, but it isn't, and you do not want to run out of primary keys. bigserial can handle 9 quintillion. An extra 32 bits per row is worth it.
Some more notes.
Everything is not null unless we have a good reason otherwise. This will catch data entry mistakes. It makes the database easier to work with because you know the data will be there.
Similarly, anything which is supposed to be unique is declared unique so it is guaranteed.
There are no arbitrary limits on column size. You might see other examples like email varchar(64) or the like. This doesn't actually save you any space, it just puts limits on what is allowed in the database. There's no fundamental limit on how long a username or email address can be, that's a business rule. Business rules change all the time; they should not be hard coded into the schema. Enforcing business rules is the job of the thing inputting the data, or possibly triggers.
Now that we have these three tables, we can use them to record people's meals. To do that we need a fourth table to record a user having a meal at a location: a join table. A join table is what allows us to keep information about users, meals, and locations each in one canonical place. The users, locations, and meals are referred to by their IDs known as a foreign key.
create table user_meals (
id bigserial primary key,
user_id bigint not null references users(id),
meal_id bigint not null references meals(id),
location_id bigint not null references locations(id),
-- Using a time range is more flexible than start + duration.
taken_at tstzrange not null,
-- This can store up to 99999.99 which seems reasonable.
cost numeric(7, 2) not null,
notes text,
-- When was this entry created?
created_at timestamp not null default current_timestamp,
-- When was it last updated?
updated_at timestamp not null default current_timestamp
);
Because we're using bigserial for everything's primary key, referring to other tables is simple: they're all bigint.
As before, everything is not null unless we have a good reason otherwise. Not every meal will have notes, for example.
Fairly standard created_at and updated_at fields are used to record when the entry was created or updated.
Imprecise numeric types such as float or double are avoided, especially for money. Instead the arbitrary precision type numeric is used to store money precisely.
numeric requires we give it a precision and scale; we must choose a limit, so I picked something unreasonably high in my currency. Your currency may vary.
Rather than storing a start time and meal duration, we take advantage of Postgres's range types to store a time range from when the meal started to when it ended. This allows us to use range functions and operators to make queries about the meal time much simpler. For example, where taken_at #> '2020-02-15 20:00' finds all meals which were being taken at 8pm on Feb 15th.
There's more to do, such as adding indexes for performance, hopefully this will get you started. If there's one take away it's this: don't try to cram everything into one table.
Try it out.

Creating related tables in SQLite

I am creating related tables in SQLite and am wondering what the most efficient way to make them relate to each other is.
CREATE TABLE cards_name (id INTEGER PRIMARY KEY, name TEXT, rarity TEXT);
CREATE TABLE card_story (id INTEGER PRIMARY KEY, name_id INTEGER, story TEXT);
I have already entered some data for the first table and I was wondering how to add data to the second table without having to look up what the INTEGER PRIMARY KEY is every time (perhaps by using the cards name??)
26|Armorsmith|Rare
27|Auchenai Soulpriest|Rare
28|Avenging Wrath|Epic
29|Bane of Doom|Epic
For instance, I would like to enter the story of Armorsmith as "She accepts guild funds for repairs!" into story TEXT by using her name(Armorsmith) instead of ID(26).
Thanks
The task you are describing should be taken care of on the application level, not on database level.
You can create a GUI where you can select the name of a card, but the underlying value sent back to the database is the card's id and that gets stored in the story table establishing the relationship between the card and the story.
I would like to enter the story of Armorsmith as "She accepts guild funds for repairs!" into story TEXT by using her name(Armorsmith) instead of ID(26).
You can insert into one table from another table. Instead of hard coding the values, you can get them from a select. So long as the rows returned by the select match the rows needed by the insert it'll work.
insert into cards_story
(name_id, story)
select id, :story
from cards_name
where name = :name
The insert needs an ID and a story. The select returns ids and we've added our own text field for the story.
This statement would be executed with two parameters, one containing the text of the story, and one containing the name of the person. So you might write something like this (the exact details depend on your programming language and SQL interface library).
sql.execute(
name: "Armorsmith",
story: "She accepts guild funds for repairs!"
)
Is the equivalent of:
insert into cards_story
(name_id, story)
select id, 'She accepts guild funds for repairs!'
from cards_name
where name = 'Armorsmith'
Note that you'll want to make a few changes to your schema...
Declare name unique else you might get multiple cards for one name.
Like name TEXT UNIQUE.
Since you're looking up cards by name, you probably want to prevent there being multiple cards with the same name. That's just complexity you don't need to deal with.
Declare your foreign keys.
Like name_id INTEGER REFERENCES cards_name(id).
This has multiple benefits. One is keys are automatically indexed, so looking up stories by name_id will be faster.
The other is it enforces "referential integrity" which is a fancy way of saying it makes sure that every story has a name to go with it. If you try to delete a card_name it will balk unless the card_story is deleted first. You can also use things like on delete cascade to do the cleanup for you.
However, SQLite does not have foreign keys on by default. You have to turn them on. It's a very good idea to do so.

How to make a row's value unique across two columns?

Suppose I have two columns, Column A and Column B. Column A is required, and Column B is optional. If Column B is specified, I'd like to make sure that its value is not found in Column B (easy) OR Column A (seems much harder) outside its own row. Likewise, if Column A is changed, I'd like to make sure that its new value is not found in Column B or Column A, outside its own row.
So far, the closest I've gotten is the exclusion constraint, but that doesn't seem capable of comparing Column B to Column A. Is this possible outside the application layer, or am I stuck with an application layer solution?
The use case is that I'd like the following:
Users always have usernames, which are always unique amongst all usernames and emails.
Users sometimes have emails, which are always unique amongst all usernames and emails.
This allows me to know that, for example, if a user logs in with their email address I know which username is theirs. But I also want users to be able to use email addresses as usernames. And more confusingly, I'd like users to be able to specify one email address as a username and a separate email address as the email address on file.
Any pointers?
I think you should have one table for users. This should have a unique, auto incremented user id. You should have another table for logins. This would have columns such as:
UserId
LoginType (email or userid)
Name
This table has a unique constraint on name. In addition, it has a unique constraint on UserId and LoginType to ensure at most one value of each for any given user. You can add additional constraints to ensure that emails really look like emails, for instance.
If you want a separate email associated with the UserId for contact purposes, you can put that in the Users table.
The key idea: move the confusing notion of a login to a separate entity (table) in the database.

How to store all user settings in one database with a unique id

Im making an app where I have a server-side sql database to store the user settings of all users.
Im not sure how to make each user unique, so that the database knows who is who.
The database is storing these user data for each row: id, email, county, age and gender.
So im thinking the best way is to make the user unique to he/she's email - which is unique - so that the when the settings are updated or outputted, the sql knows what row to fetch.
How should I go about with this?
And how would i then output the right data to the right user?
An entity in the database should have a primary key. I understand that in your design the id field is going to be the primary key. Usually this is an auto-generated integer. This is called a surrogate key In this case you need to tell to the table that the email field must be unique as well. You can do that by creating a unique index for this field. The unique index will prevent the creation of two different users with the same email. Going with this approach you can query the table checking either for id or for email.
An alternative is to have natural key. In this case, email would be the primary key of your table, so you wouldn't have the id field. Going with this approach you can query the table checking either for email, which is the unique identifier of each user.

Generate sequential number with a prefix

I need to assign a unique number to each "customer". Basically an account number or so to speak. I need to generate a sequential number with a prefix... say... 10001111 to eventually 19999999. The numbers need to be unique. I know how to create a random number but those numbers must be unique and can't be repeated so I am stuck with how to even begin with the programming logic. I found some C# that would help but I am programming in VB.NET. Any help would be appreciated!
Honestly, I don't see why you need to do anything more complicated than an Identity Column in SQL Server.
CREATE TABLE Persons
(
P_Id int PRIMARY KEY IDENTITY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
Source: W3 Schools
You can then pull the customer data back and pre-pend a prefix to this either static or dynamically.
A static example would be
SELECT 'CUST-' + P_Id [CustomerNumber], LastName
FROM Persons
But nothing is stopping you from adding the prefix in as a column an dynamically joining them (if you need to store multiple prefixes).
At the end of the day, you'll need a persistent store to ensure you're getting a unique number. Instead of reinventing the wheel on this you can leverage the DB server, which is written for this purpose.
You can also have a table in the DB whose job is to store the latest ID number and you can manually increment and update that, but I'm not a huge fan of that. It is too messy.
One common way to do this kind of thing is to generate all of the possibilities at once, and store them in a list. When you want to create a new customer, choose a random number from the list and then remove it from that list so that it can't be used again. I don't know how practical this would be in your scenario.