I've used SQL for years but have never truly harnessed its potential.
For this example let's say I have two tables:
CREATE TABLE messages (
MessageID INTEGER NOT NULL PRIMARY KEY,
UserID INTEGER,
Timestamp INTEGER,
Msg TEXT);
CREATE TABLE users (
UserID INTEGER NOT NULL PRIMARY KEY,
UserName TEXT,
Age INTEGER,
Gender INTEGER,
WebURL TEXT);
As I understand it, PRIMARY KEY basically indexes the field so that it can be used as a rapid search later on - querying based on exact values of the primary key gets results extremely quickly even in huge tables. (which also enforces that the field must be unqiue in each record.)
In my current workflow, I'd do something like
SELECT * FROM messages;
and then in code, for each message, do:
SELECT * FROM users WHERE UserID = results['UserID'];
This obviously sounds very inefficient and I know it can be done a lot better.
What I want to end up with is a result set that contains all of the fields from messages, except that instead of the UserID field, it contains all of the fields from the users table that match that given UserID.
Could someone please give me a quick primer on how this sort of thing can be accomplished?
If it matters, I'm using SQLite3 as an SQL engine, but I also would possibly want to do this on MySQL.
Thank you!
Not sure about the requested order, but you can adapt it.
Just JOIN the tables on UserID
SELECT MESSAGES.*,
USERS.USERNAME,
USERS.AGE,
USERS.GENDER,
USERS.WEBURL
FROM MESSAGES
JOIN USERS
ON USERS.USERID = MESSAGES.USERID
ORDER BY MESSAGES.USERID,
MESSAGES.TIMESTAMP
Related
I a have a table structure as below. For fetching the data from table I am having search criteria as mentioned below. I am writing a singe sql query as per requirement(sample query I mentioned below). I need to create an index for the table to cover all the search criteria. It will be helpful somebody advice me.
Table structure(columns):
applicationid varchar(15),
trans_tms timestamp,
SSN varchar,
firstname varchar,
lastname varchar,
DOB date,
Zipcode smallint,
adddetais json
Search criteria will be from API will be fall under 4 categories. All 4 categories are mandatory. At any cost I will receive 4 categories of values for against single applicant.
Search criteria:
ssn&last name (last name need to use function I.e. soundex(lastname)=soundex('inputvalue').
ssn & DOB
ssn&zipcode
firstname&lastname&DOB.
Query:
I am trying to write.
Sample query is:
Select *
from table
where ((ssn='aaa' and soundex(lastname)=soundex('xxx')
or ((ssn='aaa' and dob=xxx)
or (ssn='aaa' and zipcode = 'xxx')
or (firstname='xxx' and lastname='xxx' and dob= xxxx));
For considering performance I need to create an index for the table. Might be composite. Any suggestion will be helpful.
Some Approaches I would follow:
Yes, you are correct composite index/multicolumn index will give benefit in AND conditions of two columns, however, indexes would overlap on columns for given conditions.
Documentation : https://www.postgresql.org/docs/10/indexes-multicolumn.html
You can use a UNION instead of OR.
Reference : https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
If multiple conditions could be combined for e.g: ssn should be 'aaa' with any combination, then modifying the where clause with lesser OR is preferable.
Operational databases of identical structure work in several countries.
country A has table Users with column user_id
country B has table Users with column user_id
country C has table Users with column user_id
When data from all three databases is brought to the staging area for the further data warehousing purposes all three operational tables are integrated into a single table Users with dwh_user_id.
The logic looks like following:
if record comes from A then dwh_user_id = 1000000 + user_id
if record comes from B then dwh_user_id = 4000000 + user_id
if record comes from c then dwh_user_id = 8000000 + user_id
I have a strong feeling that it is a very bad approach. What would be a better approach?
(user_id + country_iso_code maybe?)
In general, it's a terrible idea to inject logic into your primary key in this way. It really sets you up for failure - what if country A gets more than 4000000 user records?
There are a variety of solutions.
Ideally, you include the column "country" in all tables, and use that together with the ID as the primary key. This keeps the logic identical between master and country records.
If you're working with a legacy system, and cannot modify the country tables, but can modify the master table, add the key there, populate it during load, and use the combination of country and ID as the primary key.
The way we handle this scenario in Ajilius is to add metadata columns to the load. Values like SERVER_NAME or DATABASE_NAME might provide enough unique information to make a compound key unique.
An alternative scenario is to generate a GUID for each row at extract or load time, which would then uniquely identify each row.
The data vault guys like to use a hash across the row, but in this case it would only work if no row was ever a complete duplicate.
This is why they made the Uniqueidentifier data type. See here.
If you can't change to that, I would put each one in a different table and then union them in a view. Something like:
create view vWorld
as
select 1 as CountryId, user_id
from SpainUsers
UNION ALL
select 2 as CountryId, user_id
from USUsers
Most efficient way to do this would be :-
If record from Country A, then user * 0 = Hence dwh_user_id = 0.
If record from Country B, then (user * 0)- 1 = Hence dwh_user_id = -1.
If record from Country C, then (user * 0)+ 1 = Hence dwh_user_id = 1.
Suggesting this logic assuming the dwh_user_id is supposed to be a number field.
Asked this on the database site but it seems to be really slow moving. So I'm new to SQL and databases in general, the only thing I have worked on with an SQL database used one to many relationships. I want to know the easiest way to go about implementing a "favorites" mechanism for users in my DB-similar to what loads of sites like Youtube, etc, offer. Users are of course unique, so one user can have many favorites, but one item can also be favorited by many users. Is this considered a many to many relationship? What is the typical design pattern for doing this? Many to many relationships look like a headache(I'm using SQLAlchemy so my tables are interacted with like objects) but this seems to be a fairly common feature on sites so I was wondering what is the most straightforward and easy way to go about it. Thanks
Yes, this is a classic many-to-many relationship. Usually, the way to deal with it is to create a link table, so in say, T-SQL you'd have...
create table user
(
user_id int identity primary key,
-- other user columns
)
create table item
(
item_id int identity primary key,
-- other item columns
)
create table userfavoriteitem
(
user_id int foreign key references user(user_id),
item_id int foreign key references item(item_id),
-- other information about favoriting you want to capture
)
To see who favorited what, all you need to do is run a query on the userfavoriteitem table which would now be a data mine of all sorts of useful stats about what items are popular and who liked them.
select ufi.item_id,
from userfavoriteitem ufi
where ufi.user_id = [id]
Or you can even get the most popular items on your site using the query below, though if you have a lot of users this will get slow and the results should be saved in a special table updated on by a schedules job on the backend every so often...
select top 10 ufi.item_id, count(ufi.item_id),
from userfavoriteitem ufi
where ufi.item_id = [id]
GROUP BY ufi.item_id
I've never seen any explicitly-for-database design patterns (except a couple of trivial misuses of the phrase 'design pattern' when it became fashionable some years ago).
M:M relationships are OK: use a link table (aka association table etc etc). Your example of a User and Favourite sounds like M:M indeed.
create table LinkTable
(
Id int IDENTITY(1, 1), -- PK of this table
IdOfTable1 int, -- PK of table 1
IdOfTable2 int -- PK of table 2
)
...and create a UNIQUE index on (IdOfTable1, IdOfTable2). Or do away with the Id column and make the PF on (IdOfTable1, IdOfTable2) instead.
I am quite new to postgresql. Could any expert help me solve this problem please.
Consider the following PostgreSQL tables created for a university system recording which students take which modules:
CREATE TABLE module (id bigserial, name text);
CREATE TABLE student (id bigserial, name text);
CREATE TABLE takes (student_id bigint, module bigint);
Rewrite the SQL to include sensible primary keys.
CREATE TABLE module
(
m_id bigserial,
name text,
CONSTRAINT m_key PRIMARY KEY (m_id)
);
CREATE TABLE student
(
s_id bigserial,
name text
CONSTRAINT s_key PRIMARY KEY (s_id)
);
CREATE TABLE takes
(
student_id bigint,
module bigint,
CONSTRAINT t_key PRIMARY KEY (student_id)
);
Given this schema I have the following questions:
Write an SQL query to count how many students are taking DATABASE.
SELECT COUNT(name)
FROM student
WHERE module = 'DATABASE' AND student_id=s_id
Write an SQL query to show the student IDs and names (but nothing else) of all student taking DATABASE
SELECT s_id, name
FROM Student, take
WHERE module = 'DATABASE' AND student_id = s_id
Write an SQL query to show the student IDs and names (but nothing else) of all students not taking DATABASE.
SELECT s_id, name
FROM Student, take
WHERE student_id = s_id AND module != 'DATABASE'
Above are my answers. Please correct me if I am wrong and please comment the reason. Thank you for your expertise.
This looks like homework so I'm not going to give a detailed answer. A few hints:
I found one case where you used ยด quotes instead of ' apostrophes. This suggests you're writing SQL in something like Microsoft Word, which does so-called "smart quotes. Don't do that. Use a sensible text editor. If you're on Windows, Notepad++ is a popular choice. (Fixed it when reformatting the question, but wanted to mention it.)
Don't use the legacy non-ANSI join syntax JOIN table1, table2, table3 WHERE .... It's horrible to read and it's much easier to make mistakes with. You should never have been taught it in the first place. Also, qualify your columns - take.module not just module. Always write ANSI joins, e.g. in your example above:
FROM Student, take
WHERE module = 'DATABASE' AND student_id = s_id
becomes
FROM student
INNER JOIN take
ON take.module = 'DATABASE'
AND take.student_id = student.s_id;
(if the table names are long you can use aliases like FROM student s then s.s_id)
Query 3 is totally wrong. Imagine if take has two rows for a student, one where the student is taking database and one where they're taking cooking. Your query will still return a result for them, even though they're taking database. (It'd also return the same student ID multiple times, which you don't want). Think about subqueries. You will need to query the student table, using a NOT EXISTS (SELECT .... FROM take ...) to filter out students who are not taking database. The rest you get to figure out on your own.
Also, your schemas don't actually enforce the constraint that a student may only take DATABASE once at a time. Either add that, or consider in your queries the possibility that a student might be registered for DATABASE twice.
Presently I'm learning (MS) SQL, and was trying out various aggregate function samples. The question I have is: Is there any scenario (sample query that uses aggregate function) where having a unique constraint column (on a table) helps when using an aggregate function.
Please note: I'm not trying to find a solution to a problem, but trying to see if such a scenario exist in real world SQL programming.
One immediate theoretical scenario comes to mind, the unique constraint is going to be backed by a unique index, so if you were just aggregating that field, the index would be narrower than scanning the table, but that would be on the basis that the query didn't use any other fields and was thus covering, otherwise it would tip out of the NC index.
I think the addition of the index to enforce the unique constraint is automatically going to have the ability to potentially help a query, but it might be a bit contrived.
Put the unique constraint on the field if you need the field to be unique, if you need some indexes to help query performance, consider them seperately, or add a unique index on that field + include other fields to make it covering (less useful, but more useful than the unique index on a single field)
Let's take following two tables, one has records for subject name and subject Id and another table contains record for student having marks in particular subjects.
Table1(SubjectId int unique, subject_name varchar, MaxMarks int)
Table2(Id int, StudentId int, SubjectId, Marks int)
so If I need to find AVG of marks obtained in Science subject by all student who have attempted for
Science(SubjectId =2) then I would fire following query.
SELECT AVG(Marks), MaxMarks
FROM Table1, Table2
WHERE Table1.SubjectId = 2;