SQL: combine two tables for a query - sql

I want to query two tables at a time to find the key for an artist given their name. The issue is that my data is coming from disparate sources and there is no definitive standard for the presentation of their names (e.g. Forename Surname vs. Surname, Forename) and so to this end I have a table containing definitive names used throughout the rest of my system along with a separate table of aliases to match the varying styles up to each artist.
This is PostgreSQL but apart from the text type it's pretty standard. Substitute character varying if you prefer:
create table Artists (
id serial primary key,
name text,
-- other stuff not relevant
);
create table Aliases (
artist integer references Artists(id) not null,
name text not null
);
Now I'd like to be able to query both sets of names in a single query to obtain the appropriate id. Any way to do this? e.g.
select id from ??? where name = 'Bloggs, Joe';
I'm not interested in revising my schema's idea of what a "name" is to something more structured, e.g. separate forename and surname, since it's inappropriate for the application. Most of my sources don't structure the data, sometimes one or the other name isn't known, it may be a pseudonym, or sometimes the "artist" may be an entity such as a studio.

I think you want:
select a.id
from artists a
where a.name = 'Bloggs, Joe' or
exists (select 1
from aliases aa
where aa.artist = a.id and
aa.name = 'Bloggs, Joe'
);
Actually, if you just want the id (and not other columns), then you can use:
select a.id
from artists a
where a.name = 'Bloggs, Joe'
union all -- union if there could be duplicates
select aa.artist
from aliases aa
where aa.name = 'Bloggs, Joe';

Related

Query data from one column depending on other values on the table

So, I have a table table with three columns I am interested in : value, entity and field_entity.
There are other columns that are not important for this question. There are many different types of entity, and some of them can have the same field_entity, but those two columns determine what the column value refers to (if it is an id number for a person or the address or some other thing)
If I need the name of a person I would do something like this:
select value from table where entity = 'person' and field_entity = 'person_name';
My problem is I need to get a lot of different values (names, last names, address, documents, etc.), so the way I am doing it now is using a left join like this:
select
doc_type.value as doc_type,
doc.value as doc,
status.value as status
from
table doc
-- Get doc type
left join table doc_type
on doc.entity = doc_type.entity
and doc.transaction_id = doc_type.transaction_id
and doc_type.field_entity = 'document_type'
-- Get Status
left join table status
on doc.entity = status.entity
and doc.transaction_id = status.transaction_id
and status.field_entity = 'status'
where doc.entity = 'users' and doc.field_entity = 'document' and doc.transaction_id = 11111;
There are 16 values or more, and this can get a bit bulky and difficult to maintain, so I was wondering if some of you can point out a better way to do this?
Thanks!
I assume that you are not in position to alter the table structure, but can you add views to the database? If so, you can create views based on the different types of entities in your table.
For example:
CREATE VIEW view_person AS
SELECT value AS name
FROM doc
WHERE doc.entity = 'person'
AND doc.field_entity = 'name';
Then you can write clearer queries:
SELECT view_person.name FROM view_person;

How to show one to many relationship between two tables in SQL?

I have two tables A and B.
A table contain
postid,postname,CategoryURl
and
B table contain
postid,CategoryImageURL
For one postid there are multiple CategoryImageURL assigned.I want to display that CategoryImageURL in Table A but for one postid there should be CategoryImageURL1,CategoryImageURL2 should be like that one.
I want to achieve one to many relationship for one postid then what logic should be return in sql function??
In my eyes it seems that you want to display all related CategoryImageURLs of the second table in one line with a separator in this case the comma?
Then you will need a recursive operation there. Maybe a CTE (Common Table Expression) does the trick. See below. I have added another key to the second table, to be able to check, if all rows of the second table have been processed for the corresponding row in the first table.
Maybe this helps:
with a_cte (post_id, url_id, name, list, rrank) as
(
select
a.post_id
, b.url_id
, a.name
, cast(b.urln + ', ' as nvarchar(100)) as list
, 0 as rrank
from
dbo.a
join dbo.b
on a.post_id = b.post_id
union all
select
c.post_id
, a1.url_id
, c.name
, cast(c.list + case when rrank = 0 then '' else ', ' end + a1.urln as nvarchar(100))
, c.rrank + 1
from a_cte c
join ( select
b.post_id
, b.url_id
, a.name
, b.urln
from dbo.a
join dbo.b
on a.post_id = b.post_id
) a1
on c.post_id = a1.post_id
and c.url_id < a1.url_id -- ==> take care, that there is no endless loop
)
select d.name, d.list
from
(
select name, list, rank() over (partition by post_id order by rrank desc)
from a_cte
) d (name, list, rank)
where rank = 1
You are asking the wrong sort of question. This is about normalization.
As it stands, you have a redundancy? Where each postname and categoryURL is represented by an ID field.
For whatever reason, the tables separated CategoryImageUrl into its own table and linked this to each set of postname and categoryURL.
If the relation is actually one id to each postname, then you can denormalize the table by adding the column CategoryImageUrl to your first table.
Postid, postname, CategoryURL, CategoryImageUrl
Or if you wish to keep the normalization, combine like fields into their own table like so:
--TableA:
Postid, postname, <any other field dependent on postname >
--TableA
Postid, CategoryURL, CategoryImageUrl
Now this groups CategoryURL together but uses a redundancy of having multiple CategoryURL to exist. However, Postid has only one CategoryUrl.
To remove this redundancy in our table, we could use a Star Schema strategy like this:
-- Post Table
Postid, postname
-- Category table
CategoryID, CategoryURL, <any other info dependent only on CategoryURL>
-- Fact Table
Postid, CategoryID, CategoryImageURL
DISCLAIMER: Naturally I assumed aspects of your data and might be off. However, the strategy of normalization is still the same.
Also, remember that SQL is relational and deals with sets of data. Inheritance is incompatible to the relational set theory. Every table can be queried forwards and backwards much the way every page and chapter in a book is treated as part of the book. At no point would we see a chapter independent of a book.

SQLite, Many to many relations, How to aggregate?

I have the classic arrangement for a many to many relation in a small flashcard like application built using SQLite. Every card can have multiple tags, and every tag can have multiple cards. This two entities having each a table with a third table to link records.
This is the table for Cards:
CREATE TABLE Cards (CardId INTEGER PRIMARY KEY AUTOINCREMENT,
Text TEXT NOT NULL,
Answer INTEGER NOT NULL,
Success INTEGER NOT NULL,
Fail INTEGER NOT NULL);
This is the table for Tags:
CREATE TABLE Tags (TagId INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT UNIQUE NOT NULL);
This is the cross reference table:
CREATE TABLE CardsRelatedToTags (CardId INTEGER,
TagId INTEGER,
PRIMARY KEY (CardId, TagId));
I need to get a table of cards with their associated tags in a column separated by commas.
I can already get what I need for a single row knowing its Id with the following query:
SELECT Cards.CardId, Cards.Text,
(SELECT group_concat(Tags.Name, ', ') FROM Tags
JOIN CardsRelatedToTags ON CardsRelatedToTags.TagId = Tags.TagId
WHERE CardsRelatedToTags.CardId = 1) AS TagsList
FROM Cards
WHERE Cards.CardId = 1
This will result in something like this:
CardId | Text | TagsList
1 | Some specially formatted text | Tag1, Tag2, TagN...
How to get this type of result (TagsList from group_concat) for every row in Cards using a SQL query? It is advisable to do so from the performance point of view? Or I need to do this sort of "presentation" work in application code using a simpler request to the DB?
Answering your code question:
SELECT
c.CardId,
c.Text,
GROUP_CONCAT(t.Name,', ') AS TagsList
FROM
Cards c
JOIN CardsRelatedToTags crt ON
c.CardId = crt.CardId
JOIN Tags t ON
crt.TagId = t.TagId
WHERE
c.CardId = 1
GROUP BY c.CardId, c.Text
Now, to the matter of performance. Databases are a powerful tool and do not end on simple SELECT statements. You can definitely do what you need inside a DB (even SQLite). It is a bad practice to use a SELECT statement as a feed for one column inside another SELECT. It would require scanning a table to get result for each row in your input.

Underlying rows in Group By

I have a table with a certain number of columns and a primary key column (suppose OriginalKey). I perform a GROUP BY on a certain sub-set of those columns and store them in a temporary table with primary key (suppose GroupKey). At a later stage, I may need to get more details about one or more of those groupings (which can be found in the temporary table) i.e. I need to know which were the rows from the original table that formed that group. Simply put, I need to know the mappings between GroupKey and OriginalKey. What's the best way to do this? Thanks in advance.
Example:
Table Student(
StudentID INT PRIMARY KEY,
Level INT, --Grade/Class/Level depending on which country you are from)
HomeTown TEXT,
Gender CHAR)
INSERT INTO TempTable SELECT HomeTown, Gender, COUNT(*) AS NumStudents FROM Student GROUP BY HomeTown, Gender
On a later date, I would like to find out details about all towns that have more than 50 male students and know details of every one of them.
How about joining the 2 tables using the GroupKey, which, you say, are the same?
Or how about doing:
select * from OriginalTable where
GroupKey in (select GroupKey from my_temp_table)
You'd need to store the fields you grouped on in your temporary table, so you can join back to the original table. e.g. if you grouped on fieldA, fieldB, and fieldC, you'd need something like:
select original.id
from original
inner join temptable on
temptable.fieldA = original.fieldA and
temptable.fieldB = original.fieldB and
temptable.fieldC = original.fieldC

Selecting distinct rows based on values from left table

Using Postgres. Here's my scenario:
I have three different tables. One is a title table. The second is a genre table. The third table is used to join the two. When I designed the database, I expected that each title would have one top level genre. After filling it with data, I discovered that there were titles that had two, sometimes, three top level genres.
I wrote a query that retrieves titles and their top level genres. This obviously requires that I join the two tables. For those that only have one top level genre, there is one record. For those that have more, there are multiple records.
I realize I'll probably have to write a custom function of some kind that will handle this for me, but I thought I'd ask if it's possible to do this without doing so just to make sure I'm not missing anything.
Is it possible to write a query that will allow me to select all of the distinct titles regardless of the number of genres that it has, but also include the genre? Or even better, a query that would give me a comma delimited string of genres when there are multiples?
Thanks in advance!
Sounds like a job for array_agg to me. With tables like this:
create table t (id int not null, title varchar not null);
create table g (id int not null, name varchar not null);
create table tg (t int not null, g int not null);
You could do something like this:
SELECT t.title, array_agg(g.name)
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
to get:
title | array_agg
-------+-----------------------
one | {g-one,g-two,g-three}
three | {g-three}
two | {g-two}
Then just unpack the arrays as needed. If for some reason you really want a comma delimited string instead of an array, then string_agg is your friend:
SELECT t.title, string_agg(g.name, ',')
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
and you'll get something like this:
title | string_agg
-------+---------------------
one | g-one,g-two,g-three
three | g-three
two | g-two
I'd go with the array approach so that you wouldn't have to worry about reserving a character for the delimiter or having to escape (and then unescape) the delimiter while aggregating.
Have a look at this thread which might answer your question.