Selecting distinct rows based on values from left table - sql

Using Postgres. Here's my scenario:
I have three different tables. One is a title table. The second is a genre table. The third table is used to join the two. When I designed the database, I expected that each title would have one top level genre. After filling it with data, I discovered that there were titles that had two, sometimes, three top level genres.
I wrote a query that retrieves titles and their top level genres. This obviously requires that I join the two tables. For those that only have one top level genre, there is one record. For those that have more, there are multiple records.
I realize I'll probably have to write a custom function of some kind that will handle this for me, but I thought I'd ask if it's possible to do this without doing so just to make sure I'm not missing anything.
Is it possible to write a query that will allow me to select all of the distinct titles regardless of the number of genres that it has, but also include the genre? Or even better, a query that would give me a comma delimited string of genres when there are multiples?
Thanks in advance!

Sounds like a job for array_agg to me. With tables like this:
create table t (id int not null, title varchar not null);
create table g (id int not null, name varchar not null);
create table tg (t int not null, g int not null);
You could do something like this:
SELECT t.title, array_agg(g.name)
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
to get:
title | array_agg
-------+-----------------------
one | {g-one,g-two,g-three}
three | {g-three}
two | {g-two}
Then just unpack the arrays as needed. If for some reason you really want a comma delimited string instead of an array, then string_agg is your friend:
SELECT t.title, string_agg(g.name, ',')
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
and you'll get something like this:
title | string_agg
-------+---------------------
one | g-one,g-two,g-three
three | g-three
two | g-two
I'd go with the array approach so that you wouldn't have to worry about reserving a character for the delimiter or having to escape (and then unescape) the delimiter while aggregating.

Have a look at this thread which might answer your question.

Related

SQL: combine two tables for a query

I want to query two tables at a time to find the key for an artist given their name. The issue is that my data is coming from disparate sources and there is no definitive standard for the presentation of their names (e.g. Forename Surname vs. Surname, Forename) and so to this end I have a table containing definitive names used throughout the rest of my system along with a separate table of aliases to match the varying styles up to each artist.
This is PostgreSQL but apart from the text type it's pretty standard. Substitute character varying if you prefer:
create table Artists (
id serial primary key,
name text,
-- other stuff not relevant
);
create table Aliases (
artist integer references Artists(id) not null,
name text not null
);
Now I'd like to be able to query both sets of names in a single query to obtain the appropriate id. Any way to do this? e.g.
select id from ??? where name = 'Bloggs, Joe';
I'm not interested in revising my schema's idea of what a "name" is to something more structured, e.g. separate forename and surname, since it's inappropriate for the application. Most of my sources don't structure the data, sometimes one or the other name isn't known, it may be a pseudonym, or sometimes the "artist" may be an entity such as a studio.
I think you want:
select a.id
from artists a
where a.name = 'Bloggs, Joe' or
exists (select 1
from aliases aa
where aa.artist = a.id and
aa.name = 'Bloggs, Joe'
);
Actually, if you just want the id (and not other columns), then you can use:
select a.id
from artists a
where a.name = 'Bloggs, Joe'
union all -- union if there could be duplicates
select aa.artist
from aliases aa
where aa.name = 'Bloggs, Joe';

Get data from other table based on column with concatenated values

I have two tables:
category with columns:
id name
1 business
2 sports
...
article with columns:
id title categories
1 abc 1|2|3
2 xyz 1|2
I know there should be a separate table for article categories but I was given this.
Is it possible to write a query that returns:
id title category_names
1 xyz business,sports
I thought of splitting the string in article -> categories column, then use in query to extract name from category table but couldn't figure it out.
You should fix your data model. But, you can do this in SQL Server:
select a.*, s.names
from article a cross apply
(select string_agg(c.name, ',') as names
from string_split(a.categories, '|') ss join
category c
on try_convert(int, ss.value) = c.id
) s;
Here is a db<>fiddle.
Presumably, you already know the shortcomings of this data model:
SQL Server has poor string handling functionality.
Numbers should be stored as numbers not strings.
Foreign key references should be properly declared.
Such queries cannot make use of indexes and partitions.
If you really want to store multiple values in a field, SQL Server offers both JSON and XML. Strings are not the right approach.

SQLite, Many to many relations, How to aggregate?

I have the classic arrangement for a many to many relation in a small flashcard like application built using SQLite. Every card can have multiple tags, and every tag can have multiple cards. This two entities having each a table with a third table to link records.
This is the table for Cards:
CREATE TABLE Cards (CardId INTEGER PRIMARY KEY AUTOINCREMENT,
Text TEXT NOT NULL,
Answer INTEGER NOT NULL,
Success INTEGER NOT NULL,
Fail INTEGER NOT NULL);
This is the table for Tags:
CREATE TABLE Tags (TagId INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT UNIQUE NOT NULL);
This is the cross reference table:
CREATE TABLE CardsRelatedToTags (CardId INTEGER,
TagId INTEGER,
PRIMARY KEY (CardId, TagId));
I need to get a table of cards with their associated tags in a column separated by commas.
I can already get what I need for a single row knowing its Id with the following query:
SELECT Cards.CardId, Cards.Text,
(SELECT group_concat(Tags.Name, ', ') FROM Tags
JOIN CardsRelatedToTags ON CardsRelatedToTags.TagId = Tags.TagId
WHERE CardsRelatedToTags.CardId = 1) AS TagsList
FROM Cards
WHERE Cards.CardId = 1
This will result in something like this:
CardId | Text | TagsList
1 | Some specially formatted text | Tag1, Tag2, TagN...
How to get this type of result (TagsList from group_concat) for every row in Cards using a SQL query? It is advisable to do so from the performance point of view? Or I need to do this sort of "presentation" work in application code using a simpler request to the DB?
Answering your code question:
SELECT
c.CardId,
c.Text,
GROUP_CONCAT(t.Name,', ') AS TagsList
FROM
Cards c
JOIN CardsRelatedToTags crt ON
c.CardId = crt.CardId
JOIN Tags t ON
crt.TagId = t.TagId
WHERE
c.CardId = 1
GROUP BY c.CardId, c.Text
Now, to the matter of performance. Databases are a powerful tool and do not end on simple SELECT statements. You can definitely do what you need inside a DB (even SQLite). It is a bad practice to use a SELECT statement as a feed for one column inside another SELECT. It would require scanning a table to get result for each row in your input.

Combine query results from one table with the defaults from another

This is a dumbed down version of the real table data, so may look bit silly.
Table 1 (users):
id INT
username TEXT
favourite_food TEXT
food_pref_id INT
Table 2 (food_preferences):
id INT
food_type TEXT
The logic is as follows:
Let's say I have this in my food preference table:
1, 'VEGETARIAN'
and this in the users table:
1, 'John', NULL, 1
2, 'Pete', 'Curry', 1
In which case John defaults to be a vegetarian, but Pete should show up as a person who enjoys curry.
Question, is there any way to combine the query into one select statement, so that it would get the default from the preferences table if the favourite_food column is NULL?
I can obviously do this in application logic, but would be nice just to offload this to SQL, if possible.
DB is SQLite3...
You could use COALESCE(X,Y,...) to select the first item that isn't NULL.
If you combine this with an inner join, you should be able to do what you want.
It should go something like this:
SELECT u.id AS id,
u.username AS username,
COALESCE(u.favorite_food, p.food_type) AS favorite_food,
u.food_pref_id AS food_pref_id
FROM users AS u INNER JOIN food_preferences AS p
ON u.food_pref_id = p.id
I don't have a SQLite database handy to test on, however, so the syntax might not be 100% correct, but it's the gist of it.

Query mySql with LIKE %...% and not pull false records

I have a database that contains two fields that collect multiple values. For instance, one is colors, where one row might be "red, blue, navyblue, lightblue, orange". The other field uses numbers, we'll call it colorID, where one row might be "1, 10, 23, 110, 239."
Now, let's say I want to SELECT * FROM my_table WHERE 'colors' LIKE %blue%; That query will give me all the rows with "blue," but also rows with "navyblue" or "lightblue" that may or may not contain "blue." Likewise, with colorID, a query for WHERE 'colorID' LIKE %1% will pull up a lot more rows than I want.
What's the correct syntax to properly query the database and only return correct results? FWIW, the fields are both set as TEXT (due to the commas). Is there a better way to store the data that would make searching easier and more accurate?
you really should look at changing your db schema. One option would be to create a table that holds colours with an INT as the primary key. You could then create a pivot table to link my_table to colours
CREATE TABLE `colours` (
`id` INT NOT NULL ,
`colour` VARCHAR( 255 ) NOT NULL ,
PRIMARY KEY ( `id` )
) ENGINE = MYISAM
CREATE TABLE `mytable_to_colours` (
`mytable_id` INT NOT NULL ,
`colour_id` INT NOT NULL ,
) ENGINE = MYISAM
so your query could look like this - where '1' is the value of blue (and more likely how you would be referencing it)
SELECT *
FROM my_table
JOIN mytable_to_colours ON (my_table.id = mytable_to_colours.mytable_id)
WHERE colour_id = '1'
If you want to search in your existing table you can use the following query:
SELECT *
FROM my_table
WHERE colors LIKE 'blue,%'
OR colors LIKE '%,blue'
OR colors LIKE '%,blue,%'
OR colors = 'blue'
However it is much better than when you create table colors and numbers and create many to many relationships.
EDITED: Just like #seengee has written.
MySQL has a REGEXP function that will allow you to match something like "[^a-z]blue|^blue". But you should really consider not doing it this way at all. A single table containing one row for each color (with multiple rows groupable by a common ID) would be far more scalable.
The standard answer would be to normalize the data by putting a colorSelID (or whatever) in this table, then having another table with two columns, mapping from 'colorSelID' to the individual colorIDs, so your data above would turn into something like:
other colums | colorSelId
other data | 1
Then in the colors table, you'd have:
colorSelId | ColorId
1 | 1
1 | 10
1 | 23
1 | 110
1 | 239
Then, when you want to find all the items that match colorID 10, you just search on colorID, and join that ColorSelId back to your main table to get all the items with a colorID of 10:
select *
from
main_table join color_table
on
main_table.ColorSelId=color_table.ColorSelId
where
color_table.colorId = 10
Edit: note that this will also probably speed up your searches a lot, at least assuming you index on ColorId in the color table, and ColorSelId in the main table. A search on '%x%' will (almost?) always do a full table scan, whereas this will use the index.
Perhaps this will help to you:
SELECT * FROM table WHERE column REGEXP "[X]"; // where X is a number. returns all rows containg X in your column
SELECT * FROM table WHERE column REGEXP "^[X]"; // where X is a number. returns all rows containg X as first number in your column
Good luck!
None of the solutions suggested so far seem likely to work, assuming I understand your question. Short of splitting the comma-delimited string into a table and joining, you can do this (using 'blue' as an example):
WHERE ', ' + myTable.ValueList + ',' LIKE '%, blue,%'
If you aren't meticulous about spaces after commas, you would need to replace spaces in ValueList with empty strings as part of this code (and remove the space in ', ').