I have the classic arrangement for a many to many relation in a small flashcard like application built using SQLite. Every card can have multiple tags, and every tag can have multiple cards. This two entities having each a table with a third table to link records.
This is the table for Cards:
CREATE TABLE Cards (CardId INTEGER PRIMARY KEY AUTOINCREMENT,
Text TEXT NOT NULL,
Answer INTEGER NOT NULL,
Success INTEGER NOT NULL,
Fail INTEGER NOT NULL);
This is the table for Tags:
CREATE TABLE Tags (TagId INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT UNIQUE NOT NULL);
This is the cross reference table:
CREATE TABLE CardsRelatedToTags (CardId INTEGER,
TagId INTEGER,
PRIMARY KEY (CardId, TagId));
I need to get a table of cards with their associated tags in a column separated by commas.
I can already get what I need for a single row knowing its Id with the following query:
SELECT Cards.CardId, Cards.Text,
(SELECT group_concat(Tags.Name, ', ') FROM Tags
JOIN CardsRelatedToTags ON CardsRelatedToTags.TagId = Tags.TagId
WHERE CardsRelatedToTags.CardId = 1) AS TagsList
FROM Cards
WHERE Cards.CardId = 1
This will result in something like this:
CardId | Text | TagsList
1 | Some specially formatted text | Tag1, Tag2, TagN...
How to get this type of result (TagsList from group_concat) for every row in Cards using a SQL query? It is advisable to do so from the performance point of view? Or I need to do this sort of "presentation" work in application code using a simpler request to the DB?
Answering your code question:
SELECT
c.CardId,
c.Text,
GROUP_CONCAT(t.Name,', ') AS TagsList
FROM
Cards c
JOIN CardsRelatedToTags crt ON
c.CardId = crt.CardId
JOIN Tags t ON
crt.TagId = t.TagId
WHERE
c.CardId = 1
GROUP BY c.CardId, c.Text
Now, to the matter of performance. Databases are a powerful tool and do not end on simple SELECT statements. You can definitely do what you need inside a DB (even SQLite). It is a bad practice to use a SELECT statement as a feed for one column inside another SELECT. It would require scanning a table to get result for each row in your input.
Related
I implemented a standard tagging system on SQLite with two tables.
Table annotation:
CREATE TABLE IF NOT EXISTS annotation (
id INTEGER PRIMARY KEY,
comment TEXT
)
Table label:
CREATE TABLE IF NOT EXISTS label (
id INTEGER PRIMARY KEY,
annot_id INTEGER NOT NULL REFERENCES annotation(id),
tag TEXT NOT NULL
)
I can easily find the annotations that match tags 'tag1' OR 'tag2' :
SELECT * FROM annotation
JOIN label ON label.annot_id = annotation.id
WHERE label.tag IN ('tag1', 'tag2') GROUP BY annotation.id
But how do I select the annotations that match tags 'tag1' AND
'tag2'?
How do I select the annotations that match tags 'tag1'
AND 'tag2' but NOT 'tag3'?
Should I use INTERSECT? Is it efficient or is there a better way to express these?
I would definitely go with INTERSECT for question 1 and EXCEPT for question 2. After many years of experience with SQL I find it best to go with whatever the platform offers in cases where it directly addresses what you want to do.
The only exception would be if you had a really good reason not to. In this case, intersect and except are not ansi standard, so you are stuck with sqlite for as long as you use them.
If you want to go old school and use ONLY straight up SQL it is possible using subqueries, one for tag A, one for tag B, and one for tag C. Using an outer join with an "is null" condition is a common idiom to perform the exclusion.
Here is an sqlite example:
create table annotation (id integer, comment varchar);
create table label (id integer, annot_id integer, tag varchar);
insert into annotation values (1,'annot 1'),(2,'annot 2');
insert into label values (1,1,'tag1'),(2,1,'tag2'),(3,1,'tag2');
insert into label values (1,2,'tag1'),(2,2,'tag2'),(3,2,'tag3');
select distinct x.id,x.comment from annotation x
join label a on a.annot_id=x.id and a.tag='tag1'
join label b on b.annot_id=x.id and b.tag='tag2'
left join label c on c.annot_id=x.id and c.tag='tag3'
where
c.id is null;
This is set up so that both annotation 1 and 2 have tag1 and tag2 but label 2 has tag3 so should be excluded the output is only annotation 1:
id
comment
1
annot 1
I'm new to the relational database stuff and Im having a hard time understanding how to write a query to do what I want. I have two tables that have a relationship.
CREATE TABLE DocumentGroups (
id INTEGER PRIMARY KEY AUTOINCREMENT,
comments TEXT,
Username TEXT NOT NULL,
)
CREATE TABLE Documents (
id INTEGER PRIMARY KEY,
documentGroupId INT NOT NULL,
documentTypeId INT NOT NULL,
documentTypeName TEXT NOT NULL,
succesfullyUploaded BIT
)
I would like to query the Documents table and get the record count for each username. Here is the query that I came up with:
SELECT Count(*)
FROM DOCUMENTS
JOIN DocumentGroups ON Documents.documentGroupId=DocumentGroups.id
GROUP BY Username
I currently have 2 entries in the Documents table, 1 from each user. This query prints out:
[{Count(*): 1}, {Count(*): 1}]
This looks correct, but is there anyway for me to get he username associated with each count. Right now there is no way of me knowing which count belongs to each user.
You are almost there. Your query already produces one row per user name (that's your group by clause). All that is left to do is to put that column in the select clause as well:
select dg.username, count(*) cnt
from documents d
join documentgroups dg on d.documentgroupid = dg.id
group by dg.username
Side notes:
table aliases make the queries easier to read and write
in a multi-table query, always qualify all columns with the (alias of) table they belong to
you probably want to alias the result of count(*), so it is easier to consume it from your application
I have a table in the following structure. I am writing a query to get all item_ids where key_name='topic' and key_string_value='investing', which is the simple part.
select item_id from table where key_name='topic' and key_string_value='investing'
But then for all the item_ids returned above, I want to order them by the values set for each item_id in key_name='importance' and key_name='product'.The table structure is making it very difficult as I am not an SQL expert. Any help would be appreciated.
item_id key_name key_string_value Key_float_value
1 topic investing null
1 importance null 500
1 product A null
1 product B null
2 topic Starting null
2 product B null
2 importance null 300
2 topic retail null
3 importance null 400
3 topic investing null
3 product C null
4 topic Starting null
4 topic investing null
4 importance null 400
4 product D null
#Schwern is on right - your structure should be normalized, and the names should be better too. All this makes me think: homework.
The answer to the homework question is a self join, and looks like this:
select t1.item_id , imp.key_float_value, prd.key_string_value
from [table] t1
LEFT OUTER JOIN [table] imp on imp.item_id = t1.item_id and imp.key_name='importance'
LEFT OUTER JOIN [table] prd on prd.item_id = t1.item_id and prd.key_name='product'
where t1.key_name='topic' and t1.key_string_value='investing'
ORDER BY imp.key_float_value, prd.key_string_value
The square brackets on `[table] are because the use of the table keyword as the table name requires the name to be delimited. Square brackets for TSQL. Others use double quotes (")
You have a very poorly design table that will be slow and difficult to work with. SQL is not a key/value store; it works on rows, columns and relationships. Rather than fight it, I would suggest redesigning it. Either use a NoSQL database which is easier to use and works more like normal programming data structures, or redesign it.
Here's the redesign I would suggest.
CREATE TABLE item (
id INTEGER PRIMARY KEY,
importance INTEGER DEFAULT 0
);
CREATE TABLE item_topics (
item_id INTEGER REFERENCES item(id),
topic TEXT NOT NULL
);
CREATE TABLE item_products (
item_id INTEGER REFERENCES item(id),
product TEXT NOT NULL
);
The item itself, and any scalar (ie. single value) attributes go into one table. Anything which can be a list (products and topics) needs its own table relating each item to its elements. If this seems clunky, that's because it is, but that's how SQL works.
To find all items whose topic is investing, you have to join on the item_topics table.
SELECT item.id
FROM item
JOIN item_topics ON item.id = item_topics.id
WHERE topic = 'investing'
Then to order them, add ORDER BY item.importance.
I've got four tables in a PostgreSQL 9.3.6 database:
sections
fields (child of sections)
entries (child of sections)
data (child of entries)
CREATE TABLE section (
id serial PRIMARY KEY,
title text,
"group" integer
);
CREATE TABLE fields (
id serial PRIMARY KEY,
title text,
section integer,
type text,
"default" json
);
CREATE TABLE entries (
id serial PRIMARY KEY,
section integer
);
CREATE TABLE data (
id serial PRIMARY KEY,
data json,
field integer,
entry integer
);
I'm trying to generate a page that looks like this:
section title
field 1 title | field 2 title | field 3 title
entry 1 | data 'as' json | data 1 json | data 3 json <-- table
entry 2 | data 'df' json | data 5 json | data 6 json
entry 3 | data 'gh' json | data 8 json | data 9 json
The way I have it set up right now each piece of 'data' has an entry it's linked to, a corresponding field (that field has columns that determine how the data's json field should be interpreted), a json field to store different types of data, and an id (1-9 here in the table).
In this example there are 3 entries, and 3 fields and there is a data piece for each of the cells in between.
It's set up like this because one section can have different field types and quantity than another section and therefore different quantities and types of data.
Challenge 1:
I'm trying to join the table together in a way that it's sortable by any of the columns (contents of the data for that field's json column). For example I want to be able to sort field 3 (the third column) in reverse order, the table would look like this:
section title
field 1 title | field 2 title | field 3 title
entry 3 | data 'gh' json | data 8 json | data 9 json
entry 2 | data 'df' json | data 5 json | data 6 json
entry 1 | data 'as' json | data 1 json | data 3 json <-- table
I'm open to doing it another way too if there's a better one.
Challenge 2:
Each field has a 'default value' column - Ideally I only have to create 'data' entries when they have a value that isn't that default value. So the table might actually look like this if field 2's default value was 'asdf':
section title
field 1 title | field 2 title | field 3 title
entry 3 | data 'gh' json | data 8 json | data 9 json
entry 2 | data 'df' json | 'asdf' | data 6 json
entry 1 | data 'as' json | 'asdf' | data 3 json <-- table
The key to writing this query is understanding that you just need to fetch all the data for single section and the rest you just join. You also can't with your schema directly filter data by section so you'll need to join entry just for that:
SELECT d.* FROM data d JOIN entries e ON (d.entry = e.id)
WHERE e.section = ?
You can then join field to each row to get defaults, types and titles:
SELECT d.*, f.title, f.type, f."default"
FROM data d JOIN entries e ON (d.entry = e.id)
JOIN fields f ON (d.field = f.id)
WHERE e.section = ?
Or you can select fields in a separate query to save some network traffic.
So this was an answer, here come bonuses:
Use foreign keys instead of integers to refer to other tables, it will make database check consistency for you.
Relations (tables) should be called in singular by convention, so it's section, entry and field.
Referring fields are called <name>_id, e.g. field_id or section_id also by convention.
The whole point of JSON fields is to store a collection with not statically defined data, so it would made much more sense to not use entries and data tables, but single table with JSON containing all the fields instead.
Like this:
CREATE TABLE row ( -- less generic name would be even better
id int primary key,
section_id int references section (id),
data json
)
With data fields containing something like:
{
"title": "iPhone 6",
"price": 650,
"available": true,
...
}
#Suor has provided good advice, some of which you already accepted. I am building on the updated schema.
Schema
CREATE TABLE section (
section_id serial PRIMARY KEY,
title text,
grp integer
);
CREATE TABLE field (
field_id serial PRIMARY KEY,
section_id integer REFERENCES section,
title text,
type text,
default_val json
);
CREATE TABLE entry (
entry_id serial PRIMARY KEY,
section_id integer REFERENCES section
);
CREATE TABLE data (
data_id serial PRIMARY KEY,
field_id integer REFERENCES field,
entry_id integer REFERENCES entry,
data json
);
I changed two more details:
section_id instead of id, etc. "id" as column name is an anti-pattern that's gotten popular since a couple of ORMs use it. Don't. Descriptive names are much better. Identical names for identical content is a helpful guideline. It also allows to use the shortcut USING in join clauses:
Don't use reserved words as identifiers. Use legal, lower-case, unquoted names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?
Referential integrity?
There is another inherent weakness in your design. What stops entries in data from referencing a field and an entry that don't go together? Closely related question on dba.SE
Enforcing constraints “two tables away”
Query
Not sure if you need the complex design at all. But to answer the question, this is the base query:
SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM entry e
JOIN field f USING (section_id)
LEFT JOIN data d USING (field_id, entry_id) -- can be missing
WHERE e.section_id = 1
ORDER BY 1, 2;
The LEFT JOIN is crucial to allow for missing data entries and use the default instead.
SQL Fiddle.
crosstab()
The final step is cross tabulation. Cannot show this in SQL Fiddle since the additional module tablefunc is not installed.
Basics for crosstab():
PostgreSQL Crosstab Query
SELECT * FROM crosstab(
$$
SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM entry e
JOIN field f USING (section_id)
LEFT JOIN data d USING (field_id, entry_id) -- can be missing
WHERE e.section_id = 1
ORDER BY 1, 2
$$
,$$SELECT field_id FROM field WHERE section_id = 1 ORDER BY field_id$$
) AS ct (entry int, f1 json, f2 json, f3 json) -- static
ORDER BY f3->>'a'; -- static
The tricky part here is the return type of the function. I provided a static type for 3 fields, but you really want that dynamic. Also, I am referencing a field in the json type that may or may not be there ...
So build that query dynamically and execute it in a second call.
More about that:
Dynamic alternative to pivot with CASE and GROUP BY
Using Postgres. Here's my scenario:
I have three different tables. One is a title table. The second is a genre table. The third table is used to join the two. When I designed the database, I expected that each title would have one top level genre. After filling it with data, I discovered that there were titles that had two, sometimes, three top level genres.
I wrote a query that retrieves titles and their top level genres. This obviously requires that I join the two tables. For those that only have one top level genre, there is one record. For those that have more, there are multiple records.
I realize I'll probably have to write a custom function of some kind that will handle this for me, but I thought I'd ask if it's possible to do this without doing so just to make sure I'm not missing anything.
Is it possible to write a query that will allow me to select all of the distinct titles regardless of the number of genres that it has, but also include the genre? Or even better, a query that would give me a comma delimited string of genres when there are multiples?
Thanks in advance!
Sounds like a job for array_agg to me. With tables like this:
create table t (id int not null, title varchar not null);
create table g (id int not null, name varchar not null);
create table tg (t int not null, g int not null);
You could do something like this:
SELECT t.title, array_agg(g.name)
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
to get:
title | array_agg
-------+-----------------------
one | {g-one,g-two,g-three}
three | {g-three}
two | {g-two}
Then just unpack the arrays as needed. If for some reason you really want a comma delimited string instead of an array, then string_agg is your friend:
SELECT t.title, string_agg(g.name, ',')
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
and you'll get something like this:
title | string_agg
-------+---------------------
one | g-one,g-two,g-three
three | g-three
two | g-two
I'd go with the array approach so that you wouldn't have to worry about reserving a character for the delimiter or having to escape (and then unescape) the delimiter while aggregating.
Have a look at this thread which might answer your question.