Join in SQLite with per-row selection of join table based on value in link table - sql

I have tables as follows:
muscles
id primary_synonym_id source
---------- ------------------ ----------
1 1 1
2 2 2
3 3 3
muscle_synonyms
id synonym
---------- ---------------
1 Gluteus maximus
2 Soleus
3 Infraspinatus
sources (As you can probably tell, sources is intended as a link table.)
id type sub_id
---------- ---------- ----------
1 url 1
2 url 2
3 book 1
source_urls
id url
---------- ------------------
1 http://v.gd/3NCOMC
2 http://v.gd/fWdonY
source_books
id isbn
---------- ----------
1 1405006692
From the above, which query would you recommend, to generate the following output?
id synonym ref
---------- --------------- ------------------
1 Gluteus maximus http://v.gd/3NCOMC
2 Soleus http://v.gd/fWdonY
3 Infraspinatus 1405006692
Please mention worthwhile alternatives - if any - to such a query, that you think would promote good database practice. (For example, a different way to structure the data and the use of a simpler query.)

I was unfamiliar with the coalesce() function. The following query was inspired by this, and it works:
select muscles.id, synonym, coalesce(url, isbn) as ref
from muscles
join muscle_synonyms on muscles.primary_synonym_id = muscle_synonyms.id
join sources on muscles.source = sources.id
left join source_urls on
sources.type = 'url' and
sources.sub_id = source_urls.id
left join source_books on
sources.type = 'book' and
sources.sub_id = source_books.id
where ref not null;

i dont exactly understand what is sub_id in sources and how its linked to source_books
sql will almost looks like this
select m.id,ms.synonym,su.url from muscles m
join muscle_synonym ms on ms.id=m.primary_synonym_id
join sources s on s.id=m.source
join source_url su on s.sub_id=su.id

I can't see any advantage in having 2 different tables for source_*.
I would merge source_url and source_book into source_ref so:
id type ref
---------- ---------- ------------------
1 url http://v.gd/3NCOMC
2 url http://v.gd/fWdonY
3 book 1405006692
And sources will become a two table join:
id ref_id
---------- ----------
1 1
2 2
3 3
So you query will become simple joins as:
SELECT muscles.id, muscle_synonyms.synonym, source_ref.ref FROM muscles
LEFT JOIN muscle_synonyms ON muscles.id=muscle_synonyms.id
LEFT JOIN source ON muscles.source=source.id
LEFT JOIN source_ref ON source.ref_id=source_ref.id;
This will include all muscles, including those without synonym neither sources, neither any ref for source.

Related

Knex.js Getting values from comma-separated

I have two SQlite3 tables task and tags
task is my master table and tags is storing tag names
I store comma-separated values in task
Now I want to get Tag names with use of a knex.js
table task
id task tags
---------------------
1 abc 1,2,3
2 xyz 3,1
3 apple 2
table tags
id tag
------------
1 cold
2 hot
3 normal
Now i want output as below
OUTPUT:
id task tags
---------------------
1 abc cold,hot,normal
2 xyz normal,cold
3 apple hot
I know i will have to use joins but not sure how to actually use it in knex.js. Please do help me.
Part of the problem is that your database is not properly normalised. Instead of having two tables task and tabs, with table tasks containing multiple tag IDs in the column 'tags' you should have three tables; 'tasks', 'tags' and the 'joining' table 'task_tags'. They would store the following data...
Tasks
id task
----------
1 abc
2 xyz
3 apple
Tags
id tag
------------
1 cold
2 hot
3 normal
task_tags
task_id tag_id
1 1
1 2
1 3
2 1
2 3
3 2
Now you can have as many tags as you like (whether or not any tasks use them) and as many tasks as you like (whether or not they use any tags) and you associate a task with it's tags via the task_tags table.
Then to get the result you want you would use the select
SELECT
tasks.id,
tasks.task,
GROUP_CONCAT(tags.tag) -- this gives you the csv line eg cold,hot,normal
from tasks
left join task_tags
ON tasks.id = task_tags.task_id
left join tags
on tags.id = task_tags.tag_id
GROUP BY task.id, tags.id
see https://www.sqlite.org/lang_aggfunc.html for explanation of GROUP_CONCAT
Your task table should be redesigned to hold one tag per row, not multiple tags in a single row:
id task tag
---------- ---------- ----------
1 abc 1
1 abc 2
1 abc 3
2 xyz 3
2 xyz 1
3 apple 2
Then it's easy:
SELECT task.id, task.task, group_concat(tags.tag, ',') AS tags
FROM task
JOIN tags ON task.tag = tags.id
GROUP BY task.id, task.task
ORDER BY task.id;
which gives
id task tags
---------- ---------- ---------------
1 abc cold,hot,normal
2 xyz normal,cold
3 apple hot
A design that follows the rules of relational databases makes life much easier (And the above can be normalized further; see the other answer); while some databases do support array types, sqlite is not one of them. If you insist on keeping your current design, though, there's an ugly hack involving the JSON1 extension and turning your CSV list of numbers into a JSON array:
SELECT task.id, task.task, group_concat(tags.tag, ',') AS tags
FROM task
JOIN json_each('[' || task.tags || ']') AS j
JOIN tags ON tags.id = j.value
GROUP BY task.id, task.task
ORDER BY task.id;

Find a value based on a table result

First of all, sorry for the title. Couldn't think of any better title.
This is what I got:
SELECT study FROM old_employee;
study
---------
STUDY1
STUDY2
STUDY3
STUDY1
STUDY2
SELECT id,name_string FROM studies;
id | name_string
----+-------------------
1 | STUDY1
2 | STUDY2
3 | STUDY3
Now I would like to find the id's based on the first output. This is what i've attempted but obviously it's not working.
SELECT id FROM studies WHERE name_string LIKE (SELECT study FROM old_employee);
My desired output:
id
----
1
2
3
1
2
edit: I'm saving old_employee as a view and i'm wondering if there's a smarter way of including it in the answers below instead of creating this view first.
CREATE VIEW old_employee AS
SELECT *
FROM dblink('dbname=mydb', 'select study from personnel')
AS t1(study char(10));
This can be accomplished without using SQL LIKE Operator. Here is the query.
SELECT s.id
FROM studies s,
old_employee o
WHERE s.name_string = o.study;
Second query (According to what #a_horse_with_no_name said):
SELECT studies.id
FROM studies
INNER JOIN old_employee
ON studies.name_string = old_employee.study

SQL Two SELECT vs. JOIN best performance?

I wonder which has better performance in this case. First of all, I want to show to the user his medical information. I have two tables
user
-----
id_user | type_blood | number | ...
1 O 123
2 A+ 442
user_allergies
-----------
id_user | name
1 name1
1 name2
I want to return:
JSON {id_user=1, type_blood=0, allergies=(name1,name2)}
So, Its better do a JOIN for user and user_allergies and iterate, or maybe two SELECT?
But if then I have another table like user_allergies, that the result can be:
user_another_table
-----------
id_user | name
1 namet1
1 namet2
1 namet3
JSON {id_user=1, type_blood=0, allergies=(name1,name2), table=(namet1,namet2,namet3)}
It's better three SELECT or a JOIN, but then I have to iterate on the results and I can't imagine a esay way. A JOIN can give me a result like:
id_user | type_blood | allergy_name | another_table_name
1 O name1 namet1
1 O name1 namet2
1 O name1 namet3
1 O name2 namet1
1 O name2 namet2
1 O name2 namet3
Is there any way to extract:
id_user | type_blood | allergy_name | another_table_name
1 O name1 namet1
1 O name2 namet2
1 O namet3
Thanks community, I'm newbie in SQL
Depending on the data - there is no way to get the 2nd set of results you've shown, if the 1st set of results shows the values. The 2nd one is throwing data away - in this case allergy 'name2' for another_table_name 'namet3'. This is why you get many rows back with repeated data.
You can use the group by clause to restrict this in some cases, but again - it won't let you throw away data like that.
You could try using the COALESCE clause, if your DB supports it.
If not, I think you're going to have to construct your JSON in some business logic, in which case its fine to read the data in a 3-way join. You order by the user id and either create or append the row data to the JSON document depending if a user record is present or not (if you order by user id, you only need to keep track of when the user id value changes).
Alternatively, you can read a list of users and single-item data in one query, and then ht the DB again for the repeating data.

sybase - values from one table that aren't on another, on opposite ends of a 3-table join

Hypothetical situation: I work for a custom sign-making company, and some of our clients have submitted more sign designs than they're currently using. I want to know what signs have never been used.
3 tables involved:
table A - signs for a company
sign_pk(unique) | company_pk | sign_description
1 --------------------1 ---------------- small
2 --------------------1 ---------------- large
3 --------------------2 ---------------- medium
4 --------------------2 ---------------- jumbo
5 --------------------3 ---------------- banner
table B - company locations
company_pk | company_location(unique)
1 ------|------ 987
1 ------|------ 876
2 ------|------ 456
2 ------|------ 123
table C - signs at locations (it's a bit of a stretch, but each row can have 2 signs, and it's a one to many relationship from company location to signs at locations)
company_location | front_sign | back_sign
987 ------------ 1 ------------ 2
987 ------------ 2 ------------ 1
876 ------------ 2 ------------ 1
456 ------------ 3 ------------ 4
123 ------------ 4 ------------ 3
So, a.company_pk = b.company_pk and b.company_location = c.company_location. What I want to try and find is how to query and get back that sign_pk 5 isn't at any location. Querying each sign_pk against all of the front_sign and back_sign values is a little impractical, since all the tables have millions of rows. Table a is indexed on sign_pk and company_pk, table b on both fields, and table c only on company locations. The way I'm trying to write it is along the lines of "each sign belongs to a company, so find the signs that are not the front or back sign at any of the locations that belong to the company tied to that sign."
My original plan was:
Select a.sign_pk
from a, b, c
where a.company_pk = b.company_pk
and b.company_location = c.company_location
and a.sign_pk *= c.front_sign
group by a.sign_pk having count(c.front_sign) = 0
just to do the front sign, and then repeat for the back, but that won't run because c is an inner member of an outer join, and also in an inner join.
This whole thing is fairly convoluted, but if anyone can make sense of it, I'll be your best friend.
How about something like this:
SELECT DISTINCT sign_pk
FROM table_a
WHERE sign_pk NOT IN
(
SELECT DISTINCT front_sign sign
FROM table_c
UNION
SELECT DISTINCT rear_sign sign
FROM table_c
)
ANSI outer join is your friend here. *= has dodgy semantics and should be avoided
select distinct a.sign_pk, a.company_pk
from a join b on a.company_pk = b.company_pk
left outer join c on b.company_location = c.company_location
and (a.sign_pk = c.front_sign or a.sign_pk = c.back_sign)
where c.company_location is null
Note that the where clause is a filter on the rows returned by the join, so it says "do the joins, but give me only the rows that didn't to join to c"
Outer join is almost always faster than NOT EXISTS and NOT IN
I would be tempted to create a Temp table for the inner join and then outer join that.
But it really depends on the size of your data sets.
Yes, the schema design is flawed, but we can't always fix that!

SQL Alternative to performing an INNER JOIN on a single table

I have a large table (TokenFrequency) which has millions of rows in it. The TokenFrequency table that is structured like this:
Table - TokenFrequency
id - int, primary key
source - int, foreign key
token - char
count - int
My goal is to select all of the rows in which two sources have the same token in it. For example if my table looked like this:
id --- source --- token --- count
1 ------ 1 --------- dog ------- 1
2 ------ 2 --------- cat -------- 2
3 ------ 3 --------- cat -------- 2
4 ------ 4 --------- pig -------- 5
5 ------ 5 --------- zoo ------- 1
6 ------ 5 --------- cat -------- 1
7 ------ 5 --------- pig -------- 1
I would want a SQL query to give me source 1, source 2, and the sum of the counts. For example:
source1 --- source2 --- token --- count
---- 2 ----------- 3 --------- cat -------- 4
---- 2 ----------- 5 --------- cat -------- 3
---- 3 ----------- 5 --------- cat -------- 3
---- 4 ----------- 5 --------- pig -------- 6
I have a query that looks like this:
SELECT F.source AS source1, S.source AS source2, F.token,
(F.count + S.count) AS sum
FROM TokenFrequency F
INNER JOIN TokenFrequency S ON F.token = S.token
WHERE F.source <> S.source
This query works fine but the problems that I have with it are that:
I have a TokenFrequency table that has millions of rows and therefore need a faster alternative to obtain this result.
The current query that I have is giving duplicates. For example its selecting:
source1=2, source2=3, token=cat, count=4
source1=3, source2=2, token=cat, count=4
Which isn't too much of a problem but if there is a way to elimate those and in turn obtain a speed increase then it would be very useful
The main issue that I have is speed of the query with my current query it takes hours to complete. The INNER JOIN on a table to itself is what I believe to be the problem. Im sure there has to be a way to eliminate the inner join and get similar results just using one instance of the TokenFrequency table. The second problem that I mentioned might also promote a speed increase in the query.
I need a way to restructure this query to provide the same results in a faster, more efficient manner.
Thanks.
I'd need a little more info to diagnose the speed issue, but to remove the dups, add this to the WHERE:
AND F.source<S.source
Try this:
SELECT token, GROUP_CONCAT(source), SUM(count)
FROM TokenFrequency
GROUP BY token;
This should run a lot faster and also eliminate the duplicates. But the sources will be returned in a comma-separated list, so you'll have to explode that in your application.
You might also try creating a compound index over the columns token, source, count (in that order) and analyze with EXPLAIN to see if MySQL is smart enough to use it as a covering index for this query.
update: I seem to have misunderstood your question. You don't want the sum of counts per token, you want the sum of counts for every pair of sources for a given token.
I believe the inner join is the best solution for this. An important guideline for SQL is that if you need to calculate an expression with respect to two different rows, then you need to do a join.
However, one optimization technique that I mentioned above is to use a covering index so that all the columns you need are included in an index data structure. The benefit is that all your lookups are O(log n), and the query doesn't need to do a second I/O to read the physical row to get other columns.
In this case, you should create the covering index over columns token, source, count as I mentioned above. Also try to allocate enough cache space so that the index can be cached in memory.
If token isn't indexed, it certainly should be.