Can someone verify that I'm interpreting this sql statement properly? - sql

I'm using rails and a plugin to manage tags - The query I'm using involves
- a tasks table (many to many relationship with users)
- a tags table ( stores reference to a tag - id (int), name (string) )
- a taggable table (polymorphic table that references a tag, the taggable item, and the tagger of that item, in my case, tasks)
Here is the sql:
ActsAsTaggableOn::Tag Load (0.1ms)
SELECT "tags".* FROM "tags" WHERE (lower(name) = '#sharedtag')
Task Load (0.4ms)
SELECT "tasks".*
FROM "tasks" INNER JOIN "task_relationships"
ON "tasks"."id" = "task_relationships"."task_id"
JOIN taggings tasks_taggings_f7b47be
ON tasks_taggings_f7b47be.taggable_id = tasks.id
AND tasks_taggings_f7b47be.taggable_type = 'Task'
AND tasks_taggings_f7b47be.tag_id = 23
WHERE "task_relationships"."user_id" = 1
ORDER BY tasks.created_at DESC
What I'm confused about is line 3 of the task load, where tasks_taggings_f7b47be.tag_id shows up out of nowhere. I assume it's some sort of temporary table or reference to a created join table, but have only recently started exploring sql.
Any explanation, links, or general knowledge would be appreciated.

I think tasks_taggings_f7b47be is an alias to the taggings table => http://www.w3schools.com/sql/sql_alias.asp
It is permissible to omit the AS keyword:
"The general syntax of an alias is SELECT * FROM table_name [AS] alias_name. Note that the AS keyword is completely optional and is usually kept for readability purposes." more

Related

How to replace the column value with value from other connected table

The code below is my query code of postgresql schema views.
Please assuming this a library table, which is a book list and you have some defined tags can apply on the book itself, and every book will be devided into one category.
CREATE VIEW tagging_books AS
SELECT tags."TagName", books."BookISBN", books."BookName", books."BookCategoryID"
FROM library
INNER JOIN tags on library."TagName_id" = tags."id"
INNER JOIN books on library."BookISBN_id" = books."id"
ORDER BY tags."id"
The schema views inside db will looks like this:
/tags.TagName /books.BookISBN /books.BookName /books.BookCategoryID
Python ISBN 957-208-570-0 Learn Python 1
And the BookCategoryID from table "books" is actually a foreign key of table "category", the table looks like this:
/category
BookCategoryID CategoryName
1 Toolbook
I wonder that, is there anyway to replace the books."BookCategoryID" field to category."CategoryName" by query code? Like the example below.
/tags.TagName /books.BookISBN /books.BookName /category.CategoryName
Python ISBN 957-208-570-0 Learn Python Toolbook
Since they are connected with each other, I think they can definitely being replaced, but I don't know how to do... Thank you.
To include category.name, simply join with table category on the foreign-key constraint like:
select category.name, books.*
from books
join category on books.BookCategoryID = category.BookCategoryID
You can add it to your view creation as well:
CREATE VIEW tagging_books AS
SELECT tags.TagName, books.BookISBN, books.BookName, category.name as "CategoryName"
FROM library
JOIN tags on library.TagName_id = tags.id
JOIN books on library.BookISBN_id = books.id
JOIN category on books.BookCategoryID = category.BookCategoryID
ORDER BY tags.id

How can I remove the repetitive joins from this SELECT query?

I have a postgres database with two tables: services and meta.
The first table stores the "core" information the application needs, and the app also has a "custom field" feature implemented similar to how Wordpress's wp_post_meta table works.
Users can add on meta rows with arbitrary keys and values, in a one-to-many relationship with the service.
The schema of the meta table is:
id
key (string)
value (string)
service_id (foreign key)
That works great for the app, so I'm not interested in changing the schema, but for some infrequently used admin dashboards I need to get back a list of services with several of the meta rows joined on as columns.
Here's what I have so far:
SELECT
services.*,
meta1.value AS funding,
meta2.value AS ownership
FROM services
JOIN meta meta1
ON services.id = meta.service_id
AND meta.key = 'Funding'
JOIN meta meta2
ON services.id = meta2.service_id
AND meta2.key = 'Ownership'
Now, this works great, but I have to do another join every time I want to add another meta value.
That seems like it will slow down the query and make it less readable.
Is there a good way to refactor this to keep it easy to read and fast to run?
here's an attempted refactor using OR, which doesn't work:
SELECT
*,
meta.value AS funding,
meta.value AS ownership
FROM services
JOIN meta
ON services.id = meta.service_id
AND meta.key = 'Funding' OR meta.key = 'Ownership'
One way would be to aggregate the key/value pairs into a JSON value with a derived table:
select srv.*,
mv.vals ->> 'Funding' as funding,
mv.vals ->> 'Ownership' as ownership
from services srv
cross join lateral (
select jsonb_object_agg(m.key, m.value) as vals
from meta m
where m.key in ('Funding', 'Ownership')
and m.service_id = srv.id
) as mv
If your application can handle the JSON, then maybe the conversion into two separate columns isn't actually necessary which would avoid the repetition of the keys.
You can use conditional aggregation:
SELECT s.*, m.funding, m.ownership
FROM services s JOIN
(SELECT m.service_id,
MAX(value) FILTER (WHERE key = 'Funding') as Funding,
MAX(value) FILTER (WHERE key = 'Ownership') as Ownership
FROM meta m
GROUP BY m.service_id
) m
ON m.service_id = s.id

Matching Many To Many relation

I have the following entities:
#Entity
class User {
#ManyToMany(type => Group)
#JoinTable()
groups: Group[];
}
#Entity
class MediaObject {
#ManyToMany(type => Group)
#JoinTable()
groups: Group[];
}
#Entity
class Group {
// [...]
}
Now I want to select every MediaObject which has at least one group in common with the one specific User.
Example:
User 1 MediaObject 1
-----------------------------
Group 1 |--- Group 2
Group 2 ---| Group 3
User 1 has at least one same group as MediaObject
How can I build a where sql query for this? I use typeorm for building my queries, yet every sql query would help. Also, I want to understand how.
Typeorm joins the tables like this
LEFT JOIN "group" "groups" ON "groups"."id" = "media_groups"."groupId"
Using a simple JOIN you may retrieve MediaObject Id's that share at least one group with the user. Than use IN to retrieve the MediaObject's
select *
from MediaObject mo
where mo.id in
(
select moJoin.mediaObjectId
from media_object_groups_group moJoin
join user_groups_group uJoin on moJoin.groupId = uJoin.groupId
where uJoin.userId = 1
)
If there can be multiple overlapping groups between same MediaObject and the same User, an EXISTS semi-join might be faster than using IN:
SELECT m.*
FROM "MediaObject" m
WHERE EXISTS (
SELECT -- select list can be empty here
FROM user_groups_group gu
JOIN media_object_groups_group gm USING ("groupId")
WHERE gu."userId" = 1
AND gm."mediaObjectId" = m.id
);
Else, Radim's query should serve just fine after adding some double-quotes.
This assumes that referential integrity is enforced with foreign key constraints, so it's safe to rely on user_groups_group."userId" without checking the corresponding user even exists.
It's unwise to use reserved words like "user" or "group" or CaMeL-case strings as identifiers. Either requires double-quoting. ORMs regularly serve poorly in this respect. See:
Are PostgreSQL column names case-sensitive?

ActiveRecord Join Query and select in Rails

In my rails 4 application, a client (clients table) can have many projects (projects table). I have a column called name in each table. I am trying to write a join and then select which uses projects as the base table and clients as the lookup table. client_id is the foreign_key in the projects table:
I am writing my query as follows:
Project.joins(:client).select('projects.id,projects.name,clients.name')
I get the following response:
Project Load (0.6ms) SELECT projects.id,projects.name,clients.name FROM "projects" INNER JOIN "clients" ON "clients"."id" = "projects"."client_id"
=> #<ActiveRecord::Relation [#<Project id: 1, name: "Fantastico Client">]>
If I try to alias it like so:
Project.joins(:client).select('projects.id,projects.name,clients.name as client_name')
Then I get the following response:
Project Load (0.8ms) SELECT projects.id,projects.name,clients.name as client_name FROM "projects" INNER JOIN "clients" ON "clients"."id" = "projects"."client_id"
=> #<ActiveRecord::Relation [#<Project id: 1, name: "The Dream Project">]>
In either case, ActiveRecord looses one of the names as you can see from the above response. How should I be writing this query?
If the column in select is not one of the attributes of the model on which the select is called on then those columns are not displayed. All of these attributes are still contained in the objects within AR::Relation and are accessible as any other public instance attributes.
You could verify this by calling first.client_name:
Project.joins(:client)
.select('projects.id,projects.name,clients.name as client_name')
.first.client_name
You can use :'clients.name' as one of your symbols. For instance:
Project.select(:id, :name, :'clients.name').joins(:client)
I like it better because it seems like Rails understands it, since it quotes all parameters:
SELECT "projects"."id", "projects"."name", "clients"."name"
FROM "projects"
INNER JOIN "clients" ON "clients"."id" = "projects"."client_id"
(I'm not 100% sure that's the exact SQL query, but I'm fairly certain and I promise it will use "clients"."name")
To get both project table name and client name you can do like below query
Project.joins(:client).pluck(:name,:'clients.name')
your query don't looses any thing. Actually you have applied join on models and you have written Project.joins(:client) that why it is looking like.
means It will hold Project related data as it is and associated data hold with alias name that you have given 'client_name' in your query.
if you use
Project.joins(:client)
.select('projects.id project_id, projects.name projects_name,clients.name as client_name')
then it look like
[#, #]
but it hold all the attribute that you selected.
Try This:
sql = Project.joins(:client).select(:id, :name, :"clients.name AS client_name").to_sql
data = ActiveRecord::Base.connection.exec_query(sql)
OUTPUT
[
{"id"=>1, "name"=>"ProjectName1", "client_name"=>"ClientName1"},
{"id"=>2, "name"=>"ProjectName2", "client_name"=>"ClientName2"}
]

Optimizing MySQL Query

We have a query that is currently killing our database and I know there has to be a way to optimize it. We have 3 tables:
items - table of items where each items has an associated object_id, length, difficulty_rating, rating, avg_rating & status
lists - table of lists which are basically lists of items created by our users
list_items - table with 2 columns: list_id, item_id
We've been using the following query to display a simple HTML table that shows each list and a number of attributes related to the list including averages of attributes of the included list items:
select object_id, user_id, slug, title, description, items,
city, state, country, created, updated,
(select AVG(rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_rating',
(select AVG(avg_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_avg_rating',
(select AVG(length) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_length',
(select AVG(difficulty_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby LIMIT $start,$step
The reason why we haven't broken this up in 1 query to get all the lists and subsequent lookups to pull the averages for each list is because we want the user to be able to sort on the averages columns (i.e. 'order by avg_difficulty').
Hopefully my explanation makes sense. There has to be a much more efficient way to do this and I'm hoping that a MySQL guru out there can point me in the right direction. Thanks!
It looks like you can replace all the subqueries with joins:
SELECT l.object_id,
l.user_id,
<other columns from lists>
AVG(i.rating) as avgrating,
AVG(i.avg_rating) as avgavgrating,
<other averages>
FROM lists l
LEFT JOIN list_items li
ON li.list_id = l.object_id
LEFT JOIN items i
ON i.object_id = li.object_id
AND i.status = 'A'
WHERE l.user_id = $user_id AND l.status = 'A'
GROUP BY l.object_id, l.user_id, <other columns from lists>
That would save a lot of work for the DB engine.
Here how to find the bottleneck:
Add the keyword EXPLAIN before the SELECT. This will cause the engine to output how the SELECT was performed.
To learn more about Query Optimization with this method see: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
A couple of things to consider:
Make sure that all of your joins are indexed on both sides. For example, you join list_items.list_id=lists.object_id in several places. list_id and object_id should both have indexes on them.
Have you done any research as to what the variation in the averages are? You might benefit from having a worker thread (or cronjob) calculate the averages periodically rather than putting the load on your RDBMS every time you run this query. You'd need to store the averages in a separate table of course...
Also, are you using status as an enum or a varchar? The cardinality of an enum would be much lower; consider switching to this type if you have a limited range of values for status column.
-aj
That's one hell of a query... you should probably edit your question and change the query so it's a bit more readable, although due to the complex nature of it, I'm not sure that's possible.
Anyway, the simple answer here is to denormalize your database a bit and cache all of your averages on the list table itself in indexed decimal columns. All those sub queries are killing you.
The hard part, and what you'll have to figure out is how to keep those averages updated. A generally easy way is to store the count of all items and the sum of all those values in two separate fields. Anytime an action is made, increment the count by 1, and the sum by whatever. Then update table avg_field = sum_field/count_field.
Besides indexing, even a cursory analysis shows that your query contains much redundancy that your DBMS' optimizer cannot be able to spot (SQL is a redundant language, it admits too many equivalents, syntactically different expressions; this is a known and documented problem - see for example SQL redundancy and DBMS performance, by Fabian Pascal).
I will rewrite your query, below, to highlight that:
let LI =
select object_id from list_items where list_id=lists.object_id
in
select object_id, user_id, slug, title, description, items, city, state, country, created, updated,
(select AVG(rating) from items where object_id IN LI AND status="A") as 'avg_rating',
(select AVG(avg_rating) from items where object_id IN LI AND status="A") as 'avg_avg_rating',
(select AVG(length) from items where object_id IN LI AND status="A") as 'avg_length',
(select AVG(difficulty_rating) from items where object_id IN LI AND status="A") as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby
LIMIT $start, $step
Note: this is only the first step to refactor that beast.
I wonder: why people rarely - if at all - use views, even only to simplify SQL queries? It will help in writing more manageable and refactorable queries.