How to join products and their characteristics - sql

How to join products and their characteristics
I have two tables.
Product (id, title, price, created_at, updated_at etc)
and
ProductCharacteristic(id, product_id, sold_quantity, date, craated_at, updated_at etc).
I should show products table (header is product.id, product.title, product.price, sold_quantity) for some period of time and ordered by any fields from header.
And I can't write query
Now I have following query
> current_project.products.includes(:product_characteristics).group('products.id').pluck(:title, 'SUM(product_characteristics.sold_quantity) AS sold_quantity')
(45.4ms) SELECT "products"."title", SUM(product_characteristics.sold_quantity) AS sold_quantity FROM "products" LEFT OUTER JOIN "product_characteristics" ON "product_characteristics"."product_id" = "products"."id" WHERE "products"."project_id" = $1 GROUP BY products.id [["project_id", 20]]
Please help me to write query through orm(to add where with dates and ordering) or write raw sql query.
I used pluck. It returns array of arrays (not array of hashes). It's no so good of course.
product_characteristics.date field is unique by scope product_id. But please give me two examples (with this condition and without it to satisfy my curiosity).
And I use postgresql and rails 4.2.x
P.S. By the way the ProductCharacteristic table will have a lot of records(mote than one million). Should I use postgresql table partitioning. Can it improve performance?
Thank you.

You can use select instead of count in that case, and the property will be accessible as product.sold_quantity
The query becomes
products = current_project.products.joins(:product_characteristics).group('products.id').select(:title, 'SUM(product_characteristics.sold_quantity) AS sold_quantity')
products.first.sold_quantity # => works
To order, you can just add an order clause
products = products.order(id: :asc)
or
products = products.order(id: :desc)
for instance
And for the where
products = products.where("created_at > ?", 2.days.ago)
for instance.
You can chain sql clauses after the first line, it does not matter cause the query will only be launched when you actually use the retrieved set.
And so you can also do stuff like
if params[:foo]
products = products.order(:id)
end

Related

Active Record - How to perform a nested select on a second table?

I need to list all customers along with their latest order date (plus pagination).
How can I write the following SQL query using Active Record?
select *,
(
select max(created_at)
from orders
where orders.customer_id = customers.id
) as latest_order_date
from customers
limit 25 offset 0
I tried this but it complains missing FROM-clause entry for table "customers":
Customer
.select('*')
.select(
Order
.where('customer_id = customers.id')
.maximum(:created_at)
).page(params[:page])
# Generates this (clearly only the Order query):
SELECT MAX("orders"."created_at")
FROM "orders"
WHERE (customer_id = customers.id)
EDIT: it would be good to keep AR's parameterization and kaminari's pagination goodness.
You haven't given us any information about the relationship between these two tables, so I will assume Customer has_many Orders.
While ActiveRecord doesn't support what you are trying to do, it is built on top of Arel, which does.
Every Rails model has a method named arel_table that will return its corresponding Arel::Table. You might want a helper library to make this cleaner because the default way is a little cumbersome. I will use the plain Arel syntax to maximize compatibility.
ActiveRecord understands Arel objects and can accept them alongside its own syntax.
orders = Order.arel_table
customers = Customer.arel_table
Customer.joins(:orders).group(:id).select([
customers[Arel.star],
orders[:created_at].maximum.as('latest_order_date')
])
Which produces
SELECT "customers".*, MAX("orders"."created_at") AS "latest_order_date"
FROM "customers"
INNER JOIN "orders" ON "orders"."customer_id" = "customers"."id"
GROUP BY "customers"."id"
This is the customary way of doing this, but if you still want to do it as a subquery, you can do this
Customer.select([
customers[Arel.star],
orders.project(orders[:created_at].maximum)
.where(orders[:customer_id].eq(customers[:id]))
.as('latest_order_date')
])
Which gives us
SELECT "customers".*, (
SELECT MAX("orders"."created_at")
FROM "orders"
WHERE "orders"."customer_id" = "customers"."id" ) "latest_order_date"
FROM "customers"
The most Active Record-ish way I've come up with so far is:
Customer
.page(params[:page])
.select('*')
.select(<<-SQL.squish)
(
SELECT MAX(created_at) AS latest_order_date
FROM orders
WHERE orders.customer_id = customers.id
)
SQL
I still wish I could make the string part more Active Record-ish.
The <<-SQL is just heredoc.
Here is the same answer #adam was giving, but not using AREL and just straight ActiveRecord. Not sure it's really much better than #João Marcelo Souza
Customer.select("customers.*, max(orders.created_at)").joins(:orders).group("customers.id").page(params[:page])
(The group by avoids list all the customers columns by using this feature of Postgres 9.1 and higher.)
The OP doesn't say, but the query doesn't handle the case where the customer has no orders. This version does that:
Customer.select("customers.*, coalesce(max(orders.created_at),0)").joins("left outer join orders on orders.customer_id=customers.id").group("customers.id").page(params[:page])

How do I write an SQL query whose where clause is contained in another query?

I have two tables, one for Customer and one for Item.
In Customer, I have a column called "preference", which stores a list of hard criteria expressed as a WHERE clause in SQL e.g. "item.price<20 and item.category='household'".
I'd like a query that works like this:
SELECT * FROM item WHERE interpret('SELECT preference FROM customer WHERE id = 1')
Which gets translated to this:
SELECT * FROM item WHERE item.price < 20 and item.category = 'household'
Example data model:
CREATE TABLE customer (
cust_id INT
preference VARCHAR
);
CREATE TABLE item (
item_id INT
price DECIMAL(19,4)
category VARCHAR
);
# Additional columns omitted for brevity
I've looked up casting and dynamic SQL but I haven't been able to figure out how I should do this.
I'm using PostgreSQL 9.5.1
I'm going to assume that preference is the same as my made up item_id column. You may need to tweak it to fit your case. For future questions like this it may pay to give us the table structures you are working with, it really helps us out!
What you are asking for is a subquery:
select *
from item
where item_id in (select
preference
from
customer
where id = 1)
What I would suggest though is a join:
select item.*
from item
join customer on item.item_id = customer.preference
where item.price<20 and
item.category='household'
customer.id = 1
I decided to change my schema instead, as it was getting pretty messy to store the criteria in preferences in that manner.
I restricted the kinds of preferences that could be specified, then stored them as columns in Customer.
After that, all the queries I wanted could be expressed as joins.

Activerecord query returning doubles while using uniq

I am running the following query with the goal of returning a unique set of customer objects:
Customer.joins(:projects).select('customers.*, projects.finish_date').where("projects.closed = false").uniq
However, this code will generate duplicates if a customer has more than one project active (e.g. closed = true). If I remove the projects.finish_date from the select clause this query works as intended. However, I need this to be in there to be able to sort on that column.
How can I make this query return a unique set of customers?
How can I make this query return a unique set of customers?
This doesn't completely make sense, and probably isn't what you want.
The problem is that you're joining against the projects table, at which point there may be several rows for the same customer with different project finish_dates. These rows are unique and will be returned as multiple unique Customer objects, each with different a finish_date.
If you only want one of these, how is Rails to determine which one? Wouldn't it be a problem if you only had one customer object with one finish_date returned if there are really 10 projects for that customer, each with a different finish_date?
Instead, you probably want something like this:
customers = Customer.joins(:projects).select('customers.*, projects.finish_date').where("projects.closed = false").uniq
customers.group_by(&:id)
This groups all of your same customers together.
OR, you might want:
projects = Project.where(closed: false).includes(:user)
users = projects.map(&:user).uniq
In either case, you're producing a unique set of users from the superset of all user-project joins.
RE Your comments:
If you want to get a list of customers with their most recent associated project, you could use a sub query in your where:
select customers.*, projects.finish_date from customers
inner join projects on projects.customer_id = customers.id
where projects.id = (
select id from projects
where customer.id = project.customer_id
and closed = false
order by finish_date desc
limit 1
)
You can express this using ActiveRecord by embedding the sub-query in a where:
Customer.joins(:projects)
.select('customers.*, projects.finish_date as finish_date')
.where('select id from projects where customer.id = project.customer_id and closed = false order by finish_date desc limit 1')
I have no idea how this will perform for you, but I suspect poorly.
I would always stick to a simple includes and in-Ruby filter before attempting to optimize with SQL.

How to use sqlite db functions to set values from calculation

I have an app (its ruby on rails but it shouldn't matter) where I am bulk loading items and quantities per branch.
The database is sqlite.
Say I have a two tables: items and quantities.
items columns:
id
quantity
quantities columns:
id
item_id
branch_id
value
I need to be able to loop through all records in the items table set the item.quantity to the sum of all quantity.value where item_id matches item.id
I would like to know if there is a way to do this all within a sql query (the select and update) so that I don't have to pull the data out of the database, do the calculation and then write the updated quantity back to each item, instead I would like to have it all occur in the database using sql statements and functions.
Is this possible in a sql query function db side?
example:
item[1] = {id=>1,quantity=>0}
quantity[1] = {id=>1,item_id=>1,value=>100,branch_id=>1}
quantity[2] = {id=>2,item_id=>1,value=>200,branch_id=>2}
quantity[3] = {id=>3,item_id=>1,value=>300,branch_id=>3}
item[1].quantity = quantity[1].value + quantity[2].value + quantity[3].value
You can do this with an update query. In this case, you can use a correlated subquery to calculate the sum of the quantities for a given item:
update item
set quantity = (select sum(value) from quantities where item.id = quantities.item_id);
EDIT:
If you want to speed it up putting an index on quantities(item_id, value).
You could also summarize quantities in a temporary table and use this instead. The index, however, however, should be sufficient for performance.
You are duplicating your data! Which means you must always keeping everything in sync.
I would prefer to create a view and get total quantities from there:
CREATE TABLE items(id, ...);
CREATE TABLE quantities(id, item_id, branch_id, value, ...);
CREATE VIEW items_sum AS SELECT items.*, SUM(quantities.value) AS quantity FROM items LEFT JOIN quantities ON items.id=quantities.item_id GROUP BY items.id;
As items_sum is a query, you don't have to update it. SELECT * FROM items_sum;will return desired results for total quantities.

SQL Join IN Query with AND?

I have the following tables:
Option
-------
id - int
name - varchar
Product
---------
id - int
name -varchar
ProductOptions
------------------
id - int
product_id - int
option_id - int
If I have a list of option ids, how can I retrieve all Products that have all the options with the list of ids that I have? I know that SQL "IN" will use an "OR" i need an "AND". Thank you!
If the ids are not repeated, you can retrieve the ids of the options you need and count how many they are. Then, you just
SELECT product_id FROM ProductOptions
WHERE option_id IN ( OPTIONS )
GROUP BY product_id
HAVING COUNT(product_id) = NEEDED;
Without the GROUP BY, if you had five option ids, and product 27 had fifteen options among which there were those five, you'd get five rows with the same product_id. The GROUP BY joins those rows. Since you want ALL options, and options have all different IDs, asking "rows with all of them" is equivalent to asking "rows with as many options as the desired option set size".
Plus, you run the big query on ProductOptions only, which should be really fast.
One way to approach queries like this is with a group by and having clause. It is best if you start with your list of required options in a list:
with list as (
select <optionname1> as optionname union all
select <optionname2 union all . . .
)
select ProductId
from list l left outer join
Options o
on l.optionname = o.name
ProductOptions po join
on po.option_id = o.option_id left outer join
group by ProductId
having count(distinct o.optionname) = count(distinct l.optionname)
This guarantees that all are in the list. By the way, I used SQL Server syntax to generate the list.
If you have the list in other formats, such as a delimited string, there are other options. There are other possibilities depending on the database you are using. However, the above idea should work on any database, with two caveats:
The with statement might just become a subquery in the FROM clause where "list" is.
The method for creating the list (a table of constants) varies among databases
If you have list of Id's you have basically only 2 options.
- Either to call as many selects as many id's you have
- or you have to use IN () or OR.
The usage of IN would be recommended however, as calling one statement is usually more performant (moreover in case you have index on all your id columns, no table scan should be required).
I'd use following statement:
select Product.* from Product, Option, ProductOption where Option.id IN ( 1, 2, ... ) and option.id = ProductOption.option_id and Product.product_id = Product.id
One more remark, why do you have id column in ProductOptions table? It's useless from my point of view, you should rather have composite primary key from columns product_id and option_id (as this couple is unique).
Will this work?:
select p.id, p.name
from Product as p inner join
ProductOptions as po on p.id=po.product_id
where po.option_id in (1,2,3,4)