How do I write an SQL query whose where clause is contained in another query? - sql

I have two tables, one for Customer and one for Item.
In Customer, I have a column called "preference", which stores a list of hard criteria expressed as a WHERE clause in SQL e.g. "item.price<20 and item.category='household'".
I'd like a query that works like this:
SELECT * FROM item WHERE interpret('SELECT preference FROM customer WHERE id = 1')
Which gets translated to this:
SELECT * FROM item WHERE item.price < 20 and item.category = 'household'
Example data model:
CREATE TABLE customer (
cust_id INT
preference VARCHAR
);
CREATE TABLE item (
item_id INT
price DECIMAL(19,4)
category VARCHAR
);
# Additional columns omitted for brevity
I've looked up casting and dynamic SQL but I haven't been able to figure out how I should do this.
I'm using PostgreSQL 9.5.1

I'm going to assume that preference is the same as my made up item_id column. You may need to tweak it to fit your case. For future questions like this it may pay to give us the table structures you are working with, it really helps us out!
What you are asking for is a subquery:
select *
from item
where item_id in (select
preference
from
customer
where id = 1)
What I would suggest though is a join:
select item.*
from item
join customer on item.item_id = customer.preference
where item.price<20 and
item.category='household'
customer.id = 1

I decided to change my schema instead, as it was getting pretty messy to store the criteria in preferences in that manner.
I restricted the kinds of preferences that could be specified, then stored them as columns in Customer.
After that, all the queries I wanted could be expressed as joins.

Related

Repeat a query over the parameter list

I would like to iterate the same query while using different parameter values from a predefined list.
Say, I have a table with two columns. The first columns contains customer name. The second column contains customer spending.
###CUSTOMER; SPENDING###
customer1; 1000
customer2; 111
customer3; 100
customer1; 323
...
I know the complete list of customers: customerlist = {customer1, customer2, customer3}.
I would like to do something like:
Select sum(spending)
from mytable
where customer = #customerlist
The query should compute the sum of spending for each customer defined in the customer list. I have found some examples of sql procedures with stored parameters but not the case with one parameter of multiple values.
Thank you
P.S. This is just a hypothetical example to illustrate my question (I know it would be much more effective to use here a simple group by).
You can use nested query like this
SELECT CustomerList.CustomerName Cust, isnull((SELECT SUM(Spending) CustSpending
FROM Customer
WHERE Customer.CustomerName = CustomerList.CustomerName),0)
FROM CustomerList
This would normally be done using GROUP BY:
Select customer, sum(spending)
from mytable
group by customer;
GROUP BY is a very fundamental part of SQL. You should review your knowledge of SQL so you understand how to use it.

How to join products and their characteristics

How to join products and their characteristics
I have two tables.
Product (id, title, price, created_at, updated_at etc)
and
ProductCharacteristic(id, product_id, sold_quantity, date, craated_at, updated_at etc).
I should show products table (header is product.id, product.title, product.price, sold_quantity) for some period of time and ordered by any fields from header.
And I can't write query
Now I have following query
> current_project.products.includes(:product_characteristics).group('products.id').pluck(:title, 'SUM(product_characteristics.sold_quantity) AS sold_quantity')
(45.4ms) SELECT "products"."title", SUM(product_characteristics.sold_quantity) AS sold_quantity FROM "products" LEFT OUTER JOIN "product_characteristics" ON "product_characteristics"."product_id" = "products"."id" WHERE "products"."project_id" = $1 GROUP BY products.id [["project_id", 20]]
Please help me to write query through orm(to add where with dates and ordering) or write raw sql query.
I used pluck. It returns array of arrays (not array of hashes). It's no so good of course.
product_characteristics.date field is unique by scope product_id. But please give me two examples (with this condition and without it to satisfy my curiosity).
And I use postgresql and rails 4.2.x
P.S. By the way the ProductCharacteristic table will have a lot of records(mote than one million). Should I use postgresql table partitioning. Can it improve performance?
Thank you.
You can use select instead of count in that case, and the property will be accessible as product.sold_quantity
The query becomes
products = current_project.products.joins(:product_characteristics).group('products.id').select(:title, 'SUM(product_characteristics.sold_quantity) AS sold_quantity')
products.first.sold_quantity # => works
To order, you can just add an order clause
products = products.order(id: :asc)
or
products = products.order(id: :desc)
for instance
And for the where
products = products.where("created_at > ?", 2.days.ago)
for instance.
You can chain sql clauses after the first line, it does not matter cause the query will only be launched when you actually use the retrieved set.
And so you can also do stuff like
if params[:foo]
products = products.order(:id)
end

How to use sqlite db functions to set values from calculation

I have an app (its ruby on rails but it shouldn't matter) where I am bulk loading items and quantities per branch.
The database is sqlite.
Say I have a two tables: items and quantities.
items columns:
id
quantity
quantities columns:
id
item_id
branch_id
value
I need to be able to loop through all records in the items table set the item.quantity to the sum of all quantity.value where item_id matches item.id
I would like to know if there is a way to do this all within a sql query (the select and update) so that I don't have to pull the data out of the database, do the calculation and then write the updated quantity back to each item, instead I would like to have it all occur in the database using sql statements and functions.
Is this possible in a sql query function db side?
example:
item[1] = {id=>1,quantity=>0}
quantity[1] = {id=>1,item_id=>1,value=>100,branch_id=>1}
quantity[2] = {id=>2,item_id=>1,value=>200,branch_id=>2}
quantity[3] = {id=>3,item_id=>1,value=>300,branch_id=>3}
item[1].quantity = quantity[1].value + quantity[2].value + quantity[3].value
You can do this with an update query. In this case, you can use a correlated subquery to calculate the sum of the quantities for a given item:
update item
set quantity = (select sum(value) from quantities where item.id = quantities.item_id);
EDIT:
If you want to speed it up putting an index on quantities(item_id, value).
You could also summarize quantities in a temporary table and use this instead. The index, however, however, should be sufficient for performance.
You are duplicating your data! Which means you must always keeping everything in sync.
I would prefer to create a view and get total quantities from there:
CREATE TABLE items(id, ...);
CREATE TABLE quantities(id, item_id, branch_id, value, ...);
CREATE VIEW items_sum AS SELECT items.*, SUM(quantities.value) AS quantity FROM items LEFT JOIN quantities ON items.id=quantities.item_id GROUP BY items.id;
As items_sum is a query, you don't have to update it. SELECT * FROM items_sum;will return desired results for total quantities.

SQL Join IN Query with AND?

I have the following tables:
Option
-------
id - int
name - varchar
Product
---------
id - int
name -varchar
ProductOptions
------------------
id - int
product_id - int
option_id - int
If I have a list of option ids, how can I retrieve all Products that have all the options with the list of ids that I have? I know that SQL "IN" will use an "OR" i need an "AND". Thank you!
If the ids are not repeated, you can retrieve the ids of the options you need and count how many they are. Then, you just
SELECT product_id FROM ProductOptions
WHERE option_id IN ( OPTIONS )
GROUP BY product_id
HAVING COUNT(product_id) = NEEDED;
Without the GROUP BY, if you had five option ids, and product 27 had fifteen options among which there were those five, you'd get five rows with the same product_id. The GROUP BY joins those rows. Since you want ALL options, and options have all different IDs, asking "rows with all of them" is equivalent to asking "rows with as many options as the desired option set size".
Plus, you run the big query on ProductOptions only, which should be really fast.
One way to approach queries like this is with a group by and having clause. It is best if you start with your list of required options in a list:
with list as (
select <optionname1> as optionname union all
select <optionname2 union all . . .
)
select ProductId
from list l left outer join
Options o
on l.optionname = o.name
ProductOptions po join
on po.option_id = o.option_id left outer join
group by ProductId
having count(distinct o.optionname) = count(distinct l.optionname)
This guarantees that all are in the list. By the way, I used SQL Server syntax to generate the list.
If you have the list in other formats, such as a delimited string, there are other options. There are other possibilities depending on the database you are using. However, the above idea should work on any database, with two caveats:
The with statement might just become a subquery in the FROM clause where "list" is.
The method for creating the list (a table of constants) varies among databases
If you have list of Id's you have basically only 2 options.
- Either to call as many selects as many id's you have
- or you have to use IN () or OR.
The usage of IN would be recommended however, as calling one statement is usually more performant (moreover in case you have index on all your id columns, no table scan should be required).
I'd use following statement:
select Product.* from Product, Option, ProductOption where Option.id IN ( 1, 2, ... ) and option.id = ProductOption.option_id and Product.product_id = Product.id
One more remark, why do you have id column in ProductOptions table? It's useless from my point of view, you should rather have composite primary key from columns product_id and option_id (as this couple is unique).
Will this work?:
select p.id, p.name
from Product as p inner join
ProductOptions as po on p.id=po.product_id
where po.option_id in (1,2,3,4)

SQL query to search an unique ID that can be in three different tables

I have three tables that control products, colors and sizes. Products can have or not colors and sizes. Colors can or not have sizes.
product color size
------- ------- -------
id id id
unique_id id_product (FK from product) id_product (FK from version)
stock unique_id id_version (FK from version)
title stock unique_id
stock
The unique_id column, that is present in all tables, is a serial type (autoincrement) and its counter is shared with the three tables, basically it works as a global unique ID between them.
It works fine, but i am trying to increase the query performance when i have to select some fields based in the unique_id.
As i don't know where is the unique_id that i am looking for, i am using UNION, like below:
select title, stock
from product
where unique_id = 10
UNION
select p.title, c.stock
from color c
join product p on c.id_product = p.id
where c.unique_id = 10
UNION
select p.title, s.stock
from size s
join product p on s.id_product = p.id
where s.unique_id = 10;
Is there a better way to do this? Thanks for any suggestion!
EDIT 1
Based on #ErwinBrandstetter and #ErikE answers i decided to use the below query. The main reasons is:
1) As unique_id has indexes in all tables, i will get a good performance
2) Using the unique_id i will find the product code, so i can get all columns i need using a another simple join
SELECT
p.title,
ps.stock
FROM (
select id as id_product, stock
from product
where unique_id = 10
UNION
select id_product, stock
from color
where unique_id = 10
UNION
select id_product, stock
from size
where unique_id = 10
) AS ps
JOIN product p ON ps.id_product = p.id;
PL/pgSQL function
To solve the problem at hand, a plpgsql function like the following should be faster:
CREATE OR REPLACE FUNCTION func(int)
RETURNS TABLE (title text, stock int) LANGUAGE plpgsql AS
$BODY$
BEGIN
RETURN QUERY
SELECT p.title, p.stock
FROM product p
WHERE p.unique_id = $1; -- Put the most likely table first.
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, c.stock
FROM color c
JOIN product p ON c.id_product = p.id
WHERE c.unique_id = $1;
END;
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, s.stock
FROM size s
JOIN product p ON s.id_product = p.id
WHERE s.unique_id = $1;
END IF;
END;
$BODY$;
Updated function with table-qualified column names to avoid naming conflicts with OUT parameters.
RETURNS TABLE requires PostgreSQL 8.4, RETURN QUERY requires version 8.2. You can substitute both for older versions.
It goes without saying that you need to index the columns unique_id of every involved table. id should be indexed automatically, being the primary key.
Redesign
Ideally, you can tell which table from the ID alone. You could keep using one common sequence, but add 100000000 for the first table, 200000000 for the second and 300000000 for the third - or whatever suits your needs. This way, the least significant part of the number is easily distinguishable.
A plain integer spans numbers from -2147483648 to +2147483647, move to bigint if that's not enough for you. I would stick to integer IDs, though, if possible. They are smaller and faster than bigint or text.
CTEs (experimental!)
If you cannot create a function for some reason, this pure SQL solution might do a similar trick:
WITH x(uid) AS (SELECT 10) -- provide unique_id here
, a AS (
SELECT title, stock
FROM x, product
WHERE unique_id = x.uid
)
, b AS (
SELECT p.title, c.stock
FROM x, color c
JOIN product p ON c.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM a)
AND c.unique_id = x.uid
)
, c AS (
SELECT p.title, s.stock
FROM x, size s
JOIN product p ON s.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM b)
AND s.unique_id = x.uid
)
SELECT * FROM a
UNION ALL
SELECT * FROM b
UNION ALL
SELECT * FROM c;
I am not sure whether it avoids additional scans like I hope. Would have to be tested. This query requires at least PostgreSQL 8.4.
Upgrade!
As I just learned, the OP runs on PostgreSQL 8.1.
Upgrading alone would speed up the operation a lot.
Query for PostgreSQL 8.1
As you are limited in your options, and a plpgsql function is not possible, this function should perform better than the one you have. Test with EXPLAIN ANALYZE - available in v8.1.
SELECT title, stock
FROM product
WHERE unique_id = 10
UNION ALL
SELECT p.title, ps.stock
FROM product p
JOIN (
SELECT id_product, stock
FROM color
WHERE unique_id = 10
UNION ALL
SELECT id_product, stock
FROM size
WHERE unique_id = 10
) ps ON ps.id_product = p.id;
I think it's time for a redesign.
You have things that you're using as bar codes for items that are basically all the same in one respect (they are SerialNumberItems), but have been split into multiple tables because they are different in other respects.
I have several ideas for you:
Change the Defaults
Just make each product required to have one color "no color" and one size "no size". Then you can query any table you want to find the info you need.
SuperType/SubType
Without too much modification you could use the supertype/subtype database design pattern.
In it, there is a parent table where all the distinct detail-level identifiers live, and the shared columns of the subtype tables go in the supertype table (the ways that all the items are the same). There is one subtype table for each different way that the items are distinct. If mutual exclusivity of the subtype is required (you can have a Color or a Size but not both), then the parent table is given a TypeID column and the subtype tables have an FK to both the ParentID and the TypeID. Looking at your design, in fact you would not use mutual exclusivity.
If you use the pattern of a supertype table, you do have the issue of having to insert in two parts, first to the supertype, then the subtype. Deleting also requires deleting in reverse order. But you get a great benefit of being able to get basic information such as Title and Stock out of the supertype table with a single query.
You could even create schema-bound views for each subtype, with instead-of triggers that convert inserts, updates, and deletes into operations on the base table + child table.
A Bigger Redesign
You could completely change how Colors and Sizes are related to products.
First, your patterns of "has-a" are these:
Product (has nothing)
Product->Color
Product->Size
Product->Color->Size
There is a problem here. Clearly Product is the main item that has other things (colors and sizes) but colors don't have sizes! That is an arbitrary assignment. You may as well have said that Sizes have Colors--it doesn't make a difference. This reveals that your table design may not be best, as you're trying to model orthogonal data in a parent-child type of relationship. Really, products have a ColorAndSize.
Furthermore, when a product comes in colors and sizes, what does the uniqueid in the Color table mean? Can such a product be ordered without a size, having only a color? This design is assigning a unique ID to something that (it seems to me) should never be allowed to be ordered--but you can't find this information out from the Color table, you have to compare the Color and Size tables first. It is a problem.
I would design this as: Table Product. Table Size listing all distinct sizes possible for any product ever. Table Color listing all distinct colors possible for any product ever. And table OrderableProduct that has columns ProductId, ColorID, SizeID, and UniqueID (your bar code value). Additionally, each product must have one color and one size or it doesn't exist.
Basically, Color and Size are like X and Y coordinates into a grid; you are filling in the boxes that are allowable combinations. Which one is the row and which the column is irrelevant. Certainly, one is not a child of the other.
If there are any reasonable rules, in general, about what colors or sizes can be applied to various sub-groups of products, there might be utility in a ProductType table and a ProductTypeOrderables table that, when creating a new product, could populate the OrderableProduct table with the standard set—it could still be customized but might be easier to modify than to create anew. Or, it could define the range of colors and sizes that are allowable. You might need separate ProductTypeAllowedColor and ProductTypeAllowedSize tables. For example, if you are selling T-shirts, you'd want to allow XXXS, XXS, XS, S, M, L, XL, XXL, XXXL, and XXXXL, even if most products never use all those sizes. But for soft drinks, the sizes might be 6-pack 8oz, 24-pack 8oz, 2 liter, and so on, even if each soft drink is not offered in that size (and soft drinks don't have colors).
In this new scheme, you only have one table to query to find the correct orderable product. With proper indexes, it should be blazing fast.
Your Question
You asked:
in PostgreSQL, so do you think if i use indexes on unique_id i will get a satisfactory performance?
Any column or set of columns that you use to repeatedly look up data must have an index! Any other pattern will result in a full table scan each time, which will be awful performance. I am sure that these indexes will make your queries lightning fast as it will take only one leaf-level read per table.
There's an easier way to generate unique IDs using three separate auto_increment columns. Just prepend a letter to the ID to uniquify it:
Colors:
C0000001
C0000002
C0000003
Sizes:
S0000001
S0000002
S0000003
...
Products:
P0000001
P0000002
P0000003
...
A few advantages:
You don't need to serialize creation of ids across tables to ensure uniqueness. This will give better performance.
You don't actually need to store the letter in the table. All IDs in the same table start with the same letter, so you only need to store the number. This means that you can use an ordinary auto_increment column to generate your IDs.
If you have an ID you only need to check the first character to see which table it can be found in. You don't even need to make a query to the database if you just want to know whether it's a product ID or a size ID.
A disadvantage:
It's no longer a number. But you can get around that by using 1,2,3 instead of C,S,P.
Your query will be pretty much efficient, as long as you have an index on unique_id, on every table and indices on the joining columns.
You could turn those UNION into UNION ALL but the won't be any differnce on performance, for this query.
This is a bit different. I don't understand the intended behaviour if stocks exists in more than one of the {product,color,zsize} tables. (UNION will remove duplicates, but for the row-as-a-whole, eg the {product_id,stock} tuples. That makes no sense to me. I just take the first. (Note the funky self-join!!)
SELECT p.title
, COALESCE (p2.stock, c.stock, s.stock) AS stock
FROM product p
LEFT JOIN product p2 on p2.id = p.id AND p2.unique_id = 10
LEFT JOIN color c on c.id_product = p.id AND c.unique_id = 10
LEFT JOIN zsize s on s.id_product = p.id AND s.unique_id = 10
WHERE COALESCE (p2.stock, c.stock, s.stock) IS NOT NULL
;