Custom table join - sql

Suppose I have 4 tables in Postgresql DB:
users {
id: int
}
cars {
id: int
}
usage_items {
id: int,
user_id: int,
car_id: int,
start: date,
end: date
}
prices {
id: int,
car_id: int,
price: int
}
When a user is renting a car I am creating a usage_item record to track the rental time. At the end of the month, I am sending him an invoice with calculated costs. The SQL is pretty simple here:
SELECT usage_items.start, usage_items.end, prices.price
FROM usage_items
JOIN prices ON prices.car_id = usage_items.car_id
(I omitted here WHERE clause with dates comparison, the rest of calculations I do in my Ruby code)
The problem I struggle with now is that some of my users have custom contracts with me ensuring lower prices for them. I am looking for a way to express this logic in my DB.
I came up with an idea to add user_id column to the prices table but this way I would need to create prices for every single user. So I decided to implement the following logic: if car_id in prices row is null it means that it is the default price for all the users. Otherwise, it is specific to a user. But I have no idea how to write SQL for this case, because:
SELECT usage_items.start, usage_items.end, prices.price
FROM usage_items
JOIN prices ON prices.car_id = usage_items.car_id
WHERE prices.user_id IS NULL OR prices.user_id = usage_items.user_id
returns rows for both prices. And I need only the one with an associated group or if it does not exist the one with null group_id.
Can you help me fix this SQL? Or maybe my design is bad and I should change it somehow?

Given your existing schema, this is one way to accomplish what you want:
Setup:
CREATE TABLE usage_items (user_id INTEGER, car_id INTEGER);
CREATE TABLE prices (user_id INTEGER, car_id INTEGER, price INTEGER);
INSERT INTO usage_items VALUES (1, 10), (2, 11), (3, 12);
INSERT INTO prices VALUES
(1, 10, 101),
(2, 11, 102),
(4, 12, 104),
(NULL, 10, 201),
(NULL, 11, 202),
(NULL, 12, 304);
Query (I'm not using start/end but it's the same thing):
SELECT DISTINCT ON (u.user_id, u.car_id) u.user_id, u.car_id, p.price
FROM usage_items u
LEFT JOIN prices p
ON u.car_id = p.car_id
AND (u.user_id = p.user_id OR p.user_id IS NULL)
ORDER BY u.user_id, u.car_id, CASE WHEN p.user_id IS NOT NULL THEN 1 ELSE 2 END
Result:
| user_id | car_id | price |
| ------- | ------ | ----- |
| 1 | 10 | 101 |
| 2 | 11 | 102 |
| 3 | 12 | 304 |
As you can see, the records in usage_items with a corresponding record for their car AND user_id in prices get their custom price rather than the NULL version; user 3 who does not have a custom price gets the NULL version (and not the custom price for a different customer).
Test here https://www.db-fiddle.com/f/wPVWEY3r22n22iKpDMrcMC/0

Related

Get Ids from constant list for which there are no rows in corresponding table

Let say I have a table Vehicles(Id, Name) with below values:
1 Car
2 Bike
3 Bus
and a constant list of Ids:
1, 2, 3, 4, 5
I want to write a query returning Ids from above list for which there are no rows in Vehicles table. In the above example it should return:
4, 5
But when I add new row to Vehicles table:
4 Plane
It should return only:
5
And similarly, when from the first version of Vehicle table I remove the third row (3, Bus) my query should return:
3, 4, 5
I tried with exist operator but it doesn't provide me correct results:
select top v.Id from Vehicle v where Not Exists ( select v2.Id from Vehicle v2 where v.id = v2.id and v2.id in ( 1, 2, 3, 4, 5 ))
You need to treat your "list" as a dataset, and then use the EXISTS:
SELECT V.I
FROM (VALUES(1),(2),(3),(4),(5))V(I) --Presumably this would be a table (type parameter),
--or a delimited string split into rows
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable YT
WHERE YT.YourColumn = V.I);
Please try the following solution.
It is using EXCEPT set operator.
Set Operators - EXCEPT and INTERSECT (Transact-SQL)
SQL
-- DDL and sample data population, start
DECLARE #Vehicles TABLE (ID INT PRIMARY KEY, vehicleType VARCHAR(30));
INSERT INTO #Vehicles (ID, vehicleType) VALUES
(1, 'Car'),
(2, 'Bike'),
(3, 'Bus');
-- DDL and sample data population, end
DECLARE #vehicleList VARCHAR(20) = '1, 2, 3, 4, 5'
, #separator CHAR(1) = ',';
SELECT TRIM(value) AS missingID
FROM STRING_SPLIT(#vehicleList, #separator)
EXCEPT
SELECT ID FROM #Vehicles;
Output
+-----------+
| missingID |
+-----------+
| 4 |
| 5 |
+-----------+
In SQL we store our values in tables. We therefore store your list in a table.
It is then simple to work with it and we can easily find the information wanted.
I fully agree that it is possible to use other functions to solve the problem. It is more intelligent to implement database design to use basic SQL. It will run faster, be easier to maintain and will scale for a table of a million rows without any problems. When we add the 4th mode of transport we don't have to modify anything else.
CREATE TABLE vehicules(
id int, name varchar(25));
INSERT INTO vehicules VALUES
(1 ,'Car'),
(2 ,'Bike'),
(3 ,'Bus');
CREATE TABLE ids (iid int)
INSERT INTO ids VALUES
(1),(2),(3),(4),(5);
CREATE VIEW unknownIds AS
SELECT iid unknown_id FROM ids
LEFT JOIN vehicules
ON iid = id
WHERE id IS NULL;
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 4 |
| 5 |
INSERT INTO vehicules VALUES (4,'Plane')
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 5 |
db<>fiddle here

Choosing between using count with filter and altering the on condition

There are 2 tables, video and category.
create table category (
id integer primary key,
name text
);
create table video (
id integer primary key,
category_id integer references category (id),
quality text
);
insert into category (id, name) values (1, 'Entertainment');
insert into category (id, name) values (2, 'Drawing');
insert into video (id, category_id, quality) values (1, 1, 'sd');
insert into video (id, category_id, quality) values (2, 1, 'hd');
insert into video (id, category_id, quality) values (3, 1, 'hd');
I can get the list of all categories with the number of all videos.
select category.id, category.name, count(video)
from category left outer join video
on (category.id = video.category_id)
group by category.id;
result
id | name | count
----+---------------+-------
2 | Drawing | 0
1 | Entertainment | 3
(2 rows)
To get all categories with the number of HD videos, both of these queries can be used.
count with filter
select
category.id,
category.name,
count(video) filter (where video.quality='hd')
from category left outer join video
on (category.id = video.category_id)
group by category.id;
result
id | name | count
----+---------------+-------
2 | Drawing | 0
1 | Entertainment | 2
(2 rows)
on
select
category.id,
category.name,
count(video)
from category left outer join video
on (category.id = video.category_id and video.quality='hd')
group by category.id;
result
id | name | count
----+---------------+-------
2 | Drawing | 0
1 | Entertainment | 2
(2 rows)
The results are equal. What are the pros and cons of using the first and the second way? Which one is preferred?
The second query is somehow more efficient, because the on predicate of the join reduces the number of rows earlier, while the first query keeps them all, and then relies on the filter of the aggregate function. I would recommend the second query.
The first query would be useful if you were, for example, to perform several conditional counts, like:
select
category.id,
category.name,
count(*) filter (where video.quality='hd') no_hd_videos,
count(*) filter (where video.quality='sd') no_sd_videos
from category
left outer join video on category.id = video.category_id
group by category.id;

sql select sum conditions

I'm studying sql (by myself) and I would like to know how I would do for these examples:
1- i'd create this 3 tables bellow:
CREATE TABLE Business (
Id INT,
Category INT,
Business_Name VARCHAR(30),
City_Id INT,
Billing INT
);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(1, 1, 'Bread', 1, 50);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(2, 2, 'Oreo', 2, 10);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(3, 2, 'Pizza', 3, 15);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(4, 2, 'Beer', 4, 25);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(5, 1, 'Steak', 1, 80);
CREATE TABLE City (
Id INT,
City_Name VARCHAR(30)
);
INSERT INTO City (Id, City_Name) VALUES(1, 'Paris');
INSERT INTO City (Id, City_Name) VALUES(2, 'New York');
INSERT INTO City (Id, City_Name) VALUES(3, 'Tokio');
INSERT INTO City (Id, City_Name) VALUES(4, 'Vancouver');
INSERT INTO City (Id, City_Name) VALUES(5, 'Cairo');
CREATE TABLE Category (
Id INT,
Category_Name VARCHAR(30)
);
INSERT INTO Category (Id, Category_Name) VALUES(1, 'Bar');
INSERT INTO Category (Id, Category_Name) VALUES(2, 'Pub');
INSERT INTO Category (Id, Category_Name) VALUES(3, 'Pizza');
2- I want to make these SQL queries:
a) Total Value of Billing (Billing) all stores, like this table:
-----------------------
|Business_Name | Total |
|--------------+-------|
|Total | 180 |
------------------------
b) All Total Billing by Category_Name like this table:
-------------------
|Category | Total |
|---------+-------|
|Bar | 130 |
|---------+-------|
|Pub | 50 |
|---------+-------|
|Pizza | 5 |
----------+--------
c)List the Business_Name with min billing, showing the: Category_Name, Business_Name, and Billing like this table:
----------------------------------------
|Category_Name | Business_Name | Total |
|--------------+---------------+-------|
|Pub | Beer | 5 |
|--------------+---------------+--------
d) All Total of Billing by City, showing the: Category_Name, Business_Name, City_Name and Billing like this table
--------------------------
|City | Total |
|----------------+-------|
|Cairo | 0 |
|----------------+-------|
|New York | 10 |
|----------------+-------|
|Paris | 130 |
|----------------+-------|
|Tokio | 15 |
-----------------+--------
|Vancouver | 25 |
-----------------+--------
Any body with a little more knowledge that could be help me, please? =)
First thing is first, all of these are basic queries and i have to point out that a simple google search for tutorials(ex1, ex2, ex3) would have answered most of these. as we are here to provide help and guidance i hope you take it to heart and read the tutorials before going over the answers.
with that said to be able to help you out i will walk through each query and provide an overview of what is happening.
a) you need an aggregate operation here to sum up the values. You would use the sum key word. normally you need a group by, but in this case since we only have a hard coded column with the word "Total" in it, it is not required. we also give each of the columns an alias as per your table. this is after the column name.
select 'Total' as business_Name,
sum(billing) Total
from business
b) This one is almost an exact copy of a, but requires a grouping. in this case you have to use the group by key word for all columns that are not in aggregates. in this case it is only the category name. it is good practice to not use the ordinal position in group by and order by statements, you should always spell out the columns you are using when able.
select c.category_name,
sum(billing) total
from business b
inner join category c
on b.category = c.id
group by c.category_name
c) We continue to build onto the query and add another column in the select statement and then add a column to the group by to allow grouping.
select c.category_name,
b.business_name,
sum(billing) total
from business b
inner join category c
on b.category = c.id
group by c.category_name, b.business_name
d) For this query its very similar to b, but instead of category_name we do a join on city with city id.
select c.city_name
,sum(billing) as total
from business b
inner join city c on c.id = b.city_id
group by c.city_name
with all of this said, several of your examples do not match your expected output. but these queries do match the expected output with the data you provided.
I really do recommend going through some tutorials to grasp the basics of sql better.
Here is answer to one of the queries. But I'd recommend you to read online basic sql tutorials and you will be able to write them yourself easily.
b)
select c.category_name
,sum(billing)
from business b
join category c
on b.category = c.id
group by 1

Using Datediff() on Joined Tables with CASES

I'm trying to determine the best way to calculate a datediff (in minutes). I'm trying to determine the time lapsed between two stages in a registration process (stage 8 and stage 10), but I only need this for anyone who's ever been in stage 10. There are two tables: Registration and Registrationstagehistory which are linked by the registration id number (rid)
I get the following error:
Incorrect syntax near the keyword 'CASE'.
from below.
Select datediff(Minute,startdate,enddate)
from Registration r with(nolock)
inner join registrationstagehistory rh with(nolock) on r.rid=rh.rhrid
It may be something like this:
CASE WHEN R.RsID=10 THEN
CASE rh.rhrsid
WHEN 8 then 'startdate'
WHEN 10 THEN 'enddate'
ELSE NULL
END
END
You're missing an END to one of your CASE statements (the inner one it appears but i can't be sure from the context I have):
CASE WHEN R.RsID=10 THEN
CASE rh.rhrsid
WHEN 8 then 'startdate'
WHEN 10 THEN 'enddate'
ELSE NULL
END
END
Check it here: http://rextester.com/BHBZ30100
I've used this schema for this answer:
create table #reg (id int, other text);
create table #his (reg_id int, rs_id int, rdate datetime);
insert into #reg values (1, 'registration 1'), (2, 'registration 2');
insert into #his values (1, 1, '2016-11-01')
,(1, 5, '2016-11-05')
,(1, 8, '2016-11-08')
,(1, 9, '2016-11-09')
,(1,10, '2016-11-10')
,(2, 1, '2016-11-01')
,(2, 5, '2016-11-05');
If you only want the datediff in RegistrationHistory table of Registration records that has reached Stage 10, you can directly filter the records where Stage=10. Then use another query to search the date of Stage 8, and find the difference.
select
#reg.*, #his.*,
datediff( minute
,isnull((select h2.rdate
from #his h2
where h2.rs_id = 8 and h2.reg_id = #his.reg_id), #his.rdate)
,#his.rdate) as minutes
from #his
inner join #reg on #his.reg_id = #reg.id
where
#his.rs_id = 10
And this is the result: (I've added all fields, due I don't know exactly what you need),
+----+----+----------------+--------+-------+---------------------+---------+
| | id | other | reg_id | rs_id | rdate | minutes |
+----+----+----------------+--------+-------+---------------------+---------+
| 1 | 1 | registration 1 | 1 | 10 | 10.11.2016 00:00:00 | 2880 |
+----+----+----------------+--------+-------+---------------------+---------+

Transposing an sql result so that one column goes onto multiple columns

I'm trying to get data out of a table for a survey in a particular format. However all my attempts seems to hand the DB because of too many joins/too heavy on the DB.
My data looks like this:
id, user, question_id, answer_id,
1, 1, 1, 1
3, 1, 3, 15
4, 2, 1, 2
5, 2, 2, 12
6, 2, 3, 20
There are roughly 250,000 rows and each user has about 30 rows. I want the result to look like:
user0, q1, q2, q3
1, 1, NULL, 15
2, 2, 12, 20
So that each user has one row in the result, each with a separate column for each answer.
I'm using Postgres but answers in any SQL language would be appreciated as I could translate to Postgres.
EDIT: I also need to be able to deal with users not answering questions, i.e. in the example above q2 for user 1.
Consider the following demo:
CREATE TEMP TABLE qa (id int, usr int, question_id int, answer_id int);
INSERT INTO qa VALUES
(1,1,1,1)
,(2,1,2,9)
,(3,1,3,15)
,(4,2,1,2)
,(5,2,2,12)
,(6,2,3,20);
SELECT *
FROM crosstab('
SELECT usr::text
,question_id
,answer_id
FROM qa
ORDER BY 1,2')
AS ct (
usr text
,q1 int
,q2 int
,q3 int);
Result:
usr | q1 | q2 | q3
-----+----+----+----
1 | 1 | 9 | 15
2 | 2 | 12 | 20
(2 rows)
user is a reserved word. Don't use it as column name! I renamed it to usr.
You need to install the additional module tablefunc which provides the function crosstab(). Note that this operation is strictly per database.
In PostgreSQL 9.1 you can simply:
CREATE EXTENSION tablefunc;
For older version you would execute a shell-script supplied in your contrib directory. In Debian, for PostgreSQL 8.4, that would be:
psql mydb -f /usr/share/postgresql/8.4/contrib/tablefunc.sql
Erwins answer is good, until missing answer for a user shows up. I'm going to make an assumption on you....you have a users table that has one row per user and you have a questions table that has one row per questions.
select usr, question_id
from users u inner join questions q on 1=1
order by 1,
This statement will create a row for every user/question, and be in the same order. Turn it into a subquery and left join it to your data...
select usr,question_id,qa.answer_id
from
(select usr, question_id
from users u inner join questions q on 1=1
)a
left join qa on qa.usr = a.usr and qa.question_id = a.usr
order by 1,2
Plug that into Erwins crosstab statement and give him credit for the answer :P
I implemented a truly dynamic function to handle this problem without having to hard code any specific number of questions or use external modules/extensions. It also much simpler to use than crosstab().
You can find it here: https://github.com/jumpstarter-io/colpivot
Example that solves this particular problem:
begin;
create temp table qa (id int, usr int, question_id int, answer_id int);
insert into qa values
(1,1,1,1)
,(2,1,2,9)
,(3,1,3,15)
,(4,2,1,2)
,(5,2,2,12)
,(6,2,3,20);
select colpivot('_output', $$
select usr, ('q' || question_id::text) question_id, answer_id from qa
$$, array['usr'], array['question_id'], '#.answer_id', null);
select * from _output;
rollback;
Result:
usr | 'q1' | 'q2' | 'q3'
-----+------+------+------
1 | 1 | 9 | 15
2 | 2 | 12 | 20
(2 rows)