Issue with SQL involving JOINS - sql

I have 2 tables with similar layout, involving INCOME and EXPENSES.
The id column is a customer ID.
I need a result of customer TOTAL AMOUNT, summing up income and expenses.
Table: Income
| id | amountIN|
+--------------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
Table: Expenses
| id | amountOUT|
+---------------+
| 1 | -x |
| 4 | -z |
My problem is that some customers only have expenses and others just income... so cannot know in advance id I need to do a LEFT or RIGHT JOIN.
In the example above an RIGHT JOIN could do the trick, but if the situation is inverted (more customers on the Expenses table) it doesn't work.
Expected Result
| id | TotalAmount|
+--------------+
| 1 | a - x |
| 2 | b |
| 3 | c |
| 4 | d - z |
Any help?

select id, SUM(Amount)
from
(
select id, amountin as Amount
from Income
union all
select id, amountout as Amount
from Expense
) a
group by id

I believe a full join will solve your problem.

I would approach this as a union. Do that in your subquery then sum on it.
For instance:
select id, sum(amt) from
(
select i.id, i.amountIN as amt from Income i
union all
select e.id, e.amountOUT as amt from Expenses e
)
group by id

You should really have another table like client :
Table: Client
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
So you could do something like that
SELECT Client.ID, COALESCE(Income.AmountIN, 0) - COALESCE(Expenses.AmountOUT, 0)
FROM Client c
LEFT JOIN Income i ON i.ID = c.ID
LEFT JOIN Expense e ON e.ID = c.ID
Will be less complicated and i'm sure it will come handy another time :)

Related

Postgresql left join

I have two tables cars and usage. I create a record in usage once a month for some of cars.
Now I want to get distinct list of cars with their latest usage that I saved.
first of all look at the tables please
cars:
| id | model | reseller_id |
|----|-------------|-------------|
| 1 | Samand Sall | 324228 |
| 2 | Saba 141 | 92933 |
usages:
| id | car_id | year | month | gas |
|----|--------|------|-------|-----|
| 1 | 2 | 2020 | 2 | 68 |
| 2 | 2 | 2020 | 3 | 94 |
| 3 | 2 | 2020 | 4 | 33 |
| 4 | 2 | 2020 | 5 | 12 |
The problem is here
I need only the latest usage of year and month
I tried a lot of ways but none of them is good enough. because sometimes this query gets me one ofnot latest records of usages.
SELECT * FROM cars AS c
LEFT JOIN
(select *
from usages
) u on (c.id = u.car_id)
order by u.gas desc
You can do this with a DISTINCT ON in the derived table:
SELECT *
FROM cars AS c
LEFT JOIN (
select distinct on (u.car_id) *
from usages u
order by u.car_id, u.year desc, u.month desc
) lu on c.id = lu.car_id
order by u.gas desc;
I think you need window function row_number. Here is the demo.
select
id,
model,
reseller_id
from
(
select
c.id,
model,
reseller_id,
row_number() over (partition by u.car_id order by u.id desc) as rn
from cars c
left join usages u
on c.id = u.car_id
) subq
where rn = 1

What's the best way to create empty/default rows for missing aggregations

I have a table I want to group over two levels. As the output, I need all the grouping value combinations, such that I end up with zeros where non existant combinations occur. For example, say I have this table:
+------+------+
| user | page |
+------+------+
| a | 1 |
| a | 1 |
| a | 2 |
| b | 2 |
| b | 3 |
+------+------+
I'm after output like this:
+------+------+--------+
| user | page | visits |
+------+------+--------+
| a | 1 | 2 |
| a | 2 | 1 |
| a | 3 | 0 |
| b | 1 | 0 |
| b | 2 | 1 |
| b | 3 | 1 |
+------+------+--------+
I can achieve this with the following query, but it seems rather heavy handed:
WITH
users AS (SELECT distinct(user) FROM sometable),
pages AS (SELECT distinct(page) FROM sometable),
users_pages_empty AS (SELECT * FROM users CROSS JOIN pages),
users_pages_full AS (SELECT user, page, count(*) as visits FROM sometable GROUP BY user, page)
SELECT e.user, e.page, coalesce(f.visits, 0) as visits
FROM users_pages_empty e
LEFT JOIN users_pages_full f ON e.user=f.user AND e.page=f.page
I happen to be using AWS Athena, but I think this is more a generic SQL question than an Athena question.
The performance of this query is fine, it's more the readability/complexity I'm not happy with.
Use a cross join to generate the rows and a left join to bring in the existing rows and aggregate:
select u.user, p.page, count(s.user)
from (select distinct user from sometable) u cross join
(select distinct page from sometable) p left join
sometable s
on s.user = u.user and s.page = p.page
group by u.user, p.page
order by u.user, p.page;

Getting a distinct value from one column if all rows matches a certain criteria

I'm trying to find a performant and easy-to-read query to get a distinct value from one column, if all rows in the table matches a certain criteria.
I have a table that tracks e-commerce orders and whether they're delivered on time, contents and schema as following:
> select * from orders;
+----+--------------------+-------------+
| id | delivered_on_time | customer_id |
+----+--------------------+-------------+
| 1 | 1 | 9 |
| 2 | 0 | 9 |
| 3 | 1 | 10 |
| 4 | 1 | 10 |
| 5 | 0 | 11 |
+----+--------------------+-------------+
I would like to get all distinct customer_id's which have had all their orders delivered on time. I.e. I would like an output like this:
+-------------+
| customer_id |
+-------------+
| 10 |
+-------------+
What's the best way to do this?
I've found a solution, but it's a bit hard to read and I doubt it's the most efficient way to do it (using double CTE's):
> with hits_all as (
select memberid,count(*) as count from orders group by memberid
),
hits_true as
(select memberid,count(*) as count from orders where hit = true group by memberid)
select
*
from
hits_true
inner join
hits_all on
hits_all.memberid = hits_true.memberid
and hits_all.count = hits_true.count;
+----------+-------+----------+-------+
| memberid | count | memberid | count |
+----------+-------+----------+-------+
| 10 | 2 | 10 | 2 |
+----------+-------+----------+-------+
You use group by and having as follows:
select customer_id
from orders
group by customer_id
having sum(delivered_on_time) = count(*)
This works because an ontime delivery is identified by delivered_on_time = 1. So you can just ensure that the sum of delivered_on_time is equal to the number of records for the customer.
You can use aggregation and having:
select customer_id
from orders
group by customer_id
having min(delivered_on_time) = max(delivered_on_time);

SQL - finding the value between rows

I have 2 tables:
Employee table with 2 columns Name and Sales
Rewards table with 2 columns Bonus and Range
Sample data:
Employee Rewards
| Name | Sales | | Bonus | Range |
+------+-------+ +-------+-------+
| John | 112 | | 2 | 200 |
| Mary | 201 | | 3 | 300 |
| Joe | 400 | | 5 | 500 |
| Jack | 300 |
Each employee deserves bonus from the Rewords table if his sales <= Rewards.Range.
I want to select Employee.Name and Rewards.Bonus.
In this case the result should be:
| Name | Bonus |
+------+-------+
| John | 2 |
| Mary | 3 |
| Joe | 5 |
| Jack | 3 |
Any idea what this SQL query will be?
Thanks,
zb
I suggest this approach. There might be a syntax error or two.
select name
, (select bonus from rewards
where range =
(select min(range)
from rewards
where range >= sales)
)
from employee
I just tested this one out and it retrieved the right results with your test tables:
SELECT a.name,
MIN(b.bonus) AS bonus
FROM db.employee a
INNER JOIN db.rewards b
ON a.sales <= b.range
GROUP BY a.name;
I'd use lag to get the bottom part part of each range and join it on the employee's table:
SELECT name, bonus
FROM employee e
JOIN (SELECT bonus,
range AS top_range,
COALESCE(LAG(range) OVER (ORDER BY bonus ASC), 0) AS bottom_range
FROM rewards) r ON e.sales BETWEEN r.bottom_range AND r.top_range
;with OrderedBonuses as (
select e.name, r.bonus, row_number() over (partition by e.name order by r.Range desc) as ord
from Employee e
JOIN Rewards r on e.Sales <= r.Range
)
select name, bonus
from OrderedBonuses
where ord = 1;

how to bake in a record count in a sql query

I have a query that looks like this:
select id, extension, count(distinct(id)) from publicids group by id,extension;
This is what the results looks like:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 1
18459154909 | 9891114 | 1
18459154919 | 43244 | 1
18459154919 | 8776232 | 1
18766145025 | 12311 | 1
18766145025 | 1122111 | 1
18766145201 | 12422 | 1
18766145201 | 14141 | 1
But what I really want is for the results to look like this:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 2
18459154909 | 9891114 | 2
18459154919 | 43244 | 2
18459154919 | 8776232 | 2
18766145025 | 12311 | 2
18766145025 | 1122111 | 2
18766145201 | 12422 | 2
18766145201 | 14141 | 2
I'm trying to get the count field to show the total number of records that have the same id.
Any suggestions would be appreciated
I think you want to count distincts extentions, not ids.
Run this query:
select id
, extension
(select count(*) from publicids p1 where p.id = p1.id ) distinct_id_count
from publicids p
group by id,extension;
This is more or less the same as Pastor's answer. Depending on what the optimizer does it might be faster with higher record count source tables.
select p.id, p.extension, p2.id_count
from publicids p
inner join (
select id, count(*) as id_count
from publicids group by id
) as p2 on p.id = p2.id