SQL query to calculate expenses based on several tables

SQL query to calculate expenses based on several tables - sql

I am working on a big (not real) task to manage the expenses of several countries. I have already calculated the capacities of every town in investments, now I need to calculate the budget to built these spaceships. The task is as follows:
We have the tables below (there are tables Town and Spaceship, but the task is clear without them here). We need to calculate how much money is needed to complete each type of ship available for production. So, we have different types of spaceships and each type needs different types of parts (see table Spaceship_required_part). In every town there are produced several types of parts (see table Spaceship_part_in_town). We need to calculate, what is the cost (see cost in Spaceship_part, stage in Spaceship_part_in_town, and amount in Spaceship_required_part) to build a unit of every available type of spaceship. By available we mean that the parts needed can be found in the given city. We calculate the budget for a given city (I can do it for the rest of them by myself).
create table Spaceship_part(
id int PRIMARY KEY,
name text,
cost int
);
create table Spaceship_part_in_town(
id int PRIMARY KEY,
spaceship_part_id int references Spaceship_part,
city_id int references Town,
stage float -- the percentage of completion of the part
);
create table Spaceship_required_part(
id int PRIMARY KEY,
spaceship_part int references Spaceship_part,
spaceship int references Spaceship,
amount int -- amount of a particular part needed for the given spaceship
);
I understand how would I solve this task using a programming language, but my SQL skills are not that good. I understand that first I need to check what spaceships can we build using the available parts in the town. This can be done using a counter of the needed parts (amount) and available parts in town (count(spaceship_part_id)). Then I need to calculate the sum needed to build every spaceship using the formula (100-stage)*cost/100.
However, I have no idea how to compose this in SQL code. I am writing in PostgreSQL.

The data model is like:
To build a spaceship with least build cost, we can:
Step 1. Calculate a part's build_cost = (100 - stage) * cost / 100; for each part, rank the build cost based on stage so we minimize total cost for a spaceship.
Step 2. Based on build_cost, we calcualte the total_cost of a parts by required quantities (in order to compare with spaceship_required_part.amount) and take notes from where the parts are coming from in part_sources, which is in CSV format (city_id, stage, build_cost),...
Step 3. Once we have available parts and total qty & cost calculate, we join it with spaceship_required_part to get result like this:
spaceship_id|spaceship_part_id|amount|total_cost|part_sources |
------------+-----------------+------+----------+---------------------+
1| 1| 2| 50.0|(4,80,20),(3,70,30) |
1| 2| 1| 120.0|(1,40,120) |
2| 2| 2| 260.0|(1,40,120),(2,30,140)|
2| 3| 1| 180.0|(2,40,180) |
3| 3| 2| 360.0|(2,40,180),(4,40,180)|
The above tells us that to build:
spaceship#1, we need part#1 x 2 sourced from city#4 and city#3; part#2 x 1 from city 1; total cost = 50 + 120 = 170, or
spceeship#2, we need part#2 x 2 sourced from city#1 and city#2; part#3 x 1 from city#2; total cost = 160 + 180 = 340, or
spaceship#3, we need part#3 x 2 from city#2 and city#4; total cost = 360.
After 1st iteration, we can update spaceship_part_in_town and remove the 1st spaceship from spaceship_required_part, then run the query again to get the 2nd spaceship to build and its part sources.
with cte_part_sources as (
select spt.spaceship_part_id,
spt.city_id,
sp.cost,
spt.stage,
(100.0-spt.stage)*sp.cost/100.0 as build_cost,
row_number() over (partition by spt.spaceship_part_id order by spt.stage desc) as cost_rank
from spaceship_part_in_town spt
join spaceship_part sp
on spt.spaceship_part_id = sp.id),
cte_parts as (
select spaceship_part_id,
city_id,
cost_rank,
cost,
stage,
build_cost,
cost_rank as total_qty,
sum(build_cost) over (partition by spaceship_part_id order by cost_rank) as total_cost,
string_agg('(' || city_id || ',' || stage || ',' || build_cost || ')',',') over (partition by spaceship_part_id order by cost_rank) as part_sources
from cte_part_sources)
select srp.spaceship_id,
srp.spaceship_part_id,
srp.amount,
p.total_cost,
p.part_sources
from spaceship_required_part srp
left
join cte_parts p
on srp.spaceship_part_id = p.spaceship_part_id
and srp.amount = p.total_qty;
EDIT:
added db fiddle

Related

query to get groups of points that are, at least at certain distance between groups

I have a problem that I know how to solve (more or less) using a regular programming language in a non-optimal but good enough way.
I want to get a list of groups of points that are within certain range inside each group, but that each group does not overlap with the rest.
For example, group of points A are at a distance of 1 or less, same for points in group B but all points in A are at least at 1.1 distance of all points in group B.
The way I would do this on a programming language (in a non optimal way as I said) will be to pick any point, find all points that are in a range of 1 or less (call it group A), then pick one point that is not in that group and find all points that are not on the group A and that are at a distance of 1 or less call it group B. Loop again but now taking into account groups A and B.
Also it's worth mention that some points will have a flag to mark them as processed (previously grouped and that group saved), they should be ignored. Hopefully this that may speed up the query when several groups already exist.
I'm not sure if this is a task that can be accomplished within SQL in a single query or if I would be better to extract the data from the database and make another query with the new parameters.
My points are multi-dimensional (vectors of 128). But for simplicity this are some example ones:
id |x |y |z |
---|---------------------|----------------------|----------------------|
1| -0.03909766674041748| 0.03122374415397644| 0.02698654681444168|
2| -0.09763473272323608| 0.04069424420595169| 0.11512677371501923|
3|-0.040237002074718475| 0.0678766518831253| 0.03919816389679909|
4| -0.10432711988687515| 0.07187126576900482| 0.10971983522176743|
5| -0.1513511687517166| 0.07631294429302216| 0.05949840694665909|
6| -0.1276567280292511| 0.11543292552232742| 0.06757785379886627|
A query that I often use to find closest (very simplified points is this:
SELECT id, sqrt(
power(X - -0.10434776544570923, 2) +
power(Y - 0.08688679337501526, 2)
) AS distance
FROM points
HAVING distance < 0.5
ORDER BY distance ASC
I'm not sure how the output data would look on a tabular form, but Ideally I want something like this:
ref_point | point_ids
----------|-------------
2 | 12, 15, 16, 255, 85
6 | 8, 12, 55, 44
Where ref_point is a point id that is at <1 of distance of all the points on it's group

Adding values together based upon creation of a formatted string

I'm just learning how to manipulate strings within SQL tables and am now trying to combine string manipulation with column value calculations. My problem states that I limit a serial number, denoted by "xx-yyyyyyy", to its first two values (without the hyphen) and then add cost values together (that relate to these serial values) after creation of these new serial numbers. However, when I add the cost values together, I am getting an incorrect result due to serial values not adding together (duplicate serial values within my output table). My question is, how do I go about entering my code so that I have no duplicate serial values in my output and all values (excluding NULLs) are added together?
Example table that I am working with is like so:
____Serial____|____Cost____
1| xx-yyyyyy | $aaa.bb
2| xx-yyyyyy | $aaa.bb
3| ... | ...
Here is my code that I have currently tried:
SELECT left(Serial, CHARINDEX('-', Serial)-1) AS NewSerial, sum(cost) AS TotalCost
FROM table
WHERE CHARINDEX('-', serial) > 0
GROUP BY Serial
ORDER BY TotalCost DESC
The results did add together cost values, but it did leave duplicate NewSerial values (which I assume is due to the GROUP BY clause).
Output (From my code):
_|___NewSerial____|____TotalCost____
1| ab | $abc.de
2| cd | $abc.de
3| ab | $abc.de
4| ef | $abc.de
5| cd | $abc.de
How can I go about fixing/solving this issue within this area so that the NewSerial values all add together rather than stay separate like in my output?

You need to repeat the expression in the GROUP BY:
SELECT left(Serial, CHARINDEX('-', Serial)-1) AS NewSerial, sum(cost) AS TotalCost
FROM table
WHERE CHARINDEX('-', serial) > 0
GROUP BY left(Serial, CHARINDEX('-', Serial)-1)
ORDER BY TotalCost DESC

Aggregated sum in DAX

I'm leasing a car, which I use my self, but also rent out for other people to use. I have 2000km I can drive each month, so I'm trying to do an area pivot graph which will track how much I use it vs how much it's rented out.
I have a table column consisting of the rented mileage and my own mileage
___________________________________
|Date |Rented mileage|Own mileage|
|23/03-18| 315| 117|
|07-04-18| 255| 888|
|07/04-18| 349| 0|
|13/04-18| 114| 0|
|21/04-18| 246| 113|
|28/04-18| 1253| 0|
|01/05-18| 1253| 0|
So far I have two measures:
RentedMileage:=SUM(Table1[Rented Mileage])
OwnMileage:=SUM(Table1[Own Mileage])
Which, when I plot to the pivot chart looks like this:
I would like the mileage to be aggregated and have a Line which shown when I'm exceeding my 2000 km limit, so it would look something like this:
But I can't for the life of me figure out how to do an aggregated value of my table?

The issue was solved by adding following line of code to the measure:
Cumulative Quantity :=
CALCULATE (
SUM ( Transactions[Quantity] ),
FILTER (
ALL ( 'Date'[Date] ),
'Date'[Date] <= MAX ( 'Date'[Date] )
)
)

PostgreSQL, finding and fixing overlapping time periods

I have time periods spent in different units per user in a table. The time periods overlap and I would like to fix that. I have:
user|unit|start_time|end_time
1| 1|2015-01-01|2015-01-31
1| 2|2015-01-07|2015-01-14
2| 1|2015-01-09|2015-01-13
2| 2|2015-01-10|2015-01-15
ie. user 1 started at unit 1 on 2015-01-01, transfered to unit 2 on 2015-01-07, returned to unit 1 on 2015-01-14 and left unit 1 on the 2015-01-31. The user can't be in two places at once so the table should look more like this:
user|unit|start_time|end_time
1| 1|2015-01-01|2015-01-07 --fixed end_time
1| 2|2015-01-07|2015-01-14
1| 1|2015-01-14|2015-01-31 --newly created line
2| 1|2015-01-09|2015-01-10 --fixed end_time
2| 2|2015-01-10|2015-01-15
Here is some SQL to create the test table with some entries.
CREATE TABLE users_n_units
(
users character varying (100),
units character varying (100),
start_time date,
end_time date
);
INSERT INTO users_n_units (users,units,start_time,end_time)
VALUES ('1','1','2015-01-01','2015-01-31'),
('1','2','2015-01-07','2015-01-14'),
('2','1','2015-01-09','2015-01-13'),
('2','2','2015-01-10','2015-01-15');

You don’t really give enough information to fully answer this, and as others have pointed out you may end up with special cases so you should analyze what your data looks like carefully before running updates.
But in your test environment you can try something like this. The trick is to join your table to itself with clauses that restricts you to the data that matches your business logic properly, and then update it.
This statement works on your tiny sample set and just runs through and mechanically sets end times to the following time period’s start times. I have used something very similar to this on similar problems before so I know the mechanism should work for you.
CAUTION: not tested on anything other than this small set. Don’t run on production data!
UPDATE a SET a.end_time = b.start_time
FROM users_n_units a
INNER JOIN users_n_units b ON a.users = b.users AND a.units < b.units

SQL Server Primary Key for a range lookup

I have a static dataset that correlates a range of numbers to some metadata, e.g.
+--------+--------+-------+--------+----------------+
| Min | Max |Country|CardType| Issuing Bank |
+--------+--------+-------+--------+----------------+
| 400011 | 400051 | USA |VISA | Bank of America|
+--------+--------+-------+--------+----------------+
| 400052 | 400062 | UK |MAESTRO | HSBC |
+--------+--------+-------+--------+----------------+
I wish to lookup a the data for some arbitrary single value
SELECT *
FROM SomeTable
WHERE Min <= 400030
AND Max >= 400030
I have about 200k of these range mappings, and am wondering the best table structure for SQL Server?
A composite key doesn't seem correct due to the fact that most of the time, the value being looked up will be in between the two range values stored on disk. Similarly, only indexing the first column doesn't seem to be selective enough.
I know that 200k rows is fairly insignificant, and I can get by with doing not much, but lets assume that the numbers of rows could be orders of magnitude greater.

If you usually search on both min and max then a compound key on (min,max) is appropriate. The engine will find all rows where min is less than X, then search within those result to find the rows where max is greater then Y.
The index would also be useful if you do searches on min only, but would not be applicable if you do searches only on max.

You can index the first number and then do the lookup like this:
select t.*,
(select top 1 s.country
from static s
where t.num >= s.firstnum
order by s.firstnum
) country
from sometable t;
Or use outer apply:
select t.*, s.country
from sometable t outer apply
(select top 1 s.country
from static s
where t.num >= s.firstnum
order by s.firstnum
) s
This should take advantage of an index on static(firstnum) or static(firstnum, country). This does not check against the second number. If that is important, use outer apply and do the check outside the subquery.

I would specify the primary key on (Min,Max). Queries are as simple as:
SELECT *
FROM SomeTable
WHERE #Value BETWEEN Min AND Max
I'd also define a constraint to enforce that Min <= Max. Then I would create a trigger to enforce uniqueness in ranges and prevent the database from storing an overlapping range.

I belive is easy/faster if you create a trigger for INSERT and then fill the related calculated columns country, issuing bank, card-number length
At the end you do the calculation only once, instead 200k every time you will do a query. Of course is there a space cost. But query will be much easier to mantain.
I remember once I have to calculate some sin and cos to calculate distance so I just create the calculated columns once.
After your update I think is even easier
+--------+--------+-------+--------+----------------+----------+
| Min | Max |Country|CardType| Issuing Bank | TypeID |
+--------+--------+-------+--------+----------------+----------+
| 400011 | 400051 | USA |VISA | Bank of America| 1 |
+--------+--------+-------+--------+----------------+----------+
| 400052 | 400062 | UK |MAESTRO | HSBC | 2 |
+--------+--------+-------+--------+----------------+----------+
Then you Card will also create a column TypeID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas