Adding values together based upon creation of a formatted string - sql

I'm just learning how to manipulate strings within SQL tables and am now trying to combine string manipulation with column value calculations. My problem states that I limit a serial number, denoted by "xx-yyyyyyy", to its first two values (without the hyphen) and then add cost values together (that relate to these serial values) after creation of these new serial numbers. However, when I add the cost values together, I am getting an incorrect result due to serial values not adding together (duplicate serial values within my output table). My question is, how do I go about entering my code so that I have no duplicate serial values in my output and all values (excluding NULLs) are added together?
Example table that I am working with is like so:
____Serial____|____Cost____
1| xx-yyyyyy | $aaa.bb
2| xx-yyyyyy | $aaa.bb
3| ... | ...
Here is my code that I have currently tried:
SELECT left(Serial, CHARINDEX('-', Serial)-1) AS NewSerial, sum(cost) AS TotalCost
FROM table
WHERE CHARINDEX('-', serial) > 0
GROUP BY Serial
ORDER BY TotalCost DESC
The results did add together cost values, but it did leave duplicate NewSerial values (which I assume is due to the GROUP BY clause).
Output (From my code):
_|___NewSerial____|____TotalCost____
1| ab | $abc.de
2| cd | $abc.de
3| ab | $abc.de
4| ef | $abc.de
5| cd | $abc.de
How can I go about fixing/solving this issue within this area so that the NewSerial values all add together rather than stay separate like in my output?

You need to repeat the expression in the GROUP BY:
SELECT left(Serial, CHARINDEX('-', Serial)-1) AS NewSerial, sum(cost) AS TotalCost
FROM table
WHERE CHARINDEX('-', serial) > 0
GROUP BY left(Serial, CHARINDEX('-', Serial)-1)
ORDER BY TotalCost DESC

Related

SQL query to calculate expenses based on several tables

I am working on a big (not real) task to manage the expenses of several countries. I have already calculated the capacities of every town in investments, now I need to calculate the budget to built these spaceships. The task is as follows:
We have the tables below (there are tables Town and Spaceship, but the task is clear without them here). We need to calculate how much money is needed to complete each type of ship available for production. So, we have different types of spaceships and each type needs different types of parts (see table Spaceship_required_part). In every town there are produced several types of parts (see table Spaceship_part_in_town). We need to calculate, what is the cost (see cost in Spaceship_part, stage in Spaceship_part_in_town, and amount in Spaceship_required_part) to build a unit of every available type of spaceship. By available we mean that the parts needed can be found in the given city. We calculate the budget for a given city (I can do it for the rest of them by myself).
create table Spaceship_part(
id int PRIMARY KEY,
name text,
cost int
);
create table Spaceship_part_in_town(
id int PRIMARY KEY,
spaceship_part_id int references Spaceship_part,
city_id int references Town,
stage float -- the percentage of completion of the part
);
create table Spaceship_required_part(
id int PRIMARY KEY,
spaceship_part int references Spaceship_part,
spaceship int references Spaceship,
amount int -- amount of a particular part needed for the given spaceship
);
I understand how would I solve this task using a programming language, but my SQL skills are not that good. I understand that first I need to check what spaceships can we build using the available parts in the town. This can be done using a counter of the needed parts (amount) and available parts in town (count(spaceship_part_id)). Then I need to calculate the sum needed to build every spaceship using the formula (100-stage)*cost/100.
However, I have no idea how to compose this in SQL code. I am writing in PostgreSQL.
The data model is like:
To build a spaceship with least build cost, we can:
Step 1. Calculate a part's build_cost = (100 - stage) * cost / 100; for each part, rank the build cost based on stage so we minimize total cost for a spaceship.
Step 2. Based on build_cost, we calcualte the total_cost of a parts by required quantities (in order to compare with spaceship_required_part.amount) and take notes from where the parts are coming from in part_sources, which is in CSV format (city_id, stage, build_cost),...
Step 3. Once we have available parts and total qty & cost calculate, we join it with spaceship_required_part to get result like this:
spaceship_id|spaceship_part_id|amount|total_cost|part_sources |
------------+-----------------+------+----------+---------------------+
1| 1| 2| 50.0|(4,80,20),(3,70,30) |
1| 2| 1| 120.0|(1,40,120) |
2| 2| 2| 260.0|(1,40,120),(2,30,140)|
2| 3| 1| 180.0|(2,40,180) |
3| 3| 2| 360.0|(2,40,180),(4,40,180)|
The above tells us that to build:
spaceship#1, we need part#1 x 2 sourced from city#4 and city#3; part#2 x 1 from city 1; total cost = 50 + 120 = 170, or
spceeship#2, we need part#2 x 2 sourced from city#1 and city#2; part#3 x 1 from city#2; total cost = 160 + 180 = 340, or
spaceship#3, we need part#3 x 2 from city#2 and city#4; total cost = 360.
After 1st iteration, we can update spaceship_part_in_town and remove the 1st spaceship from spaceship_required_part, then run the query again to get the 2nd spaceship to build and its part sources.
with cte_part_sources as (
select spt.spaceship_part_id,
spt.city_id,
sp.cost,
spt.stage,
(100.0-spt.stage)*sp.cost/100.0 as build_cost,
row_number() over (partition by spt.spaceship_part_id order by spt.stage desc) as cost_rank
from spaceship_part_in_town spt
join spaceship_part sp
on spt.spaceship_part_id = sp.id),
cte_parts as (
select spaceship_part_id,
city_id,
cost_rank,
cost,
stage,
build_cost,
cost_rank as total_qty,
sum(build_cost) over (partition by spaceship_part_id order by cost_rank) as total_cost,
string_agg('(' || city_id || ',' || stage || ',' || build_cost || ')',',') over (partition by spaceship_part_id order by cost_rank) as part_sources
from cte_part_sources)
select srp.spaceship_id,
srp.spaceship_part_id,
srp.amount,
p.total_cost,
p.part_sources
from spaceship_required_part srp
left
join cte_parts p
on srp.spaceship_part_id = p.spaceship_part_id
and srp.amount = p.total_qty;
EDIT:
added db fiddle

How can I extend an SQL table with new primary keys as well as add up values for exisiting keys?

I want to join or update the following two tables and also add up df for existing words. So if the word endeavor does not exist in the first table, it should be added with its df value or if the word hello exists in both tables df should be summed up.
FYI I'm using MariaDB and PySpark to do word counts on documents and calculate tf, df, and tfidf values.
Table name: df
+--------+----+
| word| df|
+--------+----+
|vicinity| 5|
| hallo| 2|
| admire| 3|
| settled| 1|
+--------+----+
Table name: word_list
| word| df|
+----------+---+
| hallo| 1|
| settled| 1|
| endeavor| 1|
+----------+---+
So in the end the updated/combined table should look like this:
| word| df|
+----------+---+
| vicinity| 5|
| hallo| 3|
| admire| 3|
| settled| 2|
| endeavor| 1|
+----------+---+
What I've tried to do so far is the following:
SELECT df.word, df.df + word_list.df FROM df FULL OUTER JOIN word_list ON df.word=word_list.word
SELECT df.word FROM df JOIN word_list ON df.word=word_list.word
SELECT df.word FROM df FULL OUTER JOIN word_list ON df.word=word_list.word
None of them worked, I either get a table with just null values, some null values, or some exception. I'm sure there must be an easy SQL statement to achieve this but I've been stuck with this for hours and also haven't found anything relatable on stack overflow.
You just need to UNION the two tables first, then aggregate on the word. Since the tables are identically structured it's very easy. Look at this fiddle. I have used maria 10.3 since you didn't specify, but these queries should be completely compliant with (just about) any DBMS.
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=c6d86af77f19fc1f337ad1140ef07cd2
select word, sum(df) as df
from (
select * from df
UNION ALL
select * from word_list
) z
group by word
order by sum(df) desc;
UNION is the vertical cousin of JOIN, that is, UNION joins to datasets vertically or row-wise, and JOIN adds them horizontally, that is by adding columns to the output. Both datasets need to have the same number of columns for the UNION to work, and you need to use UNION ALL here so that the union returns all rows, because the default behavior is to return unique rows. In this dataset, since settled has a value of 1 in both tables, it would only have one entry in the UNION if you don't use the ALL keyword, and so when you do the sum the value of df would be 1 instead of 2, as you are expecting.
The ORDER BY isn't necessary if you are just transferring to a new table. I just added it to get my results in the same order as your sample output.
Let me know if this worked for you.

Incremental integer ID in Impala

I am using Impala for querying parquet-tables and cannot find a solution to increment an integer-column ranging from 1..n. The column is supposed to be used as ID-reference. Currently I am aware of the uuid() function, which
Returns a universal unique identifier, a 128-bit value encoded as a string with groups of hexadecimal digits separated by dashes.
Anyhow, this is not suitable for me since I have to pass the ID to another system which requests an ID in style of 1..n. I also already know that Impala has no auto-increment-implementation.
The desired result should look like:
-- UUID() provided as example - I want to achieve the `my_id`-column.
| my_id | example_uuid | some_content |
|-------|--------------|--------------|
| 1 | 50d53ca4-b...| "a" |
| 2 | 6ba8dd54-1...| "b" |
| 3 | 515362df-f...| "c" |
| 4 | a52db5e9-e...| "d" |
|-------|--------------|--------------|
How can I achieve the desired result (integer-ID ranging from 1..n)?
Note: This question differs from this one which specifically handles Kudu-tables. However, answers should be applicable for this question as well.
Since other Q&A's like this one only came up with uuid()-alike answers, I put some thought in it and finally came up with this solution:
SELECT
row_number() OVER (PARTITION BY "dummy" ORDER BY "dummy") as my_id
, some_content
FROM some_table
row_number() generates a continuous integer-number over a provided partition. Unlike rank(), row_number() always provides an incremented number on its partition (even if duplicates occur)
PARTITION BY "dummy" partitions the entire table into one partition. This works since "dummy" is interpreted in the execution graph as temporary column yielding only the String-value "dummy". Thus, also something analog to "dummy" works.
ORDER BY is required in order to generate the increment. Since we don't care about the order in this example (otherwise just set your respective column), also use the "dummy"-workaround.
The command creates the desired incremental ID without any nested SQL-statements or other tricks.
| my_id | some_content |
|-------|--------------|
| 1 | "a" |
| 2 | "b" |
| 3 | "c" |
| 4 | "d" |
|-------|--------------|
I used Markus's answer for a large partitioned table and found that I was getting duplicate ids. I think the ids were only unique within their partition; possibly PARTITION BY "dummy" leads Impala to think that each partition can execute row_number() on its own. I was able to get it working by specifying an actual column to order by and no partition by:
SELECT
row_number() OVER (ORDER BY actual_column) as my_id
, some_content
FROM some_table
It doesn't seem to matter whether the values in the column are unique (mine weren't), but using the actual partition key might result in the same issue as the "dummy" column.
Understandably, it took a lot longer to run than the dummy version.

Removing field that I concatenated to break my group by clause and show unique values

This is going to be an easy one for you.
I have used Union to join two table and, therefore I am using a group by clause.
I have to values that are exactly the same in every way other than the fact they need to display as separate lines.
this is my column:
(select (concat(productiondetail.detailid, ProductionDetail.Description)) as [Description],
and that result gives me:
SM1250 | 11078Metabisulphite - 1250KG Bags | $ | 500.00
SM1250 | 11079Metabisulphite - 1250KG Bags | $ | 500.00
Now I want to remove the numbers in the Description.
Note: the Numbers are sequential and the Descriptions are of various lengths.

How to join records from multiple object tables to a master table with a single query?

So I have a data model which is set up with a table that contains NAME, ID, and CONDITION columns for a series of objects (each object has a unique id number). The rest of the attributes for these objects are contained in columns of several respective tables based on the object type (there are some different attributes associated with each type). All the type-specific tables have an ID column so that the objects can be matched to the master list.
I want to write an sql query that will return information about objects of several different types based on the CONDITION tied to their unique ID.
Here is a simplified example of what I am working with:
object_master_list
| ID | NAME | CONDITION |
-------------------------
|1234| obj1| true|
|0000| obj2| false|
|1236| obj3| true|
|0001| obj4| false|
|5832| obj5| true|
|6698| obj6| false|
|6699| obj7| false|
obj_type_one
| ID | NAME | HEIGHT |
-------------------------
|1234| obj1| o1height|
|0000| obj2| o2height|
|5832| obj5| o5height|
|6699| obj7| o7height|
obj_type_two
| ID | NAME | WEIGHT |
-------------------------
|1236| obj3| o3height|
|0001| obj4| o4height|
|6698| obj6| o6height|
As you can see, there is no correlation between NAME and type or ID and type.
I am currently working in iReport, and I have been using the query designer and editing it manually as necessary.
Right now an example query would look like:
SELECT
object_master_list."NAME" AS NAME,
obj_type_one."HEIGHT" AS HEIGHT,
obj_type_two."WEIGHT" AS WEIGHT
FROM
object_master_list INNER JOIN obj_type_one ON object_master_list."ID" =
obj_type_one."ID"
INNER JOIN obj_type_two ON obj_type_two."ID" = object_master_list."ID"
WHERE
object_master_list."CONDITION" = 'true'
My data is returning no results. From the research I have done on sql joins, I believe this is happening:
Where circle "A" represents my master list.
iReport stores and utilizes the values returned from a query row by row, with a field for each column. So ideally I should end up with this:
$F{NAME} which will receive the following values in succession ("obj1", "obj3", "obj5")
$F{HEIGHT} with value series (o1hieght, null, o5height)
$F{HEIGHT} with value series (null, o3weight, null)
The table representation I suppose would look like this:
| NAME | HEIGHT | WEIGHT |
------------------------------
| obj1| o1height| null|
| obj3| null| o3weight|
| obj5| o5height| null|
My question is how do I accomplish this?
I ran in to this on a smaller scale before, so I am aware that I could use subreports or create multiple data sets, but frankly I have a lot of object types and I would rather not if I could help it. I am also not allowed to add a TYPE column to the master list.
Thanks in advance for any replies.
You can use left join in the following way :
select o1.name, o2.height, o3.weight
from object_master_list o1 left join obj_type_one o2 on o1.id = o2.id
left join obj_type_two o3 on o1.id = o3.id
where o1.condition = 'true'
SQL Fiddle