Take sum of a concatenated column SQL - sql

I want to use post and pre revenue of an interaction to calculate net revenue. Sometimes there are multiple customers in an interaction. The data is like:
InteractionID | Customer ID | Pre | Post
--------------+-------------+--------+--------
1 | ab12 | 10 | 30
2 | cd12 | 40 | 15
3 | de12;gh12 | 15;30 | 20;10
Expected output is to take sum in pre and post call to calculate net
InteractionID | Customer ID | Pre | Post | Net
--------------+---------------+--------+-------+------
1 | ab12 | 10 | 30 | 20
2 | cd12 | 40 | 15 | -25
3 | de12;gh12 | 45 | 30 | -15
How do I get the net revenue column?

The proper solution is to normalize your relational design by adding a separate table for customers and their respective pre and post.
While stuck with the current design, this would do it:
SELECT *, post - pre AS net
FROM (
SELECT interaction_id, customer_id
,(SELECT sum(x::numeric) FROM string_to_table(pre, ';') x) AS pre
,(SELECT sum(x::numeric) FROM string_to_table(post, ';') x) AS post
FROM tbl
) sub;
db<>fiddle here
string_to_table() requires at least Postgres 14.
You did not declare your Postgres version, so I assume the current version Postgres 14.
For older versions replace with regexp_split_to_table() or unnest(string_to array)).

Related

SQL Query to return tier based values

I have a database with a table for tier based pricing depending on the quantity bought example: (1-10) is $5, (11-15) is $10, 16 is $15, and 17-20 is $20
The table is structured in this way:
number int,
cost int
an example of the table:
number | cost
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | 2
7 | 3
8 | 4
9 | 7
10 |7
Is there any way for me to write a query so that i can get these numbers returned in the format min, max, and cost for example running the query on the example above would return:
min|max|cost
-----|-----|----
1 | 10 | 5
11 |15 | 10
16 |16 | 15
17 |20 | 20
Also I am not sure if this is the best structure for such a table. Any and all help is appreciated. Thanks!
Try this. Its rather messy. Just tried it using my Server Management Studio.
Update: For better readability.
SELECT Mininum.Min, Maximum.Max, Mininum.Cost
FROM
(
SELECT MIN([Number]) as 'Min', Cost
FROM [TestDB].[dbo].[Testing]
GROUP BY Cost
) as [Mininum]
INNER JOIN
(
SELECT MAX([Number]) as 'Max', Cost
FROM [TestDB].[dbo].[Testing]
GROUP BY Cost
) as [Maximum]
ON Mininum.Cost = Maximum.Cost

Multiply a field with a counted field

I'm using DBISAM to have a table from existing datas.
My problem is :
NAME | COUNT(QTY) | PRICE | ? |
_______|________________|____________|_________|
A | 1 | 9.5 | 9.5 |
_______|________________|____________|_________|
B | 2 | 12.5 | 25 |
_______|________________|____________|_________|
C | 5 | 20 | 100 |
_______|________________|____________|_________|
My SQL request is SELECT NAME, COUNT(QTY), PRICE FROM ARTICLE.
But I can't find how to have the last column (with the ?).
I tried PRICE * COUNT(QTY) but DBISAM can't handle this and say "Invalid use of non-aggregated column".
Have you an idea about how I can get the last column ?
Thanks

Convert rows to columns for a Report

Using instructions found here I've tried to create a crosstab query to show historical data from three previous years and I would like to output it in a report.
I've got a few complications that are making this difficult and I'm having trouble getting the data to show correctly.
The query it is based on is structured like this:
EmpID | ReviewYearID | YearName | ReviewDate | SelfRating | ManagerRating | NotSelfRating |
1 | 5 | 2013 | 01/09/2013 | 3.5 | 3.5 | 3.5 |
1 | 6 | 2014 | 01/09/2014 | 2.5 | 2.5 | 2.5 |
1 | 7 | 2015 | 01/09/2015 | 4.5 | 4.5 | 4.5 |
2 | 6 | 2014 | 01/09/2014 | 2.0 | 2.0 | 2.0 |
2 | 7 | 2015 | 01/09/2015 | 2.0 | 2.0 | 2.0 |
3 | 7 | 2015 | 01/09/2015 | 5.0 | 5.0 | 5.0 |
[Edit]: Here is the SQL for the base query. It is combining data from two tables:
SELECT tblEmployeeYear.EmployeeID AS EmpID, tblReviewYear.ID AS ReviewYearID, tblReviewYear.YearName, tblReviewYear.ReviewDate, tblEmployeeYear.SelfRating, tblEmployeeYear.ManagerRating, tblEmployeeYear.NotSelfRating
FROM tblReviewYear INNER JOIN tblEmployeeYear ON tblReviewYear.ID = tblEmployeeYear.ReviewYearID;
[/Edit]
I would like a crosstab query that transposes the columns/rows to show historical data for up to 3 previous years (based on review date) for a specific employee. The end result would look something like this for Employee ID 1:
Year | 2015 | 2014 | 2013 |
SelfRating | 4.5 | 2.5 | 3.5 |
ManagerRating | 4.5 | 2.5 | 3.5 |
NotSelfRating | 4.5 | 2.5 | 3.5 |
Other employees would have less columns since they don't have data for previous years.
I'm having issues with filtering it down to a specific employee and sorting the years by their review date (the name isn't always a reliable way to sort them).
In the end I'm looking to use this as the data for a report.
If there is a different way than a crosstab query to accomplish this I would be okay with that as well.
Thanks!
You need a column for all the rating types, not an individual column for each type. If you can't redesign the table, I would suggest creating a new one for your purposes. The below uses a union to add in that type column referred to above. You create a column and hardcode the value (SelfRating, ManagerRating, etc):
SELECT * INTO EmployeeRatings
FROM (SELECT tblEmployeeYear.EmployeeId AS EmpId, ReviewYearId, "SelfRating" AS Category, SelfRating AS Score
FROM tblEmployeeYear
WHERE SelfRating Is Not Null
UNION ALL
SELECT tblEmployeeYear.EmployeeId, ReviewYearId, "ManagerRating", ManagerRating
FROM tblEmployeeYear
WHERE ManagerRating Is Not Null
UNION ALL
SELECT tblEmployeeYear.EmployeeId, ReviewYearId, "NotSelfRating", NotSelfRating
FROM tblEmployeeYear
WHERE NotSelfRating Is Not Null)
Then use the newly created table in place of tblEmployeeYear. Note that I use Year([ReviewDate]) which will return only the year. Also, since it looks like it may be possible to have more than one of each review type per year, I averaged the Score for the year.
TRANSFORM Avg(Score)
SELECT EmpId, Category
FROM (SELECT EmpId, Category, ReviewDate, Score
FROM tblReviewYear
INNER JOIN EmployeeRatings
ON tblReviewYear.ID = EmployeeRatings.ReviewYearID) AS Reviews
GROUP BY EmpId, Category
PIVOT Year([ReviewDate]);

Find a subset of numbers that equals to the target weighted average and target sum

There is a SQL server table containing 1 million of rows. A sample data is shown below.
Percentage column is computed as = ((Y/X)* 100)
+----+--------+-------------+-----+-----+-------------+
| ID | Amount | Percentage | X | Y | Z |
+----+--------+-------------+-----+-----+-------------+
| 1 | 10 | 9.5 | 100 | 9.5 | 95 |
| 2 | 20 | 9.5 | 100 | 9.5 | 190 |
| 3 | 40 | 5 | 100 | 5 | 200 |
| 4 | 50 | 5.555555556 | 90 | 5 | 277.7777778 |
| 5 | 70 | 8.571428571 | 70 | 6 | 600 |
| 6 | 100 | 9.230769231 | 65 | 6 | 923.0769231 |
| 7 | 120 | 7.058823529 | 85 | 6 | 847.0588235 |
| 8 | 60 | 10.52631579 | 95 | 10 | 631.5789474 |
| 9 | 80 | 10 | 100 | 10 | 800 |
| 10 | 95 | 10 | 100 | 10 | 950 |
+----+--------+-------------+-----+-----+-------------+
Now I need to find the rows such that their amount value add up to a given Amount and weighted average matches to the given Percentage.
For example, if the target Amount =365 and target Percentage=9.84, then from the given dataset, we can say that rows with ID=1,2,6,8,9,10 form the subset which will match the given targets.
Amount = 10+20+100+60+80+95
= 365
Percentage = Sum of (product of Amount and Percentage)/Sum of (Amount)
(I am using Z column to store the products of Amount and Percentage to make the calculations easier)
= ((10*9.5)+(20*9.5)+(100*9.23077)+(60*10.5264)+(80*10)+(95*10))/ (10+20+100+60+80+95)
= 9.834673618
So the rows 1,2,6,8,9,10 matches the given target sum and target weighted average.
Proposed algorithm should work on the 1 million rows and main objective is to achieve the match on the weighted average (Percentage) with Amount as much close as possible to the target Amount.
I found few questions on the stackoverflow which are related to match the target sum. But my problem is to match two target attributes Sum and weighted average.
Which algorithm can be used to achieve this?
Since the target "Percentage" is only approximate (therefore not an actual constraint), let's try removing it and find a solution for Amount. This can only make the problem easier.
What's left is the Subset Sum Problem, which is NP-complete. There are simple exponential-time solutions, and sneaky pseudo-polynomial-time solutions, but I don't think any of them will be practical for a table with 106 rows.
If this is an academic exercise, I suggest you write up the cleverest pseudo-polynomial-time solution you can come up with. If it's a task in the real world, I suggest you go back to the person who gave it to you, explain that an exact solution is impractical, and negotiate for an approximate solution.

Get latest child record without given order

Simplified, I got the following situation. I've got two tables. One migration has multiple checks through checks.migration_id. The Column checks.old describes a type of check. Now I want to get for each migration the check with the biggest time where old is true (query1) and false (query2).
There are about 30.000 migrations and each has around 1000 checks where old=true and 1000 checks where old=false. The table checks will grow quite extreme. The order of the checks is not given and could be totally mixed up.
I want to get the latest check for a maximum of 150 migrations at once.
SQL Fiddle: http://sqlfiddle.com/#!15/282ce/15
I'm using PostgreSQL 9.3 and Rails 3.2 (shouldn't matter)
Whats the most efficient way to get the latest subrecord where old = true?
Table Migrations:
| ID |
|----|
| 1 |
| 2 |
Table Checks:
| ID | MIGRATION_ID | OLD | OK | TIME |
|----|--------------|-----|----|----------------------------------|
| 1 | 1 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 2 | 1 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 3 | 2 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 4 | 2 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 5 | 1 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 6 | 1 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 7 | 2 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 8 | 2 | 0 | 1 | September, 22 2014 12:00:04+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 5 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 2 | 7 | 1 | 1 | September, 22 2014 12:00:03+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 6 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 2 | 8 | 0 | 1 | September, 22 2014 12:00:04+0000 |
I tried to solve it with a max in a subquery, but then I lose the information about checks.ok and check.time.
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 't') AS latest FROM migrations eq;
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 'f') AS latest FROM migrations eq;
(I know that I get max(id) instead of max(time).)
In Rails I tried to fetch for each Migration the latest Record which resulted in the 1+n Problem. I'm not able to include all Checks because there are way to much of them.
A simple solution with the Postgres specific DISTINCT ON:
Query 1 ("for each migration the check with the biggest time where old is true"):
SELECT DISTINCT ON (migration_id)
migration_id, id AS check_id, old, ok, time
FROM checks
WHERE old
ORDER BY migration_id, time DESC;
Invert the the WHERE condition for Query 2:
...
WHERE NOT old
...
Details:
Select first row in each GROUP BY group?
But if you want better read performance with big tables, use JOIN LATERAL (Postgres 9.2+, standard SQL), building on a multicolumn index like:
CREATE INDEX checks_special_idx ON checks(old, migration_id, time DESC);
Query 1:
SELECT m.id AS migration_id
, c.id AS check_id, c.old, c.ok, c.time
FROM migrations m
-- FROM (SELECT id FROM migrations LIMIT 150) m
JOIN LATERAL (
SELECT id, old, ok, time
FROM checks
WHERE migration_id = m.id
AND old
ORDER BY time DESC
LIMIT 1
) c ON TRUE;
Switch the condition on old again for query 2.
For an unspecified "maximum of 150 migrations", use the commented alternative line.
Details:
Optimize GROUP BY query to retrieve latest record per user
SQL Fiddle.
Aside: don't use "time" as identifier. It's a reserved word in standard SQL and a basic type name in Postgres.