Select Value based on Multiple Value Range in SQL - sql

I am having multiple criteria to give incentive to my employees. For example as shown in below image
Grid Table is dynamic in nature. It keeps on changing based on business conditions.
I have a table where I have emp Ids whose Resolution % I have calculated and also calculated their Normalization %. Now, I need to give them % Incentives based on the above Grid using SQL Query.
Output Table in which i need to update the incentives

I assume the grid table is also stored as a database table (so you can update it):
+-----------------+---------------+--------------------+------------------+-----------+
| INCENTIVES |
+-----------------+---------------+--------------------+------------------+-----------+
| from_resulution | to_resolution | from_normalization | to_normalization | incentive |
+-----------------+---------------+--------------------+------------------+-----------+
| 0 | 70 | 0 | 5 | 9 |
| 0 | 70 | 5 | 10 | 11 |
| 0 | 70 | 10 | 100 | 13 |
| 71 | 75 | 0 | 5 | 10 |
... I hope you get the idea
+-----------------+---------------+--------------------+------------------+-----------+
And the update query can be:
update employee E
set E.incentive = (select I.incentive
from incentives I
where e.resolution >= I.from_resolution
and e.resolution < I.to_resolution
and e.normalization >= I.from_normalization
and e.normalization < I.to_normalization)
UPDATE: the TO values are not in the scope of the range. By using the TO value equal to the FROM value of the next range we assure to cover all values (including floating point). Thanks to Gordon

Related

Comparing every row in table with the master row

I have a Redshift table with single VARCHAR column named "Test" and several float columns. The "Test" column has unique values, one of them is "Control", others are not hardcoded.
Tables has ~10 rows (not static) and ~10 columns.
I need to generate the Looker report which will show the original data and the difference between the corresponding float columns in "Control" and other Tests.
Input Example:
Test | Metric_1 | Metric_2
----------------------------
Control| 10 | 100
A | 12 | 120
B | 8 | 80
The desirable report:
| Control | A | A-Control | B | B-Control
|---------|----|-----------|---|-----------
Metric_1 | 10 | 12 | 2 | 8 | -2
Metric_2 | 100 | 120| 20 | 80| -20
To calculate the difference for the each row with "Control"
I tried:
SELECT T.test,
T.metric_1 - Control.metric_1 AS DIFF1,
T.metric_2 - Control.metric_2 AS DIFF2,
...
FROM T, (SELECT * FROM T WHERE test='Control') AS Control
I can do part of work in Looker (it can transpose),
part in SQL, but still cannot figure out how to build this report.
You could transpose the test dimension, being able to build part of it:
| Control | A | B |
|---------|----|---|
Metric_1 | 10 | 12 | 8 |
Metric_2 | 100 | 120| 80|
Then operate on top of this results using table calculations.
You can use the functions pivot_where() or pivot_index().
For example, pivot_where(test = 'A', metric) - pivot_where(test = 'Control', metric)

Creating a view that joins multiple tables on an ID and a timestamp that needs to be rounded

I have a web application that sends data to my sqlite database into different tables depending on the information. I would like to make a view that merges multiple tables together based on cownumber and TS[timestamp] (There are no updates to my table, so a change to the same cownumber send the full record as a new entry with a new timestamp). The ajax calls are made table by table so the TS do not exactly sync up generally they can be 5-20 seconds off depending on the connection
Here is a sample of the three tables
+----master_animal-----+
+----------------------------------------------------+
| cownumber | height | weight | ts |
+-----------+----------+--------+--------------------+
| 1 | 150 | ... | 2017-12-01 12:28:00|
| 2 | 170 | ... | 2017-12-03 17:16:00|
| 3 | 60 | ... | 2017-12-03 08:09:00|
| 4 | 109 | ... | 2017-12-04 23:23:00|
+----animal_inventory-----+
+-------------------------------------------------------------+
| cownumber | brandlocation| dateacquired| ts |
+-----------+--------------+-------------+--------------------+
| 1 | ... | ... | 2017-12-01 12:28:50|
| 2 | ... | ... | 2017-12-03 17:16:30|
| 3 | ... | ... | 2017-12-03 08:09:12|
| 4 | ... | ... | 2017-12-04 23:23:23|
+----experiment-----+
+-------------------------------------------------------------+
| cownumber | ageatwean | birthweight | ts |
+-----------+--------------+-------------+--------------------+
| 1 | ... | ... | 2017-12-01 12:28:20|
| 2 | ... | ... | 2017-12-03 17:16:41|
| 3 | ... | ... | 2017-12-03 08:09:24|
| 4 | ... | ... | 2017-12-04 23:23:11|
The View I wrote
CREATE VIEW testing
AS SELECT a.height,a.weight,a.cownumber,
b.brandlocation,b.dateacquired,
c.ageatwean,c.birthweight
FROM master_animal a, animal_inventory b, experiment c
WHERE a.cownumber=b.cownumber
AND ROUND(a.ts/10000) = ROUND(b.ts/10000)
AND a.cownumber=c.cownumber
AND ROUND(a.ts/10000) = ROUND(c.ts/10000);
The query I wrote
Select * from testing where cownumber = 1;
What I was hoping to get back was
+----testing-----+
+----------------------------------------------------+
| cownumber | height | weight | brandlocation| dateacquired | ageatwean |birthweight |
+-----------+--------+--------+--------------+--------------+-----------+------------+
| 941 | 0 | ... | ... | ... | ... | .. |
Where there will be one row for cownumber 941 as long as all the correlated records were within a few seconds of each other. I am not exactly sure if I need to divide by 10000 or smaller. The same record should be no more than 50 seconds apart from each other. Anything more than 50 seconds apart should be considered a new record.
When I test this where there is only one record for that cownumber it works fine. But lets say I change some information from each table. I provide a new height, a new brandlocation.
Instead of getting two rows. The first row being the initial data entry and the second row showing the same cownumber with the changed values, I get back 8 rows with partial changes.
height|weight|cownumber|brandlocation|dateacquired|ageatwean|birthweight|
0.0|0.0|941|0|0|0.0|0
0.0|0.0|941|0|0|0.0|0
0.0|0.0|941|Left Hip|0|0.0|0
0.0|0.0|941|Left Hip|0|0.0|0
50.0|0.0|941|0|0|0.0|0
50.0|0.0|941|0|0|0.0|0
50.0|0.0|941|Left Hip|0|0.0|0
50.0|0.0|941|Left Hip|0|0.0|0
I assume the issue is in my where clause but I am not sure exactly how to fix it
The timestamps are stored as strings. When you try to divide it, the database tries to convert it to a number, which results in 2017. So all timestamps end up being the same.
Dividing cannot determine the distance; the values 9999 and 10000 will end up different although they are right near each other. (And an integer division results in an integer result, so the ROUND() has no effect.)
To compute the distance, convert the timestamp into a number of seconds first, and then use abs():
SELECT ...
FROM master_animal m
JOIN animal_inventory i ON m.cownumber = i.cownumber
AND abs(strftime('%s', m.ts) - strftime('%s', i.ts)) <= 50
JOIN experiment e ON m.cownumber = e.cownumber
AND abs(strftime('%s', m.ts) - strftime('%s', e.ts)) <= 50;

SQL deleting rows with duplicate dates conditional upon values in two columns

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)
The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5
You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.

Find a subset of numbers that equals to the target weighted average and target sum

There is a SQL server table containing 1 million of rows. A sample data is shown below.
Percentage column is computed as = ((Y/X)* 100)
+----+--------+-------------+-----+-----+-------------+
| ID | Amount | Percentage | X | Y | Z |
+----+--------+-------------+-----+-----+-------------+
| 1 | 10 | 9.5 | 100 | 9.5 | 95 |
| 2 | 20 | 9.5 | 100 | 9.5 | 190 |
| 3 | 40 | 5 | 100 | 5 | 200 |
| 4 | 50 | 5.555555556 | 90 | 5 | 277.7777778 |
| 5 | 70 | 8.571428571 | 70 | 6 | 600 |
| 6 | 100 | 9.230769231 | 65 | 6 | 923.0769231 |
| 7 | 120 | 7.058823529 | 85 | 6 | 847.0588235 |
| 8 | 60 | 10.52631579 | 95 | 10 | 631.5789474 |
| 9 | 80 | 10 | 100 | 10 | 800 |
| 10 | 95 | 10 | 100 | 10 | 950 |
+----+--------+-------------+-----+-----+-------------+
Now I need to find the rows such that their amount value add up to a given Amount and weighted average matches to the given Percentage.
For example, if the target Amount =365 and target Percentage=9.84, then from the given dataset, we can say that rows with ID=1,2,6,8,9,10 form the subset which will match the given targets.
Amount = 10+20+100+60+80+95
= 365
Percentage = Sum of (product of Amount and Percentage)/Sum of (Amount)
(I am using Z column to store the products of Amount and Percentage to make the calculations easier)
= ((10*9.5)+(20*9.5)+(100*9.23077)+(60*10.5264)+(80*10)+(95*10))/ (10+20+100+60+80+95)
= 9.834673618
So the rows 1,2,6,8,9,10 matches the given target sum and target weighted average.
Proposed algorithm should work on the 1 million rows and main objective is to achieve the match on the weighted average (Percentage) with Amount as much close as possible to the target Amount.
I found few questions on the stackoverflow which are related to match the target sum. But my problem is to match two target attributes Sum and weighted average.
Which algorithm can be used to achieve this?
Since the target "Percentage" is only approximate (therefore not an actual constraint), let's try removing it and find a solution for Amount. This can only make the problem easier.
What's left is the Subset Sum Problem, which is NP-complete. There are simple exponential-time solutions, and sneaky pseudo-polynomial-time solutions, but I don't think any of them will be practical for a table with 106 rows.
If this is an academic exercise, I suggest you write up the cleverest pseudo-polynomial-time solution you can come up with. If it's a task in the real world, I suggest you go back to the person who gave it to you, explain that an exact solution is impractical, and negotiate for an approximate solution.

Merge same cells in iReport

is it possible to have a sum in detail band in iReport?
It is important to have cells merged vertically after export to excel like this:
-----------------------------
| id | year | value | sum |
-----------------------------
| | 2010 | 55 | |
| 1 | 2011 | 65 | 180 |
| | 2012 | 60 | |
-----------------------------
| 2 | 2010 | 70 | 70 |
-----------------------------
My idea is to have the main query with GROUP BY clause and for "year" and "value" use table component with another query. Problem is that my query is long running and i need to have only one in whole report.
First have a look at here. It's about grouping rows.
You will see that you should create a group in your report, not in the query depending on your id field.
For calculating the sum field, drag the value field to the column footer, and then you will see a pop-up menu. Click to the result of an aggregation function radio button, then choose sum function. This will create a variable to calculate the sum of the value field. Change this variable's reset type to group (to id_group). Use this field in your sum field.
For grouping rows depending on id, click on the sum field and set this field's print when group changes to id_group.
this should help :)
when you group your fields your table will look like this. The grouped fields are at the top.
-----------------------------
| id | year | value | sum |
-----------------------------
| 1 | 2010 | 55 | 180 |
| | 2011 | 65 | |
| | 2012 | 60 | |
-----------------------------
| 2 | 2010 | 70 | 70 |
-----------------------------