Rows in table with specific sum of a column

Rows in table with specific sum of a column - sql

I have a table containing some payments looking something like this:
id | from | to | amount
--------------------------
1 | 125 | 135 | 2.4
2 | 123 | 134 | 1.7
3 | 124 | 138 | 4.8
4 | 118 | 119 | 3.9
5 | 56 | 254 | 23.5
...
I need to know if there is a way to make SQL query that would tell me if there is a series of consecutive rows, the amount of which sums up to a certain value. For example, if I wanted value 6.5, it would return rows 2 to 3. If I wanted 12.8, it would return rows 1 to 4 and so on.
I am absolutely stuck and would appreciate some help.

I would approach this as follows. First, calculate the cumulative sum. Then, the condition that consecutive rows have a particular sum is equivalent to saying that the difference between two of the cumulative sums equals that value.
with p as (
select p.*, sum(amount) over (order by id) as cumamount
from payments p
)
select
from p p1 join
p p2
on p1.id <= p2.id and
( p2.cumamount - p1.cumamount ) = 6.5;
As a note: this will probably not work if amount is stored as a floating point number because of very small inaccuracies. If amount where an integer, it would be fine, but it clearly is not. A fixed point representation should be ok.

;with numbers as (select number from master..spt_values where type='p' and number between 1 and (Select MAX(id) from yourtable)),
ranges as ( select n1.number as start, n2.number as finish from numbers n1 cross join numbers n2 where n1.number<=n2.number)
select yourtable.* from yourtable
inner join
(
select start, finish
from ranges
inner join yourtable on id between start and finish
group by start, finish
having SUM(amount)=12.8
) results
on yourtable.id between start and finish

Related

how can i get an accurate count based on max date when joining 3 tables when one of the join fields is many to 1 in oracle?

So, I have 3 tables that I am attempting to get counts for based on a groupid, and a task code. There are a few issues I am having as some of the relationships are many to one, which I think is somehow inflating my counts. I will list my 3 tables with the pertinent attributes.
task_table contains:
task_code - would like to get the counts of each one in a group id, would like to use the latest instance basedon event date.
sol_id -used to join to worktable; many sol_id to one m_id is possible
edate -need to use to get one record
cur_id - where cur_id = 1 in the where clause
worktable contains:
sol_id - used to join to task_table
m_id - used to join to grouptable
grouptable contains:
m_id
groupid- used to group the task_code to get count
I'd like the end result to look like:
group_id task_count task
5555 45 A
5555 4 N
5624 67 A
5624 23 O
5624 42 X
I have been attempting to run a number of queries, but the counts I am getting back do not look correct. I am concerned that it is somehow returning more than one instance of the m_id somehow? Here is the query in question:
select c.groupid, count(c.groupid) group_count, a.task_code from task_table a
join worktable b
on a.sol_id = b.sol_id
join grouptable c
on b.m_id= c.m_id
where a.cur_id = 1 and a.task_code is not null
group by c.groupid, a.task_code;
If I add 'edate = (select max(edate) from task_table)' in the where clause, it returns an empty table.
I am unsure how to incorporate edate to get only the newest record that fits the criteria in the where clause. The reason I think I want to use this is because there could be more than one sol_id that is associated with a m_id, so i'd just like to include only the newest record with a cur_id in the count. Thank you for your time.
sample data
task_table
task_code sol_id edate cur_id
A 23 6/7/09 1
A 24 6/4/09 1
A 23 6/10/09 0
B 45 6/2/09 1
B 42 6/3/09 1
C 34 10/8/10 0
C 83 9/10/09 1
work table
sol_id m_id
23 1234
24 1234
45 1832
42 1343
83 7623
group table
m_id group_id
1234 A76
1832 Y23
1343 A76
7623 Y23
looking at these tables, the result should look like the following
group_id task_count task
A76 2 A
Y23 1 C
( A76 should only count sol_id 23 and 42)
( Y23 should only count sol_id 83)

So, there's a conflict in your requested data result. According to your own sample, A76 should have a task_count of 2: sol_id 23, which has Task A, and sol_id 42, which has Task B. It's not possible to have it return a row like you have at your example result table because it would need to group by TASK_CODE, which means losing the COUNT(task_code). Can't have it both ways.
In order to obtain only the most recent edate, I did a separate calculation to location that max(edate) by task_code, then joined it back to obtain the sol_id. If this isn't accurate for your data set, you'll need to determine another way of obtaining max(edate). This works for your sample set.
with recentTasks as (
select task_code, max(edate) as recentDate
from task_table m
where cur_id = 1
and task_code is not null
group by task_code
), recentTaskWithSols as (
select m.task_code, m.recentDate as edate, t.sol_id
from recentTasks m
join task_table t on m.task_code = t.task_code AND m.recentDate = t.edate
where t.cur_id = 1
)
select c.group_id,
count(a.sol_id) task_count
from group_table c
join work_table b on c.m_id = b.m_id
join recentTaskWithSols a on b.sol_id = a.sol_id
group by c.group_id;
gives the result:
+------------------------+
| GROUP_ID | TASK_COUNT |
+------------------------+
| A76 | 2 |
| Y23 | 1 |
+-----------+------------+
Demo here.

Running "distinct on" across all unique thresholds in a postgres table

I have a Postgres 11 table called sample_a that looks like this:
time | cat | val
------+-----+-----
1 | 1 | 5
1 | 2 | 4
2 | 1 | 6
3 | 1 | 9
4 | 3 | 2
I would like to create a query that for each unique timestep, gets the most recent values across each category at or before that timestep, and aggregates these values by taking the sum of these values and dividing by the count of these values.
I believe I have the query to do this for a given timestep. For example, for time 3 I can run the following query:
select sum(val)::numeric / count(val) as result from (
select distinct on (cat) * from sample_a where time <= 3 order by cat, time desc
) x;
and get 6.5. (This is because at time 3, the latest from category 1 is 9 and the latest from category 2 is 4. The count of the values are 2, and they sum up to 13, and 13 / 2 is 6.5.)
However, I would ideally like to run a query that will give me all the results for each unique time in the table. The output of this new query would look as follows:
time | result
------+----------
1 | 4.5
2 | 5
3 | 6.5
4 | 5
This new query ideally would avoid adding another subselect clause if possible; an efficient query would be preferred. I could get these prior results by running the prior query inside my application for each timestep, but this doesn't seem efficient for a large sample_a.
What would this new query look like?

See if performance is acceptable this way. Syntax might need minor tweaks:
select t.time, avg(mr.val) as result
from (select distinct time from sample_a) t,
lateral (
select distinct on (cat) val
from sample_a a
where a.time <= t.time
order by a.cat, a.time desc
) mr
group by t.time

I think you just want cumulative functions:
select time,
sum(sum(val)) over (order by time) / sum(sum(num_val)) over (order by time) as result
from (select time, sum(val) as sum_val, count(*) as num_val
from sample_a a
group by time
) a;
Note if val is an integer, you might need to convert to a numeric to get fractional values.
This can be expressed without a subquery as well:
select time,
sum(sum(val)) over (order by time) / sum(count(*)) over (order by time) as result
from sample_a
group by time

Cumulative count of duplicates

For a table looking like
ID | Value
-------------
1 | 2
2 | 10
3 | 3
4 | 2
5 | 0
6 | 3
7 | 3
I would like to calculate the number of IDs with a higher Value, for each Value that appears in the table, i.e.
Value | Position
----------------
10 | 0
3 | 1
2 | 4
0 | 6
This equates to the offset of the Value in a ORDER BY Value ordering.
I have considered doing this by calculating the number of duplicates with something like
SELECT Value, count(*) AS ct FROM table GROUP BY Value";
And then cumulating the result, but I guess that is not the optimal way to do it (nor have I managed to combine the commands accordingly)
How would one go about calculating this efficiently (for several dozens of thousands of rows)?

This seems like a perfect opportunity for the window function rank() (not the related dense_rank()):
SELECT DISTINCT ON (value)
value, rank() OVER (ORDER BY value DESC) - 1 AS position
FROM tbl
ORDER BY value DESC;
rank() starts with 1, while your count starts with 0, so subtract 1.
Adding a DISTINCT step (DISTINCT ON is slightly cheaper here) to remove duplicate rows (after computing counting ranks). DISTINCT is applied after window functions. Details in this related answer:
Best way to get result count before LIMIT was applied
Result exactly as requested.
An index on value will help performance.
SQL Fiddle.

You might also try this if you're not comfortable with window functions:
SELECT t1.value, COUNT(DISTINCT t2.id) AS position
FROM tbl t1 LEFT OUTER JOIN tbl t2
ON t1.value < t2.value
GROUP BY t1.value
Note the self-join.

Implementing Hierarchy in SQL

Suppose I have a table which has a "CDATE" representing the date when I retrieved the data, a "SECID" identifying the security I retrieved data for, a "SOURCE" designating where I got the data and the "VALUE" which I got from the source. My data might look as following:
CDATE | SECID | SOURCE | VALUE
--------------------------------
1/1/2012 1 1 23
1/1/2012 1 5 45
1/1/2012 1 3 33
1/4/2012 2 5 55
1/5/2012 1 5 54
1/5/2012 1 3 99
Suppose I have a HIERARCHY table like the following ("SOURCE" with greatest HIERARCHY number takes precedence):
SOURCE | NAME | HIERARCHY
---------------------------
1 ABC 10
3 DEF 5
5 GHI 2
Now let's suppose I want my results to be picked according to the hierarchy above. So applying the hierarch and selecting the source with the greatest HIERARCHY number I would like to end up with the following:
CDATE | SECID | SOURCE | VALUE
---------------------------------
1/1/2012 1 1 23
1/4/2012 2 5 55
1/5/2012 1 3 99

This joins on your hierarchy and selects the top-ranked source for each date and security.
SELECT CDATE, SECID, SOURCE, VALUE
FROM (
SELECT t.CDATE, t.SECID, t.SOURCE, t.VALUE,
ROW_NUMBER() OVER (PARTITION BY t.CDATE, t.SECID
ORDER BY h.HIERARCHY DESC) as nRow
FROM table1 t
INNER JOIN table2 h ON h.SOURCE = t.SOURCE
) A
WHERE nRow = 1

You can get the results you want with the below. It combines your data with your hierarchies and ranks them according to the highest hierarchy. This will only return one result arbitrarily though if you have a source repeated for the same date.
;with rankMyData as (
select
d.CDATE
, d.SECID
, d.SOURCE
, d.VALUE
, row_number() over(partition by d.CDate, d.SECID order by h.HIERARCHY desc) as ranking
from DATA d
inner join HIERARCHY h
on h.source = d.source
)
SELECT
CDATE
, SECID
, SOURCE
, VALUE
FROM rankMyData
where ranking = 1

Is there a way to get data from 2 tables without creating a Cartesian product?

In our database a customer can have any number of drivers, any number of vehicles, any number of storage locations, any number of buildings at those locations, any number of comments, and so on. I need a query that returns all of the customer's information and right now the query is something like:
SELECT *
FROM Customer c
INNER JOIN Driver d ON c.ID = d.CustomerID
INNER JOIN Vehicle v ON c.ID = v.CustomerID
The more that a customer has the bigger the result gets, and it grows exponentially because a cartesian product is being created here. 3 drivers, 3 vechiles creates 9 rows, and this is a very small example compared to what our real data is like. We actually have 10 different tables that can hold as many rows per customer as they want. The norm is 2-7 rows at least per table per customer. we have had as many as 60,000,000+ rows returned (6 items each in 10 different tables, 6^10 = 60,466,176) and for our purposes 6 rows total would have given us all the data we needed if we could just stick the 6 rows in each table together.
so in the smaller example, if 1 customer had 2 vehicles and 3 drivers and another customer had 2 vehicles and 1 drivers i would want a result set that looked like:
CustomerID | DriverID | VehicleID
1 | 1 | 1
1 (or NULL) | 2 | 2
1 (or NULL) | NULL | 3
2 | 3 | 4
2 (or NULL) | 4 | NULL
Instead our query that joins every table together on CustomerID looks like this:
CustomerID | DriverID | VehicleID
1 | 1 | 1
1 | 1 | 2
1 | 1 | 3
1 | 2 | 1
1 | 2 | 2
1 | 2 | 3
2 | 3 | 4
2 | 4 | 4
Really, what I want to do is just:
SELECT * FROM Driver
SELECT * FROM Vehicle
Because all we are doing with the data is looping through the rows and formatting the information in a document. All drivers are listed, then all vehicles are listed. It makes no sense to do this crazy huge join when we don't have to, but it's just an arbitrary requirement that it must return all the data in 1 result set from a stubborn superior who refuses to listen to reason. Since the columns are different a UNION isn't possible. i'm just hoping there's a way to stick them together horizontally instead of vertically.
Also, I'm using Microsoft SQL Server.

It's an ugly hack, but you know your proper solution is just as you state:
SELECT * FROM Driver
SELECT * FROM Vehicle
Instead you could use a union query and blank out the columns from the other tables, just start it with a query that sets the type and names of the columns, with a false coldition so it doesn't return a row:
SELECT 1 AS DriverID, "" AS DriverName, 1 AS VehicleID, "" AS VehicleName WHERE 1=0
UNION SELECT DriverID, DriverName, NULL, NULL FROM Driver
UNION SELECT NULL, NULL, VehicleID, VehicleName FROM Driver
Really, really bad code! Keep working on your superior to allow a better solution.

Here's how I'm doing it. Instead of:
SELECT *
FROM Customer c
INNER JOIN Driver d ON c.ID = d.CustomerID
INNER JOIN Vehicle v ON c.ID = v.CustomerID
I'm doing:
WITH CustomerCTE AS
(
SELECT 1 ROW_NUM, ID
FROM Customer
),
DriverCTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ID) ROW_NUM, *
FROM Driver
),
VehicleCTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ID) ROW_NUM, *
FROM Vehicle
)
SELECT *
FROM CustomerCTE c
FULL OUTER JOIN DriverCTE d ON c.ID = d.CustomerID AND c.ROW_NUM = d.ROW_NUM
FULL OUTER JOIN VehicleCTE v ON d.CustomerID = v.CustomerID AND d.ROW_NUM = v.ROW_NUM
ORDER BY
CASE WHEN c.ID IS NOT NULL THEN c.ID ELSE
CASE WHEN d.CustomerID IS NOT NULL THEN d.CustomerID ELSE
v.CustomerID
END
END,
CASE WHEN c.ROW_NUM IS NOT NULL THEN c.ROW_NUM ELSE
CASE WHEN d.ROW_NUM IS NOT NULL THEN d.ROW_NUM ELSE
v.ROW_NUM
END
END
Now if a customer has 3 drivers and 3 vehicles i get 3 rows instead of 9 rows. It makes it look like each driver is associated to 1 of the 3 vehicles, but it's actually not. Again, this is bad design, but it is necessary to cut down on the number of rows returned with the unreasonable restrictions I was given.
It looks like more work than webturner's answer, but in my real case where I have to join 10 different tables with over 500 columns its a lot less work to do it this way than to explicitly name all 500 columns and fill in all of the remaining columns from each table with NULL.
Though, this may not be of much use to most people. In most cases if you're doing something like this you probably need to rethink your design, but there may be some cases where you have no choice.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Rows in table with specific sum of a column - sql

Related

how can i get an accurate count based on max date when joining 3 tables when one of the join fields is many to 1 in oracle?

Running "distinct on" across all unique thresholds in a postgres table

Cumulative count of duplicates

Implementing Hierarchy in SQL

Is there a way to get data from 2 tables without creating a Cartesian product?

Categories

Resources