How to sum only the first row for each group in a result set - sql

Ok, I will try to explain myself the best I can, but I have the following:
I have a datasource that basically is a dynamic query. The query in itself shows 3 fields, Name, Amount1, Amount2.
Now, I could have rows with the same Name. The idea is to make a sum of Amount1+Amount2 WHEN Name is distinct from the previous one I saved. If I would do this on C# it could be something like this:
foreach (DataRow dr in repDset.Dataset.Rows)
{
total = (long)dr["Amount1"] + (long)dr["Amount2"];
if (thisconditiontrue)
{
if (PreviousName == "" || PreviousName != dr["Name"].ToString())
{
TotalName = TotalName + total;
}
PreviousName = dr["Name"].ToString();
}
}
The idea is to grab this and make a Reporting Services expression using the methods RS can give me, for example:
IIF(Fields!Name.Value<>Previous(Fields!Name.Value),Fields!Amount1.Value + Fields!Amount2.Value,False)
Something like that but that stores the amount of the previous one.
Maybe creating another field? a calculated one?
I can clarify further and edit if needed.
*EDIT for visual clarification:
As an example, it is something like this:

This query is assuming you're working with SQL server. But you're going to need something to order the query results by otherwise how do you know which row is the first one?
SELECT SUM(NameTotal) AS Total
FROM (
SELECT Name, Amount1 + Amount2 AS NameTotal,
ROW_NUMBER() OVER (ORDER BY OrderField PARTITION BY Name) AS rowNum
FROM srcTable
) AS a
WHERE rowNum=1;
This uses the analytical window function ROW_NUMBER() to number each row and the PARTITION BY clause tells it to reset the numbering for every different value of Name in the result set. You do need a field that you can order the results by though or this won't work. If you really just want a random order you can do ORDER BY NEWID() but that will give you a non-deterministic result.
This syntax is particular to SQL server but it can usually be achieved in other databases.
If you're looking to display the output like you've shown in your example you could use two queries and reference the other one by passing it as the scope to an aggregate function in an SSRS expression like this:
=MAX(Fields!Total.Value, "TotalQueryDataset")
Where your dataset is called "TotalQueryDataset".
Otherwise you can achieve the output using pure SQL like this:
WITH nameTotals AS (
SELECT Name, Amount1, Amount2,
ROW_NUMBER() OVER (ORDER BY OrderField PARTITION BY Name) AS rowNum
FROM srcTable
)
SELECT Name, Amount1, Amount2
FROM nameTotals
UNION ALL
SELECT 'Total', SUM(Amount1 + Amount2), NULL
FROM nameTotals
WHERE rowNum=1;

Related

Iterate through Oracle SQL query results line by line, and produce sub-queries - not running efficiently

If have the below query (simplified example of my query, for the purposes of readability):
SELECT make, year, color, count(*)
FROM cars
GROUPY BY make, year, color
ORDER BY 4 DESC;
I want to iterate through the resulting table and produce sub queries for the criteria of each row (examples below). I hope to then use these sub queries to make a single table with samples results (maybe 3 rows) that meet the criteria of each of the rows from the original query results (ex. as there are multiple Jeeps from 2019 in color black).
SELECT * from cars
WHERE make = 'Jeep'
AND year = '2019'
AND color = 'Black';
SELECT * from cars
WHERE make = 'Ford'
AND year = '2018'
AND color = 'Red';
This may seem like an odd or unnecessary request. However, I believe that this is the best approach given the complexity of my actual problem. This is the approach I want to take, as I want a simplified solution that I can come back to and alter for future use and for different variations of queries.
I am currently using ROW_NUMBER() to retrieve a maximum of three rows per group as my approach (below). Although this compiles for me, it has never run to completion because it has a very long runtime. When I go through the process manually (that I hope to automate with this query), the runtime to produce the desired output doesn't take too long (an hour or two). However, when I run this solution it remains running for the entire day and then Oracle stops the process as a result of the database connection timing out. Does anyone have a better approach to this problem, or perhaps a way to make this run more efficiently?
select *
from (
select c.*,
row_number() over(partition by make, year, color order by id) as rn
from cars c
) x
where rn <= 3
NOTE: I am using Oracle SQL Developer
You can get all queries by dynamically create another column like :
SELECT DISTINCT make, year, color,
'SELECT * from cars WHERE make =''' || make ||''' AND year = ''' || year ||''' AND color = ''' || color ||'''' AS SELECT_STATEMENTS
FROM (select *
from (
select c.*,
row_number() over(partition by make, year, color order by id) as rn
from cars c
) x
where rn <= 3)

Using Multiple aggregate functions in the where clause

We have a select statement in production that takes quite a lot of time.
The current query uses row number - window function.
I am trying to rewrite the query and test the same. assuming its orc table fetching aggregate values instead of using row number may help to reduce the execution time, is my assumption
Is something like this possible. Let me know if i am missing anything.
Sorry i am trying to learn, so please bear with my mistakes, if any.
I tried to rewrite the query as mentioned below.
Original query
SELECT
Q.id,
Q.crt_ts,
Q.upd_ts,
Q.exp_ts,
Q.biz_effdt
(
SELECT u.id, u.crt_ts, u.upd_ts, u.exp_ts, u.biz_effdt, ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY u.crt_ts DESC) AS ROW_N
FROM ( SELECT cust_prd.id, cust_prd.crt_ts, cust_prd.upd_ts, cust_prd.exp_ts, cust_prd.biz_effdt FROM MSTR_CORE.cust_prd
WHERE biz_effdt IN ( SELECT MAX(cust_prd.biz_effdt) FROM MSTR_CORE.cust_prd )
) U
)Q WHERE Q.row_n = 1
My attempt:
SELECT cust_prd.id, cust_prd.crt_ts, cust_prd.upd_ts, cust_prd.exp_ts, cust_prd.biz_effdt FROM MSTR_CORE.cust_prd
WHERE biz_effdt IN ( SELECT MAX(cust_prd.biz_effdt) FROM MSTR_CORE.cust_prd )
having cust_prd.crt_ts = max (cust_prd.crt_ts)

How to implement lag function in teradata.

Input :
Output :
I want the output as shown in the image below.
In the output image, 4 in 'behind' is evaluated as tot_cnt-tot and the subsequent numbers in 'behind', for eg: 2 is evaluated as lag(behind)-tot & as long as the 'rank' remains same, even 'behind' should remain same.
Can anyone please help me implement this in teradata?
You appears to want :
select *, (select count(*)
from table t1
where t1.rank > t.rank
) as behind
from table t;
I would summarize the data and do:
select id, max(tot_cnt), max(tot),
(max(tot_cnt) -
sum(max(tot)) over (order by id rows between unbounded preceding and current row)
) as diff
from t
group by id;
This provides one row per id, which makes a lot more sense to me. If you want the original data rows (which are all duplicates anyway), you can join this back to your table.

Sql -after group by I need to take rows with newest date

I need to write a query in sql and I can't do it correctly. I have a table with 7 columns 1st_num, 2nd_num, 3rd_num, opening_Date, Amount, code, cancel_Flag.
For every 1st_num, 2nd_num, 3rd_num I want to take only the record with the min (cancel_flag), and if there's more then 1 row so take the the newest opening Date.
But when I do group by and choose min and max for the relevant fields, I get a mix of the rows, for example:
1. 12,130,45678,2015-01-01,2005,333,0
2. 12,130,45678,2015-01-09,105,313,0
The result will be
:12,130,45678,2015-01-09,2005,333,0
and that mixes the rows into one
Microsoft sql server 2008 . using ssis by visual studio 2008
my code is :
SELECT
1st_num,
2nd_num,
3rd_num,
MAX(opening_date),
MAX (Amount),
code,
MIN(cancel_flag)
FROM do. tablename
GROUP BY
1st_num,
2nd_num,
3rd_num,
code
HAVING COUNT(*) > 1
How do I take the row with the max date or.min cancel flag as it is without mixing values?
I can't really post my code because of security reasons but I'm sure you can help.
thank you,
Oren
It is very difficult like this to answer, because every DBMS has different syntax.
Anyways, for most dbms this should work. Using row_number() function to rank the rows, and take only the first one by our definition (all your conditions):
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY t.1st_num,t.2nd_num,t.3rd_num order by t.cancel_flag asc,t.opening_date desc) as row_num
FROM YourTable t ) as tableTempName
WHERE row_num = 1
Use NOT EXISTS to return a row as long as no other row with same 1st_num, 2nd_num, 3rd_num has a lower cancel_flag value, or same cancel_flag but a higher opening_Date.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.1st_num = t1.1st_num
and t2.2nd_num = t1.2nd_num
and t2.3rd_num = t1.3rd_num
and (t2.cancel_flag < t1.cancel_flag
or (t2.cancel_flag = t1.cancel_flag and
t2.opening_Date > t1.opening_Date)))
Core ANSI SQL-99, expected to work with (almost) any dbms.

how to select lines in Mysql while a condition lasts

I have something like this:
Name.....Value
A...........10
B............9
C............8
Meaning, the values are in descending order. I need to create a new table that will contain the values that make up 60% of the total values. So, this could be a pseudocode:
set Total = sum(value)
set counter = 0
foreach line from table OriginalTable do:
counter = counter + value
if counter > 0.6*Total then break
else insert line into FinalTable
end
As you can see, I'm parsing the sql lines here. I know this can be done using handlers, but I can't get it to work. So, any solution using handlers or something else creative will be great.
It should also be in a reasonable time complexity - the solution how to select values that sum up to 60% of the total
works, but it's slow as hell :(
Thanks!!!!
You'll likely need to use the lead() or lag() window function, possibly with a recursive query to merge the rows together. See this related question:
merge DATE-rows if episodes are in direct succession or overlapping
And in case you're using MySQL, you can work around the lack of window functions by using something like this:
Mysql query problem
I don't know which analytical functions SQL Server (which I assume you are using) supports; for Oracle, you could use something like:
select v.*,
cumulative/overall percent_current,
previous_cumulative/overall percent_previous from (
select
id,
name,
value,
cumulative,
lag(cumulative) over (order by id) as previous_cumulative,
overall
from (
select
id,
name,
value,
sum(value) over (order by id) as cumulative,
(select sum(value) from mytab) overall
from mytab
order by id)
) v
Explanation:
- sum(value) over ... computes a running total for the sum
- lag() gives you the value for the previous row
- you can then combine these to find the first row where percent_current > 0.6 and percent_previous < 0.6