Combining output rows by date - sql

I have a following problem with PL/SQL..
I need to retrieve data from the table for various parameters, for the certain time period, but on the output I have duplicates for dates providing me an output for each column, but not combining them together. Could I please borrow your geniuses for this issue?
Here is my code (part of it, as it repeats the same for other parameters I need to deliver):
select /*+FULL(k)*/ k.date_n,
SUM(decode(bucket_flag_n,
'1',
(DECODE(type_s,
'MOC',
decode(on_off_net_s, 'On net', duration_sum),
'MOC_4',
decode(on_off_net_s, 'On net', duration_sum),
'MOC CF_4',
decode(on_off_net_s, 'On net', duration_sum),0)))) test1,
SUM(decode(bucket_flag_n,
'0',
(DECODE(type_s,
'MOC',
decode(on_off_net_s, 'Off net', duration_sum),
'MOC_4',
decode(on_off_net_s, 'Off net', duration_sum),
'MOC CF_4',
decode(on_off_net_s, 'Off net', duration_sum),0)))) test2
from (select /*+FULL(a)*/
a.d_timestamp date_n,
a.service_s type_s,
a.country_s,
a.on_off_net_s,
a.bucket_flag_n,
round(SUM(a.duration_n / 60)) duration_sum, --minutes rounded
SUM(a.count_n) sms_count, -- sms count
round(SUM(a.volume_n / 1024 / 1024)) volume_sum -- volume mb rounded
from database a, database2 b
where a.country_s = 'Country'
and a.free_of_charge_flag_n = '1'
and a.d_timestamp between b.date_from and b.date_to
group by a.d_timestamp,
a.service_s,
a.country_s,
a.on_off_net_s,
a.bucket_flag_n) k
group by k.date_n, bucket_flag_n
order by 1
Here is what I gt on the output:
Thank you in advance!

Your group by clause is:
group by k.date_n, bucket_flag_n
If you only want one row per date, then change it to:
group by k.date_n
I would also suggest that you learn modern join syntax ("never use commas in the from clause") and replace decode() with case. However, those are syntactic conventions and don't affect the results from the query.

There's several strange things going on here.
First off, you say:
Here is my code (part of it, as it repeats the same for other
parameters I need to deliver):
Which implies that all aggregate, non-grouped columns have the DECODE(...) containing 'MOC', 'MOC_4', and 'MOC CF_4' - if so, you can actually make those part of the WHERE clause, which may actually speed up your query (Assuming service_s has other codes not used in the query, and relevant indices).
The next thing is, you're using an inclusive upper-bound (<=, found in BETWEEN) with what appears to be a timestamp. This will give you wrong results - often, midnight of the next day is incorrectly included, although there are other possibilities too. When dealing with positive, contiguous-range types, you must use an exclusive upper-bound (<), or suffer the consequences: this is an inherent property of representation of numbers, and has nothing to do with implementation in a computer, or specific applications. (I also find the names somewhat poor, especially as d_timestamp doesn't really tell me anything about what it represents)
Math, and rounding issues:
Assuming duration_n, count_n, and volumn_n (...what does _n stand for? Why the suffix?) are INTEGER types, ROUND(...) is unnecessary, as all math performed will be integer-based, and return non-fractional amounts in the first place. The commutative property of addition can potentially be exploited - you can rewrite SUM(a.duration_n / 60) as SUM(a.duration_n) / 60 (performance gains, if any, would be low) - however if the given column is an INTEGER type you will get different results (which is correct is up to you - actually, given computer limitations it gives different answers no matter what the type is, but would be most pronounced with an integral type).
Given some of the mentioned assumptions (namely, that all aggregate columns have the same DECODE(..), we can simplify the query somewhat:
SELECT A.d_timestamp AS date_n,
SUM(CASE WHEN A.bucket_flag_n = '1' AND A.on_off_net_s = 'On net'
THEN A.duration_n END) / 60 AS test1,
SUM(CASE WHEN A.bucket_flag_n = '0' AND A.on_off_net_s = 'Off net'
THEN A.duration_n END) / 60 AS test2
FROM Database A
JOIN Database2 B
ON A.d_timestamp >= B.date_from
AND A.d_timestamp < B.date_to
WHERE A.country_s = 'Country'
AND A.free_of_charge_flag_n = '1'
AND A.service_s IN ('MOC', 'MOC_4', 'MOC CF_4')
AND ((bucket_flag_n = '1' AND on_off_net_s = 'On net')
OR (bucket_flag_n = '0' AND on_off_net_s = 'Off net'))
GROUP BY A.d_timestamp
ORDER BY A.d_timestamp
... adding the remaining aggregate columns is left as an exercise to the reader.
A couple of notes: If the relationship between bucket_flag_n and on_off_net_s is as indicated in all cases, you can actually remove the conditions from the WHERE clause. If you have other that you're bucketing you may have to anyways. I'm also suspicious of the usefulness of grouping by something that claims to be a timestamp, as these are usually too high resolution for useful groups in aggregation (ie - each value tends to be on its own line). If the value is a date you have a different problem...

Related

How to query column with letters on SQL?

I'm new to this.
I have a column: (chocolate_weight) On the table : (Chocolate) which has g at the end of every number, so 30x , 2x5g,10g etc.
I want to remove the letter at the end and then query it to show any that weigh greater than 35.
So far I have done
Select *
From Chocolate
Where chocolate_weight IN
(SELECT
REPLACE(chocolote_weight,'x','') From Chocolate) > 35
It is coming back with 0 , even though there are many that weigh more than 35.
Any help is appreciated
Thanks
If 'g' is always the suffix then your current query is along the right lines, but you don't need the IN you can do the replace in the where clause:
SELECT *
FROM Chocolate
WHERE CAST(REPLACE(chocolate_weight,'g','') AS DECIMAL(10, 2)) > 35;
N.B. This works in both the tagged DBMS SQL-Server and MySQL
This will fail (although only silently in MySQL) if you have anything that contains units other than grams though, so what I would strongly suggest is that you fix your design if it is not too late, store the weight as an numeric type and lose the 'g' completely if you only ever store in grams. If you use multiple different units then you may wish to standardise this so all are as grams, or alternatively store the two things in separate columns, one as a decimal/int for the numeric value and a separate column for the weight, e.g.
Weight
Unit
10
g
150
g
1000
lb
The issue you will have here though is that you will have start doing conversions in your queries to ensure you get all results. It is easier to do the conversion once when the data is saved and use a standard measure for all records.

Can I divide an amount across multiple parties and round to the 'primary' party in a single SQL query?

I am working on an oracle PL/SQL process which divides a single monetary amount across multiple involved parties in a particular group. Assuming 'pGroupRef' is an input parameter, the current implementation first designates a 'primary' involved party, and then it splits the amount across all the secondaries as follows:
INSERT INTO ActualValue
SELECT
...
pGroupRef AS GroupRef,
ROUND(Am.Amount * P.SplitPercentage / 100, 2) AS Amount,
...
FROM
Amount Am,
Party P
WHERE
Am.GroupRef = pGroupRef
AND P.GroupRef = Am.GroupRef
...
P.PrimaryInd = 0;
Finally, it runs a second procedure to insert whatever amount is left over to the primary party, i.e.:
INSERT INTO ActualValue
SELECT
...
pGroupRef AS GroupRef,
Am.Amount - S.SecondaryAmounts,
FROM
Amount Am,
Party P,
(SELECT SUM(Amount) AS SecondaryAmounts FROM ActualValue WHERE GroupRef = pGroupRef) S
WHERE
Am.GroupRef = pGroupRef
AND P.GroupRef = Am.GroupRef
...
P.PrimaryInd = 1;
However, the full query here is very large and I am making this area more complex by adding subgroups, each of which will have their own primary member, and the possibility of overrides - hence if I continued to use this implementation then it would mean a lot of duplicated SQL.
I suppose I could always calculate the correct amounts into an array before running a single unified insert - but I feel like there has to be an elegant mathematical way to capture this logic in a single SQL Query.
So you can use analytical functions to get what you are looking for. As I didn't know your exact structure, this is only an example:
SELECT s.party_id, s.member_id,
s.portion + DECODE(s.prime, 1, s.total - SUM(s.portion) OVER (PARTITION BY s.party_id),0)
FROM (SELECT p.party_id, p.member_id,
ROUND(a.amt*(p.split/100), 2) AS PORTION,
a.amt AS TOTAL, p.prime
FROM party p
INNER JOIN amount a ON p.party_id = a.party_id) s
So in the query you have a subquery that gathers the required information, then the outer query puts everything together, only applying the remainder to the record marked as prime.
Here is a DBFiddle showing how this works (LINK)
N.B.: Interestingly in the example in the DBFiddle, there is a 0.01 overpayment, so the primary actually pays less.

IIF Function returning incorrect calculated values - SQL Server

I am writing a query to show returns of placing each way bets on horse races
There is an issue with the PlaceProfit result - This should show a return if the horses finishing position is between 1-4 and a loss if the position is => 5
It does show the correct return if the horses finishing position is below 9th, but 10th place and above is being counted as a win.
I include my code below along with the output.
ALTER VIEW EachWayBetting
AS
SELECT a.ID,
RaceDate,
runners,
track.NAME AS Track,
horse.NAME as HorseName,
IndustrySP,
Place AS 'FinishingPosition',
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
IIF(A.Place = '1', 1.0 * (A.IndustrySP-1), '-1') AS WinProfit,
IIF(A.Place <='4', 1.0 * (A.IndustrySP-1)/5, '-1') AS PlaceProfit
FROM dbo.NewRaceResult a
LEFT OUTER JOIN track ON track.ID = A.TrackID
LEFT OUTER JOIN horse ON horse.ID = A.HorseID
WHERE a.Runners > 22
This returns:
As I mention in the comments, the problem is your choice of data type for place, it's varchar. The ordering for a string data type is completely different to that of a numerical data type. Strings are sorted by character from left to right, in the order the characters are ordered in the collation you are using. Numerical data types, however, are ordered from the lowest to highest.
This means that, for a numerical data type, the value 2 has a lower value than 10, however, for a varchar the value '2' has a higher value than '10'. For the varchar that's because the ordering is completed on the first character first. '2' has a higher value than '1' and so '2' has a higher value than '10'.
The solution here is simple, fix your design; store numerical data in a numerical data type (int seems appropriate here). You're also breaking Normal Form rules, as you're storing other data in the column; mainly the reason a horse failed to be classified. Such data isn't a "Place" but information on why the horse didn't place, and so should be in a separate column.
You can therefore fix this by firstly adding a new column, then updating it's value to be the values that aren't numerical and making place only contain numerical data, and then finally altering your place column.
ALTER TABLE dbo.YourTable ADD UnClassifiedReason varchar(5) NULL; --Obviously use an appropriate length.
GO
UPDATE dbo.YourTable
SET Place = TRY_CONVERT(int,Place),
UnClassifiedReason = CASE WHEN TRY_CONVERT(int,Place) IS NULL THEN Place END;
GO
ALTER TABLE dbo.YourTable ALTER COLUMN Place int NULL;
GO
If Place does not allow NULL values, you will need to ALTER the column first to allow them.
In addition to fixing the data as Larnu suggests, you should also fix the query:
SELECT nrr.ID, nrr.RaceDate, nrr.runners,
t.NAME AS Track, t.NAME as HorseName, nrr.IndustrySP,
Place AS FinishingPosition,
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
(CASE WHEN nrr.Place = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN nrr.Place <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
FROM dbo.NewRaceResult nrr LEFT JOIN
track t
ON t.ID = nrr.TrackID LEFT JOIN
horse h
ON h.ID = nrr.HorseID
WHERE nrr.Runners > 22;
The important changes are removing single quotes from numbers and column names. It seems you need to understand the differences among strings, numbers, and identifiers.
Other changes are:
Meaningful table aliases, rather than meaningless letters such as a.
Qualifying all column references, so it is clear where columns are coming from.
Switching from IFF() to CASE. IFF() is bespoke SQL Server; CASE is standard SQL for conditional expressions (both work fine).
Being sure that the types returned by all branches of the conditional expressions are consistent.
Note: This version will work even if you don't change the type of Place. The strings will be converted to numbers in the appropriate places. I don't advocate relying on such silent conversion, so I recommend fixing the data.
If place can have non-numeric values, then you need to convert them:
(CASE WHEN TRY_CONVERT(int, nrr.Place) = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN TRY_CONVERT(int, nrr.Place) <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
But the important point is to fix the data.

Bizarre result in SQL query - PostgreSQL

I discovered this strange behavior with this query:
-- TP4N has stock_class = 'Bond'
select lot.symbol
, round(sum(lot.qty_left), 4) as "Qty"
from ( select symbol
, qty_left
-- , amount
from trade_lot_tbl t01
where t01.symbol not in (select symbol from stock_tbl where stock_class = 'Cash')
and t01.qty_left > 0
and t01.trade_date <= current_date -- only current trades
union
select 'CASH' as symbol
, sum(qty_left) as qty_left
-- , sum(amount) as amount
from trade_lot_tbl t11
where t11.symbol in (select symbol from stock_tbl where stock_class = 'Cash')
and t11.qty_left > 0
and t11.trade_date <= current_date -- only current trades
group by t11.symbol
) lot
group by lot.symbol
order by lot.symbol
;
Run as is, the Qty for TP4N is 1804.42
Run with the two 'amount' lines un-commented, which as far as I can tell should NOT affect the result, yet Qty for TP4N = 1815.36. Only ONE of the symbols (TP4N) has a changed value, all others remain the same.
Run with the entire 'union' statement commented out results in Qty for TP4N = 1827.17
The correct answer, as far as I can tell, is 1827.17.
So, to summarize, I get three different values by modifying parts of the query that, as far as I can tell, should NOT affect the answer.
I'm sure I'm going to kick myself when the puzzle is solved, this smells like a silly mistake.
Likely, what you are seeing is caused by the use of union. This set operator deduplicates the resultsets that are returned by both queries. So adding or removing columns in the unioned sets may affect the final resultset (by default, adding more columns reduces the risk of duplication).
As a rule of thumb: unless you do want deduplication, you should use union all (which is also more efficient, since the database does not need to search for duplicates).

Is this statement quicker than the previous?

I am running through some old code if I changed the logic of this CASE statement:
CASE WHEN ClaimNo.ClaimNo IS NULL THEN '0'
WHEN ClaimNo.ClaimNo = 1 THEN '1'
WHEN ClaimNo.ClaimNo = 2 THEN '2'
WHEN ClaimNo.ClaimNo = 3 THEN '3'
WHEN ClaimNo.ClaimNo = 4 THEN '4'
ELSE '5+'
END AS ClaimNo ,
If I changed it to:
CASE WHEN ClaimNo.ClaimNo >= 5 THEN '5+'
ELSE COALESCE(ClaimNo.ClaimNo,0) END 'ClaimNo' ,
Would the statement technically be quicker? Its obviously a lot shorter as a statement and appears that it wouldn't run as many statements to obtain the same result.
These are not the same! The case expression returns one type and in this case you want the type to be a string (because '5+' is a string). However, mixing strings and integers in the wheres will result in a type conversion error.
Which is faster depends on the distribution of the data. If most of the data consists of 5 or more, then the second method would be faster . . . and work if written as:
(CASE WHEN ClaimNo.ClaimNo >= 5 THEN '5+'
ELSE CAST(COALESCE(ClaimNo.ClaimNo, 0) as VARCHAR(255))
END) as ClaimNo,
In fact, there is only one comparison, so from the perspective of doing the comparisons it will be faster.
The next question is whether the conversion from a number to a string is faster than the multiple comparisons with each value listed separately. Let me be honest: I do not know. And I have been concerned about query performance for a long time.
Why don't I know? Such micro-optimizations generally have basically no impact in the real world. You should use the version of the logic that works; readability and maintainability are also important. Of course performance is an issue, but the bit fiddling techniques that are important in other languages often have no place in SQL which is designed to handle much larger quantities of data, spread across multiple processors and disks.