How to optimize nested innner hive query

How to optimize nested innner hive query - sql

I have a table with following stock data where we have couple of columns like date, ticker, open and close(stock prices).
To query this data, I want to know which stock has given the highest margin on particular date. So if I have 516 different stocks, my query should return 516 rows of ticker, date, open, close and a new column Margin(which will be max(close-open)).
| deep_stocks.date_ | deep_stocks.ticker | deep_stocks.open | deep_stocks.close |
+--------------------+---------------------+-------------------+--------------------+--+
| 20100721 | A | 27.68 | 27.58 |
| 20100722 | A | 27.95 | 28.72 |
| 20100723 | A | 28.56 | 29.3 |
| 20100726 | A | 29.22 | 29.64 |
| 20100727 | A | 29.73 | 28.87 |
| 20100728 | A | 28.79 | 28.78 |
| 20100729 | A | 28.97 | 28.15 |
| 20100730 | A | 27.78 | 27.93 |
| 20100802 | A | 28.35 | 28.82 |
| 20100803 | A | 28.7 | 27.84 |
I have written a query where my approach was:
Step 1 - Get the difference between Close and Open prices (Inner/Sub query)
Step 2 - Get the maximum of margin for every stock (used group by with max function)
Step 3 - Join the results with Main Table and get the data.
I'll put my query in solution or comments can someone please correct it as it is taking more time. Also I would like to know can we have any other alternative approach.

As already told about my approach please find below query:
SELECT ds.ticker, ds.date_, ds.close, ds.open, ds.Margin FROM
(SELECT ticker, date_, close, open, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks) ds
JOIN
(SELECT dsIn.ticker, max(dsIn.Margin) mxMargin FROM
(select ticker, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks ) dsIn group by dsIn.ticker) dsEx
ON ds.ticker=dsEx.ticker AND ds.Margin=dsEx.mxMargin ORDER BY ds.Margin;
Do we have any other alternatives for this query or can it be possible to optimize it.

Related

Cumulative SUM in a query (SQL access)

Using MS access SQL I have a query (actually a UNION made of multiple queries) and need a cumulative sum (actually a statement of account which items are in chronological order).
How do I get a cumulative sum?
Since they are duplicates by date I have to add a new ID, however, SQL in MS access does not seem to have ROW_ID or similar.

So, we need to sort donation data into chronological order across multiple tables with duplicates. First combine all the tables of donators in one query which sets up the simplest syntax. Then to put things in order we need to have an order for the duplicate dates. The dataset has two natural ways to sort duplicate dates including the donator and the amount. For instance, we could decide that after the date bigger donations come first, If the rule is complicated enough we abstract it to a code module and into public function and include it in the query so that we can sort by it:
'Sorted Donations:'
SELECT (BestDonator(q.donator)) as BestDonator, *
FROM tblCountries as q
UNION SELECT (BestDonator(j.donator)) as BestDonator, *
FROM tblIndividuals as j
ORDER BY EvDate Asc, Amount DESC , BestDonator DESC;
Public Function BestDonator(donator As String) As Long
BestDonator = Len(donator) 'longer names are better :)'
End Function
with sorted donations we have settled on an order for the duplicate dates and have combined both individual donations and country donations, so now we can calculate the running sum directly using either dsum or a subquery. There is no need to calculate row id. The tricky part is getting the syntax correct. I ended up abstracting the running sum calculation to a function and omitting BestDonator because I couldn't easily paste together this query in the query designer and I ran out of time to bug fix
Public Function RunningSum(EvDate As Date, Amount As Currency)
RunningSum = DSum("Amount", "Sorted Donations", "(EvDate < #" & [EvDate] & "#) OR (EvDate = #" & [EvDate] & "# AND Amount >= " & [Amount] & ")")
End Function
Carefully note the OR in the Dsum part of the RunningSum calculation. This is the tricky part to summing the right amounts.
'output
-------------------------------------------------------------------------------------
| donator | EvDate | Amount | RunningSum |
-------------------------------------------------------------------------------------
| Reiny | 1/10/2020 | 321 | 321 |
-------------------------------------------------------------------------------------
| Czechia | 3/1/2020 | 7455 | 7776 |
-------------------------------------------------------------------------------------
| Germany | 3/18/2020 | 4222 | 11998 |
-------------------------------------------------------------------------------------
| Jim | 3/18/2020 | 222 | 12220 |
-------------------------------------------------------------------------------------
| Australien | 4/15/2020 | 13423 | 25643 |
-------------------------------------------------------------------------------------
| Mike | 5/31/2020 | 345 | 25988 |
-------------------------------------------------------------------------------------
| Portugal | 6/6/2020 | 8755 | 34743 |
-------------------------------------------------------------------------------------
| Slovakia | 8/31/2020 | 3455 | 38198 |
-------------------------------------------------------------------------------------
| Steve | 9/6/2020 | 875 | 39073 |
-------------------------------------------------------------------------------------
| Japan | 10/10/2020 | 5234 | 44307 |
-------------------------------------------------------------------------------------
| John | 10/11/2020 | 465 | 44772 |
-------------------------------------------------------------------------------------
| Slowenia | 11/11/2020 | 4665 | 49437 |
-------------------------------------------------------------------------------------
| Spain | 11/22/2020 | 7677 | 57114 |
-------------------------------------------------------------------------------------
| Austria | 11/22/2020 | 3221 | 60335 |
-------------------------------------------------------------------------------------
| Bill | 11/22/2020 | 767 | 61102 |
-------------------------------------------------------------------------------------
| Bert | 12/1/2020 | 755 | 61857 |
-------------------------------------------------------------------------------------
| Hungaria | 12/24/2020 | 9996 | 71853 |
-------------------------------------------------------------------------------------

SQL GROUPING with conditional

I am sure this is easy to accomplish but after spending the whole day trying I had to give up and ask for your help.
I have a table that looks like this
| PatientID | VisitId | DateOfVisit | FollowUp(Y/N) | FollowUpWks |
----------------------------------------------------------------------
| 123456789 | 2222222 | 20180802 | Y | 2 |
| 123456789 | 3333333 | 20180902 | Y | 4 |
| 234453656 | 4443232 | 20180506 | N | NULL |
| 455344243 | 2446364 | 20180618 | Y | 12 |
----------------------------------------------------------------------
Basically I have a list of PatientIDs, each patient can have multiple visits (VisitID and DateOfVisit). FollowUp(Y/N) specifies whether the patients has to be seen again and in how many weeks (FollowUpWks).
Now, what I need is a query that extracts PatientsID, DateOfVisit (the most recent one and only if FollowUp is YES) and the FollowUpWks field.
Final result should look like this
| PatientID | VisitId | DateOfVisit | FollowUp(Y/N) | FollowUpWks |
----------------------------------------------------------------------
| 123456789 | 3333333 | 20180902 | Y | 4 |
| 455344243 | 2446364 | 20180618 | Y | 12 |
----------------------------------------------------------------------
The closest I could get was with this code
SELECT PatientID,
Max(DateOfVisit) AS LastVisit
FROM mytable
WHERE FollowUp = True
GROUP BY PatientID;
The problem is that when I try adding the FollowUpWks field to the SELECT I get the following error: "The query does not include the specified expression as part of an aggregate function." However, if I add FollowUpWks to the GROUP BY statement than I get all visits, not just the most recent ones.

You need to match back to the most recent visit. One method uses a correlated subquery:
SELECT t.*
FROM mytable as t
WHERE t.FollowUp = True AND
t.DateOfVisit = (SELECT MAX(t2.DateOfVisit)
FROM mytable as t2
WHERE t2.PatientID = t.PatientID
);

How to define a sub query inside SQL statement to be used several times as a table alias?

I have an MS Access database for rainfall data of several climate stations.
For each day of each station, I want to calculate the rainfall in the previous day (if recorded), and the sum of the rainfall at the previous 3 and 7 days.
Due to the huge amount of data and the limitations of Access, I made a query that takes station by station; Then I applied an auxillary query to find dates first, For each station, The following SQL statement is applied (and named RainFallStudy query):
SELECT
[173].ID, [173].AirportCode, [173].RFmm,
DateSerial([rYear], [rMonth], [rDay]) AS DateSer,
[DateSer]-1 AS DM1,
[DateSer]-2 AS DM2,
[DateSer]-3 AS DM3,
[DateSer]-4 AS DM4,
[DateSer]-5 AS DM5,
[DateSer]-6 AS DM6,
[DateSer]-7 AS DM7
FROM
[173]
WHERE
((([173].AirportCode) = 786660));
I used DM1, DM2, etc as the date serial of the day-1, day-2, etc.
Then I used another query that uses RainFallStudy query with left joints as shown in the figure:
The SQL statement is
SELECT
RainFallStudy.ID, RainFallStudy.AirportCode,
RainFallStudy.RFmm AS RF0, RainFallStudy.DateSer,
RainFallStudy.DM1, RainFallStudy_1.RFmm AS RF1,
RainFallStudy_2.RFmm AS RF2, RainFallStudy_3.RFmm AS RF3,
RainFallStudy_4.RFmm AS RF4, RainFallStudy_5.RFmm AS RF5,
RainFallStudy_6.RFmm AS RF6, RainFallStudy_7.RFmm AS RF7,
Nz([rf1], 0) + Nz([rf2], 0) + Nz([rf3], 0) + Nz([rf4], 0) + Nz([rf5], 0) + Nz([rf6], 0) + Nz([rf7], 0) AS RF_W
FROM
((((((RainFallStudy
LEFT JOIN
RainFallStudy AS RainFallStudy_1 ON RainFallStudy.DM1 = RainFallStudy_1.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_2 ON RainFallStudy.DM2 = RainFallStudy_2.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_3 ON RainFallStudy.DM3 = RainFallStudy_3.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_4 ON RainFallStudy.DM4 = RainFallStudy_4.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_5 ON RainFallStudy.DM5 = RainFallStudy_5.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_6 ON RainFallStudy.DM6 = RainFallStudy_6.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_7 ON RainFallStudy.DM7 = RainFallStudy_7.RFmm;
Now I suffer from the slow performance of this query, as the records of each station range from 1,000 to 750,000 records! Is there any better way to find what I need in a faster SQL statement? The second question, can I make a standalone SQL statement for that (one query without the auxiliary query) as I will use it in python, which requires one SQL statement (as Iof my knowledge).
Thanks in advance.
Update
As requested by #Andre, Here are some sample data of table [173] in HTML
<table><tbody><tr><th>ID</th><th>AirportCode</th><th>rYear</th><th>rMonth</th><th>rDay</th><th>RFmm</th></tr><tr><td>11216</td><td>409040</td><td>2012</td><td>1</td><td>23</td><td>0.51</td></tr><tr><td>11217</td><td>409040</td><td>2012</td><td>1</td><td>24</td><td>0</td></tr><tr><td>11218</td><td>409040</td><td>2012</td><td>1</td><td>25</td><td>0</td></tr><tr><td>11219</td><td>409040</td><td>2012</td><td>1</td><td>26</td><td>2.03</td></tr><tr><td>11220</td><td>409040</td><td>2012</td><td>1</td><td>27</td><td>0</td></tr><tr><td>11221</td><td>409040</td><td>2012</td><td>1</td><td>28</td><td>0</td></tr><tr><td>11222</td><td>409040</td><td>2012</td><td>1</td><td>29</td><td>0</td></tr><tr><td>11223</td><td>409040</td><td>2012</td><td>1</td><td>30</td><td>0</td></tr><tr><td>11224</td><td>409040</td><td>2012</td><td>1</td><td>31</td><td>0.25</td></tr><tr><td>11225</td><td>409040</td><td>2012</td><td>2</td><td>1</td><td>0</td></tr><tr><td>11226</td><td>409040</td><td>2012</td><td>2</td><td>2</td><td>0</td></tr><tr><td>11227</td><td>409040</td><td>2012</td><td>2</td><td>3</td><td>4.32</td></tr><tr><td>11228</td><td>409040</td><td>2012</td><td>2</td><td>4</td><td>13.21</td></tr><tr><td>11229</td><td>409040</td><td>2012</td><td>2</td><td>5</td><td>1.02</td></tr><tr><td>11230</td><td>409040</td><td>2012</td><td>2</td><td>6</td><td>0</td></tr><tr><td>11231</td><td>409040</td><td>2012</td><td>2</td><td>7</td><td>0</td></tr><tr><td>11232</td><td>409040</td><td>2012</td><td>2</td><td>8</td><td>0</td></tr><tr><td>11233</td><td>409040</td><td>2012</td><td>2</td><td>9</td><td>0</td></tr><tr><td>11234</td><td>409040</td><td>2012</td><td>2</td><td>10</td><td>5.08</td></tr><tr><td>11235</td><td>409040</td><td>2012</td><td>2</td><td>11</td><td>0</td></tr><tr><td>11236</td><td>409040</td><td>2012</td><td>2</td><td>12</td><td>12.95</td></tr><tr><td>11237</td><td>409040</td><td>2012</td><td>2</td><td>13</td><td>5.59</td></tr><tr><td>11238</td><td>409040</td><td>2012</td><td>2</td><td>14</td><td>0.25</td></tr><tr><td>11239</td><td>409040</td><td>2012</td><td>2</td><td>15</td><td>0</td></tr><tr><td>11240</td><td>409040</td><td>2012</td><td>2</td><td>16</td><td>0</td></tr><tr><td>11241</td><td>409040</td><td>2012</td><td>2</td><td>17</td><td>0</td></tr><tr><td>11242</td><td>409040</td><td>2012</td><td>2</td><td>18</td><td>0</td></tr><tr><td>11243</td><td>409040</td><td>2012</td><td>2</td><td>19</td><td>0</td></tr><tr><td>11244</td><td>409040</td><td>2012</td><td>2</td><td>20</td><td>14.48</td></tr><tr><td>11245</td><td>409040</td><td>2012</td><td>2</td><td>21</td><td>9.65</td></tr><tr><td>11246</td><td>409040</td><td>2012</td><td>2</td><td>22</td><td>3.05</td></tr><tr><td>11247</td><td>409040</td><td>2012</td><td>2</td><td>23</td><td>0</td></tr><tr><td>11248</td><td>409040</td><td>2012</td><td>2</td><td>24</td><td>0</td></tr><tr><td>11249</td><td>409040</td><td>2012</td><td>2</td><td>25</td><td>0</td></tr><tr><td>11250</td><td>409040</td><td>2012</td><td>2</td><td>26</td><td>0</td></tr><tr><td>11251</td><td>409040</td><td>2012</td><td>2</td><td>27</td><td>0</td></tr><tr><td>11252</td><td>409040</td><td>2012</td><td>2</td><td>28</td><td>7.37</td></tr><tr><td>11253</td><td>409040</td><td>2012</td><td>2</td><td>29</td><td>0</td></tr></tbody></table>
And here is sample output (HTML)
<table><tbody><tr><th>ID</th><th>AirportCode</th><th>DateSer</th><th>ThisDay</th><th>Yesterday</th><th>Prev3days</th><th>PrevWeek</th></tr><tr><td>11216</td><td>409040</td><td>23-01-2012</td><td>0.51</td><td>0</td><td>0</td><td>0</td></tr><tr><td>11217</td><td>409040</td><td>24-01-2012</td><td>0</td><td>0.51</td><td>0.51</td><td>0.51</td></tr><tr><td>11218</td><td>409040</td><td>25-01-2012</td><td>0</td><td>0</td><td>0.51</td><td>0.51</td></tr><tr><td>11219</td><td>409040</td><td>26-01-2012</td><td>2.03</td><td>0</td><td>0.51</td><td>0.51</td></tr><tr><td>11220</td><td>409040</td><td>27-01-2012</td><td>0</td><td>2.03</td><td>2.03</td><td>2.54</td></tr><tr><td>11221</td><td>409040</td><td>28-01-2012</td><td>0</td><td>0</td><td>2.03</td><td>2.54</td></tr><tr><td>11222</td><td>409040</td><td>29-01-2012</td><td>0</td><td>0</td><td>2.03</td><td>2.54</td></tr><tr><td>11223</td><td>409040</td><td>30-01-2012</td><td>0</td><td>0</td><td>0</td><td>2.54</td></tr><tr><td>11224</td><td>409040</td><td>31-01-2012</td><td>0.25</td><td>0</td><td>0</td><td>2.03</td></tr><tr><td>11225</td><td>409040</td><td>01-02-2012</td><td>0</td><td>0.25</td><td>0.25</td><td>2.28</td></tr><tr><td>11226</td><td>409040</td><td>02-02-2012</td><td>0</td><td>0</td><td>0.25</td><td>2.28</td></tr><tr><td>11227</td><td>409040</td><td>03-02-2012</td><td>4.32</td><td>0</td><td>0.25</td><td>0.25</td></tr><tr><td>11228</td><td>409040</td><td>04-02-2012</td><td>13.21</td><td>4.32</td><td>4.32</td><td>4.57</td></tr><tr><td>11229</td><td>409040</td><td>05-02-2012</td><td>1.02</td><td>13.21</td><td>17.53</td><td>17.78</td></tr><tr><td>11230</td><td>409040</td><td>06-02-2012</td><td>0</td><td>1.02</td><td>18.55</td><td>18.8</td></tr><tr><td>11231</td><td>409040</td><td>07-02-2012</td><td>0</td><td>0</td><td>14.23</td><td>18.8</td></tr><tr><td>11232</td><td>409040</td><td>08-02-2012</td><td>0</td><td>0</td><td>1.02</td><td>18.55</td></tr><tr><td>11233</td><td>409040</td><td>09-02-2012</td><td>0</td><td>0</td><td>0</td><td>18.55</td></tr><tr><td>11234</td><td>409040</td><td>10-02-2012</td><td>5.08</td><td>0</td><td>0</td><td>18.55</td></tr><tr><td>11235</td><td>409040</td><td>11-02-2012</td><td>0</td><td>5.08</td><td>5.08</td><td>19.31</td></tr><tr><td>11236</td><td>409040</td><td>12-02-2012</td><td>12.95</td><td>0</td><td>5.08</td><td>6.1</td></tr><tr><td>11237</td><td>409040</td><td>13-02-2012</td><td>5.59</td><td>12.95</td><td>18.03</td><td>18.03</td></tr><tr><td>11238</td><td>409040</td><td>14-02-2012</td><td>0.25</td><td>5.59</td><td>18.54</td><td>23.62</td></tr><tr><td>11239</td><td>409040</td><td>15-02-2012</td><td>0</td><td>0.25</td><td>18.79</td><td>23.87</td></tr><tr><td>11240</td><td>409040</td><td>16-02-2012</td><td>0</td><td>0</td><td>5.84</td><td>23.87</td></tr><tr><td>11241</td><td>409040</td><td>17-02-2012</td><td>0</td><td>0</td><td>0.25</td><td>23.87</td></tr><tr><td>11242</td><td>409040</td><td>18-02-2012</td><td>0</td><td>0</td><td>0</td><td>18.79</td></tr><tr><td>11243</td><td>409040</td><td>19-02-2012</td><td>0</td><td>0</td><td>0</td><td>18.79</td></tr><tr><td>11244</td><td>409040</td><td>20-02-2012</td><td>14.48</td><td>0</td><td>0</td><td>5.84</td></tr><tr><td>11245</td><td>409040</td><td>21-02-2012</td><td>9.65</td><td>14.48</td><td>14.48</td><td>14.73</td></tr><tr><td>11246</td><td>409040</td><td>22-02-2012</td><td>3.05</td><td>9.65</td><td>24.13</td><td>24.13</td></tr><tr><td>11247</td><td>409040</td><td>23-02-2012</td><td>0</td><td>3.05</td><td>27.18</td><td>27.18</td></tr><tr><td>11248</td><td>409040</td><td>24-02-2012</td><td>0</td><td>0</td><td>12.7</td><td>27.18</td></tr><tr><td>11249</td><td>409040</td><td>25-02-2012</td><td>0</td><td>0</td><td>3.05</td><td>27.18</td></tr><tr><td>11250</td><td>409040</td><td>26-02-2012</td><td>0</td><td>0</td><td>0</td><td>27.18</td></tr><tr><td>11251</td><td>409040</td><td>27-02-2012</td><td>0</td><td>0</td><td>0</td><td>27.18</td></tr><tr><td>11252</td><td>409040</td><td>28-02-2012</td><td>7.37</td><td>0</td><td>0</td><td>12.7</td></tr><tr><td>11253</td><td>409040</td><td>29-02-2012</td><td>0</td><td>7.37</td><td>7.37</td><td>10.42</td></tr></tbody></table>

I created an additional column rDate (DateTime) and filled it with this query:
UPDATE Rainfall SET Rainfall.rDate = DateSerial([rYear],[rMonth],[rDay]);
Then your desired result can be achieved with several subqueries, using SUM() for the last two columns:
SELECT r.ID, r.AirportCode, r.rDate, r.RFmm,
(SELECT RFmm FROM Rainfall r1 WHERE r1.AirportCode = r.AirportCode AND r1.rDate = r.rDate-1) AS Yesterday,
(SELECT SUM(RFmm) FROM Rainfall r3 WHERE r3.AirportCode = r.AirportCode AND r3.rDate BETWEEN r.rDate-3 AND r.rDate-1) AS Prev3days,
(SELECT SUM(RFmm) FROM Rainfall r7 WHERE r7.AirportCode = r.AirportCode AND r7.rDate BETWEEN r.rDate-7 AND r.rDate-1) AS PrevWeek
FROM Rainfall r
Make sure AirportCode and rDate are indexed for larger numbers of records.
Result:
+-------+-------------+------------+-------+-----------+-----------+----------+
| ID | AirportCode | rDate | RFmm | Yesterday | Prev3days | PrevWeek |
+-------+-------------+------------+-------+-----------+-----------+----------+
| 11216 | 409040 | 23.01.2012 | 0,51 | | | |
| 11217 | 409040 | 24.01.2012 | 0 | 0,51 | 0,51 | 0,51 |
| 11218 | 409040 | 25.01.2012 | 0 | 0 | 0,51 | 0,51 |
| 11219 | 409040 | 26.01.2012 | 2,03 | 0 | 0,51 | 0,51 |
| 11220 | 409040 | 27.01.2012 | 0 | 2,03 | 2,03 | 2,54 |
| 11221 | 409040 | 28.01.2012 | 0 | 0 | 2,03 | 2,54 |
| 11222 | 409040 | 29.01.2012 | 0 | 0 | 2,03 | 2,54 |
| 11223 | 409040 | 30.01.2012 | 0 | 0 | 0 | 2,54 |
| 11224 | 409040 | 31.01.2012 | 0,25 | 0 | 0 | 2,03 |
| 11225 | 409040 | 01.02.2012 | 0 | 0,25 | 0,25 | 2,28 |
| 11226 | 409040 | 02.02.2012 | 0 | 0 | 0,25 | 2,28 |
| 11227 | 409040 | 03.02.2012 | 4,32 | 0 | 0,25 | 0,25 |
| 11228 | 409040 | 04.02.2012 | 13,21 | 4,32 | 4,32 | 4,57 |
| 11229 | 409040 | 05.02.2012 | 1,02 | 13,21 | 17,53 | 17,78 |
+-------+-------------+------------+-------+-----------+-----------+----------+
Use Nz() to avoid NULL values in the first row.

It appears that you store the day in separate fields (rYear, rMonth, rDay). So, in order to get the date you use the DateSerial function. This means that in order to use the date for a join or where clause, Access must calculate the date for the entire table. You need to store the date in a separate field and index it to avoid the calculation.

T-SQL aggregate over contiguous dates more efficiently

I have a need to aggregate a sum over contiguous dates. I've seen solutions to similar problems that will return the start and end dates, but don't have a need to aggregate the data between those ranges. It's further complicated by the extremely large amounts of data involved, to the point that a simple self join takes an impractical amount of time (especially since the start and end date fields are unindexed)
I have a solution involving cursors, but I've generally been led to believe that cursors can always be more efficiently replaced with joins that will execute faster, but so far every solution I've tried with a query anywhere close to giving me the data I need takes an hour at least, and my cursor solutions takes about 10 seconds. So I'm asking if there is a more efficient answer.
And the data includes both buy and sell transactions and each row of aggregated contiguous dates returned also needs to list the transaction ID of the last sell that occurred before the first buy of the contiguous set of buy transactions.
An example of the data:
+------------------+------------+------------+------------------+--------------------+
| TRANSACTION_TYPE | TRANS_ID | StartDate | EndDate | Amount |
+------------------+------------+------------+------------------+--------------------+
| sell | 100 | 2/16/16 | 2/18/18 | $100.00 |
| sell | 101 | 3/1/16 | 6/6/16 | $121.00 |
| buy | 102 | 6/10/16 | 6/12/16 | $22.00 |
| buy | 103 | 6/12/16 | 6/14/16 | $0.35 |
| buy | 104 | 6/29/16 | 7/2/16 | $5.00 |
| sell | 105 | 7/3/16 | 7/6/16 | $115.00 |
| buy | 106 | 7/8/16 | 7/9/16 | $200.00 |
| sell | 107 | 7/10/16 | 7/13/16 | $4.35 |
| sell | 108 | 7/17/16 | 7/20/16 | $0.50 |
| buy | 109 | 7/25/16 | 7/29/16 | $33.00 |
| buy | 110 | 7/29/16 | 8/1/16 | $75.00 |
| buy | 111 | 8/1/16 | 8/3/16 | $0.33 |
| sell | 112 | 9/1/16 | 9/2/16 | $99.00 |
+------------------+------------+------------+------------------+--------------------+
Should have results like the following:
+------------+------------+------------------+--------------------+
| Last_Sell | StartDate | EndDate | Amount |
+------------+------------+------------------+--------------------+
| 101 | 6/10/16 | 6/14/18 | $22.35 |
| 101 | 6/29/16 | 7/2/16 | $5.00 |
| 105 | 7/8/16 | 7/9/16 | $200.00 |
| 108 | 7/25/16 | 8/3/16 | $108.33 |
+------------------+------------+------------+--------------------+
Right now I use queries to split the data into buys and sells, and just walk through the buy data, aggregating as I go, inserting into the return table every time I find a break in the dates, and I step through the sell table until I reach the last sell before the start date of the set of buys.
Walking linearly through cursors gives me a computational time of n. Even though cursors are orders of magnitude less efficient, it's still calculating in n, while I suspect the joins I would need to do would give me at least n log n. With the ridiculous amount of data I'm working with, the inefficiencies of cursors get swamped if it goes beyond linear time.

If I assume that the transaction id increases along with the dates, then you can get the last sales date using a cumulative max. Then, the adjacency can be found by using similar logic, but with a lag first:
with cte as (
select t.*,
sum(case when transaction_type = 'sell' then trans_id end) over
(order by trans_id) as last_sell,
lag(enddate) over (partition by transaction_type order by trans_id) as prev_enddate
from t
)
select last_sell, min(startdate) as startdate, max(enddate) as enddate,
sum(amount) as amount
from (select cte.*,
sum(case when startdate = dateadd(day, 1, prev_enddate) then 0 else 1 end) over (partition by last_sell order by trans_id) as grp
from cte
where transaction_type = 'buy'
) x
group by last_sell, grp;

Grand Total value doesn't match with Top N Filtered values in SSRS

I have a report in reporting services. In this report, I am displaying the Top N values. But my Grand Total is displaying the sum of all the values.
Right now I am getting something like this.Here N = 2
+-------+------+-------------+
| Area |ID | Count |
+-------+------+-------------+
| - A | | 4 |
| | a1 | 1 |
| | b1 | 1 |
| | c1 | 1 |
| | d1 | 1 |
| | | |
| - B | | 3 |
| | a2 | 1 |
| | b2 | 1 |
| | c2 | 1 |
| | | |
|Grand | | 10 |
|Total | | |
+-------+------+-------------+
The correct Grand Total should be 7 instead of 10. A and B are toggle items(You can expand and contract)
How can I display the correct Grand Total using Top N filter?
I also want to use the filter in the report and not in the SQL query.

You should use the filter on the Dataset. Filtering the report object itself only turns off the items (rows, for example) visibility. The item / row itself will still be part of the group and will be used for calculations.

I found a way to solve my question. As Ido said I worked on the dataset. I am using Analysis Cube. So in this cube I created a Named Set Calculation.
In this set I used the TopCount() function. It filters out the TOP N values where N can be integer according to your choice.
So the final Named Set in this case is :-
TopCount([Dim Area].[Area].[Area], 2, ([Measures].[Count]))
This will give you Grand total of Top N filtered values.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to optimize nested innner hive query - sql

Related

Cumulative SUM in a query (SQL access)

SQL GROUPING with conditional

How to define a sub query inside SQL statement to be used several times as a table alias?

T-SQL aggregate over contiguous dates more efficiently

Grand Total value doesn't match with Top N Filtered values in SSRS

Categories

Resources