I have a table with the following table.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
So this shows stock against some of the hours in which there is a change in the quantity.
Now my requirement is to create a view on this table which will virtually show the data (if stock is not htere for a particular hour). So the data that should be shown is
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
----------------------------------
That means even if the data is not there for some particular hour then we should show the last hour's stock which is having stock. And i have another table with all the available hours from 1-23 in a column.
I have tried partition over by method as given below. But i think i am missing some thing around this to get my requirement done.
SELECT
HOUR_NUMBER,
CASE WHEN TOTAL_STOCK IS NULL
THEN SUM(TOTAL_STOCK)
OVER (
PARTITION BY LOCATION
ORDER BY CURRENT_HOUR ROWS 1 PRECEDING
)
ELSE
TOTAL_STOCK
END AS FULL_STOCK
FROM
(
SELECT HOUR_NUMBER AS HOUR_NUMBER
FROM HOURS_TABLE -- REFEERENCE TABLE WITH HOURS FROM 1-23
GROUP BY 1
) HOURS_REF
LEFT OUTER JOIN
(
SEL CURRENT_HOUR AS CURRENT_HOUR
, STOCK AS TOTAL_STOCK
,LOCATION AS LOCATION
FROM STOCK_TABLE
WHERE STOCK<>0
) STOCKS
ON HOURS_REF.HOUR_NUMBER = STOCKS.CURRENT_HOUR
This query is giving all the hours with stock as null for the hours without data.
We are looking at ANSI sql solution so that it can be used on databases like Teradata.
I am thinking that i am using partition over by wrongly or is there any other way. We tried with CASE WHEN but that needs some kind of looping to check back for an hour with some stock.
I've run into similar problems before. It's often simpler to make sure that the data you need somehow gets into the database in the first place. You might be able to automate it with a stored procedure that runs periodically.
Having said that, did you consider trying COALESCE() with a scalar subquery? (Or whatever similar function your dbms supports.) I'd try it myself and post the SQL, but I'm leaving for work in two minutes.
Haven't tried, but along the lines of what Mike said:
SELECT a.hour
, COALESCE( a.stock
, ( select b.stock
from tbl.b
where b.hour=a.hour-1 )
) "stock"
FROM tbl a
Note: this will impact performance greatly.
Thanks for your responses. I have tried out RECURSIVE VIEW for the above requirement and is giving correct results (I am fearing about the CPU usage for big tables as it is recursive). So here is stock table
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
Then we will have a view on this table which will give all 12 hours data using Left outer join.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 NULL
8 2000 NULL
9 2000 24
----------------------------------
Then we will have a recursive view which joins the table recursively with the same view to get the Stock of each hour moved one hour up and appended with level of data coming incremented.
REPLACE RECURSIVE VIEW HOURLY_STOCK_VIEW
(HOUR_NUMBER,LOCATION, STOCK, LVL)
AS
(
SELECT
HOUR_NUMBER,
LOCATION,
STOCK,
1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN
UNION ALL
SELECT
STK.HOUR_NUMBER,
THE_VIEW.LOCATION,
THE_VIEW.STOCK,
LVL+1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN STK
JOIN
HOURLY_STOCK_VIEW THE_VIEW
ON THE_VIEW.HOUR_NUMBER = STK.HOUR_NUMBER -1
WHERE LVL <=12
)
;
You can observe that first we select from the Left outer joined view then we union it with the left outer join view joined on the same view which we are creating and giving it its level at which data is coming.
Then we select the data from this view with the minimum level.
SEL * FROM HOURLY_STOCK_VIEW
WHERE
(
HOUR_NUMBER,
LVL
)
IN
(
SEL
HOUR_NUMBER,
MIN(LVL)
FROM HOURLY_STOCK_VIEW
WHERE STOCK IS NOT NULL
GROUP BY 1
)
;
This is working fine and giving the result as
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
10 2000 24
11 2000 24
12 2000 24
----------------------------------
I know this is going to take huge CPU for large tables to get the recursion work ( we are limiting the recursion to only 12 levels as 12 hours data is needed to stop it go into infinite loop). But I thought some body can use this for some kind of Hierarchy building. I will look for some more responses from you guys on any other approaches available. Thanks. You can have a look at Recursive views in the below link for teradata.
http://forums.teradata.com/forum/database/recursion-in-a-stored-procedure
The most common uses of view is, the removal of complexity.
For example:
CREATE VIEW FEESTUDENT
AS
SELECT S.NAME,F.AMOUNT FROM STUDENT AS S
INNER JOIN FEEPAID AS F ON S.TKNO=F.TKNO
Now do a SELECT:
SELECT * FROM FEESTUDENT
Related
I have the following table, let's call it Names:
Name Id Date
Dirk 1 27-01-2015
Jan 2 31-01-2015
Thomas 3 21-02-2015
Next I have the another table called Consumption:
Id Date Consumption
1 26-01-2015 30
1 01-01-2015 20
2 01-01-2015 10
2 05-05-2015 20
Now the problem is, that I think that doing this using SQL is the fastest, since the table contains about 1.5 million rows.
So the problem is as follows, I would like to match each Id from the Names table with the Consumption table provided that the difference between the dates are the lowest, so we have: Dirk consumes on 27-01-2015 about 30. In case there are two dates that have the same "difference", I would like to calculate the average consumption on those two dates.
While I know how to join, I do not know how to code the difference part.
Thanks.
DBMS is Microsoft SQL Server 2012.
I believe that my question differs from the one mentioned in the comments, because it is much more complicated since it involves comparison of dates between two tables rather than having one date and comparing it with the rest of the dates in the table.
This is how you could it in SQL Server:
SELECT Id, Name, AVG(Consumption)
FROM (
SELECT n.Id, Name, Consumption,
RANK() OVER (PARTITION BY n.Id
ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date]))) AS rnk
FROM Names AS n
INNER JOIN Consumption AS c ON n.Id = c.Id ) t
WHERE t.rnk = 1
GROUP BY Id, Name
Using RANK with PARTITION BY n.Id and ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date])) you can locate all matching records per Id: all records with the smallest difference in days are going to have rnk = 1.
Then, using AVG in the outer query, you are calculating the average value of Consumption between all matching records.
SQL Fiddle Demo
I am having performance issue on a set of SQLs to generate current month's statement in realtime.
Customers will purchase some goods using points from an online system, and the statement containing "open_balance", "point_earned", "point_used", "current_balance" should be generated.
The following shows the shortened schema :
//~200k records
customer: {account_id:string, create_date:timestamp, bill_day:int} //totally 14 fields
//~250k records per month, kept for 6 month
history_point: {point_id:long, account_id:string, point_date:timestamp, point:int} //totally 9 fields
//each customer have maximum of 12 past statements kept
history_statement: {account_id:string, open_date:date, close_date:date, open_balance:int, point_earned:int, point_used:int, close_balance:int} //totally 9 fields
On every bill day, the view should automatically create a new month statement.
i.e. If bill_day is 15, then transaction done on or after 16 Dec 2013 00:00:00 should belongs to new bill cycle of 16 Dec 2013 00:00:00 - 15 Jan 2014 23:59:59
I tried the approach described below,
Calculate the last close day for each account (in materialized view, so that it update only after there is new customer or past month statement inserted into history_statement)
Generate a record for each customer each month that I need to calculate (Also in materialized view)
Sieve the point record for only point records within the date that I will calculate (This takes ~0.1s only)
Join 2 with 3 to obtain point earned and used for each customer each month
Join 4 with 4 on date less than open date to sum for open and close balance
6a. Select from 5 where open date is less than 1 month old as current balance (these are not closed yet, and the point reflect the point each customer own now)
6b. All the statements are obtained by union of history_statement and 5
On a development server, the average response time (200K customer, 1.5M transactions in current month) is ~3s which is pretty slow for web application, and on the testing server, where resources are likely to be shared, the average response time (200K customer, ~200k transaction each month for 8 months) is 10-15s.
Does anyone have some idea on writing a query with better approach or to speed up the query?
Related SQL:
2: IV_STCLOSE_2_1_T(Materialized view)
3: IV_STCLOSE_2_2_T (~0.15s)
SELECT ACCOUNT_ID, POINT_DATE, POINT
FROM history_point
WHERE point_date >= (
SELECT MIN(open_date)
FROM IV_STCLOSE_2_1_t
)
4: IV_STCLOSE_3_T (~1.5s)
SELECT p0.account_id, p0.open_date, p0.close_date, COALESCE(SUM(DECODE(SIGN(p.point),-1,p.point)),0) AS point_used, COALESCE(SUM(DECODE(SIGN(p.point),1,p.point)),0) AS point_earned
FROM iv_stclose_2_1_t p0
LEFT JOIN iv_stclose_2_2_t p
ON p.account_id = p0.account_id
AND p.point_date >= p0.open_date
AND p.point_date < p0.close_date + INTERVAL '1' DAY
GROUP BY p0.account_id, p0.open_date, p0.close_date
5: IV_STCLOSE_4_T (~3s)
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
SELECT t1.account_id AS STAT_ACCOUNT_ID, t1.open_date, t1.close_date, t1.open_balance, t1.point_earned AS point_earn, t1.point_used , t1.open_balance + t1.point_earned + t1.point_used AS close_balance
FROM (
SELECT v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used, COALESCE(sum(v2.point_used + v2.point_earned),0) AS OPEN_BALANCE
FROM t v1
LEFT JOIN t v2
ON v1.account_id = v2.account_id
AND v1.OPEN_DATE > v2.OPEN_DATE
GROUP BY v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used
) t1
It turns out to be that in IV_STCLOSE_4_T
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
is problematic.
At first thought WITH t AS would be faster as IV_STCLOSE_3_T is only evaluated once, but it apparently forced materializing the whole IV_STCLOSE_3_T, generating over 200k records despite I only need at most 12 of them from a single customer at any time.
With the above statement removed and appropriately indexing account_id, the cost reduced from over 500k to less than 500.
I would like to calculate moving summary:
Total amount:100
first receipt: 20
second receipt: 10
the first row in calculation column is a difference between total amount and the first receipt: 100-20=80
the second row in calculation column is a difference between the first calculated_row and the first receip: 80-10=70
The presentation is supposed to present receipt_amount, balance:
receipt_amount | balance
20 | 80
10 | 70
I'll be glad to use your help
Thanks :-)
You didn't really give us much information about your tables and how they are structured.
I'm assuming that there is an orders table that contains the total_amount and a receipt_table that contains each receipt (as a positive value):
As you also didn't specify your DBMS, this is ANSI SQL:
select sum(amount) over (order by receipt_nr) as running_sum
from (
select total_amount as amount
from orders
where order_no = 1
union all
select -1 * receipt_amount
from the_receipt_table
where order_no =
) t
First of all- thanks for your response.
I work with Cache DB which can be used both SQL and ORACLE syntax.
Basically, the data is locaed in two different tables, but I have them in one join query.
Couple of rows with different receipt amounts and each row (receipt) has the same total amount.
Foe example:
Receipt_no Receipt_amount Total_amount Balance
1 20 100 80
1 10 100 70
1 30 100 40
2 20 50 30
2 10 50 20
So, the calculation is supposed to be in a way that in the first receipt the difference calculation is made from the total_amount and all other receipts (in the same receipt_no) are being reduced from the balance
Thanks!
I have a database of movies where one field is the year which it was released.
I want to create a query which will loop through each decade and will calculate the sum of a particular field for that decade. I have no idea how I can get a loop for every decade. Can anyone help?
If you want the decades where you don't have any movies as well as those with movies, then you can use generate_series to build you list of decades and the do a left outer join to your table; generate_series is the standard way to build numeric and time lists on the fly in PostgreSQL. Something like this should get you started:
select decade.d, count(t.year)
from generate_series(1900, 2100, 10) as decade(d)
left outer join your_table t on decade.d = floor(t.year / 10) * 10
group by decade.d
order by decade.d
That will produce output like this:
d | count
------+-------
1900 | 1
1910 | 0
1920 | 1
1930 | 3
1940 | 0
1950 | 0
1960 | 1
1970 | 0
1980 | 3
-- ...
2100 | 0
You could adjust the first and last values for the generate_series call to match your data if desired.
The floor(t.year / 10) * 10 bit gives you decade for a given year; it will convert 1942 to 1940, 2000 to 2000, etc.
You can set up a decade table (a one column table with one entry for each decade) if you move to a database that doesn't have something like generate_series. The SQL would be pretty much the same, just replace the generate_series call with your decade table.
Try something like this(don't know how your tables look, guessing):
SELECT movie_year, sum(column_x)
FROM (
SELECT year
, date_trunc('decade', movie_year)::date as decade
, column_x
FROM movies) as movies_with_decades
GROUP BY decade
ORDER BY decade;
If I have a table containing schedule information that implies particular dates, is there a SQL statement that can be written to convert that information into actual rows, using some sort of CROSS JOIN, perhaps?
Consider a payment schedule table with these columns:
StartDate - the date the schedule begins (1st payment is due on this date)
Term - the length in months of the schedule
Frequency - the number of months between recurrences
PaymentAmt - the payment amount :-)
SchedID StartDate Term Frequency PaymentAmt
-------------------------------------------------
1 05-Jan-2003 48 12 1000.00
2 20-Dec-2008 42 6 25.00
Is there a single SQL statement to allow me to go from the above to the following?
Running
SchedID Payment Due Expected
Num Date Total
--------------------------------------
1 1 05-Jan-2003 1000.00
1 2 05-Jan-2004 2000.00
1 3 05-Jan-2005 3000.00
1 4 05-Jan-2006 4000.00
1 5 05-Jan-2007 5000.00
2 1 20-Dec-2008 25.00
2 2 20-Jun-2009 50.00
2 3 20-Dec-2009 75.00
2 4 20-Jun-2010 100.00
2 5 20-Dec-2010 125.00
2 6 20-Jun-2011 150.00
2 7 20-Dec-2011 175.00
I'm using MS SQL Server 2005 (no hope for an upgrade soon) and I can already do this using a table variable and while loop, but it seemed like some sort of CROSS JOIN would apply but I don't know how that might work.
Your thoughts are appreciated.
EDIT: I'm actually using SQL Server 2005 though I initially said 2000. We aren't quite as backwards as I thought. Sorry.
I cannot test the code right now, so take it with a pinch of salt, but I think that something looking more or less like the following should answer the question:
with q(SchedId, PaymentNum, DueDate, RunningExpectedTotal) as
(select SchedId,
1 as PaymentNum,
StartDate as DueDate,
PaymentAmt as RunningExpectedTotal
from PaymentScheduleTable
union all
select q.SchedId,
1 + q.PaymentNum as PaymentNum,
DATEADD(month, s.Frequency, q.DueDate) as DueDate,
q.RunningExpectedTotal + s.PaymentAmt as RunningExpectedTotal
from q
inner join PaymentScheduleTable s
on s.SchedId = q.SchedId
where q.PaymentNum <= s.Term / s.Frequency)
select *
from q
order by SchedId, PaymentNum
Try using a table of integers (or better this: http://www.sql-server-helper.com/functions/integer-table.aspx) and a little date math, e..g. start + int * freq
I've used table-valued functions to achieve a similar result. Basically the same as using a table variable I know, but I remember being really pleased with the design.
The usage ends up reading very well, in my opinion:
/* assumes #startdate and #enddate schedule limits */
SELECT
p.paymentid,
ps.paymentnum,
ps.duedate,
ps.ret
FROM
payment p,
dbo.FUNC_get_payment_schedule(p.paymentid, #startdate, #enddate) ps
ORDER BY p.paymentid, ps.paymentnum
A typical solution is to use a Calendar table. You can expand it to fit your own needs, but it would look something like:
CREATE TABLE Calendar
(
calendar_date DATETIME NOT NULL,
is_holiday BIT NOT NULL DEFAULT(0),
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED calendar_date
)
In addition to the is_holiday you can add other columns that are relevant for you. You can write a script to populate the table up through the next 10 or 100 or 1000 years and you should be all set. It makes queries like that one that you're trying to do much simpler and can give you additional functionality.