Update column in SQL Server 2008 - SSIS - sql

If the Customer Transaction type is not mentioned in the database
If the model year is less than or equal to 2 years than the transaction
type should be updated as a Warranty.
Remaining 60 % of the customer data should be updated as Customer
pay and the 40% of the customer data should be updated as Warranty
randomly on each dealer.
I have a table for Model year table of this structure:
SlNo VehicleNo ModelYear
---- --------- ---------
1 AAAD1234 2012
2 VVV023333 2008
3 CRT456 2011
4 MTER6666 2010
Is it possible to achieve this using SSIS??
I have tried a query. Please help fix it
select
vehicleNo, Modelyear,
case
when DATEDIFF(year, ModelYear, GETDATE()) <= 2 then 'Warranty' END,
case
when COUNT(modelyear) * 100 / (select COUNT(*) from VehicleModel) > 2 then '100%' end,
case
when COUNT(modelyear) * 40 / (select COUNT(*) from VehicleModel) > 2 then '40%' end
from
vehiclemodel
group by
vehicleNo, Modelyear
Output
vehicleNo Modelyear (No column name) (No column name) (No column name)
--------- --------- ---------------- ---------------- ----------------
AAAD1234 2008 NULL 100% 40%
VVV023333 2010 Warranty 100% 40%
CRT456 2011 Warranty 100% 40%
MTER6666 2012 Warranty 100% 40%

what exactly are you trying to do with SSIS? Where are you moving data from and where are you inserting it to?
If you only need to run this query you don't need SSIS. You can do this logic in SQL.
If you need to insert this into another table or database I would also do the calculation on SQL (like you just did) and use it as source to an OleDBSourtce component and then insert it into your destination.
I think you have to provide more information so we can help you

Related

How to index for a self join

I'm using SAS University Edition to analyze the following table (actually has 2.5M rows in it)
p_id c_id startyear endyear
0001 3201 2008 2013
0001 2131 2013 2015
0013 3201 2006 2010
where p_id is person_id and c_id is companyid.
I want to get number of colleagues (number of persons that worked during an overlapping span at the same companies) in a certain year, so I created a table with the distinct p_ids and do the following query:
PROC SQL;
UPDATE no_colleagues AS t1
SET c2007 = (
SELECT COUNT(DISTINCT t2.p_id) - 1
FROM table AS t2
INNER JOIN table AS t3
ON t3.p_id = t1.p_id
AND t3.c_id = t2.c_id
AND t3.startyear <= t2.endyear % checks overlapping criteria
AND t3.endyear >= t2.startyear % checks overlapping criteria
AND t3.startyear <= 2007 % limits number of returns
AND t2.startyear <= 2007 % limits number of returns
);
A single lookup on an indexed query (p_id, c_id, startyear, endyear) takes 0.04 seconds. The query above takes about 1.8 seconds for a single update, and does not use any indexes.
So my question is:
How to improve the query, and/or how to use indices to make sure the self join can use the indices?
Thanks in advance.
Based on your data, I'd do something like this, but maybe you need to tweak the code to fit your needs.
First, create a table with p_id, c_id, year.
So your first guy working at the company 3201 will have 6 observations in this table, one for each worked year.
data have_count;
set have;
do i=startyear to endyear;
worked_in = i;
output;
end;
drop i startyear endyear;
run;
Now you just count and agreggate:
proc sql;
select
worked_in as year
,c_id
,count(distinct p_id) as no_colleagues
from have_count
group by 1,2;
quit;
Result:
year c_id no_colleagues
2006 3201 1
2007 3201 1
2008 3201 2
2009 3201 2
2010 3201 2
2011 3201 1
2012 3201 1
2013 2131 1
2013 3201 1
2014 2131 1
2015 2131 1
A more efficient method:
1) Create a long format table for the results rather than wide format. This will be both easier to populate and easier to work with later.
create table colleagues_by_year (
p_id int,
year int,
colleagues int
);
Now this can be populated with a single insert statement. The only trick is getting the full list of years you want in the final table. There are a few options, but since I'm not too familiar with SAS SQL I'm going to go with a very simple one: a lookup table of years, to which you can join.
create table years (
year int
);
insert into years
values (2007),(2008),...
(A more sophisticated approach would be a recursive query that found the range of all years in the input data).
Now the final insert:
insert into colleagues_by_year
select p_id,
year,
count(*)
from colleagues
join years on
years.year between colleagues.startyear and colleagues.endyear
group by p_id,year
This won't have any rows where the number of colleagues for the year would be 0. If you wanted that you could make years be a left join and only count the rows where years.year is not null.

SQL - combine two columns into a comma separated list

The problem I'm facing is probably easy to fix, but I can't seem to find an answer online due to the specificity of the issue.
In my database, I have a 3 tables to denote how an educational course is planned. Suppose there is a course called Working with Excel. This means the table Courses has a row for this.
The second table denotes cycles of the same course. If the course is given on Jan 1 2013 and Feb 1 2013, in the underlying tables Cycles, you will find 2 rows, one for each date.
I currently already have an SQL script that gives me two columns: The course name, and a comma separated list with all the Cycle dates.
Please note I am using dd/MM/yyyy notation
This is how it's currently set up (small excerpt, this is the SELECT statement to explain the desired output):
SELECT course.name,
stuff((SELECT distinct ',' + CONVERT(varchar(10), cycleDate, 103) --code 101 = mm/dd/yyyy, code 103 = dd/mm/yyyy
FROM cycles t2
where t2.courseID= course.ID and t2.cycleDate > GETDATE()
FOR XML PATH('')),1,1,'') as 'datums'
The output it gives me:
NAME DATUMS
---------------------------------------------------
Working with Excel 01/01/2013,01/02/2013
Some other course 12/3/2013, 1/4/2013, 1/6/2013
The problem is that I need to add info from the third table I haven't mentioned yet. The table ExtraDays contains additional days for a cycle, in case this spans more than a day.
E.g., if the Working with Excel course takes 3 days, (Jan 1+2+3 and Feb 1+2+3), each of the course cycles will have 2 ExtraDays rows that contain the 'extra days'.
The tables would look like this:
Table COURSES
ID NAME
---------------------------------------------------
1 Working with Excel
Table CYCLES
ID DATE COURSEID
---------------------------------------------------
1 1/1/2013 1
2 1/2/2013 1
Table EXTRADAYS
ID EXTRADATE CYCLEID
---------------------------------------------------
1 2/1/2013 1
2 3/1/2013 1
3 2/2/2013 2
4 3/2/2013 2
I need to add these ExtraDates to the comma-separated list of dates in my output. Preferably sorted, but this is not necessary.
I've been stumped quite some time by this. I have some SQL experience, but apparently not enough for this issue :)
I'm hoping to get the following output:
NAME DATUMS
--------------------------------------------------------------------------------------
Working with Excel 01/01/2013,02/01/2013,03/01,2013,01/02/2013,02/02/2013,03/02/2013
I'm well aware that the database structure could be improved to simplify this, but unfortunately this is a legacy application, I cannot change the structure.
Can anyone point me in the right way to combining these two columns.
I hope I described my issue clear enough for you. Else, just ask :)
SELECT course.name,
stuff((SELECT distinct ',' + CONVERT(varchar(10), cycleDate, 103) --code 101 = mm/dd/yyyy, code 103 = dd/mm/yyyy
FROM (select id, date, courseid from cycles
union
select id, extradate, courseid from extradays) t2
where t2.courseID= course.ID and t2.cycleDate > GETDATE()
FOR XML PATH('')),1,1,'') as 'datums'

Why is there no "product()" aggregate function in SQL? [duplicate]

This question already has answers here:
is there a PRODUCT function like there is a SUM function in Oracle SQL?
(7 answers)
Closed 7 years ago.
When there are Sum(), min(), max(), avg(), count() functions, can someone help understand why there is no product() built-in function. And what will be the most efficient user-implementation of this aggregate function ?
Thanks,
Trinity
If you have exponential and log functions available, then:
PRODUCT(TheColumn) = EXP(SUM(LN(TheColumn)))
One can make a user-defined aggregate in SQL 2005 and up by using CLR. In Postgresql, you can do it in Postgres itself, likewise with Oracle
I'll focus on the question why it's not a standard function.
Aggregate function are basic statistical functions and product is not
Applied to common numerical data, the result will be in most cases out of range (overflow) so it is of little general use
It's probably left out because most people don't need it and it can be defined easily in most databases.
Solution for PostgreSQL:
CREATE OR REPLACE FUNCTION product_sfunc(state numeric, factor numeric)
RETURNS numeric AS $$
SELECT $1 * $2
$$ LANGUAGE sql;
CREATE AGGREGATE product (
sfunc = product_sfunc,
basetype = numeric,
stype = numeric,
initcond = '1'
);
You can simulate product() using cursors. If you let us know which database platform you're using, then we might be able to give you some sample code.
I can confirm that it is indeed rare to use a product() aggregate function, but I have a quite valid example, especially working with highly aggregated data that must be presented to users in a report.
It utilizes the exp(sum(ln( multiplyTheseColumnValues ))) "trick" as mentioned in another post and other internet sources.
The report (which should care about the display, and contain as least data calculation logic as possible, to provide better maintainability and flexibility) is basically displaying this data along with some graphics:
DESCR SUM
---------------------------------- ----------
money available in 2013 33233235.3
money spent in 2013 4253235.3
money bound to contracts in 2013 34333500
money spent 2013 in % of available 12
money bound 2013 in % of available 103
(In real life its a bit more complex and used in state budget scenarios.)
It aggregates quite some complex data found in the first 3 rows.
I do not want to calculate the percentage values of the following rows (4th and 5th) by:
doing it in the quite dumb (as it should be) report (which just takes any number of such rows with a descripiton descr and a number sum) with some fancy logic (using JasperReports, BIRT Reports or alike)
neither do I want to calculate the underlying data (money available, money spent, money bound) multiple times (since these are quite expensive operations) just to calculate the percentage values
So I used another trick involving the use of the product()-functionality.
(If somebody does know of a better way to achive this considering the above mentioned restrictions, I would be happy to know :-) )
The whole simplified example is available as one executable SQL below.
Maybe it could help convice some Oracle guys that this functionality is not as rare, or not worth providing, as it may seem at first thoughts.
with
-- we have some 10g database without pivot/unpivot functionality
-- what is interesting for various summary reports
sum_data_meta as (
select 'MA' as sum_id, 'money available in 2013' as descr, 1 as agg_lvl from dual
union all select 'MS', 'money spent in 2013', 1 from dual
union all select 'MB', 'money bound to contracts in 2013', 1 from dual
union all select 'MSP', 'money spent 2013 in % of available', 2 from dual
union all select 'MBP', 'money bound 2013 in % of available', 2 from dual
)
/* select * from sum_data_meta
SUM_ID DESCR AGG_LVL
------ ---------------------------------- -------
MA money available in 2013 1
MS money spent in 2013 1
MB money bound to contracts in 2013 1
MSP money spent 2013 in % of available 2
MBP money bound 2013 in % of available 2
*/
-- 1st level of aggregation with the base data (the data actually comes from complex (sub)SQLs)
,sum_data_lvl1_base as (
select 'MA' as sum_id, 33233235.3 as sum from dual
union all select 'MS', 4253235.3 from dual
union all select 'MB', 34333500 from dual
)
/* select * from sum_data_lvl1_base
SUM_ID SUM
------ ----------
MA 33233235.3
MS 4253235.3
MB 34333500.0
*/
-- 1st level of aggregation with enhanced meta data infos
,sum_data_lvl1 as (
select
m.descr,
b.sum,
m.agg_lvl,
m.sum_id
from sum_data_meta m
left outer join sum_data_lvl1_base b on (b.sum_id=m.sum_id)
)
/* select * from sum_data_lvl1
DESCR SUM AGG_LVL SUM_ID
---------------------------------- ---------- ------- ------
money available in 2013 33233235.3 1 MA
money spent in 2013 4253235.3 1 MS
money bound to contracts in 2013 34333500.0 1 MB
money spent 2013 in % of available - 2 MSP
money bound 2013 in % of available - 2 MBP
*/
select
descr,
case
when agg_lvl < 2 then sum
when agg_lvl = 2 then -- our level where we have to calculate some things based on the previous level calculations < 2
case
when sum_id = 'MSP' then
-- we want to calculate MS/MA by tricky aggregating the product of
-- (MA row:) 1/33233235.3 * (MS:) 4253235.3/1 * (MB:) 1/1 * (MSP:) 1/1 * (MBP:) * 1/1
trunc( -- cut of fractions, e.g. 12.7981 => 12
exp(sum(ln( -- trick simulating product(...) as mentioned here: http://stackoverflow.com/a/404761/1915920
case when sum_id = 'MS' then sum else 1 end
/ case when sum_id = 'MA' then sum else 1 end
)) over ()) -- "over()" => look at all resulting rows like an aggregate function
* 100 -- % display style
)
when sum_id = 'MBP' then
-- we want to calculate MB/MA by tricky aggregating the product as shown above with MSP
trunc(
exp(sum(ln(
case when sum_id = 'MB' then sum else 1 end
/ case when sum_id = 'MA' then sum else 1 end
)) over ())
* 100
)
else -1 -- indicates problem
end
else null -- will be calculated in a further step later on
end as sum,
agg_lvl,
sum_id
from sum_data_lvl1
/*
DESCR SUM AGG_LVL SUM_ID
---------------------------------- ---------- ------- ------
money available in 2013 33233235.3 1 MA
money spent in 2013 4253235.3 1 MS
money bound to contracts in 2013 34333500 1 MB
money spent 2013 in % of available 12 2 MSP
money bound 2013 in % of available 103 2 MBP
*/
Since the Product is noting but the multiple of SUM, so in SQL they didnot introduce the Product aggregate function
For example: 6 * 4 can be acheived by
either adding 6, 4 times to itself like 6+6+6+6
or
adding 4, 6 times to itself like 4+4+4+4+4+4
thus giving the same result

SQL complex view for virtually showing data

I have a table with the following table.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
So this shows stock against some of the hours in which there is a change in the quantity.
Now my requirement is to create a view on this table which will virtually show the data (if stock is not htere for a particular hour). So the data that should be shown is
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
----------------------------------
That means even if the data is not there for some particular hour then we should show the last hour's stock which is having stock. And i have another table with all the available hours from 1-23 in a column.
I have tried partition over by method as given below. But i think i am missing some thing around this to get my requirement done.
SELECT
HOUR_NUMBER,
CASE WHEN TOTAL_STOCK IS NULL
THEN SUM(TOTAL_STOCK)
OVER (
PARTITION BY LOCATION
ORDER BY CURRENT_HOUR ROWS 1 PRECEDING
)
ELSE
TOTAL_STOCK
END AS FULL_STOCK
FROM
(
SELECT HOUR_NUMBER AS HOUR_NUMBER
FROM HOURS_TABLE -- REFEERENCE TABLE WITH HOURS FROM 1-23
GROUP BY 1
) HOURS_REF
LEFT OUTER JOIN
(
SEL CURRENT_HOUR AS CURRENT_HOUR
, STOCK AS TOTAL_STOCK
,LOCATION AS LOCATION
FROM STOCK_TABLE
WHERE STOCK<>0
) STOCKS
ON HOURS_REF.HOUR_NUMBER = STOCKS.CURRENT_HOUR
This query is giving all the hours with stock as null for the hours without data.
We are looking at ANSI sql solution so that it can be used on databases like Teradata.
I am thinking that i am using partition over by wrongly or is there any other way. We tried with CASE WHEN but that needs some kind of looping to check back for an hour with some stock.
I've run into similar problems before. It's often simpler to make sure that the data you need somehow gets into the database in the first place. You might be able to automate it with a stored procedure that runs periodically.
Having said that, did you consider trying COALESCE() with a scalar subquery? (Or whatever similar function your dbms supports.) I'd try it myself and post the SQL, but I'm leaving for work in two minutes.
Haven't tried, but along the lines of what Mike said:
SELECT a.hour
, COALESCE( a.stock
, ( select b.stock
from tbl.b
where b.hour=a.hour-1 )
) "stock"
FROM tbl a
Note: this will impact performance greatly.
Thanks for your responses. I have tried out RECURSIVE VIEW for the above requirement and is giving correct results (I am fearing about the CPU usage for big tables as it is recursive). So here is stock table
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
Then we will have a view on this table which will give all 12 hours data using Left outer join.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 NULL
8 2000 NULL
9 2000 24
----------------------------------
Then we will have a recursive view which joins the table recursively with the same view to get the Stock of each hour moved one hour up and appended with level of data coming incremented.
REPLACE RECURSIVE VIEW HOURLY_STOCK_VIEW
(HOUR_NUMBER,LOCATION, STOCK, LVL)
AS
(
SELECT
HOUR_NUMBER,
LOCATION,
STOCK,
1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN
UNION ALL
SELECT
STK.HOUR_NUMBER,
THE_VIEW.LOCATION,
THE_VIEW.STOCK,
LVL+1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN STK
JOIN
HOURLY_STOCK_VIEW THE_VIEW
ON THE_VIEW.HOUR_NUMBER = STK.HOUR_NUMBER -1
WHERE LVL <=12
)
;
You can observe that first we select from the Left outer joined view then we union it with the left outer join view joined on the same view which we are creating and giving it its level at which data is coming.
Then we select the data from this view with the minimum level.
SEL * FROM HOURLY_STOCK_VIEW
WHERE
(
HOUR_NUMBER,
LVL
)
IN
(
SEL
HOUR_NUMBER,
MIN(LVL)
FROM HOURLY_STOCK_VIEW
WHERE STOCK IS NOT NULL
GROUP BY 1
)
;
This is working fine and giving the result as
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
10 2000 24
11 2000 24
12 2000 24
----------------------------------
I know this is going to take huge CPU for large tables to get the recursion work ( we are limiting the recursion to only 12 levels as 12 hours data is needed to stop it go into infinite loop). But I thought some body can use this for some kind of Hierarchy building. I will look for some more responses from you guys on any other approaches available. Thanks. You can have a look at Recursive views in the below link for teradata.
http://forums.teradata.com/forum/database/recursion-in-a-stored-procedure
The most common uses of view is, the removal of complexity.
For example:
CREATE VIEW FEESTUDENT
AS
SELECT S.NAME,F.AMOUNT FROM STUDENT AS S
INNER JOIN FEEPAID AS F ON S.TKNO=F.TKNO
Now do a SELECT:
SELECT * FROM FEESTUDENT

Can I use SQL to plot actual dates based on schedule information?

If I have a table containing schedule information that implies particular dates, is there a SQL statement that can be written to convert that information into actual rows, using some sort of CROSS JOIN, perhaps?
Consider a payment schedule table with these columns:
StartDate - the date the schedule begins (1st payment is due on this date)
Term - the length in months of the schedule
Frequency - the number of months between recurrences
PaymentAmt - the payment amount :-)
SchedID StartDate Term Frequency PaymentAmt
-------------------------------------------------
1 05-Jan-2003 48 12 1000.00
2 20-Dec-2008 42 6 25.00
Is there a single SQL statement to allow me to go from the above to the following?
Running
SchedID Payment Due Expected
Num Date Total
--------------------------------------
1 1 05-Jan-2003 1000.00
1 2 05-Jan-2004 2000.00
1 3 05-Jan-2005 3000.00
1 4 05-Jan-2006 4000.00
1 5 05-Jan-2007 5000.00
2 1 20-Dec-2008 25.00
2 2 20-Jun-2009 50.00
2 3 20-Dec-2009 75.00
2 4 20-Jun-2010 100.00
2 5 20-Dec-2010 125.00
2 6 20-Jun-2011 150.00
2 7 20-Dec-2011 175.00
I'm using MS SQL Server 2005 (no hope for an upgrade soon) and I can already do this using a table variable and while loop, but it seemed like some sort of CROSS JOIN would apply but I don't know how that might work.
Your thoughts are appreciated.
EDIT: I'm actually using SQL Server 2005 though I initially said 2000. We aren't quite as backwards as I thought. Sorry.
I cannot test the code right now, so take it with a pinch of salt, but I think that something looking more or less like the following should answer the question:
with q(SchedId, PaymentNum, DueDate, RunningExpectedTotal) as
(select SchedId,
1 as PaymentNum,
StartDate as DueDate,
PaymentAmt as RunningExpectedTotal
from PaymentScheduleTable
union all
select q.SchedId,
1 + q.PaymentNum as PaymentNum,
DATEADD(month, s.Frequency, q.DueDate) as DueDate,
q.RunningExpectedTotal + s.PaymentAmt as RunningExpectedTotal
from q
inner join PaymentScheduleTable s
on s.SchedId = q.SchedId
where q.PaymentNum <= s.Term / s.Frequency)
select *
from q
order by SchedId, PaymentNum
Try using a table of integers (or better this: http://www.sql-server-helper.com/functions/integer-table.aspx) and a little date math, e..g. start + int * freq
I've used table-valued functions to achieve a similar result. Basically the same as using a table variable I know, but I remember being really pleased with the design.
The usage ends up reading very well, in my opinion:
/* assumes #startdate and #enddate schedule limits */
SELECT
p.paymentid,
ps.paymentnum,
ps.duedate,
ps.ret
FROM
payment p,
dbo.FUNC_get_payment_schedule(p.paymentid, #startdate, #enddate) ps
ORDER BY p.paymentid, ps.paymentnum
A typical solution is to use a Calendar table. You can expand it to fit your own needs, but it would look something like:
CREATE TABLE Calendar
(
calendar_date DATETIME NOT NULL,
is_holiday BIT NOT NULL DEFAULT(0),
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED calendar_date
)
In addition to the is_holiday you can add other columns that are relevant for you. You can write a script to populate the table up through the next 10 or 100 or 1000 years and you should be all set. It makes queries like that one that you're trying to do much simpler and can give you additional functionality.