SQL Database Design for SSIS - sql

OK my first question so here goes.
Currently users are using a huge Access Application. They wanted a web application with some functionality based off of the Access data and with some modifications.
Ok no problem. I used the Access to SQL migration assistant to convert the data over and then wrote some SSIS packages which are executed from the web end to allow the application to be updated as needed. All here is good.
Here is where I am kind of stumped. There are 2 types of imports, quarterly and yearly. The quarterly is fine but the yearly import is causing issues. The yearly import can be for an adopted budget or for a proposed budget (each is held in a separate Access db). I have one SSIS package for each type of yearly import. The table where the information goes is as follows..
CREATE TABLE Budget
(
BudgetID uniqueidentifier NOT NULL,
ProjectNumber int NOT NULL,
SubProjectNumber varchar(6) NOT NULL,
FiscalYearBegin int NOT NULL,
FiscalYearEnd int NOT NULL,
Sequence int NULL,
QuarterImportDate datetime NULL,
ProposedBudget money NULL,
AdoptedBudget money NULL,
CONSTRAINT PK_Budget PRIMARY KEY CLUSTERED
(
BudgetID ASC
),
CONSTRAINT uc_Budget UNIQUE NONCLUSTERED
(
ProjectNumber ASC,
SubProjectNumber ASC,
FiscalYearBegin ASC,
FiscalYearEnd ASC,
Sequence ASC
)
)
Also, there can be multiple versions of the budget for the specific year in terms of Project, SubProject, FiscalYearBegin, and FiscalYearEnd. Thats is why there is a sequence number.
So the problem becomes, since I have 2 different SSIS packages, each of which is an update statement on 1 specific column (either ProposedBudget or AdoptedBudget), I have no effective way of keeping track of the correct sequence.
Please let me know if I can make this any clearer, and any advice would be great!
Thanks.

Something like this will get you the next item with an empty AdoptedBudget, but I think you will need a cursor when there are multiple AdoptedBudgets. I was thinking of doing a nested subquery with an update, but that won't work when there are multiple AdoptedBudgets. It sounds like in the source application they should be selecting a ProposedBudget whenever they add the AdoptedBudget so that a relationship can be created. This way it is clear which AdoptedBudget goes with which ProposedBudget, and it would be a simple join. I have almost the same scenario, but the difference is that I don't keep all the versions. I only have to keep the most current "ProposedBudget" and most current "AdoptedBudget". It's a little bit more difficult trying to sequence them all.
--get the smallest SequenceId with an unfilled AdoptedBudget
Select min(SequenceID),
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
From Budgets b
Where AdoptedBudget is null
Group By
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
--This won't work I don't believe
Update Budgets
Set AdoptedBudget = BudgetAmount
From Budgets b
Inner Join SourceAdoptedBudgets ab on
b.ProjectNumber = ab.ProjectNumber
b.FiscalYearBegin = ab.FiscalYearBegin
b.FiscalYearEnd = ab.FiscalYearEnd
Inner Join
(
--get the smallest SequenceId with an unfilled AdoptedBudget
Select min(SequenceID),
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
From Budgets b
Where AdoptedBudget is null
Group By
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
) as nextBudgets
on --the join fields again

Something like this using the BudgetType. Of course you'd probably create a code table for these or a IsAdopted bit field. But you get the idea.
Select
budgets.*
,row_number() over(partition by
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
order by QuarterImportDate) as SequenceNumber
From
(
Select
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
,QuarterImportDate
,'Proposed' as BudgetType
,ProposedBudget as Budget
From sourceProposed
Union
Select
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
,QuarterImportDate
,'Adopted' as BudgetType
,AdoptedBudget as Budget
From sourceAdopted
) as budgets

Related

I want to make SQL tables that are updated daily yet retain every single day's contents for later lookup. What is the best practice for this?

Basically I'm trying to create a database schema based around multiple unrelated tables that will not need to reference each other AFAIK.
Each table will be a different "category" that will have the same columns in each table - name, date, two int values and then a small string value.
My issue is that each one will need to be "updated" daily, but I want to keep a record of the items for every single day.
What's the best way to go about doing this? Would it be to make the composite key the combination of the date and the name? Or use something called a "trigger"?
Sorry I'm somewhat new to database design, I can be more specific if I need to be.
Yes, you have to create a trigger for each category table
I'm assuming name is PK for each table? If isnt the case, you will need create a PK.
Lets say you have
table categoryA
name, date, int1, int2, string
table categoryB
name, date, int1, int2, string
You will create another table to store changes log.
table category_history
category_table, name, date, int1, int2, string, changeDate
You create two trigger, one for each category table
Where you save what table gerate the update and what time was made.
create trigger before update for categoryA
INSERT INTO category_history VALUES
('categoryA', OLD.name, OLD.date, OLD.int1, Old.int2, OLD.string, NOW());
This is pseudo code, you need write trigger using your rdbms syntaxis, and check how get system date now().
As has already been pointed out, it is poor design to have different identical tables for each category. Better would be a Categories table with one entry for each category and then a Dailies table with the daily information.
create table Categories(
ID smallint not null auto_generated,
Name varchar( 20 ) not null,
..., -- other information about each category
constraint UQ_Category_Name unique( Name ),
constraint PK_Categories( ID )
);
create table Dailies(
CatID smallint not null,
UpdDate date not null,
..., -- Daily values
constraint PK_Dailies( CatID, UpdDate ),
constraint FK_Dailies_Category foreign key( CatID )
references Categories( ID )
);
This way, adding a new category involves inserting a row into the Categories table rather than creating an entirely new table.
If the database has a Date type distinct from a DateTime -- no time data -- then fine. Otherwise, the time part must be removed such as by Oracle's trunc function. This allows only one entry for each category per day.
Retrieving all the values for all the posted dates is easy:
select C.Name as Category, d.UpdDate, d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID;
This can be made into a view, DailyHistory. To see the complete history for Category Cat1:
select *
from DailyHistory
where Name = 'Cat1';
To see all the category information as it was updated on a specific date:
select *
from DailyHistory
where UpdDate = date '2014-05-06';
Most queries will probably be interested in the current values -- that is, the last update made (assuming some categories are not updated every day). This is a little more complicated but still very fast if you are worried about performance.
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID );
Of course, if every category is updated every day, the query is simplified:
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate = <today's date>;
This can also be made into a view. To see today's (or the latest) updates for Category Cat1:
select *
from DailyCurrent
where Name = 'Cat1';
Suppose now that updates are not necessarily made every day. The history view would show all the updates that were actually made. So the query shown for all categories as they were on a particular day would actually show only those categories that were actually updated on that day. What if you wanted to show the data that was "current" as of a particular date, even if the actual update was several days before?
That can be provided with a small change to the "current" query (just the last line added):
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID
and UpdDate <= date '2014-05-06' );
Now this shows all categories with the data updated on that date if it exists otherwise the latest update made previous to that date.
As you can see, this is a very flexible design which allows access the data just about any way desired.

Finding the current status of each phone number in PostgreSQL

Terminology: msisdn = phone number
First I'd like to apologize for the names. This Database schema was created using the squeryl ORM and it has some interesting table definition choices. I've included the two relevant tables below.
Basically, Records contains provisioning requests. Provisioning is each attempt at that Record. Some Records are attempted multiple times.
create table "Provisioning" (
"record_id" bigint not null,
"responseCode" varchar(128),
"response" varchar(128),
"id" bigint primary key not null,
"status" integer not null,
"actionRequired" boolean not null
);
create sequence "s_Provisioning_id";
create table "Record" (
"source" varchar(128) not null,
"timestamp" varchar(128) not null,
"finalState" varchar(128) not null,
"fromMSISDN" varchar(128) not null,
"reason" varchar(128) not null,
"id" bigint primary key not null,
"toICCID" varchar(128) not null,
"fromICCID" varchar(128) not null,
"toMSISDN" varchar(128) not null,
"batch_id" bigint not null,
"action" varchar(128) not null,
"transactionId" varchar(128) not null
);
I realize the Provisioning has no timestamp. Yes the latest id is the latest request. The developer working on it forgot to put in timestamps, the project manager convinced them it wasn't a part of the original requirements and then the client didn't want to pay to add it in later. No I'm not happy about it. No there's nothing I can do about it. Yes I hate working for a consulting company. Moving on.
The Problem: I need a report that gives me the latest state of each phone number (msisdn). There can be multiple records for each phone number. In the case of from/toMSISDN, the toMSISDN should always be used unless it is empty, in which case the from is used. The following query gets all the unique phone numbers that are in Record:
SELECT
CASE
WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
ELSE "toMSISDN"
END AS msisdn
FROM "Record"
GROUP BY msisdn
So that gives me a subset of all the numbers I need to report on. Now I need the very latest Record and Provisioing. I can get the latest Provisiong with the following:
SELECT
max(p.id) latest_provision_id,
p.record_id
FROM "Provisioning" p
LEFT JOIN "Record" r ON r.id = p.record_id
group by p.record_id
So this gives me a 1-to-1 table of each record and what it's latest provisioning is. And this is where I start to get stuck. From the Provisioning table I need the response and responseCode for the latest Provisioing. I thought about just adding max(p."responseCode") to the query, but then I realized it would most likely do a alphabetic comparison and not get the correct responseCode/response for the appropriate Provisioning.id. I tried adding those fields to the Group By, but then I started getting extra records in my queries and I wasn't quite sure what was going on.
This (very ugly subquery join) seems to give me the correct record row and provisioning row information, but it's for ever record and I need to get the latest (max provisioning id) for each msisdn/phone number (computed field). I'm not sure what to group by and what aggregate functions to use.
SELECT *,
CASE
WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
ELSE "toMSISDN"
END AS msisdn
FROM (
SELECT
max(p.id) latest_provision_id,
p.record_id
FROM "Provisioning" p
LEFT JOIN "Record" r ON r.id = p.record_id
group by p.record_id
) latest_prov
LEFT JOIN "Provisioning" p2 ON p2.id=latest_prov.latest_provision_id
LEFT JOIN "Record" r2 ON r2.id=latest_prov.record_id
I can't seem to think of a clean way of doing this without running multiple queries and dealing with the results in the application layer.
I was also originally going to do this as a Scala app using the same squeryl ORM, but the query just got considerably more complicated and I stopped around the following statement, opting instead to do the reports as a Python application:
def getSimSnapshot() = {
join(record,provisioning.leftOuter)((r,p) =>
groupBy(r.fromMSISDN)
compute(max(r.id),r.finalState,r.fromMSISDN,r.reason,r.action)
on(r.id === p.map(_.record_id))
)
}
If there's an easier way to do this with the ORM I'm all for it.
Check out Window Functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html
Without joins you can get latest data for record_id:
select *
from
(
select p.record_id, p.responseCode, p.id, max(p.id) over (partition by p.record_id) max_id
from "Provisioning" p
)
where id = max_id
It can be a problem only if "Provisioning" contains also record_ids of different tables then "Records"

Unique constraint on Distinct select in Oracle database

I have a data processor that would create a table from a select query.
<_config:table definition="CREATE TABLE TEMP_TABLE (PRODUCT_ID NUMBER NOT NULL, STORE NUMBER NOT NULL, USD NUMBER(20, 5),
CAD NUMBER(20, 5), Description varchar(5), ITEM_ID VARCHAR(256), PRIMARY KEY (ITEM_ID))" name="TEMP_TABLE"/>
and the select query is
<_config:query sql="SELECT DISTINCT ce.PRODUCT_ID, ce.STORE, op.USD ,op.CAD, o.Description, ce.ITEM_ID
FROM PRICE op, PRODUCT ce, STORE ex, OFFER o, SALE t
where op.ITEM_ID = ce.ITEM_ID and ce.STORE = ex.STORE
and ce.PRODUCT_ID = o.PRODUCT_ID and o.SALE_ID IN (2345,1234,3456) and t.MEMBER = ce.MEMBER"/>
When I run that processor, I get an unique constraint error, though I have a distinct in my select statement.
I tried with CREATE TABLE AS (SELECT .....) its creating fine.
Is it possible to get that error? I'm doing a batch execute so not able to find the individual record.
The select distinct applies to the entire row, not to each column individually. So, two rows could have the same value of item_id but be different in the other columns.
The ultimate fix might be to have a group by item_id in the query, instead of select distinct. That would require other changes to the logic. Another possibility would be to use row_number() in a subquery and select the first row.

How to insert values from column A of table X to column B of table Y - and order them randomly

I need to collect the values from the column "EmployeeID" of the table "Employees" and insert them into the column "EmployeeID" of the table "Incident".
At the end, the Values in the rows of the column "EmployeeID" should be arranged randomly.
More precisely;
I created 10 employees with their ID's, counting from 1 up to 10.
Those Employees, in fact the ID's, should receive random Incidents to work on.
So ... there are 10 ID's to spread on all Incidents - which might be 1000s.
How do i do this?
It's just for personal exercise on the local maschine.
I googled, but didn't find an explicit answer to my problem.
Should be simple to solve for you champs. :)
May anyone help me, please?
NOTES:
1) I've already created a column called "EmployeeID" in the table "Incident", therefore I'll need an update statement, won't I?
2) Schema:
[dbo].[EmployeeType]
[dbo].[Company]
[dbo].[Division]
[dbo].[Team]
[dbo].[sysdiagrams]
[dbo].[Incident]
[dbo].[Employees]
3) 1. Pre-solution:
CREATE TABLE IncidentToEmployee
(
IncidentToEmployeeID BIGINT IDENTITY(1,1) NOT NULL,
EmployeeID BIGINT NULL,
Incident FLOAT NULL
PRIMARY KEY CLUSTERED (IncidentToEmployeeID)
)
INSERT INTO IncidentToEmployee
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
SELECT * FROM IncidentToEmployee
GO
3) 2. Output by INNER JOIN ON
In case you are wondering about the "Alias" column;
Nobody really knows which persons are behind the ID's - that's why I used an Alias column.
SELECT Employees.Alias,
IncidentToEmployee.Incident
FROM Employees
INNER JOIN
IncidentToEmployee ON
Employees.EmployeeID = IncidentToEmployee.EmployeeID
ORDER BY Alias
4) Final Solution
As I mentioned, I added at first a column called "EmployeeID" already to my "Incident" table. That's why I couldn't use an INSERT INTO statement at first and had to use an UPDATE statement. I found the most suitable solution now - without creating a new table as I did as a pre-solution.
Take a look at the following code:
ALTER Table Incident
ADD EmployeeID BIGINT NULL
UPDATE Incident
SET Incident.EmployeeID = EmployeeID
FROM Incident INNER JOIN Employees
ON Incident = EmployeeID
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
Thank you all for your help - It took way longer to find a solution as I thought it would take; but I finally made it. Thanks!
UPDATE
I think you need to allocate different task to different user, a better approach will be to create a new table let's say EmployeeIncidents having columns Id(primary) , EmployeeID and IncidentID .
Now you can insert random EmployeesID and random IncidentID to new table, this way you will be able to keep records also ,
Updating Incident table will not be a smart choice.
INSERT INTO EmployeeIncidents
SELECT TOP ( 10 )
EmployeesID ,
IncidentID
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
Written by hand, so may need to tweak syntax, but something like this should do it. The Rand() function will give the same value unless seeded, so you can see with something like date to get randomness.
Insert Into Incidents
Select Top 10
EmployeeID
From Employees
Order By
Rand(GetDate())

in SQL, best way to join first and last instance of child table without NOT EXISTS?

in PostgreSQL, have issue table and child issue_step table - an issue contains one or more steps.
the view issue_v pulls things from the issue and the first and last step: author and from_ts are pulled from the first step, while status and thru_ts are pulled from the last step.
the tables
create table if not exists seeplai.issue(
isu_id serial primary key,
subject varchar(240)
);
create table if not exists seeplai.issue_step(
stp_id serial primary key,
isu_id int not null references seeplai.issue on delete cascade,
status varchar(12) default 'open',
stp_ts timestamp(0) default current_timestamp,
author varchar(40),
notes text
);
the view
create view seeplai.issue_v as
select isu.*,
first.stp_ts as from_ts,
first.author as author,
first.notes as notes,
last.stp_ts as thru_ts,
last.status as status
from seeplai.issue isu
join seeplai.issue_step first on( first.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id>first.stp_id ) )
join seeplai.issue_step last on( last.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id<last.stp_id ) );
note1: issue_step.stp_id is guaranteed to be chronologically sequential, so using it instead of stp_ts because it's already indexed
this works, but ugly as sin, and cannot be the most efficient query in the world.
In this code, I use a sub-query to find the first and last step IDs, and then join to the two instances of the step table by using those found values.
SELECT ISU.*
,S1.STP_TS AS FROM_TS
,S1.AUTHOR AS AUTHOR
,S1.NOTES AS NOTES
,S2.STP_TS AS THRU_TS
,S2.STATUS AS STATUS
FROM SEEPLAI.ISSUE ISU
INNER JOIN
(
SELECT ISU_ID
,MIN(STP_ID) AS MIN_ID
,MAX(STP_ID AS MAX_ID
FROM SEEPLAI.ISSUE_STEP
GROUP BY
ISU_ID
) SQ
ON SQ.ISU_ID = ISU.ISU.ID
INNER JOIN
SEEPLAI.ISSUE_STEP S1
ON S1.STP_ID = SQ.MIN_ID
INNER JOIN
SEEPLAI.ISSUE_STEP S2
ON S2.STP_ID = SQ.MAX_ID
Note: you really shouldn't be using a select * in a view. It is much better practice to list out all the fields that you need in the view explicitly
Have you considered using window functions?
http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html
A starting point:
select steps.*,
first_value(steps.stp_id) over w as first_id,
last_value(steps.stp_id) over w as last_id
from issue_step steps
window w as (partition by steps.isu_id order by steps.stp_id)
Btw, if you know the IDs in advance, you'll much be better off getting details in a separate query. (Trying to fetch everything in one go will just yield sucky plans due to subqueries or joins on aggregates, which will result in inefficiently considering/joining the entire tables together.)