How to fetch the changed data in a database? [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I need to fetch the local names from table A, which have been changed in the past 30 days using SQL.
Do I need to create a backup of a table or is there any other method?
And If creating backup is the only method how do we compare and find out the locally overridden names?
Table Details:
TREE_ID (NUMBER)
TREE_NM (VARCHAR2)
TREE_LEVEL (VARCHAR2)
UPLEVEL_ID (NUMBER)
HRCHY_TYPE (VARCHAR2)
CATG_ID (NUMBER)
SUBCATG_ID (NUMBER)
STATUS (VARCHAR2)
USER_ID (NUMBER)
CREATE_DATE (DATE)
EFFCT_START_DATE (DATE)
EFFCT_END_DATE (DATE)
UPDATED_DATE (DATE)
TOP_LEVEL_ID (NUMBER)
I need to generate a feed at the end of every month to fetch the changed TREE_NM.

I think there is no default operation in Oracle to do that. A possible workaround could be to include a new column to your table A where you store the modificationDate. Then defining a Before Insert OR Update Trigger which simply writes the new value (current date) to all rows that have been inserted or updated.
Hope this helps.

If you can't modify the table, this can't be done unless you can modify the apps that modify the table. If you can do the latter, make a second table with:
TreeID NUMBER (foreign key)
LastModifiedDate datetime
And write to this table every time the first table is modified. Then, you can join the two tables together on
TableA.TreeID = Table2.TreeID
WHERE Table2.LastModifiedDate >= DATEADD(d, -30, getdate())
And that will return all records that were modified in last 30 days.
If you can't modify the database OR the apps, then this is impossible with your current structure, so hopefully you have the ability to make some changes.
EDIT:
If historical changes are something that you will need to track for other purposes in the future, you should look into implementing a data warehouse (specifically, look into slowly changing dimensions).
Second Edit:
I would seriously question why you're not allowed to add a field to this table. In SQL Server, you can add fields to tables without impacting the data or applications that access it. If I were you, I would push pretty hard to add the field to the table instead of creating a more complex and obfuscated database/application structure for no apparent reason.

Related

Facts and dimensions: dynamic dimensions [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
(Appreciate this post is perhaps too high level or philosophical for SO, I'm in the schema planning phase and seeking some guidance)
After some difficulty working with a clone of our production database for analytics, I am attempting to define a events fact table along with some dimensions tables in order to make analytics work simpler.
The block I've hit in my planning is this. We have different categories of event with different dimensions needed to describe them. E.g. suppose we have 'account settings' event category as well as 'Gallery' events.
In a fact table I might have a field eventCategory and eventName with example values from above such as:
'EventCategory': 'Account Settings'
'EventName': 'Update Card Billing Details'
Or:
'EventCategory': 'Gallery'
'EventName': 'Create New Gallery'
In each case I want to use a different collection of dimensions to describe them. E.g. for Gallery events we want to know 'template', 'count of images', 'gallery category e.g. fruits'. We have no need for these details with account settings events, which have their own distinct set of dimensions to describe them.
Via the textbook examples I find online, I would have a dimensions table for Gallery events and a dimensions table for Account Settings events.
The mental block I have is that these dimensions are dynamic not static. I want to record in the fact table the value of these dimensions at the time of the event not 'now'. For example, a user can either be in trial or a paid user. If I had a dimension table 'user' their status might currently be 'paid' but at the time of some previous gallery event they may have been in trial.
What is the 'right' way to handle this:
Multiple facts tables, one for Gallery events and one for Account Settings events?
Use json in a new field in the main facts table e.g. 'EventDetail' which contains what would otherwise go in a dimension table except by using json we know the values of the dimensions at the time of the event as opposed to whatever those values are now?
I could have a sparse facts table. I would include fields for each dimension across all categories and these would be null where not applicable
Given that the dimensions I use to describe an event are dynamic, what is the 'right' way to construct a fact table for analytics? The way I see it just now the dimensions tables would have to be facts themselves to capture the changing values of these attributes over time.
To add a dimension to any SQL table is always done the same way, by adding a column.
In any kind of history, there is no "now". Every status has a time period: a beginning and ending. I usually name those columns AsOf and Until, because begin/end show up a lot as SQL keywords, making the column names harder to scan for. Usually, only AsOf is needed, because you can self-join the table to find succeeding periods, and use NULL to represent 'now' (where "now" means, as of the time the query is executed).
'user' their status might currently be 'paid' but at the time of some previous gallery event they may have been in trial.
Right, so the user's status isn't just paid/trial. It's paid or trial starting AsOf some date, until a later AsOf date for the same user.
It's hard to be more helpful. There's a bit of jargon in your question, and it's couched in domain-specific terms. I hope by attaching a date/time to every status, you can see your way out of the forest.
(A) Managing temporal data in postgres
Temporal data is a quite usual need in many kinds of business applications, but it is not a "built-in" feature in postgres, nor in many other RDBMS.
As stated by #James K. Lowden, you can use some AsOf and Until columns of type timestamp with or without time zone, or you can use instead a single column of type tsrange or tstzrange, ie a range of timestamps, and which will offer you some nice built-in functions, see the manual.
In order to avoid overlaps between timestamp ranges associated to different events for the same data, you can implement a business logic with trigger functions.
For instance for the same user, you can implement the following trigger function so that the range r1 associated to the status 'in trial' and the range r2 associated to the status 'paid' are automatically set up when the corresponding rows are inserted in the user table, and the ranges of the existing rows for the same user are updated accordingly :
CREATE OR REPLACE FUNCTION before_insert_user ()
RETURNS trigger LANGUAGE plpgsql AS
$$
BEGIN
-- update all the existing rows (ie status) for the same user_id whose valid_ranges are valid as of now
UPDATE user
SET valid_range = tstzrange(lower(valid_range), Now())
WHERE user_id = NEW.user_id
AND valid_range #> Now() ;
-- set up the valid_range for the new row (ie the new status)
NEW.valid_range = tstzrange(Now(), NULL) ;
END ;
$$ ;
CREATE OR REPLACE TRIGGER before_insert_user BEFORE INSERT ON user
FOR EACH ROW EXECUTE FUNCTION before_insert_user () ;
(B) Managing different dimensions for different categpories
As already discussed, json can be a solution to store various dimensions in the same column.
An other solution could be the table inheritance with some interesting functionalities :
CREATE TABLE Event
( EventCategory varchar
, EventName varchar
, ValidityRange tstzrange
, primary key (EventCategory , EventName, ValidityRange )
) ;
CREATE TABLE user
( status varchar
) INHERITS Event ;
CREATE TABLE Gallery
( template varchar
, "count of images" integer
, "gallery category e.g. fruits" varchar
) INHERITS Event ;
A fact table needs to have its grain defined; if facts don’t match that grain they can’t be stored in that fact table => if you have facts with different sets if dimensions then you need to use different fact tables.
Regarding the values in a dimension changing, you need to read up on Slowly Changing Dimensions

Cross Checking a SQL server report [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a report that runs daily. I want to send the output of this report to a csv file. Due to the nature of the report, from time to time some data can be lost (new data is generated when the job is executing so sometimes, some is lost during this process as it is a lengthy job).
Is there a way to cross check on a daily basis that there is not any data from the previous day that has been lost- Perhaps with a tick or cross at the end of each row to show that the data has not been exported as a csv?
I am working with sensitive information so cant share any of the report details.
This is a fairly common question. Without specifics, it's very hard to give you a concrete answer - but here are a few solutions I've used in the past.
Typically, such reports have "grand total" lines - your widget report might be broken down by month, region, sales person, product type, etc. - but you usually have a "total widgets sold" line. If that's a quick query (you may need to remove joins and other refinements) then running that query after you've generated the report data allows you to compare your report grand total with the grand total at the end of the report. If the results are different, you know that the data changed while running the report.
Another option - SQLServer specific - is to use a checksum over the data you're reporting on. If the checksum changes between the start and end of the reporting run, you know you've had data changes.
Finally - and most dramatically - if the report's accuracy is critical, you can store the fact that a particular row was included in a reporting run. This makes your report much more complex, but it allows you to be clear that you've included all the data you need. For instance:
insert into reporting_history
select #reportID, widget_sales_id
from widget_sales
--- reporting logic here
select widget.cost,
widget_sales.date,
widget_sales.price,
widget_sales......
from widgets inner join widget sales on ...
inner join reporting_history on widget_sales.widget_sales_id = widget_sales.widget_sales_id
---- all your other logic

SQL Server/Table Design, table for data snapshots where hundreds of columns possible [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We have a business process that requires taking a "snapshot" of portions of a client's data at a point in time, and being able to regurgitate it later. The data set has some oddities though that make the problem interesting:
The data is pulled from several databases, some of which are not ours.
The list of fields that could possibly be pulled are somewhere between 150 and 200
The list of fields that are typically pulled are somewhere between 10 and 20.
Each client can pull a custom set of fields for storage, this set is pre-determined ahead of time.
For example (and I have vastly oversimplified these):
Client A decides on Fridays to take a snapshot of customer addresses (1 record per customer address).
Client B decides on alternate Tuesdays to take a snapshot of summary invoice information (1 record per type of invoice).
Client C monthly summarizes hours worked by each department (1 record per department).
When each of these periods happen, a process goes out and fetches the appropriate information for each of these clients... and does something with them.
Sounds like an historical reporting system, right? It kind of is. The data is later parsed up and regurgitated in a variety of formats (xml, cvs, excel, text files, etc..) depending on the client's needs.
I get to rewrite this.
Since we don't own all of the databases, I can't just keep references to the data around. Some of that data is overwritten periodically anyway. I actually need to find the appropriate data and set it aside.
I'm hoping someone has a clever way of approaching the table design for such a beast. The methods that come to mind, all with their own drawbacks:
A dataset table (data set id, date captured, etc...);
A data table (data set id, row number, "data as a blob of crap")
A dataset table (data set id, date captured, etc....);
A data table (data set id, row number, possible field 1, possible field 2, possible field 3, ...., possible field x (where x > 150)
A dataset table (data set id, date captured, etc...); A field table (1 row per all possible field types); A selected field table (1 row for each field the client has selected); One table for each primitive data type possible (varchar, decimal, integer) (keyed on selected field, data set id, row, position, data is the single field value).
The first being the easiest to implement, but the "blob of crap" would have to be engineered to be parseable to break it down into reportable fields. Not very database friendly either, not reportable, etc.. Doesn't feel right.
The second is a horror show of columns. shudder
The third sounds right, but kind of doesn't. It's 3NF (yes, I'm old) so feels right that way. However reporting on the table screams of "rows that should have been columns" problems -- fairly useless to try to select on outside of a program.
What are your thoughts?
RE: "where hundreds of columns possible"
The limitations are 1000 columns per table
http://msdn.microsoft.com/en-us/library/ms143432.aspx

Add a Record To Last Of The Table [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
When I use below query to add new record to the table (at unknown), the new records added to first of table .
I want to add new records to the last of table .
this my code :
begin
insert into TBLCrowler (Url,Title,ParentId,HasData) values (#Url,#Title,#ParentId,#HasData)
end
select top 1 CatId, Title, ParentId, Url, CrawlerCheck from TBLCrowler where CatId=(Select min(CatId) from TBLCrowler where CrawlerCheck=1)
update TBLCrowler set CrawlerCheck=2 , HasData=2
where CatId=(Select min(CatId) from TBLCrowler where CrawlerCheck=1)
Okay, once more into the breach.
You are using a relational database. It's not a worksheet, it's not a rectangular array of cells in a word document. It's power is reliant on being able to store and retrieve records in the most efficient way possible.
Ordering is either implied through an index, which the DBMS is free to ignore, and which could change anyway, or explicitly required through an order by statement
If you want things ordered by the time they were added to the table, you add a created_at column and populate it at the time you perform the insert.
Then when you select from it you add Order By Created_At to your select statement.
If you want that ordering to be "fast" you add an index on the Created_At column, which then DBMS will make a brave attempt at using the index in order to avoid the cost of a full pass sort.
Step back and think for a minute about how you would write a DBMS. What would the cost of your implicit orderedness be. Any change to any but the "last" record in the "last" table would mean rewriting to disk every record in every table "after" it. Insert is worse, Delete is as bad and that's without considering that different records take up different amounts of space.
So throw first and last in the bin, if you can find them...

Merging SQL views [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am working for the IT Department of a college as a student worker and have very limited knowledge in using SQL (I am not majoring in Computer Science/Engineering). I'll try my best to describe what I want to accomplish here.
I want to create a table that includes info about new students, basically : id, first name, last name, need exams (Y/N), Course101 section, instructor.
My problem is, exchange and transfer students and also some first year students would not have to/did not sign up for Course101, so using WHERE studnt_course = 'Course101%' will leave out those students. I would like to pick up those students in my view and display their Course101 section, Instructor values as Null.
I am thinking about making two views, one for all new students, and one for students with Course101 only, and do some kind of merging/union but not sure how to actually do that.
Any help will be greatly appreciated!
It's still a bit vague what the current tables actually look like which makes it hard to give a good suggestion.
Based on what you've given us I'd suggest looking into a LEFT INNNER JOIN which would put NULL where the two tables don't overlap.
If you are interested in learning database design (rather than just solving this particular problem) I'd suggest looking into proper database design.
Create two queries. One for the 101 students and one for the overseas/transfers (you will need to work out what the WHERE clause if for that)
Get each one working to your satisfaction.
If every thing is the same apart from the WHERE clause then:
Then grab the conditions for each, wrap them in brackets and put an OR in between, put that into one query
So something like:
SELECT Name, Id,ShoeSize
FROM Students
WHERE (studnt_course = 'Course101%') OR (transferred = 1 OR is_exchange = 1)
i.e., So all students WHERE <this> OR <that> is true
Other wise (e.g., using diffrent tables):
Make sure that you are selecting the same column names in you SELECT statement
If one column has the same info but is call some thing different you can go:
<columnName> AS <NameYouWantToCallIt>
Make sure the column names are in the same order.
Then put the word UNION in-between the two queries.
This will combine the results from both queries (and remove duplicates).
SELECT Name, Id, ShoeSize
FROM Students
WHERE studnt_course = 'Course101%'
UNION
SELECT Name, Id, FootSize AS ShoeSize
FROM exchangeStudents
WHERE transferred = 1 OR is_exchange = 1