Sql server issue dealing with huge volume of data - sql-server-2005

i have an requirment like this i need to delete all the customer who have not done transaaction for the past 800 days
i have an table customer where customerID is the primary key
*creditcard table have columns customerID,CreditcardID, where creditcard is an primary key*
Transcation table having column transactiondatetime, CreditcardID,CreditcardTransactionID here is the primarary key in this table.
All the transcationtable data is in the view called CreditcardTransaction so i am using the view to get the information
i have written an query to get the creditcard who have done transaction for the past 800 days and get their CreditcardID and store it in table
as the volume of data in CreditcardTransaction view is around 60 millon data the query what i have writeen fails and logs an message log file is full and throws message system out of memory exception.
INSERT INTO Tempcard
SELECT CreditcardID,transactiondatetime
FROM CreditcardTransaction WHERE
DATEDIFF(DAY ,CreditcardTransaction.transactiondatetime ,getdate())>600
As i need to get the CreditcardID when was their last Transactiondatetime
Need to show their data in an Excel sheet so, i am dumping in data in an Table then insert them into Excel.
what is teh best solution i show go ahead here
i am using an SSIS package(vs 2008 R2) where i call an SP dump data into Table then do few business logic finally insert data in to excel sheet.
Thanks
prince

One thought: Using a function in a Where clause can slow things down - considerably. Consider adding a column named IdleTransactionDays. This will allow you to use the DateDiff function in a Select clause. Later, you can query the Tempcard table to return the records with IdleTransactionDays greater than 600 - similar to this:
declare #DMinus600 datetime =
INSERT INTO Tempcard
(CreditcardID,transactiondatetime,IdleTransactionDays)
SELECT CreditcardID,transactiondatetime, DATEDIFF(DAY ,CreditcardTransaction.transactiondatetime ,getdate())
FROM CreditcardTransaction
Select * From Tempcard
Where IdleTransactionDays>600
Hope this helps,
Andy

Currently you're inserting those records row by row. You could create a SSIS package that reads your data with an OLEDB Source component, performs the necessary operations and bulk inserts them (a minimally logged operation) into your destination table.
You could also directly output your rows into an Excel file. Writing rows to an intermediate table decreases performance.
If your source query still times out, investigate if any indexes exist and that they are not too fragmented.
You could also partition your source data by year (based on transactiondatetime). This way the data will be loaded in bursts.

Related

How can we get complete records of a column from netezza db table.? By default i am getting only 1000 records

How can we get complete records of a column from netezza db table.? By default i am getting only 1000 records.
I am not using any limit keyword to minimize the tailor the output.
For instance,
SELECT Account_number from Customers
In output grid i am getting only 1000 records and After downloading the same output the records remains the same.
I Want all the records, I have checked the Account_Number column contains more than 20k records.
To overcome with such issue, We Shall create a Temporary table and dump your query output in the temp table.
Post table is created then go to
query >Current query >Query option dialog will appear.
In that check the option 'keep connection open between executions'
This you have to do it everytime you fire the sql query. It is bit irritating. But you have to manage. 😊

Adding Columns - SQL Server Tables

I have been asked to look into a manual process that one of my colleagues is completing every now and again.
He sometimes needs to add a new column onto a large table (200 million rows), it is taking him more than 1 hour to do this. Before you ask, yes, the columns are nullable but sometimes the new column will have 90% data in it.
Instead of adding a new column to the existing table, he...
Creates a new table
Select (*) from old table (inserts into new)
Adds the new column as part of his script
Then he deletes the old table and renames the new table back to the original, adds index and then compresses. He says it much quicker like that.
If this is the best way then I will try and write SSIS package to try and make the process more seamless
Any advice is welcome!
Thanks
creating a new table structure and moving all the data to that table and delete the prior table is a good way just for a few data,you can do it by wizard in SQL Server. but it is the worst way for solving this problem(millions of data).
for large amount of data (millions of records) you should use "Alter Table".
Alter Table MyTable
ADD NewColumn nvarchar(10) null
the new column will add to the table as the last column.
if you use this script it takes less that one second because all data will not moving,you just add a new column in to the table.
but if you use the wizard method as you mentioned with millions of data records it takes hours.
as Ali says
alter Table MyTable
ADD NewColumn nvarchar(10) null
but then to fill in 90% of data. As he has a table already with it in and the key he's joining on in the copy so this is all he needs:
UPDATE MyTable
SET [NewColumn] = b.[NewColumn]
FROM MyTable a INNER JOIN NewColumnTable b ON a.[KeyField]= b.[KeyField]
would be a lot quicker. You could do it in SSIS but if this happens a lot then not really worth it for a few lines of SQL.

How to add dates to database records for trending analysis

I have a SQL server database table that contain a few thousand records. These records are populated by PowerShell scripts on a weekly basis. These scripts basically overwrite last weeks data so the table only has information pertaining to the previous week. I would like to be able to take a copy of that tables data each week and add a date column with that day's date beside each record. I need this so can can do trend analysis in the future.
Unfortunately, I don't have access to the PowerShell scripts to edit them. Is there any way I can accomplish this using MS SQL server or some other way?
You can do the following. Create a table that will contain the clone + dates. Insert the results from your original table along with the date into your clone table. From your description you don't need a where clause because the results of the original table are wiped out only holding new data. After the initial table creation there is no need to do it again. You'll just simply do the insert piece. Obviously the below is very basic and is just to provide you the framework.
CREATE TABLE yourTableClone
(
col1 int
col2 varchar(5)...
col5 date
)
insert into yourTableClone
select *, getdate()
from yourOriginalTable

Sql Server 2008 partition table based on insert date

My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
OR
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?
I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.

Copy data between tables in different databases without PK's ( like synchronizing )

I have a table ( A ) in a database that doesn't have PK's it has about 300 k records.
I have a subset copy ( B ) of that table in other database, this has only 50k and contains a backup for a given time range ( july data ).
I want to copy from the table B the missing records into table A without duplicating existing records of course. ( I can create a database link to make things easier )
What strategy can I follow to succesfully insert into A the missing rows from B.
These are the table columns:
IDLETIME NUMBER
ACTIVITY NUMBER
ROLE NUMBER
DURATION NUMBER
FINISHDATE DATE
USERID NUMBER
.. 40 extra varchar columns here ...
My biggest concern is the lack of PK. Can I create something like a hash or a PK using all the columns?
What could be a possible way to proceed in this case?
I'm using Oracle 9i in table A and Oracle XE ( 10 ) in B
The approximate number of elements to copy is 20,000
Thanks in advance.
If the data volumes are small enough, I'd go with the following
CREATE DATABASE LINK A CONNECT TO ... IDENTIFIED BY ... USING ....;
INSERT INTO COPY
SELECT * FROM table#A
MINUS
SELECT * FROM COPY;
You say there are about 20,000 to copy, but not how many in the entire dataset.
The other option is to delete the current contents of the copy and insert the entire contents of the original table.
If the full datasets are large, you could go with a hash, but I suspect that it would still try to drag the entire dataset across the DB link to apply the hash in the local database.
As long as no duplicate rows should exist in the table, you could apply a Unique or Primary key to all columns. If the overhead of a key/index would be to much to maintain, you could also query the database in your application to see whether it exists, and only perform the insert if it is absent