SSIS Import from Multiple Data Sources - sql

I have a little bit of a challenge. I have to consolidate some data that is coming from three different databases (Oracle, SQL Server, and Teradata).
How can I retrieve the data from TeraData and SQL Server based on the retrieve from Oracle?
For instance, Oracle has the sales information, TeraData has the client information, and SQL Server has the employee information.
I pull the list of sales from Oracle which has a list of client IDs and want to limit the TeraData pull based on those client IDs.
The clients then have an Employee identifier that ties to SQL Server.
I can connect to each individually but would like to limit the pulls from each.
Oracle returns about 3,000 rows while TeraData by itself returns 400,000 rows. The Oracle to TeraData is a many to 1 relationship (many oracle records to 1 TeraData record).
I have tried using the data source merge option but it runs each data source individually then merges them which ends up drastically increasing the amount of processing time due to the amount of records in TeraData.
Your assistance is appreciated. Thanks.

You pass over some SQL with an enormous IN string if you though it would reduce the record count: SELECT Sales.* FROM Teradata.Sales WHERE ClientID IN () You'd need to pre generate a static SQL string from something else before running against Teradata. You might run into SQL length issues if it's large.
Do you have a SQL Statement that retrieves unique client id's from Oracle?
SELECT DISTINCT ClientID FROM SCHEMA.SALES

Related

In DBeaver how to find Current largest fact table number of rows and size sql query

I'm using Dbeaver software and trying to write an sql query statement to find the Current largest fact table number of rows and size of the table.
How to write an sql query
My Table name is : transactionsale ( It is a fact table)
regards,
Prabhu

SQL Server, Efficient way to query multiple tables for a specific day of data

We have started saving the daily change data for a bunch of tables. We first put in all of the original data and than the change data with start and end dates for each table record (So not all records are saved each day just changed records). So to pull the data for a specific date I have to look at the beginning and end dates (end date might be null) and pull the MAX begin date to get the right records.
SELECT *
FROM dbo.DIM_EX_NAME_MASTER AS DEXNM
INNER JOIN
(SELECT APPID, MAX(DW_RECORD_START) AS StartDate
FROM dbo.DIM_EX_NAME_MASTER
WHERE (DW_RECORD_START < '4/4/2020') AND (DW_RECORD_END > '4/4/2020')
OR (DW_RECORD_START < '4/4/2020') AND (DW_RECORD_END IS NULL)
GROUP BY APPID) AS INNER_DEXNM ON DEXNM.APPID = INNER_DEXNM.APPID AND DEXNM.DW_RECORD_START = INNER_DEXNM.StartDate
So that's not so bad for one table but we want to build a report with a query that pulls from 25 tables with subqueries where the user selects the date to pull for.
That's going to be some really messy sql. Ideally I would like to create a view for each table and pass in the date as a parameter but SQL server doesn't allow for parameterized views.
Anyone have any ideas on how I can build multi-table date based queries without adding all of this extra sql per table?
Thanks for any help you can give!
SQL server doesn't allow for parameterized views.
No, but SQL Server does support user-defined table-valued functions.
This is pretty much exactly what you are asking for -- they can accept a date parameter and return the results as a table.

SQL Linked Server Join

I have 2 different SQL databases "IM" and "TR". These have different schemas.
IM has the table "BAL" which has 2 columns "Account" and "Balance".
TR has the table "POS" which has 2 columns "AccountId" and "Position".
Here the common link is BAL.Account=POS.AccountId.
The POS table has > 100k records. BAL has few only records as it shows only accounts which are new.
I want to run a select query on IM Databases' BAL table as follows:
Database: IM
Select Account, Balance from BAL
However, here the "Balance" should return the results from TR Database POS.Position based on BAL.Account=POS.AccountId
How can I achieve this in the fastest manner by not slowing the databases and considering that this query will be executed by lot of users every now and then. Should I use OPENQUERY? I will use where clause to shorten the return time.

Same query returning thousands of records in Access 2016 but only a hundred in SQL Server

I have an MS Access DB I'm working in and there's a linked table connected to a SQL Server DB. I'm creating a make table query and using the linked table as the connection. The purpose of the query is to pull in records with a termination date in the last 30 days.
My query in MSSMS looks like this
select name,title,u_term_date
from dbo.sys_user
where u_term_date >= DATEADD(day,-30, GETDATE())
and it returns 315 rows. This is the correct amount of rows.
My query in access looks like this
SELECT DISTINCTROW dbo_sys_user.active, dbo_sys_user.name, dbo_sys_user.u_term_date INTO Terminations
FROM dbo_sys_user
WHERE (((dbo_sys_user.active)=False) AND ((dbo_sys_user.u_term_date)>=DateAdd("d",-30,Date())));
and pulls in 16314 rows. I cannot figure out why this is. I've tried turning on unique values and unique records. I've tried narrowing it down to 10 days. Same thing. Any insights?

How to use sql to ignore a part of the table

I have an SQL table with 500,000 records in orders table.
The sql have been used for past 5 years and every year there are about 100,000 records added on the database.
The table has about 30 fields , one of the fields is "OrderDate"
The query needs only records for the last few months, maximum past 12 months.
so all the records before that are just useless and slow down all the query.
query is slow, and takes 3-4sec, same query was almost immediate few years ago.
i have to load and print all fields columns at once.
Can i make the SQL ignore and not look through part of records, suppose records with OrderDate before 2013, or first 400,000 records or ignore certain part of the records without deleting them?
As far as I can see you have two options:
Creating a new table which is identical to the old one and insert there the rows that you want to ignore, then, delete those same rows from the original table. This solution is good in the case that those rows are "useless" to every query (and if it's viable, you can update other queries that make use of those rows).
Index the column.
This is a classic use of table partitioning, but we don't know what type of SQL you're using so we don't know if it supports it. Add a tag for the version of SQL (SQL Server? Oracle?)