I use oracle database. I have many table with data very big (300-500 million record).
I use query statement have join many table together. I set index for table but get report very slow.
Please, help me solution when working with big data.
Thanks.
Do you really need to have all the data at once?
Try creating a table that stores just the information you need for the report, and run a query once a day (or few hours) to update the table. You can also use Sql Server Integration Services, although I have not tried SSIS with Oracle myself.
I agree with the other users, you really need to give more info on the problem.
Related
Due to the way our database is stored, we have tables for each significant event that occurs within a products life:
Acquired
Sold
Delivered
I need to go through and find the status of a product at any given time. In order to do so I'd need to query all of the tables within the schema and find the record with the most up to date record. I know this is possible by union-ing all tables and then finding the MAX timestamp but I wonder if there's a more elegant solution?
Is it possible to query all tables by just querying the root schema or database? Is there a way to loop through all tables within the schema and substitute that into the FROM clause?
Any help is appreciated.
Thanks
You could write a Stored Procedure but, IMO, that would only be worth the effort (and more elegant) if the list of tables changed regularly.
If the list of tables is relatively fixed then creating a UNION statement is probably the most elegant solution and relatively trivial to create - if you plan to use it regularly then just create it as a View.
The way I always approach this type of problem (creating the same SQL for multiple tables) is to dump the list of tables out into Excel, generate the SQL statement for the first table using functions, copy this function down for all the table names and then concatenate all these statements in a final function. You can then just paste this text back into your SQL editor
I heavily use bigQuery and there are now quite a number of intermediate tables. Because teammates can upload their own tables, I do not understand all the tables well.
I want to check if a table have not been used for a long time, then check if it can be deleted manually.
Is there anyone know how to do?
Many thanks
You could use logs if you have access. If you made yourself familiar with how to filter log entries you can find out about your usage quite easily: https://cloud.google.com/logging/docs/quickstart-sdk#explore
There's also the possibility of exporting logs to big query - so you could analyze them using SQL - I guess that's even more convenient.
You can get table specific meta data via the TABLES command.
SELECT *,TIMESTAMP_MILLIS(LAST_MODIFIED_TIME) ACCESS_DATE
FROM [DATASET].__TABLES__
The mentioned code snippet should provide you with the last access date.
I have a stored procedure in SQL Server that also queries tables in the same database and in a different Oracle database. This is for a data warehouse project that joins several large tables across databases and queries them.
Is it better to copy the table(with ~3 mil records) to the same database and then query it, or is the slowdown not significant from the table being in a different database? The query is complicated and can take hours.
I'm not necessarily looking for a specific answer, informed opinion and/or specific further reading are also very appreciated. Thanks!
I always prefer stage layer, or somebody calls it integration layer.
In your case (on blind) it's perhaps best solution to:
Copy table once
Create a sync step (Insert/Update) based on primary key(s)
Schedule step 2
Run your query
If there is some logical data-integrity rule, you can create second step by simple SQL based on timestamps.
I'm Working on My Program that Works With SQL Server.
for Store Data in Database Table, Which of the below approaches is correct?
Store Many Rows Just in One Table (10 Million Record)
Store Fewer Rows in Several Table (500000 Record) (exp: for each Year Create One Table)
It depends on how often you access data.If you are not using the old records, then you can archive those records. Splitting up of tables is not desirable as it may confuse you while fetching data.
I would say to store all the data in a single table, but implement a table partition on the older data. Partioning the data will increase query performance.
Here are some references:
http://www.mssqltips.com/sqlservertip/1914/sql-server-database-partitioning-myths-and-truths/
http://msdn.microsoft.com/en-us/library/ms188730.aspx
http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/
Please note that this table partioning functionality is only available in Enterprise Edition.
Well, it depends!
What are you going to do with the data? If you are querying this data a lot of times it could be a better solution to split the data in (for example) year tables. That way you would have a better performance since you have to query smaller tables.
But on the other side. With a bigger table and with good query's you might not even see a performance issue. If you only need to store this data it would be better to just use 1 table.
BTW For loading this data into the database you could use BCP (bulkcopy), which is a fast way of inserting a lot of rows.
If I have an MS Access database with linked tables from two different database servers (say one table from an SQL Server db and one from an Oracle db) and I write a query which JOINs those two tables, how will Access (or the Jet engine, I guess?) handle this query? Will it issue some SELECTs on each table first to get the fields I'm JOINing on, figurre out which rows match, then issue more SELECTs for those rows?
The key thing to understand is this:
Are you asking a question that Access/Jet can optimize before it sends its request to the two server databases? If you're joining the entirety of both tables, Jet will have to request both tables, which would be ugly.
If, on the other hand, you can provide criteria that limit one or both sides of the join, Access/Jet can be more efficient and request the filtered resultset instead of the full table.
Yep, you can have some serious performance issues. I have done this type of thing for years. Oracle, Sql, and DB2 - ugh. Sometimes I have had to set it up on a timer at 5:00am so when I get in at 7:00 it's done.
If your dataset is significant enough, it is often faster to build a table locally and then link the data. For remote datasets, also look into passthroughs.
For example, lets say you are pulling all of yesterday's customers from the oracle db and all of the customer purchases from the sql db. Let's say you have an average of 100 customers daily but a list of 30,000 and lets say your products have a list of 500,000. You could query the oracle db for your list of 100 customers, then write it as in IN statement in a passthrough query to the sql db. You'll get your data almost instantly.
Or if your recordsets are huge, build local tables of the two IDs, compare them locally and then just pull the necessary matches.
It's ugly but you can save yourself hours literally.
That would be my guess. It helps if there are indexes on both sides of the join but, as neither server has full control over the query, further query optimization is not possible.
I have no practical experience joining tables from two different data systems. However, depending on the requirements, etc, etc, you may find it faster to run SELECT queries with only the records and fields required into Access tables and do the final join and query in Access.