Various options for h2 - which is faster? - sql

Now I have two choices.
I have the same schema for all the data. the record stand for the connection between to hosts.So one record belongs to two hosts. now I do the action that once I need to get the connection of 1 host. I will insert the record in to h2. So if there is a connection between host1 and host2. Every time I query the connections of host1, I have already store the connection between host1 and host2. So when I query the info about host2. the record will be stored twice in the table. So I am thinking about to create a table for each host.
Here is the dilemma. Since there are a lots of hosts. So if I create a table for each host,the quantity of the tables will be huge. Is to query the huge but only one table faster or to query lots of smaller tables faster?(no join)
Thanks

Indexing the one table with lots of records is the way to go. It can become a maintenance nightmare if you don't, and indexing will take care of your search speed in the table. Plus let's assume that you have a huge amount of records, in Sql Server 2008 (and in 2005) you can partition the table into separate files which will help with speed as well. Even not in sql server, keeping the data in the same table is the way to go, especially if your schema for the table is the same for each host.

Related

SELECT INTO where source data are in other database than target table

I execute SELECT INTO query where my source data are in other database than the table I insert to (but on the same server).
When I execute the query using the same database where my source data are (USE DATABASE_MY_SOURCE_DATA), it completes in under a minute. When I change the database to the database where my target table sits, it doesn't complete in 10 minutes (I don't know the exact time because I cancelled it).
Why is that? Why is the difference so huge? I can't get my head around it.
Querying cross-database, even using a linked server connection, is always likely (at least in 2021) to present performance concerns.
The first problem is that the optimizer doesn't have access to estimate the number of rows in the remote table(s). It's also going to miss indexes on those tables, resorting to table scans (which tend to be a lot slower on large tables than index seeks).
Another issue is that there is no data caching, so the optimizer makes round-trips to the remote database for every necessary operation.
More information (from a great source):
https://www.brentozar.com/archive/2021/07/why-are-linked-server-queries-so-bad/
Assuming that you want this to be more performant, and that you are doing substantial filtering on the remote data source, you may see some performance benefit from creating - on the remote database - a view that filters to just the rows you want on the target table and query that for your results.
Alternatively (and likely more correctly) you should wrap these operations in an ETL process (such as SSIS) that better manages these connections.

Most efficient way to save result of SQL query in CDS/Dataverse into SQL Database table

For the purpose of my application I have created an Azure Function that connects with my Dataverse environment, query data with SELECT from different tables (I create new records based on records in tables A, B, C), stores the result into a list and then saves those records into same Dataverse environment but into another table (lets say D). I decided for such solution because Power Automate was creating those new records too slowly.
It works fine, however when there are too many requests (more than 2-3 users work with application and run Azure Functions) the save into Dataverse begins to be too slow too.
So I am thinking about another way to save and store those records. What important is that those records in table D are only for calculation purpose, users do not work with them or edit them. This is why I am thinking about creating SQL Datables table, store those records (only from table D) there, and just change connection in my application where needed.
Can you suggest me the most efficient way to do this? In a nutshell what I need is:
connect to Dataverse and query data from tables A, B, C. Result of this query will be records for table D
save the result of the query into SQL Database table (table D)
There are quite a few things to consider here.
If users don't use data of table D, could you maybe run this operation overnight or at a time when there is low traffic and slow performance of this operation is acceptable?
Have you considered using SQL views? Do you really need to store the computed data?
Perhaps you are inserting 1 item at a time? Are you using the Sql Bulk Copy Class?
Bulk Insert In SQL Server From C#
Observe the CPU utilisation of your server during this operation. If probably shoots to 100%. You want to hit 70% average utilisation for good trade-off between performance and cost. So another option is to scale up.

How do I etl big data between 2 SQL Server?

My primary datasource get 50M records per day. I need view record max delay about 5 minutes.
How I have best way to transfer data from primary SQL Server datasource to report SQL Server datasource.
At this time, I user merge join every 30seconds. But it seems effect to primary datasource performance.
The most common approach to minimize the load on your source server is to do periodic extracts using a timestamp, i.e. a simple SELECT ... WHERE timestamp > previous-max-timestamp-extracted.
The source table(s) need to provide a column that allows you to filter on un-extracted records. If that's completely impossible, you might extract e.g. the last hour's data into staging tables, and deduplicate with previously extracted records.
Yes, you could use CDC, but that's often more involved, and usually adds some restrictions.
Cheers, Kristian

Session level data in temporary tables and TVFs

I am working with a catalogue system at present with many user settings and preferences. As such when we setup a session we create a list of allowed products. These are currently stored in a table named like "allowedProducts_0001" where 0001 is the session ID.
We handle the data this way because there is a lot of complexity around product visibility that we do not wish to repeatedly process.
I have been asked to produce a TVF to select from this table, e.g.
SELECT * FROM allowedProducts('0001')
The problem I have is that I cannot query from a dynamic table name, even though the output would be in a static format.
I have considered creating a single table with a column for session id, hence removing the need for dynamic sql, but the table would be too large to be efficient (100k+ products per session for some clients with many open sessions at once).
I cannot use temp tables because the calling system doesn't keep the sql connection open constantly (several hundred possible sessions at once).
IWe're currently supporting back as far as MSSQL2008-R2, but have the option of upgrading to newer servers as part of an upgrade program.
I'm looking for suggestions of how to work around these conditions. Anybody have any ideas?
Many thanks in advance.

Archiving data in SQL

Need some advice on how best to approach this. Basically we have a few tables in our database along with archive versions of those tables for deleted data (e.g. Booking and Booking_archive). The table structure in both these tables is exactly the same, except for two extra columns in the archive table: DateDeleted and DeletedBy.
I have removed these archive tables, and just added the DateDeleted and DeletedBy columns to the actual table. My plan is to then partition this table so I can separate archived info from non-archived.
Is this the best approach? I just did not like the idea of having two tables just to distinguish between archived and non-archived data.
Any other suggestions/pointers for doing this?
The point of archiving is to improve performance, so I would say it's definitely better to separate the data into another table. In fact, I would go as far as creating an archive database on a separate server, and keeping the archived data there. That would yield the biggest performance gains. Runner-up architecture is a 2nd "archive" database on the same server with exactly duplicated tables.
Even with partitioning, you'll still have table-locking issues and hardware limitations slowing you down. Separate tables or dbs will eliminate the former, and separate server or one drive per partition could solve the latter.
As for storing the archived date, I don't think I would bother doing that on the production database. Might as well make that your timestamp on the archive-db tables, so when you insert the record it'll auto-stamp it with the datetime when it was archived.
The solution approach depends on:
Number of tables having such archive tables
What is arrival rate of data into archive tables ?
Do you want to invest in software/hardware of separate server
Based on above - various options could be:
Same database, different schema on same server
Archive database on same server
Archive database on different server
Don't go for partitioning if it's archived data and has no chance of getting back into main tables.
You might also add lifecycle management columns on archived data (retention period or expiry date) so that archive data lifecycle can be also managed effectively.