I'm a new CDS/Dataverse user and am wondering why there are so many columns in new tables? - common-data-service

I'm new to CDS/Dataverse, coming from the SQL Server world. I created a new Dataverse table and there are over a dozen columns in my "new" table (e.g. "status", "version number"). Apparently these are added automatically. Why is this?
Also, there doesn't seem to be a way to view a grid of data (like I can with SQL Server) for quick review/modification of the data. Is there a way to view data visually like this?
Any tips for a new user, coming from SQL Server, would be appreciated. Thanks.
Edit: clarified the main question with examples (column names). (thanks David)

I am also new to CDS/Dataverse, so the following is a limited understanding from what I have explored so far.
The idea behind Dataverse is that it gives you a pre-built schema that follows best-practice for you build off of, so that you spend less time worrying about building a comprehensive data schema, creating tables, and how to relate them all together, and more time building applications in Power Apps.
For example, amongst the several dozen tables it generates from the get-go is Account and Contact. The former is for organisational entities and the latter is for single-person entities. You can go straight into adding your user records in one of these tables and take advantage of bits of Power Apps functionality already hooked up to these tables. You do not have to spend time thinking up column names, creating the table, making sure it hooks up to all the other Dataverse tables, testing whether the Power Apps functionality works with it correctly etc.
It is much the same story with the automatically generated columns for new tables: they are all there to maintain a best-practice schema and functionality for Power Apps. For example, the extra columns give you good auditing with the data you add, including when a row was created, modified, who created the row etc. The important thing is to start from what you want to build, and not get too caught up in the extra tables/columns. After a bit of research, you'll probably find you can utilise some more tables/columns in your design.
Viewing and adding data is very tedious -- it seems to take 5 clicks and several seconds to load the bit of data you want, which is eons in comparison to doing it in SQL Server. I believe it is how it is due to Microsoft's attempt to make it "user friendly".
Anyhow, the standard way to view data, starting from the main Power Apps view is:
From the right-hand side pane, click Data
Click Tables
From the list of tables, click your table
Along the top row, click Data
There is an alternative method that allows you to view the Dataverse tables in SSMS – see link below:
https://www.strategy365.co.uk/using-sql-to-query-the-common-data-service/
To import data in bulk:
Click on Data from the top drop-down menu > Get data.
Importing data from Excel is free. To import from other sources, including SQL Server, I believe is a paid service (although I think you may be able to do this on the free Community Plan).

Related

Power BI maxing connections to DB :( Can we populate multiple tables with single Sql.Database call?

I am assisting my team troubleshoot an issue with a Power BI report we are developing. We have a rather complex data model in the source SQL database, so we have created 5-6 views to better manage the data. We have a requirement to use DirectQuery, as one key requirement for the report is that the most up-to-date data in the database is visible, rather than having a delay in loading/caching the data. We also have the single data source, just the one database.
When we run the report, we see a spike of 200-500 connections to the database from the specific user for the report data source, and those connections don't close. This is clearly an issue and unsustainable for any product. We have a ticket open with Microsoft premium support to address the connections not closing, but in the meantime, I'm wondering if we're doing something wrong inside the report?
When I view the queries in the query editor, we basically have one query for each view, and it's a simple:
let
Source = Sql.Database(Server, Database)
query_view_name = Source{[Schema ......]}[Data]
in
query_view_name
(I don't have the raw code in front of me, but that's the gist of it.)
It seems to me, based on analytics in the database, that "Sql.Database" is opening a new connection every time this view is called. And with 5-6 views, that's 5-6 connections at a minimum; then each time a filter is changed, it's more connections, and it's compounds from there until the database connection pool is maxed out.
Is there a way to populate all the tables using a single connection to the database? Why would Power BI be using so many connections? Can we populate multiple tables in the advanced query editor? Using DirectQuery, are there any suggestions for what we can look at/troubleshoot/change in the report?
Thanks!
Power BI establishes multiple connections to the database to load multiple tables in parallel. If you don't want this, you can turn it off from Options->Current file->Data Load->Enable parallel loading of tables:
Keep in mind, that turning this option off most likely will increase the model loading time.
You may want to take a look at Maximum connections per data source option in Options->Current file->Direct query and the whole section Query reduction beneat it. Turning on Slicer selection and Filter selection on this page is highly recommended for cases like yours, but you need to train your users that they need to click on apply to see the results.
Ok.
We have a rather complex data model in the source SQL database, so we have created 5-6 views to better manage the data.
That's fine.
We have a requirement to use DirectQuery,
But now you're going to have a bad time. DirectQuery + complex views is a recipe for poor performance. Queries against your views will add joins, potentially across the whole model for filter context, as well as Measure and Calculated Column expressions. And these queries will change dynamically, based on the user's interaction with the report. So it's very difficult to see and test all the possible queries.
Basic guidance is to use import mode against views, and only use DirectQuery against properly-indexed tables. To address data freshness, you can replace the views with tables you load and keep up-to-date from your application, or perhaps use an Indexed View, etc.

Best way to create a shared data source with Report Builder

I am using report builder 3.0 (very similar to SQL server reporting services) to create reports for users on an application using SQL server 2012 database.
To set the scene, we have a database with over 1200 tables. We actually only need about 100 of these for reporting purposes. But it is very common that we need to combine fields from multiple tables together to get a common resource of data that my colleagues and I need for our reports.
Eg if I want a view of a customer, I would want to bring in information about the customer from the customer_table, information about his phone details from the Phone table, information about his account(s) from the accounts table and so on. Then I might need another view of the accounts - account type, various balance amounts, opening date, status etc.
What I would love to do is create a "customer view" where we combine all these fields into a single combined virtual table. Then we have an "Accounts view". It would be easier to use, easier to manage etc. Then we use this for all our reports going forwards. And when we need to, we can combine the customer and accounts view to use on a report plus actual tables into one combo-dataset to use on a report.
I am unsure about the right way to do this.
I see I can create a data source. This doesn't seem right as this appears to be what one might do if working off 2 or more databases. We are using just 1 database.
Then there are report models. It seems these are being deprecated and phased out so this doesn't seem a good option.
Finally I see we can create shared datasets. However, this option (as far as I can tell) won't allow me to combine this with another dataset. So using the example above, I won't be able to combine the customer view and the account view with this approach to use for a report to display details about the customer and his/her accounts.
Would appreciate guidance on the best way to achieve what I am trying to do...
Thanks
I can only speak from personal experience, but using the the data source approach has been good for our purposes. We have a single database with 50+ tables in it. This is linked to as a shared data source in the project so is available to all 50+ reports.
We then use Stored Procedures to make the information in the databases available to the reports, each report has it's own Stored Procedure that joins as many tables as required to provide the data for the report. The advantage of using Stored Procedures also allows you to only return rows you are interested in, rather than entire tables.
I'm not certain if this is the kind of answer that you were after, but describes how we solve a similar (smaller) issue.

What strategies are available for migrating Access databases to SQL server-based applications?

I'm considering undertaking a project to migrate a very large MS Access application to a new system based on SQL Server. The existing system is essentially an ERP application with a couple of dozen users, all sharing the Access database over the network. The database has around 300 tables and lots of messy VBA code. This system is beginning to break down (actually, it's amazing it has worked as long as it has).
Due to the size and complexity of the Access application, a 'big bang' approach is not really feasible. It seems sensible to rope off chunks of functionality and migrate them piecemeal to the new system. During the migration process, which I expect to take several months, there may be a need for both databases to be in operation and be able to query and modify data in both systems.
I have considered using something like the ADO.NET Entity Framework to implement a data abstraction layer to handle this, but as far as I can tell, the Entity Framework has no Access provider.
Does my approach seem reasonable? What other strategies have people used to accomplish similar goals?
You may find that the main problem is using the MS Access JET engine as the backend. I'm assuming that you do have an Access FE (frontend) with all objects except tables, and a BE (backend - tables only).
You may find that migrating the data to SQL Server, and linking the Access FE to that, would help alleviate problems immediately.
Then, if you don't want to continue to use MS Access as the FE, you could consider breaking it up into 'modules', and redesign modules one by one using a separate development platform.
We faced a similar situation a few years ago, but we knew from the beginning that we'll have to swich one day to SQL SERVER, so the whole code was written to work from an Access client to both Access AND SQL server databases.
The idea of having a 'one-step' migration to SQL server is certainly the easier way to manage this on the database side, and there are many tools for that. But, depending on the way your client app talks to the database, your code might then not work properly. If, for example, your code includes a lot of SQL instructions (or generates them on the fly by, for example, adding filters to SELECT instructions), your syntax might not be 'SQL server' compatible: access wildcards, dates, functions, will not work on SQL server.
In addition to this, and as said by #mjv, the other drawback of a one time switch to MS SQL is that you will inheritate many of the problems from the original database: wrong or inapropriate field names, inapropriate primary/foreign key policies, hidden one-to-many relations that you'd like to implement in the new database model, etc.
I'll propose here some principles and rules to implement a 'soft transition' solution, which clearly best fits you. Just to say that it's not going to be easy, but it's definitely very interesting, paticularly when dealing with 300 tables! Lucky you!
I assume here that yo have the ability to update the client code, and you'd prefer to keep at all times the same client interface. It is of course possible to have at transition time two different interfaces, one for each database, but this will be very confusing for the users, and a permanent source of frustration for them.
According to me, the best solution strongly depend on:
The original connection technology,
and the way data is managed in your
client's code: Access linked tables,
ODBC, ADODB, recordset, local
tables, forms recordsources, batch
updating, etc.
The possibilities to split your
tables and your app in 'mostly
independant' modules.
And you will not spare the following mandatory activities:
setup up of a transfer
procedure from Access database to SQL server. You
can use already existing tools (The
access upsizing wizard is very poor,
so do not hesitate to buy a real
one, like SSW or EMS SQL Manager,
very powerfull) or build your own
one with Visual Basic. If your plan
is to make some changes in Data
Definition, you'll definitely have
to write some code. Keep in mind
that you will run this code
maaaaaany times, so make sure that
it includes all time-saving
instructions that will allow you to
restart the process from the start
as many times as you want. You will
have to choose between 2 basic data
import strategies when importing data:
a - DELETE existing record, then INSERT imported record
b - UPDATE existing record from imported record
If you plan to switch to new Primary\foreign key types, you'll have to keep track of old identifiers in your new database model during the transition period. Do not hesitate to switch to GUID Primary Keys at this stage, especially if the plan is to replicate data on multiple sites one of these days.
This transfer procedure will be divided in modules corresponding to the 'logical' modules defined previously, and you should be able to run any of these modules independantly (keeping of course in mind that they'll probably have to be implemented in a specific order, where the 'customers' module has to run before the 'invoicing' module).
implement in your client's code the possibility to connect to both original ms-access database and new MS SQL server. Ideally, you should be able to manage from within your code both connections for displaying and validating data.
This possibility will be implemented by modules, where you will have, for each of them, a 'trial period', ie the possibility to choose at testing time between access connection and sql connection when using the module. Once testing is done and complete, the module can then be run in exclusive SQL server mode.
During the transfer period, that can last a few months, you will have to manage programatically the database constraints that exist between 'SQL server' modules and 'Access' modules. Going back to our customers/invoicing example, the customers module will be first switched to MS SQL. Before the Invoicing module can be switched, you'll have to implement programmatically the one to many relations between Customers and Invoices, where each of the tables will be in a different database. Such a constraint can be implemented on the Invoice form by populating the Customers combobox with the Customers recordset from the SQL server.
My proposal is to build your modules following your database model, allways beginning with the 'one' tables or your 'one-to-many' relations: basic lists like 'Units', 'Currencies', 'Countries', shall be switched first. You'll have a first 'hands on' experience in writting data transfer code, and managing a second connection in your client interface. You'll be then able to 'go up' in your database model, switching the 'products' and 'customers' tables (where units, countries and currencies are foreign keys) to the new server.
Good luck!
I would second the suggestion to upsize the back end to SQL Server as step 1.
I would never go to the suggested Step 2, though (i.e., replacing the Access front end with something else). I would instead suggest investing the effort in fixing the flaws of the schema, and adjusting the Access app to work with the new schema.
Obviously, it is never the case that everything just works hunky dory when you upsize -- some things that were previously quite fast will be dogs, and some things that were previously quite slow will be fast. And I've found that it is often the case that the problems are very often not where you anticipate that they will be. You can only figure out what needs to be fixed by testing.
Basically, anything that works poorly gets re-architected, or moved entirely server-side.
Leverage the investment in the existing Access app rather than tossing all that out and starting from scratch. Access is a fine front end for a SQL Server back end as long as you don't assume it's going to work just the same way as it would with a Jet/ACE back end.
...thinking out loud... I think this may work.
I appears that the complexity of the application resides in the various VBA modules rather than the database table/schema themselves. A possible migration path could therefore be to first migrate the data storage to SQL server, exactly as-is, as follow:
prevent any change to the data for a few hours
duplicate all tables to the SQL server; be sure to create the same indexes as well.
create linked tables to ODBC Source pointing to the newly created tables on SQL Server
these tables should have the very same name as the original tables (which therefore may require being renamed, say with a leading underscore, for possible reference).
Now, the application can be restarted and should be using the SQL tables rather than the Access tables. All logic should work as previously (right...), possible slowness to be expected, depending on the distance between the two machines.
All the above could be tested in about a day's work or so; the most tedious being the creation of the tables on SQL server (much of that can be automated, I'm sure). The next most tedious task is to assert that the application effectively works as previously, but with its storage on SQL.
EDIT: As suggested by a comment, I should stress that there is a [fair ?] possibility that the application would not readily work so smoothly under SQL server back-end, and could require weeks of hard work in testing and fixing. However, and unless some of these difficulties can be anticipated because of insight into the application not expressed in the question, I propose that attempting the "As-is" migration to SQL Server should be considered; after all, it may just work with minimal effort, and if it doesn't, we'd know this very quickly. This is therefore a hi-return, low risk proposal...
The main advantage sought with this approach is that there will be a single storage during the [as the OP expects] longer period during which the old Access application will co-exist with the new application.
The drawback of this approach, is that, at least at first, the schema of original database is reproduced verbatim, i.e. including some of its known quirks and legacy-herited idiosyncrasies. These schema issues (and the underlying application logic) can be in time corrected, but this is of course less easy than if the new application starts ab initio, with its own, separate, storage, and distinct schema.
After the storage is moved to SQL server, the most used and/or the most independent modules of the Access application can be re-written in the new application, and as significant portions of the original application is ported, effective usage, by select beta testers or by actual users can start to be switched to the new application.
Possibly, some kind of screen-scraping based logic or some other system could be used to produce an hybrid application which would provide the end users with a comprehensive application, which sometimes work from new logic, and sometimes from the original MS-Access program.

Purging SQL Tables from large DB?

The site I am working on as a student will be redesigned and released in the near future and I have been assigned the task of manually searching through every table in the DB the site uses to find tables we can consider for deletion. I'm doing the search through every HTML files source code in dreamweaver but I was hoping there is an automated way to check my work. Does anyone have any suggestions as to how this is done in the business world?
If you search through the code, you may find SQL that is never used, because the users never choose those options in the application.
Instead, I would suggest that you turn on auditing on the database and log what SQL is actually used. For example in Oracle you would do it like this. Other major database servers have similar capabilities.
From the log data you can identify not only what tables are being used, but their frequency of use. If there are any tables in the schema that do not show up during a week of auditing, or show up only rarely, then you could investigate this in the code using text search tools.
Once you have candidate tables to remove from the database, and approval from your manager, then don't just drop the tables, create them again as an empty table, or put one dummy record in the table with mostly null values (or zero or blank) in the fields, except for name and descriptive fields where you can put something like "DELETED" "Report error DELE to support center", etc. That way, the application won't fail with a hard error, and you have a chance at finding out what users are doing when they end up with these unused tables.
Reverse engineer the DB (Visio, Toad, etc...), document the structure and ask designers of the new site what they need -- then refactor.
I would start by combing through the HTML source for keywords:
SELECT
INSERT
UPDATE
DELETE
...using grep/etc. None of these are HTML entities, and you can't reliably use table names because you could be dealing with views (assuming any exist in the system). Then you have to pour over the statements themselves to determine what is being used.
If [hopefully] functions and/or stored procedures were used in the system, most DBs have a reference feature to check for dependencies.
This would be a good time to create a Design Document on a screen by screen basis, listing the attributes on screen & where the value(s) come from in the database at the table.column level.
Compile your list of tables used, and compare to what's actually in the database.
If the table names are specified in the HTML source (and if that's the only place they are ever specified!), you can do a Search in Files for the name of each table in the DB. If there are a lot of tables, consider using a tool like grep and creating a script that runs grep against the source code base (HTML files plus any others that can reference the table by name) for each table name.
Having said that, I would still follow Damir's advice and take a list of deletion candidates to the data designers for validation.
I'm guessing you don't have any tests in place around the data access or the UI, so there's no way to verify what is and isn't used. Provided that the data access is consistent, scripting will be your best bet. Have it search out the tables/views/stored procedures that are being called and dump those to a file to analyze further. That will at least give you a list of everything that is actually called from some place. As for if those pages are actually used anywhere, that's another story.
Once you have the list of the database elements that are being called, compare that with a list of the user-defined elements that are in the database. That will give you the ones that could potentially be deleted.
All that being said, if the site is being redesigned then a fresh database schema may actually be a better approach. It's usually less intensive to start fresh and import the old data than it is to find dead tables and fields.

Ideas for Combining Thousand Databases into One Database

We have a SQL server that has a database for each client, and we have hundreds of clients. So imagine the following: database001, database002, database003, ..., database999. We want to combine all of these databases into one database.
Our thoughts are to add a siteId column, 001, 002, 003, ..., 999.
We are exploring options to make this transition as smoothly as possible. And we would LOVE to hear any ideas you have. It's proving to be a VERY challenging problem.
I've heard of a technique that would create a view that would match and then filter.
Any ideas guys?
Create a client database id for each of the client databases. You will use this id to keep the data logically separated. This is the "site id" concept, but you can use a derived key (identity field) instead of manually creating these numbers. Create a table that has database name and id, with any other metadata you need.
The next step would be to create an SSIS package that gets the ID for the database in question and adds it to the tables that have to have their data separated out logically. You then can run that same package over each database with the lookup for ID for the database in question.
After you have a unique id for the data that is unique, and have imported the data, you will have to alter your apps to fit the new schema (actually before, or you are pretty much screwed).
If you want to do this in steps, you can create views or functions in the different "databases" so the old client can still hit the client's data, even though it has been moved. This step may not be necessary if you deploy with some downtime.
The method I propose is fairly flexible and can be applied to one client at a time, depending on your client application deployment methodology.
Why do you want to do that?
You can read about Multi-Tenant Data Architecture and also listen to SO #19 (around 40-50 min) about this design.
The "site-id" solution is what's done.
Another possibility that may not work out as well (but is still appealing) is multiple schemas within a single database. You can pull common tables into a "common" schema, and leave the customer-specific stuff in customer-specific schema. In some database products, however, the each schema is -- effectively -- a separate database. In other products (Oracle, DB2, for example) you can easily write queries that work in multiple schemas.
Also note that -- as an optimization -- you may not need to add siteId column to EVERY table.
Sometimes you have a "contains" relationship. It's a master-detail FK, often defined with a cascade delete so that detail cannot exist without the parent. In this case, the children don't need siteId because they don't have an independent existence.
Your first step will be to determine if these databases even have the same structure. Even if you think they do, you need to compare them to make sure they do. Chances are there will be some that are customized or missed an upgrade cycle or two.
Now depending on the number of clients and the number of records per client, your tables may get huge. Are you sure this will not create a performance problem? At any rate you may need to take a fresh look at indexing. You may need a much more powerful set of servers and may also need to partion by client anyway for performance.
Next, yes each table will need a site id of some sort. Further, depending on your design, you may have primary keys that are now no longer unique. You may need to redefine all primary keys to include the siteid. Always index this field when you add it.
Now all your queries, stored procs, views, udfs will need to be rewritten to ensure that the siteid is part of them. PAy particular attention to any dynamic SQL. Otherwise you could be showing client A's information to client B. Clients don't tend to like that. We brought a client from a separate database into the main application one time (when they decided they didn't still want to pay for a separate server). The developer missed just one place where client_id had to be added. Unfortunately, that sent emails to every client concerning this client's proprietary information and to make matters worse, it was a nightly process that ran in the middle of the night, so it wasn't known about until the next day. (the developer was very lucky not to get fired.) The point is be very very careful when you do this and test, test, test, and test some more. Make sure to test all automated behind the scenes stuff as well as the UI stuff.
what I was explaining in Florence towards the end of last year is if you had to keep the database names and the logical layer of the database the same for the application. In that case you'd do the following:
Collapse all the data into consolidated tables into one master, consolidated database (hereafter referred to as the consolidated DB).
Those tables would have to have an identifier like SiteID.
Create the new databases with the existing names.
Create views with the old table names which use row-level security to query the tables in the consolidated DB, but using the SiteID to filter.
Set up the databases for cross-database ownership chaining so that the service accounts can't "accidentally" query the base tables in the consolidated DB. Access must happen through the views or through stored procedures and other constructs that will enforce row-level security. Now, if it's the same service account for all sites, you can avoid the cross DB ownership chaining and assign the rights on the objects in the consolidated DB.
Rewrite the stored procedures to either handle the change (since they are now referring to views and they don't know to hit the base tables and include SiteID) or use InsteadOf Triggers on the views to intercept update requests and put the appropriate site specific information into the base tables.
If the data is large you could look at using a partioned view. This would simplify your access code as all you'd have to maintain is the view; however, if the data is not large, just add a column to identify the customer.
Depending on what the data is and your security requirements the threat of cross contamination may be a show stopper.
Assuming you have considered this and deem it "safe enough". You may need/want to create VIEWS or impose some other access control to prevent customers from seeing each-other's data.
IIRC a product called "Trusted Oracle" had the ability to partition data based on such a key (about the time Oracle 7 or 8 was out). The idea was that any given query would automagically have "and sourceKey = #userSecurityKey" (or some such) appended. The feature may have been rolled into later versions of the popular commercial product.
To expand on Gregory's answer, you can also make a parent ssis that calls the package doing the actual moving within a foreach loop container.
The parent package queries a config table and puts this in an object variable. The foreach loop then uses this recordset to pass variables to the package, such as your database name and any other details the package might need.
You table could list all of your client databases and have a flag to mark when you are ready to move them. This way you are not sitting around running the ssis package on 32,767 databases. I'm hooked on the foreach loop in ssis.