Best Approach for Flat table representing data from multiple sources

Best Approach for Flat table representing data from multiple sources - sql

We are using SQL Server 2014.
We have a couple dozen tables that we create reports against. What is the best approach report against all these tables with respect to SQL. Our need is a single flat table with select/specific columns from a number of these tables.
Options
Create views across the existing live tables using relationships
to produce the single representation of our data
Create some process that is scheduled and run at different times of the data to extract columnar data and insert/update an existing table (then
perform reporting against that table)
Rely on c#/data access code to collect data and push to the reporting tool d) others?
Is there a optimized way to use data warehouse features within SQL to create this single aggregated table?
thx

Related

Query all tables within a Snowflake Schema

Due to the way our database is stored, we have tables for each significant event that occurs within a products life:
Acquired
Sold
Delivered
I need to go through and find the status of a product at any given time. In order to do so I'd need to query all of the tables within the schema and find the record with the most up to date record. I know this is possible by union-ing all tables and then finding the MAX timestamp but I wonder if there's a more elegant solution?
Is it possible to query all tables by just querying the root schema or database? Is there a way to loop through all tables within the schema and substitute that into the FROM clause?
Any help is appreciated.
Thanks

You could write a Stored Procedure but, IMO, that would only be worth the effort (and more elegant) if the list of tables changed regularly.
If the list of tables is relatively fixed then creating a UNION statement is probably the most elegant solution and relatively trivial to create - if you plan to use it regularly then just create it as a View.
The way I always approach this type of problem (creating the same SQL for multiple tables) is to dump the list of tables out into Excel, generate the SQL statement for the first table using functions, copy this function down for all the table names and then concatenate all these statements in a final function. You can then just paste this text back into your SQL editor

How to dynamically use Arcmain or just partionly copy tables in Teradata?

I need to copy tables from one teradata server to another one pretty much tables. In order to solve this problem, I have been advised to use arcmain. So table can be transfered this way:
logon ZZZZ/YYYY,XXXX;
COPY DATA TABLES
(DATABASENAME.TABLENAME11) (FROM(DATABASENAME.TABLENAME1)),
(DATABASENAME.TABLENAME12) (FROM(DATABASENAME.TABLENAME2)),
(DATABASENAME.TABLENAME13) (FROM(DATABASENAME.TABLENAME3)),
RELEASE LOCK,
FILE=NVDSID1;
However I have some tables with same names among different databases, in addition this table need to transfered just their structure and some rows (let it be WHERE service_quality ='epic'). Is there solution how to just partionally copy tables?
Initially, I have been figured out quite different way:
1) I copy all these tables `s structure to temp database
2) Insert just required rows to them
3) Copy these tables to required DB on another table
But, one more time, tables with same names rune this solution, they just cant be simply pasted to same DB. Is that possibly to do these 3 steps in a loop, adding one more step - drop table to avoid conflicts?
Creating 100 temp DB really bad solution, and there are already tables with 30 symbols long name.
Any ideas?

What we have done in our environment is to build a set of databases where we populate tables containing slices of data that are destined for another environment. This is very similar to your approach except we use multiple databases and don't have redundant object names. We then use ARCMAIN to ARCHIVE these tables and COPY them to the destination environment.
If you have multiple tables that share the same name across databases I would suggest you create multiple databases to seed the slices of data unless the table structures are the same and the intent is to merge the slices on the target environment. Then you can merge the data in these seed tables for your archive process.
Other solutions include using FastExport and FastLoad or Teradata's Data Mover. The latter is likely going to require additional licensing from Teradata if you are not already using it. The former being script driven can be more flexible than ARCMAIN to accommodate the needs of your particular environment.

Union to already massive table vs. create new table

I have a table that I use for creating reports using powerpivot in Excel. The table has 30 columns and ~2M rows, and because of this massive # of columns and rows, the excel file is already ~20MB (and yes, I use every single column in that table).
The business has asked to add a new data in the excel, which will add three new columns and ~1M rows. I'm worried about what that will do to my excel and impact to the size. Unfortunately I cannot use join because the existing table has a different hierarchy level
I was thinking of modifying the stored procedures of the existing table to union the table that includes new data instead of pulling the new table to excel and create join within the powerpivot data model.
what's the "better" approach here? will the excel be smaller if I did a join within excel or have union within SQL?

Better approach is - don't feed Excel with massive union-style tables in order to do a pivot queries.
There are OLAP Cubes for exactly that purpose already invented. Excel can connect to this kind of datasource since quite some time.
In order to create the cube using Microsoft's SQL server take a look at Analysis Services. The name had changed several times and other DB providers also have this kind of support.
Basic concept is the same. Pump your data into an efficient normalized cross-indexed structure. Use Excel and other tools to analyze that structure.
The drawback is that you will not see the data "online" and in case of Microsoft's SQL you need to purchase special license to be able to use it.

sql server - create an indexed view using query results that concatenate row results into 1 column

I am trying to do this:
Concatenate many rows into a single text string?
And I want to have the query results join against other tables. So I want to have the csv queries be an indexed view.
I tried the CTE and XML queries to get the csv results and created views using these queries. But SQL Server prevented me from creating an index on these views because CTE and subqueries are not allowed for indexed views.
Are there any other good ways to be able to join a large CSV result set against other tables and still get fast performance? Thanks

Other way is to do materialization by yourself. You create table with required structure and fill it with content of your SELECT. After that you track changes manually and provide actual data in your "cache" table. You can do this by triggers on ALL tables, including in base SELECT (synchronous, but a LOT of pain in complex systems) or by async processing ( Jobs, self-written service, analysis of CDC logs and etc).

Is a snowflake schema better than a star schema for data mining?

I know the basic difference between a star schema and a snowflake schema-a snowflake schema breaks down dimension tables into multiple tables in order to normalize them, a star schema has only one "level" of dimension tables. But the Wikipedia article for Snowflake Schema says
"Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed."
When would it be impossible to write a query in a star schema that could be written in a snowflake schema for the same underlying data? It seems like a star schema would always allow the same queries.

For data mining, you almost always have to prepare your data -- mostly as one "flat table".
It may be a query, prepared view or CSV export -- depends on the tool and your preference.
Now, to properly understand that article, one would probably have to smoke-drink the same thing as the author when he/she wrote it.

As you mention, preparing a flat table for data mining starting from a relational database is no simple task, and the snowflake or the star schema only work up to a point.
However, there is a software called Dataconda that automatically creates a flat table from a DB.
Basically, you select a target table in a relational database, and dataconda "expands" it by adding thousands new attributes to it; these attributes are obtained by executing complex queries involving multiple tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Best Approach for Flat table representing data from multiple sources - sql

Related

Query all tables within a Snowflake Schema

How to dynamically use Arcmain or just partionly copy tables in Teradata?

Union to already massive table vs. create new table

sql server - create an indexed view using query results that concatenate row results into 1 column

Is a snowflake schema better than a star schema for data mining?

Categories

Resources